Appendix A — Python Scientific Computing Reference

This appendix is a reference, not a tutorial. It gathers the computational tools assumed by the main chapters in one place, with runnable examples and pointers to where each tool is used. If you know Python but have not worked with NumPy, SciPy, or statsmodels before, read this appendix first — it provides the vocabulary the rest of the book speaks in.

import numpy as np
from scipy import stats, linalg, optimize
import statsmodels.api as sm
import matplotlib.pyplot as plt

plt.style.use('assets/book.mplstyle')
rng = np.random.default_rng(42)

NumPy Essentials

NumPy is the foundation everything else sits on. Every chapter in this book creates, reshapes, slices, and combines arrays. This section covers the patterns that recur most often.

A.0.1 Array creation

The most common constructors:

# Zeros, ones, and identity
print("zeros(3):       ", np.zeros(3))
print("ones((2,3)):  \n", np.ones((2, 3)))
print("eye(3):        \n", np.eye(3))

# Sequences
print("arange(0, 1, 0.3):", np.arange(0, 1, 0.3))
print("linspace(0, 1, 5):", np.linspace(0, 1, 5))
zeros(3):        [0. 0. 0.]
ones((2,3)):  
 [[1. 1. 1.]
 [1. 1. 1.]]
eye(3):        
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
arange(0, 1, 0.3): [0.  0.3 0.6 0.9]
linspace(0, 1, 5): [0.   0.25 0.5  0.75 1.  ]

arange uses a step size and may or may not include the endpoint (floating-point rounding decides). linspace uses a count and always includes both endpoints — prefer it for grids where exact endpoints matter.

# From existing data
a = np.array([1.0, 2.0, 3.0])
print("array:    ", a, "dtype:", a.dtype)

# Diagonal matrix from a vector
print("diag([1,2,3]):\n", np.diag([1, 2, 3]))

# Repeat and tile
print("repeat:   ", np.repeat([1, 2], 3))
print("tile:     ", np.tile([1, 2], 3))
array:     [1. 2. 3.] dtype: float64
diag([1,2,3]):
 [[1 0 0]
 [0 2 0]
 [0 0 3]]
repeat:    [1 1 1 2 2 2]
tile:      [1 2 1 2 1 2]

A.0.2 Indexing and slicing

NumPy arrays support integer indexing, slicing, boolean indexing, and fancy (array) indexing. Slices return views — modifying the slice modifies the original. Fancy indexing returns copies.

X = np.arange(12).reshape(3, 4)
print("X:\n", X)

# Basic slicing (views)
print("Row 0:      ", X[0])
print("Col 1:      ", X[:, 1])
print("Submatrix:\n", X[:2, 1:3])

# Boolean indexing (copy)
mask = X > 5
print("X[X > 5]:   ", X[mask])

# Fancy indexing (copy)
rows = np.array([0, 2])
print("Rows 0,2: \n", X[rows])
X:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Row 0:       [0 1 2 3]
Col 1:       [1 5 9]
Submatrix:
 [[1 2]
 [5 6]]
X[X > 5]:    [ 6  7  8  9 10 11]
Rows 0,2: 
 [[ 0  1  2  3]
 [ 8  9 10 11]]

A.0.3 Broadcasting

Broadcasting lets NumPy operate on arrays with different shapes without explicit loops or copies. The rule: dimensions are compared from the right; they are compatible if they are equal or one of them is 1. A missing dimension is treated as 1.

# Scalar broadcast
a = np.array([1, 2, 3])
print("a + 10:", a + 10)

# Column vector + row vector -> matrix
col = np.array([[1], [2], [3]])      # shape (3, 1)
row = np.array([10, 20, 30, 40])     # shape (4,)
print("col + row:\n", col + row)     # shape (3, 4)

# Centering columns of a matrix (subtract column means)
X = rng.standard_normal((4, 3))
X_centered = X - X.mean(axis=0)      # (4,3) - (3,) -> (4,3)
print("Column means after centering:",
      X_centered.mean(axis=0).round(15))
a + 10: [11 12 13]
col + row:
 [[11 21 31 41]
 [12 22 32 42]
 [13 23 33 43]]
Column means after centering: [ 0. -0.  0.]

Broadcasting is used throughout the book — centering data (Ch. 11), evaluating densities on grids (Ch. 7), and computing pairwise distances (Ch. 19) all rely on it.

A.0.4 Common pitfalls

Three NumPy traps catch even experienced users. Recognizing them early saves hours of silent-bug debugging.

1. Views vs copies. Basic slicing returns a view that shares memory with the original array. Modifying the slice modifies the original — sometimes intentionally, often not:

original = np.array([10, 20, 30, 40, 50])
view_slice = original[1:4]   # basic slice -> view
view_slice[0] = 999
# The original is now corrupted:
print(f"original after view mutation: {original}")

# Safe pattern: call .copy() when you need independence
original = np.array([10, 20, 30, 40, 50])
safe_copy = original[1:4].copy()
safe_copy[0] = 999
print(f"original after copy mutation: {original}")
original after view mutation: [ 10 999  30  40  50]
original after copy mutation: [10 20 30 40 50]

Boolean and fancy indexing always return copies, so a[a > 0] = 0 modifies a in-place but b = a[a > 0] produces an independent array. See the expanded discussion later in this appendix.

2. Integer vs float arrays. NumPy infers dtype from the input. An array created from integers will silently truncate floats:

int_array = np.array([1, 2, 3])       # dtype: int64
int_array[0] = 1.9
# The 1.9 is truncated to 1, no warning:
print(f"int array after assigning 1.9: {int_array}")
print(f"dtype: {int_array.dtype}")

# Safe pattern: specify float dtype at creation
float_array = np.array([1, 2, 3], dtype=float)
float_array[0] = 1.9
print(f"float array: {float_array}")
int array after assigning 1.9: [1 2 3]
dtype: int64
float array: [1.9 2.  3. ]

This trap surfaces when building arrays from counts or indices and later assigning computed values. Always create numerical arrays with dtype=float when they will hold non-integer results.

3. Shape (n,) vs (n, 1). A 1-D array with shape (n,) is not a column vector. Matrix operations that expect a column vector — or code that relies on 2-D broadcasting rules — will silently produce wrong shapes:

v = np.array([1, 2, 3])               # shape (3,)
print(f"v.shape:       {v.shape}")
print(f"v.T.shape:     {v.T.shape}")  # transpose is a no-op!

# Outer product attempt: v @ v.T gives a scalar, not a matrix
print(f"v @ v.T:       {v @ v.T}")    # dot product = 14

# Fix: reshape to (3,1) to get a true column vector
v_col = v.reshape(-1, 1)              # shape (3, 1)
print(f"v_col @ v_col.T:\n{v_col @ v_col.T}")  # 3x3 outer product
v.shape:       (3,)
v.T.shape:     (3,)
v @ v.T:       14
v_col @ v_col.T:
[[1 2 3]
 [2 4 6]
 [3 6 9]]

When statsmodels or scipy expects a 2-D array and receives a 1-D array, the result is often a confusing shape mismatch error. Use x.reshape(-1, 1) to promote a 1-D array to a column vector, or np.atleast_2d(x).T for the same effect.

A.0.5 Reshape, ravel, and transpose

a = np.arange(12)
print("reshape(3,4):\n", a.reshape(3, 4))
print("reshape(3,-1):\n", a.reshape(3, -1))  # infer columns

M = a.reshape(3, 4)
print("ravel:", M.ravel())         # back to 1-D (view)
print("M.T:\n", M.T)              # transpose
reshape(3,4):
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
reshape(3,-1):
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
ravel: [ 0  1  2  3  4  5  6  7  8  9 10 11]
M.T:
 [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]

Use -1 in one dimension to let NumPy infer it. ravel() returns a view when possible; flatten() always returns a copy.

A.0.6 Stacking and splitting

Combining arrays is a constant need — assembling design matrices, collecting bootstrap samples, or building block-diagonal structures.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertical and horizontal stacking
print("vstack:\n", np.vstack([a, b]))
print("hstack: ", np.hstack([a, b]))

# column_stack: treat 1-D arrays as columns
print("column_stack:\n", np.column_stack([a, b]))

# concatenate: general-purpose
M1 = np.ones((2, 3))
M2 = np.zeros((2, 3))
print("concatenate (axis=0):\n",
      np.concatenate([M1, M2], axis=0))
vstack:
 [[1 2 3]
 [4 5 6]]
hstack:  [1 2 3 4 5 6]
column_stack:
 [[1 4]
 [2 5]
 [3 6]]
concatenate (axis=0):
 [[1. 1. 1.]
 [1. 1. 1.]
 [0. 0. 0.]
 [0. 0. 0.]]

np.column_stack is the most common pattern in this book — it builds the design matrix \(\mathbf{X}\) from individual predictor vectors. np.vstack appears when stacking observation batches.

A.0.7 Useful functions

Several NumPy functions appear often enough to warrant a brief catalog:

x = np.array([-2.5, 0.3, 1.7, 4.2, -0.1])

# where: element-wise conditional
print("where(x>0, x, 0):", np.where(x > 0, x, 0))

# clip: bound values
print("clip(-1, 2):     ", np.clip(x, -1, 2))

# searchsorted: binary search on sorted array
sorted_a = np.array([1, 3, 5, 7, 9])
print("searchsorted(4): ", np.searchsorted(sorted_a, 4))

# cumsum and diff
print("cumsum:          ", np.cumsum([1, 2, 3, 4]))
print("diff:            ", np.diff([1, 4, 6, 10]))

# einsum: Einstein summation (trace example)
A = np.array([[1, 2], [3, 4]])
print("trace via einsum:", np.einsum('ii', A))
print("matmul via einsum:\n",
      np.einsum('ij,jk->ik', A, A))
where(x>0, x, 0): [0.  0.3 1.7 4.2 0. ]
clip(-1, 2):      [-1.   0.3  1.7  2.  -0.1]
searchsorted(4):  2
cumsum:           [ 1  3  6 10]
diff:             [3 2 4]
trace via einsum: 5
matmul via einsum:
 [[ 7 10]
 [15 22]]

np.einsum is the Swiss army knife for tensor contractions. It appears in Ch. 19 for multivariate statistics and in Ch. 14 for mixed-effects computations where explicit loops over groups would be slow.

A.0.8 Views vs copies: a common trap

Understanding when NumPy returns a view (shared memory) vs a copy (independent memory) prevents subtle bugs:

a = np.arange(5)
b = a[1:4]        # slice -> view
b[0] = 99
print(f"a after modifying slice: {a}")
# a is modified because b is a view

c = a[[1, 2, 3]]  # fancy index -> copy
c[0] = -1
print(f"a after modifying fancy: {a}")
# a is NOT modified because c is a copy
a after modifying slice: [ 0 99  2  3  4]
a after modifying fancy: [ 0 99  2  3  4]

Rule of thumb: basic slicing produces views; boolean and fancy indexing produce copies. When in doubt, call .copy() explicitly. This matters in Ch. 9 (bootstrap resampling) where accidentally modifying the original data through a view would corrupt results.

Matplotlib Essentials

Every figure in this book is produced with Matplotlib. The book’s style sheet (assets/book.mplstyle) sets publication-quality defaults — serif fonts, suppressed top/right spines, a colorblind-friendly cycle with alternating line styles, and 300 dpi output.

A.0.9 The Figure/Axes model

Matplotlib distinguishes between the figure (the canvas) and one or more axes (the individual plots). The explicit axes API gives full control:

fig, ax = plt.subplots()
x = np.linspace(0, 2 * np.pi, 200)
ax.plot(x, np.sin(x), label="sin")
ax.plot(x, np.cos(x), label="cos")
ax.set_xlabel("x")
ax.set_ylabel("f(x)")
ax.set_title("Sine and cosine")
ax.legend()
plt.show()

For multiple panels, use subplots with rows and columns:

fig, axes = plt.subplots(1, 3, figsize=(10, 3))
t = np.linspace(0, 4 * np.pi, 300)
for i, (func, name) in enumerate([
    (np.sin, "sin"), (np.cos, "cos"),
    (lambda x: np.sin(x) * np.exp(-x / 5), "damped sin"),
]):
    axes[i].plot(t, func(t))
    axes[i].set_title(name)
fig.tight_layout()
plt.show()

A.0.10 Common plot types used in the book

The chapters use a small set of plot types repeatedly. Here is a gallery with the calling conventions:

When to use each plot type:

  • Line plot — use for any quantity that evolves over an ordered index: optimization convergence traces (objective value vs iteration), power curves, or time series. The x-axis implies ordering.
  • Scatter plot — use to reveal the relationship between two continuous variables: residuals vs fitted values, observed vs predicted, or bivariate distributions. No ordering is implied; the eye looks for correlation, clusters, or outliers.
  • Step plot — use for quantities that jump at discrete points: empirical CDFs (Ch. 9), Kaplan–Meier survival curves (Ch. 18), or any piecewise-constant function. where="post" means the step rises at the data point and stays flat until the next one.
  • Histogram — use to visualize the distribution of a single variable: MCMC posterior samples (Ch. 8), bootstrap replications (Ch. 9), or residual distributions (Ch. 11). Set density=True to normalize the area to 1 so the histogram is comparable to a PDF overlay.
  • fill_between — use to show uncertainty bands around a curve: confidence intervals for regression lines (Ch. 11), credible intervals for posterior predictions (Ch. 8), or pointwise bootstrap bands (Ch. 9). The shaded region communicates “the true curve likely lies in here.”
  • Errorbar — use to display point estimates with error bars: coefficient estimates \(\pm\) standard errors (Ch. 10), treatment effects with confidence intervals (Ch. 21–23), or any setting where you want to compare several estimates side-by-side with their uncertainty.

A.0.11 Contour plots

Contour plots visualize objective-function landscapes (Ch. 2–5):

x_grid = np.linspace(-2, 2, 200)
y_grid = np.linspace(-1, 3, 200)
Xg, Yg = np.meshgrid(x_grid, y_grid)
Z = (1 - Xg)**2 + 100 * (Yg - Xg**2)**2  # Rosenbrock

fig, ax = plt.subplots(figsize=(7, 5))
cs = ax.contour(
    Xg, Yg, np.log10(Z + 1),
    levels=15, cmap="viridis",
)
ax.clabel(cs, fontsize=8)
ax.plot(1, 1, "r*", markersize=12, label="minimum")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_title("Rosenbrock log-contours")
ax.legend()
plt.show()

A.0.12 Saving figures

fig.savefig("figure.pdf")         # vector, for print
fig.savefig("figure.png")         # raster, for screen
fig.savefig("figure.png", dpi=300, bbox_inches="tight")

The book style sheet sets savefig.dpi: 300 and savefig.bbox: tight by default, so the explicit keyword arguments are rarely needed.

A.0.13 What book.mplstyle sets

The style file configures: serif fonts (Latin Modern Roman / Computer Modern), figure size 8 x 5 at 150 dpi, suppressed top and right spines, an 8-color cycle designed for grayscale readability with alternating solid/dashed/dot-dash/dotted line styles, outward-facing ticks, and frameless legends. Loading it via plt.style.use('assets/book.mplstyle') in each chapter’s import block ensures every figure in the book has a consistent appearance.

Floating Point and Conditioning

Every computation in this book runs on IEEE 754 double-precision floating point. Key facts:

  • Machine epsilon: \(\epsilon_{\text{mach}} \approx 2.2 \times 10^{-16}\). Two numbers closer than this relative to their magnitude are indistinguishable.
  • Representable range: doubles span roughly \(10^{-308}\) to \(10^{308}\). Operations outside this range produce underflow (silently flushed to zero) or overflow (inf).
  • Catastrophic cancellation: \(a - b\) when \(a \approx b\) loses significant digits. This is why np.logaddexp(0, x) is preferred over np.log(1 + np.exp(x)) for the log-sigmoid function.
print(f"Machine epsilon:  {np.finfo(float).eps:.2e}")
print(f"Smallest positive: {np.finfo(float).tiny:.2e}")
print(f"Largest finite:    {np.finfo(float).max:.2e}")
Machine epsilon:  2.22e-16
Smallest positive: 2.23e-308
Largest finite:    1.80e+308

A.0.14 Catastrophic cancellation in practice

# Naive vs stable log-sigmoid
x_vals = np.array([0.0, 20.0, 50.0, 100.0, 500.0, 710.0])

print(f"{'x':>6}  {'naive':>22}  {'stable':>22}")
for xv in x_vals:
    naive = np.log(1 / (1 + np.exp(-xv)))
    stable = -np.logaddexp(0, -xv)
    print(f"{xv:6.0f}  {naive:22.15e}  {stable:22.15e}")
     x                   naive                  stable
     0  -6.931471805599453e-01  -6.931471805599453e-01
    20  -2.061153694291927e-09  -2.061153620314381e-09
    50   0.000000000000000e+00  -1.928749847963918e-22
   100   0.000000000000000e+00  -3.720075976020836e-44
   500   0.000000000000000e+00  -7.124576406741286e-218
   710   0.000000000000000e+00  -4.476286225675130e-309

At \(x = 710\) the naive version overflows because np.exp(710) exceeds the double range. The stable version handles this correctly. This matters for log-likelihood evaluation in logistic regression (Ch. 12–13).

A.0.15 Condition number

The condition number \(\kappa(\mathbf{A}) = \|\mathbf{A}\| \cdot \|\mathbf{A}^{-1}\|\) measures how much a linear system \(\mathbf{Ax} = \mathbf{b}\) amplifies input errors. A system with \(\kappa = 10^k\) loses roughly \(k\) digits of accuracy.

# Well-conditioned vs ill-conditioned
print(f"Identity(5):  kappa = "
      f"{np.linalg.cond(np.eye(5)):.1f}")
print(f"Hilbert(5):   kappa = "
      f"{np.linalg.cond(linalg.hilbert(5)):.0f}")
print(f"Hilbert(10):  kappa = "
      f"{np.linalg.cond(linalg.hilbert(10)):.2e}")
print(f"Hilbert(15):  kappa = "
      f"{np.linalg.cond(linalg.hilbert(15)):.2e}")
Identity(5):  kappa = 1.0
Hilbert(5):   kappa = 476607
Hilbert(10):  kappa = 1.60e+13
Hilbert(15):  kappa = 4.24e+17

The Hilbert matrix is the classic example of near-singularity. By order 15 the condition number exceeds \(10^{17}\), meaning no digits in the solution are reliable. Condition numbers appear in: Ch. 2 (optimizer convergence speed depends on \(\kappa\) of the Hessian), Ch. 11 (multicollinearity diagnosis via VIF, which is a condition-number proxy), and Ch. 20 (weak instruments produce ill-conditioned moment matrices).

A.0.16 Useful safeguards

# log1p and expm1 avoid cancellation near zero
x_small = 1e-15
print(f"log(1 + x): {np.log(1 + x_small):.6e}")
print(f"log1p(x):   {np.log1p(x_small):.6e}")

# logsumexp for stable log-sum-of-exponentials
from scipy.special import logsumexp
log_vals = np.array([1000, 1001, 1002])
print(f"\nlogsumexp:  {logsumexp(log_vals):.6f}")
# naive would overflow:
# np.log(np.sum(np.exp(log_vals)))  -> inf
log(1 + x): 1.110223e-15
log1p(x):   1.000000e-15

logsumexp:  1002.407606

Dense Linear Algebra (scipy.linalg)

SciPy’s linalg module wraps LAPACK and provides the decompositions used throughout the book. Prefer scipy.linalg over numpy.linalg — it is a strict superset with better LAPACK coverage and consistent error handling.

A.0.17 Key decompositions

Decomposition scipy call Cost Used in
LU linalg.lu \(O(n^3)\) General linear systems
Cholesky linalg.cholesky \(O(n^3/3)\) Positive-definite systems (Ch. 8, 14)
QR linalg.qr \(O(mn^2)\) OLS via QR (Ch. 11)
SVD linalg.svd \(O(mn^2)\) PCA (Ch. 19), pseudoinverse
Eigendecomposition linalg.eigh \(O(n^3)\) PCA, MANOVA (Ch. 19)

A.0.18 Solving linear systems

A = np.array([[4.0, 2.0], [2.0, 3.0]])
b = np.array([1.0, 2.0])

# General solve (uses LU internally)
x = linalg.solve(A, b)
print(f"solve:    x = {x.round(6)}")

# Positive-definite solve (uses Cholesky, ~2x faster)
x_pd = linalg.solve(
    A, b, assume_a="pos",
)
print(f"solve PD: x = {x_pd.round(6)}")

# Verify
print(f"A @ x = {(A @ x).round(6)}, b = {b}")
solve:    x = [-0.125  0.75 ]
solve PD: x = [-0.125  0.75 ]
A @ x = [1. 2.], b = [1. 2.]

assume_a="pos" tells linalg.solve the matrix is positive definite, so it uses Cholesky instead of LU — halving the flop count. This matters when solving covariance systems repeatedly (Ch. 8, Ch. 14).

A.0.19 Decomposition examples

A = np.array([[4.0, 2.0], [2.0, 3.0]])

# Cholesky
L = linalg.cholesky(A, lower=True)
print(f"Cholesky: L @ L.T = A? "
      f"{np.allclose(L @ L.T, A)}")

# SVD
U, S, Vt = linalg.svd(A)
print(f"SVD singular values: {S.round(3)}")
print(f"Condition number: {S[0] / S[1]:.3f}")

# Eigendecomposition (symmetric)
eigvals, eigvecs = linalg.eigh(A)
print(f"Eigenvalues: {eigvals.round(3)}")
Cholesky: L @ L.T = A? True
SVD singular values: [5.562 1.438]
Condition number: 3.866
Eigenvalues: [1.438 5.562]

Use linalg.eigh (not linalg.eig) for symmetric matrices. It exploits symmetry for better speed and numerical stability, and guarantees real eigenvalues. The general linalg.eig returns complex values even for symmetric input due to rounding.

A.0.20 When to use scipy.linalg vs numpy.linalg

numpy.linalg covers basic operations (solve, eig, svd, norm, cond). scipy.linalg adds Cholesky with lower control, LU factorization, specialized solvers (solve_triangular, solve_banded, cho_solve), matrix functions (expm, logm), and the assume_a parameter for solve. Throughout this book, we import from scipy.linalg exclusively.

References: Trefethen and Bau (1997), Golub and Van Loan (2013).

scipy.optimize Quick Reference

The optimization chapters (Ch. 2–6) develop these methods in detail. This section is a lookup table.

A.0.21 minimize: unconstrained and constrained

Method Order Gradient Hessian Best for Chapter
Nelder-Mead 0 No No Non-smooth, low-dim 5
Powell 0 No No Non-smooth, moderate dim 5
CG 1 Yes No Large-scale, smooth 2
BFGS quasi-2 Yes No General smooth, default 2
L-BFGS-B quasi-2 Yes No Large-scale with bounds 4
Newton-CG 2 Yes Yes/HVP Moderate-scale, smooth 2
trust-ncg 2 Yes Yes Small-to-moderate, smooth 2
trust-krylov 2 Yes HVP Large-scale trust-region 4
COBYLA 0 No No Constrained, no gradients 3
SLSQP 1 Yes No Constrained, smooth 3
trust-constr 2 Yes Yes Constrained, general 3

The default when method is omitted is BFGS (if unconstrained without bounds), L-BFGS-B (if bounds are provided), or raises an error (if constraints are given). Always spell out method explicitly (STEERING.md §5.5).

# Basic minimize call
result = optimize.minimize(
    fun=lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2,
    x0=np.array([0.0, 0.0]),
    method='BFGS',
)
print(f"x* = {result.x.round(6)}")
print(f"f(x*) = {result.fun:.2e}")
print(f"Converged: {result.success}")
print(f"Iterations: {result.nit}")
x* = [1.  2.5]
f(x*) = 1.97e-15
Converged: True
Iterations: 2

A.0.22 least_squares vs curve_fit

Both solve nonlinear least-squares problems (Ch. 6). least_squares is the general interface — you supply the residual vector \(\mathbf{r}(\mathbf{x})\) and optionally the Jacobian. curve_fit is a convenience wrapper for the common case of fitting a model function \(y = f(x; \boldsymbol{\theta})\) to data.

# least_squares: full control
def residuals(params, x, y):
    a, b, c = params
    return y - a * np.exp(-b * x) + c

x_data = np.linspace(0, 4, 50)
y_data = (2.5 * np.exp(-1.3 * x_data) + 0.5
          + 0.1 * rng.standard_normal(50))

res = optimize.least_squares(
    fun=residuals,
    x0=[1.0, 1.0, 0.0],
    args=(x_data, y_data),
    method='trf',
)
print(f"least_squares: params = {res.x.round(3)}")
print(f"Cost: {res.cost:.4f}")
least_squares: params = [ 2.58   1.397 -0.519]
Cost: 0.2414
Feature least_squares curve_fit
Input Residual vector Model function
Bounds Yes Yes
Jacobian User-supplied or finite-diff Auto (via least_squares)
Covariance of params Manual from Jacobian Returned as pcov
Robust losses loss='huber', 'cauchy', etc. No

A.0.23 root: nonlinear equations

optimize.root solves \(\mathbf{F}(\mathbf{x}) = \mathbf{0}\). For scalar equations use optimize.brentq (bracketing, guaranteed convergence) or optimize.newton (Newton–Raphson). These appear in Ch. 10 for solving score equations and in Ch. 18 for survival function inversion.

# Scalar root-finding: solve x - cos(x) = 0
root_val = optimize.brentq(
    f=lambda x: x - np.cos(x),
    a=0, b=1,
)
print(f"brentq: x = {root_val:.10f}")
print(f"verify: x - cos(x) = "
      f"{root_val - np.cos(root_val):.2e}")
brentq: x = 0.7390851332
verify: x - cos(x) = -7.88e-15

A.0.24 The OptimizeResult object

All scipy.optimize functions return an OptimizeResult — a dictionary-like object with standardized fields:

Attribute Meaning
.x Solution vector
.fun Objective value at solution
.success Whether the optimizer converged
.message Human-readable convergence status
.nit Number of iterations
.nfev Number of function evaluations
.jac Jacobian at solution (if computed)
.hess_inv Inverse Hessian approximation (BFGS)

Always check .success and .message before trusting .x. An optimizer that hits the iteration limit returns success=False — the solution may still be close to optimal, but you should not assume so without checking. This is a recurring theme in Ch. 2–6.

scipy.stats Quick Reference

A.0.25 The frozen distribution API

All distributions in scipy.stats follow the same pattern: create a frozen distribution object, then call methods on it. This is the single most important API pattern in the book.

# Create a frozen distribution
rv = stats.norm(loc=0, scale=1)

# Core methods (same for every distribution)
x = np.linspace(-3, 3, 7)
print("pdf:", rv.pdf(x).round(4))
print("cdf:", rv.cdf(x).round(4))
print("ppf (quantile):", rv.ppf([0.025, 0.5, 0.975]).round(4))
print("mean:", rv.mean(), " var:", rv.var())
print("interval(0.95):", rv.interval(0.95))
print("rvs:", rv.rvs(size=5, random_state=rng).round(3))
pdf: [0.0044 0.054  0.242  0.3989 0.242  0.054  0.0044]
cdf: [0.0013 0.0228 0.1587 0.5    0.8413 0.9772 0.9987]
ppf (quantile): [-1.96  0.    1.96]
mean: 0.0  var: 1.0
interval(0.95): (np.float64(-1.959963984540054), np.float64(1.959963984540054))
rvs: [ 0.952 -2.252 -0.827 -0.782 -2.32 ]

Discrete distributions use pmf (probability mass function) instead of pdf:

rv_pois = stats.poisson(mu=5)
print("pmf(3):", rv_pois.pmf(3).round(4))
print("cdf(3):", rv_pois.cdf(3).round(4))
print("mean:", rv_pois.mean(), " var:", rv_pois.var())
pmf(3): 0.1404
cdf(3): 0.265
mean: 5.0  var: 5.0

A.0.26 Distribution catalog

Distribution scipy call Shape params Used in
Normal norm(loc, scale) Throughout
Student \(t\) t(df) df Ch. 10–11
Chi-squared chi2(df) df Ch. 10
\(F\) f(dfn, dfd) dfn, dfd Ch. 10–11
Gamma gamma(a, scale=s) a Ch. 7, 18
Beta beta(a, b) a, b Ch. 7, 9
Exponential expon(scale=s) Ch. 18
Weibull weibull_min(c, scale=s) c Ch. 18
Multivariate normal multivariate_normal(mean, cov) Ch. 8, 19
Poisson poisson(mu) mu Ch. 12
Binomial binom(n, p) n, p Ch. 12
distributions = {
    'Normal(0,1)':    stats.norm(0, 1),
    'Student t(5)':   stats.t(df=5),
    'Chi-sq(3)':      stats.chi2(df=3),
    'F(5,20)':        stats.f(dfn=5, dfd=20),
    'Gamma(2,1)':     stats.gamma(a=2, scale=1),
    'Exp(1)':         stats.expon(scale=1),
}

print(f"{'Distribution':<15} {'Mean':>8} {'Var':>8}"
      f" {'95% CI':>22}")
for name, rv in distributions.items():
    ci = rv.interval(0.95)
    print(f"{name:<15} {rv.mean():>8.3f} {rv.var():>8.3f}"
          f" {'({:.2f}, {:.2f})'.format(*ci):>22}")
Distribution        Mean      Var                 95% CI
Normal(0,1)        0.000    1.000          (-1.96, 1.96)
Student t(5)       0.000    1.667          (-2.57, 2.57)
Chi-sq(3)          3.000    6.000           (0.22, 9.35)
F(5,20)            1.111    0.710           (0.16, 3.29)
Gamma(2,1)         2.000    2.000           (0.24, 5.57)
Exp(1)             1.000    1.000           (0.03, 3.69)

A.0.27 Fitting distributions to data

# Generate data from a gamma distribution
data = stats.gamma.rvs(
    a=2.5, scale=1.5,
    size=500, random_state=rng,
)

# MLE fit
a_hat, loc_hat, scale_hat = stats.gamma.fit(
    data, floc=0,  # fix location to 0
)
print(f"True:  a=2.500, scale=1.500")
print(f"MLE:   a={a_hat:.3f}, scale={scale_hat:.3f}")
True:  a=2.500, scale=1.500
MLE:   a=2.611, scale=1.395

floc=0 fixes the location parameter — standard for distributions that are naturally non-negative. Without it, the optimizer may find a negative location that improves the likelihood but produces a nonsensical model.

A.0.28 Hypothesis tests catalog

Test scipy call Null hypothesis
One-sample \(t\) ttest_1samp(x, mu0) \(\mu = \mu_0\)
Two-sample \(t\) ttest_ind(x, y) \(\mu_x = \mu_y\)
Welch \(t\) ttest_ind(x, y, equal_var=False) \(\mu_x = \mu_y\)
Paired \(t\) ttest_rel(x, y) \(\mu_{x-y} = 0\)
Chi-squared GOF chisquare(obs, exp) Observed matches expected
\(F\)-test (ANOVA) f_oneway(*groups) All group means equal
Pearson \(r\) pearsonr(x, y) \(\rho = 0\)
Spearman \(\rho\) spearmanr(x, y) Monotonic independence
KS test kstest(x, cdf) \(x\) follows the given CDF
Shapiro–Wilk shapiro(x) \(x\) is normally distributed
Mann–Whitney \(U\) mannwhitneyu(x, y) Stochastic equality
Kruskal–Wallis kruskal(*groups) Stochastic equality (\(k\) groups)

A.0.29 Quasi-Monte Carlo

scipy.stats.qmc provides low-discrepancy sequences for numerical integration and space-filling experimental designs (Ch. 7):

from scipy.stats import qmc

sampler = qmc.Sobol(d=2, scramble=True, seed=rng)
points = sampler.random(64)
print(f"Sobol 64 points in [0,1]^2, "
      f"discrepancy = {qmc.discrepancy(points):.6f}")
Sobol 64 points in [0,1]^2, discrepancy = 0.000156

Parametric Hypothesis Testing

A.0.30 The \(t\), \(F\), and \(\chi^2\) tests

These are the building blocks for Wald, score, and likelihood-ratio tests (Ch. 10).

# One-sample t-test
x = rng.normal(loc=0.5, scale=1, size=30)
t_stat, t_pval = stats.ttest_1samp(x, popmean=0)
print(f"t-test (H0: mu=0): t={t_stat:.3f}, "
      f"p={t_pval:.4f}")

# Two-sample Welch t-test
y = rng.normal(loc=0, scale=1, size=30)
t2_stat, t2_pval = stats.ttest_ind(
    x, y, equal_var=False,
)
print(f"Welch t-test: t={t2_stat:.3f}, "
      f"p={t2_pval:.4f}")

# Chi-squared goodness of fit
observed = np.array([45, 35, 20])
expected = np.array([40, 40, 20])
chi2_stat, chi2_pval = stats.chisquare(
    observed, f_exp=expected,
)
print(f"Chi-sq test: stat={chi2_stat:.3f}, "
      f"p={chi2_pval:.4f}")
t-test (H0: mu=0): t=1.393, p=0.1742
Welch t-test: t=1.448, p=0.1531
Chi-sq test: stat=1.250, p=0.5353

A.0.31 One-way ANOVA (\(F\)-test)

The one-way ANOVA tests whether \(k\) groups have the same mean. Under the null, the \(F\) statistic follows an \(F_{k-1, N-k}\) distribution. This is the multivariate generalization of the two-sample \(t\)-test and appears as a building block in Ch. 10–11.

group_a = rng.normal(loc=5.0, scale=1.0, size=30)
group_b = rng.normal(loc=5.5, scale=1.0, size=30)
group_c = rng.normal(loc=6.0, scale=1.0, size=30)

f_stat, f_pval = stats.f_oneway(
    group_a, group_b, group_c,
)
print(f"One-way ANOVA: F={f_stat:.3f}, "
      f"p={f_pval:.4f}")
One-way ANOVA: F=6.535, p=0.0023

Nonparametric Tests

# Mann-Whitney U (two-sample rank test)
u_stat, u_pval = stats.mannwhitneyu(
    x, y, alternative='two-sided',
)
print(f"Mann-Whitney U: stat={u_stat:.1f}, "
      f"p={u_pval:.4f}")

# Kolmogorov-Smirnov (distribution comparison)
ks_stat, ks_pval = stats.kstest(
    x, 'norm', args=(x.mean(), x.std()),
)
print(f"KS test: stat={ks_stat:.3f}, "
      f"p={ks_pval:.4f}")

# Shapiro-Wilk (normality)
sw_stat, sw_pval = stats.shapiro(x)
print(f"Shapiro-Wilk: stat={sw_stat:.3f}, "
      f"p={sw_pval:.4f}")

# Kruskal-Wallis (k-sample rank test)
kw_stat, kw_pval = stats.kruskal(
    group_a, group_b, group_c,
)
print(f"Kruskal-Wallis: stat={kw_stat:.3f}, "
      f"p={kw_pval:.4f}")
Mann-Whitney U: stat=561.0, p=0.1023
KS test: stat=0.124, p=0.7009
Shapiro-Wilk: stat=0.981, p=0.8527
Kruskal-Wallis: stat=11.872, p=0.0026

Correlation and Dependence

x_corr = rng.standard_normal(100)
y_corr = 0.7 * x_corr + 0.3 * rng.standard_normal(100)

# Pearson (linear association)
r, p = stats.pearsonr(x_corr, y_corr)
print(f"Pearson:  r={r:.3f}, p={p:.4f}")

# Spearman (monotonic association, rank-based)
rho, p_s = stats.spearmanr(x_corr, y_corr)
print(f"Spearman: rho={rho:.3f}, p={p_s:.4f}")

# Kendall (concordance, rank-based)
tau, p_k = stats.kendalltau(x_corr, y_corr)
print(f"Kendall:  tau={tau:.3f}, p={p_k:.4f}")
Pearson:  r=0.933, p=0.0000
Spearman: rho=0.928, p=0.0000
Kendall:  tau=0.774, p=0.0000

A.0.32 Partial correlation

Partial correlation measures the linear association between two variables after removing the effect of one or more confounders. It is used in Ch. 11 (regression diagnostics) and Ch. 17 (testing for Granger-causal relationships after conditioning on other series).

# Partial correlation of x and y given z
z_corr = 0.5 * x_corr + 0.5 * rng.standard_normal(100)

# Residualize x and y on z
def residualize(var, controls):
    """Residuals from OLS of var on controls."""
    X_ctrl = np.column_stack(controls)
    X_aug = np.column_stack([np.ones(len(var)), X_ctrl])
    beta = np.linalg.lstsq(X_aug, var, rcond=None)[0]
    return var - X_aug @ beta

x_resid = residualize(x_corr, [z_corr])
y_resid = residualize(y_corr, [z_corr])

r_partial, p_partial = stats.pearsonr(x_resid, y_resid)
print(f"Partial corr (x,y | z): "
      f"r={r_partial:.3f}, p={p_partial:.4f}")
print(f"Raw Pearson (x,y):      "
      f"r={r:.3f}")
Partial corr (x,y | z): r=0.851, p=0.0000
Raw Pearson (x,y):      r=0.933

The partial correlation is typically lower than the raw correlation because the confounder \(z\) explains some of the apparent association.

A.0.33 Correlation matrix

data_corr = np.column_stack([x_corr, y_corr, z_corr])
R = np.corrcoef(data_corr, rowvar=False)
print("Correlation matrix:")
print(R.round(3))
Correlation matrix:
[[1.    0.933 0.766]
 [0.933 1.    0.729]
 [0.766 0.729 1.   ]]

Power and Sample Size

Statistical power = probability of detecting a true effect. For a two-sample \(t\)-test with effect size \(d = (\mu_1 - \mu_2)/\sigma\):

from statsmodels.stats.power import TTestIndPower

power_analysis = TTestIndPower()

# Sample size for 80% power, effect size d=0.5, alpha=0.05
n_needed = power_analysis.solve_power(
    effect_size=0.5, alpha=0.05, power=0.8,
    alternative='two-sided',
)
print(f"Sample size needed (per group): {n_needed:.0f}")

# Power for n=50 per group
power = power_analysis.solve_power(
    effect_size=0.5, alpha=0.05, nobs1=50,
    alternative='two-sided',
)
print(f"Power with n=50: {power:.3f}")
Sample size needed (per group): 64
Power with n=50: 0.697
fig, ax = plt.subplots()
ns = np.arange(10, 200)
powers = [
    power_analysis.solve_power(
        0.5, alpha=0.05, nobs1=n_,
        alternative='two-sided',
    )
    for n_ in ns
]
ax.plot(ns, powers, linewidth=1.5)
ax.axhline(
    0.8, color="gray", linestyle="--",
    alpha=0.5, label="80% power",
)
ax.axvline(
    n_needed, color="C1", linestyle="--",
    alpha=0.5, label=f"n={n_needed:.0f}",
)
ax.set_xlabel("Sample size per group")
ax.set_ylabel("Power")
ax.set_title("Power curve: two-sample t-test, d=0.5")
ax.legend()
plt.show()

Random Number Generation

The book uses the modern numpy.random.Generator API exclusively (STEERING.md §5.2). The legacy np.random.seed() / np.random.randn() API is never used.

# The canonical pattern
rng = np.random.default_rng(42)

# Key methods
print(f"Normal:    {rng.standard_normal(3).round(3)}")
print(f"Uniform:   {rng.uniform(0, 1, 3).round(3)}")
print(f"Poisson:   {rng.poisson(5, 3)}")
print(f"Binomial:  {rng.binomial(10, 0.3, 3)}")
print(f"Choice:    "
      f"{rng.choice([1, 2, 3, 4], 3, replace=True)}")
print(f"Permute:   "
      f"{rng.permutation([10, 20, 30, 40, 50])}")

# Reproducibility
rng1 = np.random.default_rng(42)
rng2 = np.random.default_rng(42)
print(f"\nReproducible: "
      f"{np.array_equal(rng1.standard_normal(5), rng2.standard_normal(5))}")
Normal:    [ 0.305 -1.04   0.75 ]
Uniform:   [0.697 0.094 0.976]
Poisson:   [7 3 9]
Binomial:  [2 4 4]
Choice:    [2 4 2]
Permute:   [10 40 50 30 20]

Reproducible: True

Key difference from legacy API: Generator uses the PCG64 PRNG (better statistical properties than Mersenne Twister). Each Generator instance maintains its own state, so multiple generators can run independently without global state conflicts.

A.0.34 Passing rng to library functions

scipy and statsmodels functions that accept randomness use the random_state parameter:

# scipy.stats distributions
samples = stats.norm.rvs(
    loc=0, scale=1, size=5,
    random_state=rng,
)
print(f"scipy.stats.rvs: {samples.round(3)}")

# scipy.stats.bootstrap (Ch. 9)
data_boot = (rng.standard_normal(50),)
res = stats.bootstrap(
    data_boot, np.mean,
    n_resamples=999,
    random_state=rng,
    method='percentile',
)
print(f"Bootstrap CI: ({res.confidence_interval.low:.3f},"
      f" {res.confidence_interval.high:.3f})")
scipy.stats.rvs: [ 1.129 -0.114 -0.84  -0.824  0.651]
Bootstrap CI: (-0.201, 0.169)

Statsmodels Patterns

Statsmodels is the primary library for chapters 10–18. Its API follows a distinctive pattern that differs from scipy’s functional style.

A.0.35 The .fit() → results pattern

Every statsmodels model follows the same two-step workflow: construct the model, then call .fit() to obtain a results object. All inference — coefficients, standard errors, confidence intervals, hypothesis tests — lives on the results object.

# Generate data
n = 100
x1 = rng.standard_normal(n)
x2 = rng.standard_normal(n)
y = 1.5 + 2.0 * x1 - 0.5 * x2 + rng.standard_normal(n)

# Array API: add constant manually
X = sm.add_constant(np.column_stack([x1, x2]))
model = sm.OLS(endog=y, exog=X)
results = model.fit()
print(results.summary().tables[1])
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6404      0.099     16.498      0.000       1.443       1.838
x1             1.9633      0.102     19.326      0.000       1.762       2.165
x2            -0.4584      0.094     -4.853      0.000      -0.646      -0.271
==============================================================================

A.0.36 from_formula vs array API

Statsmodels supports both an array API (shown above) and a formula API using patsy/formulaic:

import pandas as pd

df = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2})

# Formula API: intercept is automatic
results_f = sm.OLS.from_formula(
    'y ~ x1 + x2', data=df,
).fit()
print(f"Array API params:   "
      f"{results.params.round(3)}")
print(f"Formula API params: "
      f"{results_f.params.round(3)}")
Array API params:   [ 1.64   1.963 -0.458]
Formula API params: Intercept    1.640
x1           1.963
x2          -0.458
dtype: float64

The formula API adds the intercept automatically (~ 1 + is implied). Use y ~ x1 + x2 - 1 to suppress it. The array API requires sm.add_constant() explicitly.

In this book, the array API is preferred for transparency — the reader sees exactly what the design matrix \(\mathbf{X}\) contains. The formula API appears in examples where it makes the model specification more readable, particularly for interaction terms and categorical variables.

A.0.37 The results object

The results object carries everything downstream analysis needs:

print(f"Coefficients:  {results.params.round(3)}")
print(f"Std errors:    {results.bse.round(3)}")
print(f"t-statistics:  {results.tvalues.round(3)}")
print(f"p-values:      {results.pvalues.round(4)}")
print(f"R-squared:     {results.rsquared:.4f}")
print(f"AIC:           {results.aic:.1f}")
print(f"BIC:           {results.bic:.1f}")
print(f"Log-likelihood:{results.llf:.1f}")
print(f"Residuals:     {results.resid[:5].round(3)}")
Coefficients:  [ 1.64   1.963 -0.458]
Std errors:    [0.099 0.102 0.094]
t-statistics:  [16.498 19.326 -4.853]
p-values:      [0. 0. 0.]
R-squared:     0.8053
AIC:           285.0
BIC:           292.8
Log-likelihood:-139.5
Residuals:     [0.098 0.103 0.146 0.091 2.21 ]

A.0.38 Complete worked example: fit, read, extract

The following example walks through the full OLS workflow — fitting a model, reading the summary table column by column, and extracting specific quantities programmatically for downstream use.

# Generate data with known coefficients
n_ex = 200
rng_ex = np.random.default_rng(99)
x1_ex = rng_ex.standard_normal(n_ex)
x2_ex = rng_ex.uniform(0, 10, n_ex)
# True model: y = 5 + 2*x1 - 0.3*x2 + noise
y_ex = (5.0 + 2.0 * x1_ex - 0.3 * x2_ex
        + rng_ex.standard_normal(n_ex))

# Build design matrix and fit
X_ex = sm.add_constant(
    np.column_stack([x1_ex, x2_ex])
)
res_ex = sm.OLS(endog=y_ex, exog=X_ex).fit()

# --- Reading the summary table ---
print(res_ex.summary().tables[1])
print()

# Column-by-column explanation:
#   coef     = point estimate (beta-hat)
#   std err  = standard error of the estimate
#   t        = coef / std err (Wald statistic)
#   P>|t|    = two-sided p-value for H0: coef = 0
#   [0.025   = lower bound of 95% confidence interval
#   0.975]   = upper bound of 95% confidence interval

# --- Extracting specific values ---
# Coefficient for x1 (index 1, since index 0 is intercept)
beta1_hat = res_ex.params[1]
se1 = res_ex.bse[1]
pval1 = res_ex.pvalues[1]
print(f"x1 coefficient:  {beta1_hat:.4f}")
print(f"x1 std error:    {se1:.4f}")
print(f"x1 p-value:      {pval1:.6f}")

# Confidence interval for x2 (index 2)
ci = res_ex.conf_int(alpha=0.05)
print(f"x2 95% CI:       [{ci[2, 0]:.4f}, "
      f"{ci[2, 1]:.4f}]")

# Model-level: R-squared and F-test
print(f"R-squared:       {res_ex.rsquared:.4f}")
print(f"F-statistic:     {res_ex.fvalue:.2f}")
print(f"F p-value:       {res_ex.f_pvalue:.2e}")
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.9491      0.142     34.975      0.000       4.670       5.228
x1             1.9799      0.071     27.986      0.000       1.840       2.119
x2            -0.2727      0.024    -11.584      0.000      -0.319      -0.226
==============================================================================

x1 coefficient:  1.9799
x1 std error:    0.0707
x1 p-value:      0.000000
x2 95% CI:       [-0.3192, -0.2263]
R-squared:       0.8152
F-statistic:     434.48
F p-value:       5.92e-73

The key pattern: results.params[j] gives the \(j\)-th coefficient, results.bse[j] its standard error, results.pvalues[j] its \(p\)-value, and results.conf_int()[j] its confidence interval. Index 0 is always the intercept (if add_constant was used). These programmatic accessors are essential when you need to feed regression output into further computation — plotting coefficient comparisons, computing marginal effects manually, or building test statistics from multiple models.

A.0.39 Covariance type options

The default cov_type='nonrobust' assumes homoscedastic, independent errors. This is almost never defensible with real data. Statsmodels provides several robust alternatives:

cov_type Assumes Use when
'nonrobust' Homoscedastic, independent Textbook exercises only
'HC0' Heteroscedastic White’s original estimator
'HC1' Heteroscedastic Small-sample correction (\(n/(n-k)\))
'HC3' Heteroscedastic Preferred for small samples (Ch. 11)
'HAC' Heteroscedastic + autocorrelated Time series (Ch. 15–17)
'cluster' Clustered errors Grouped data (Ch. 14)
# Compare standard errors under different assumptions
for cov in ['nonrobust', 'HC0', 'HC3']:
    res = model.fit(cov_type=cov)
    print(f"{cov:>12}: SE = {res.bse.round(4)}")
   nonrobust: SE = [0.0994 0.1016 0.0944]
         HC0: SE = [0.0981 0.0944 0.0915]
         HC3: SE = [0.1011 0.0989 0.0963]

Always specify cov_type explicitly. Relying on the default is a common source of overconfident inference. See Ch. 11 for a detailed treatment.

A.0.40 Hypothesis testing on the results object

# Wald test: beta_1 = 2 (the true value)
t_test = results.t_test('x1 = 2')
print(f"t-test (beta_1=2): "
      f"t={float(t_test.tvalue):.3f}, "
      f"p={float(t_test.pvalue):.4f}")

# F-test: joint test that x1 = 0 and x2 = 0
f_test = results.f_test('x1 = 0, x2 = 0')
print(f"F-test (all slopes=0): "
      f"F={float(f_test.fvalue):.3f}, "
      f"p={float(f_test.pvalue):.6f}")

# Confidence intervals
print(f"\n95% confidence intervals:\n"
      f"{results.conf_int(alpha=0.05).round(3)}")
t-test (beta_1=2): t=-0.361, p=0.7185
F-test (all slopes=0): F=200.567, p=0.000000

95% confidence intervals:
[[ 1.443  1.838]
 [ 1.762  2.165]
 [-0.646 -0.271]]

A.0.41 Marginal effects

For nonlinear models (Ch. 12–13), coefficients are not directly interpretable as marginal effects. The get_margeff() method computes average marginal effects:

# Logistic regression example
y_binary = (y > y.mean()).astype(int)
logit_model = sm.Logit(endog=y_binary, exog=X)
logit_results = logit_model.fit(disp=0)

mfx = logit_results.get_margeff(at='overall')
print(mfx.summary())
        Logit Marginal Effects       
=====================================
Dep. Variable:                      y
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.3560      0.033     10.729      0.000       0.291       0.421
x2            -0.0773      0.031     -2.471      0.013      -0.139      -0.016
==============================================================================

The at='overall' option computes average marginal effects (AME) — the mean of the marginal effect evaluated at each observation. This is the most common choice. at='mean' evaluates at the sample mean of all covariates, which is simpler but potentially misleading for non-representative “average” individuals.

A.0.42 Reading summary() output

The summary() method produces three tables. The top table reports model-level statistics (R-squared, F-statistic, AIC, BIC, log-likelihood, Durbin–Watson). The middle table reports per-coefficient results (estimate, standard error, \(t\)-statistic, \(p\)-value, and 95% confidence interval). The bottom table reports omnibus diagnostics (Omnibus test, Jarque–Bera, skewness, kurtosis, condition number).

Key quantities to check immediately:

  • R-squared vs Adjusted R-squared: a large gap signals overfitting. Adjusted R-squared penalizes for the number of predictors.
  • F-statistic \(p\)-value: tests whether the model as a whole explains more variance than an intercept-only model. A non-significant \(F\)-test means none of the predictors matter jointly.
  • Condition number: large values (\(>30\)) indicate multicollinearity. See Ch. 11 for VIF-based diagnosis.
  • Durbin–Watson: values near 2 indicate no autocorrelation in residuals. Values near 0 or 4 indicate positive or negative autocorrelation respectively — a warning sign for cross-sectional models and a modeling choice for time-series models (Ch. 15).

A.0.43 Predictions and diagnostics

# Point predictions with confidence intervals
predictions = results.get_prediction(X[:5])
pred_summary = predictions.summary_frame(alpha=0.05)
print(pred_summary[
    ['mean', 'mean_ci_lower', 'mean_ci_upper']
].round(3))
    mean  mean_ci_lower  mean_ci_upper
0  3.409          3.012          3.807
1 -0.534         -0.826         -0.243
2 -0.739         -1.107         -0.371
3  1.577          1.282          1.871
4  3.134          2.822          3.446

A.0.44 The statsmodels diagnostic toolkit

Statsmodels provides diagnostic tests as standalone functions and as methods on results objects:

from statsmodels.stats.diagnostic import (
    het_breuschpagan,
    acorr_breusch_godfrey,
)
from statsmodels.stats.stattools import (
    durbin_watson,
)

# Breusch-Pagan test for heteroscedasticity
bp_stat, bp_pval, _, _ = het_breuschpagan(
    results.resid, results.model.exog,
)
print(f"Breusch-Pagan: stat={bp_stat:.3f}, "
      f"p={bp_pval:.4f}")

# Durbin-Watson test for autocorrelation
dw = durbin_watson(results.resid)
print(f"Durbin-Watson: {dw:.3f} "
      f"(~2 = no autocorrelation)")
Breusch-Pagan: stat=1.011, p=0.6032
Durbin-Watson: 2.200 (~2 = no autocorrelation)

These diagnostics are developed fully in Ch. 11 (regression diagnostics) and Ch. 15 (time-series residual diagnostics).

Environment and Reproducibility Checklist

Every code block in this book was executed in the environment defined by requirements.txt. To reproduce results:

uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt

The key packages and their roles:

Package Role Primary chapters
numpy Array computation All
scipy Optimization, statistics, linear algebra All
statsmodels Statistical models, inference 10–18
linearmodels IV, panel data 20–22
matplotlib Plotting All
pandas Data frames 10–23
scikit-learn Pipeline, datasets 1, datasets elsewhere

Verify your environment matches:

import scipy
import statsmodels

print(f"NumPy:       {np.__version__}")
print(f"SciPy:       {scipy.__version__}")
print(f"statsmodels: {statsmodels.__version__}")
NumPy:       2.2.6
SciPy:       1.14.1
statsmodels: 0.14.4