Advanced Methods in SciPy and Statsmodels

The Theory They Never Fully Explained

Authors

Kun Deng

Claude (Anthropic)

Published

May 2, 2026

Preface

You call minimize, OLS, or LogisticRegression.fit() and the coefficients come back. But why did they come back? What algorithm ran? What assumptions were made? What happens when those assumptions fail and the output is silently wrong?

This book answers these questions. It is written for junior developers, beginning data analysts, and fresh graduate students who use scipy and statsmodels but want to understand what these libraries actually do under the hood — not just how to call a function, but what that function computes and why it works. If you have ever stared at a convergence warning, a negative $R^2$, or standard errors that seem too small, this book gives you the tools to diagnose and fix the problem.

Every chapter takes a method, develops the mathematics that justify it, re-implements the core algorithm from scratch so you can see every moving part, and then covers the diagnostics and failure modes that the documentation never mentions. The from-scratch implementation is verified against the library output — if the code matches to 6 digits, the theory is validated. No hand-waving, no “it can be shown” — just clear explanations backed by runnable code.

Who this book is for. Junior developers building data pipelines, beginning data analysts moving beyond pandas and charts, and fresh graduate students who want to understand the methods they use in research. You write Python comfortably, use NumPy for array computation, and have taken at least one statistics course. Prior experience with scipy or statsmodels is not assumed — Chapter 1 teaches these from the ground up. The appendices provide refreshers on the scientific Python stack, probability, and matrix algebra for readers who need them.

How to Read This Book

Each chapter follows a fixed eleven-section template:

Motivation – why the method exists and when you need it
Mathematical Foundation – definitions, theorems, proofs
The Algorithm – pseudocode matching the notation ledger
Statistical Properties – what the theory guarantees
Library Implementation – the library’s choices and their consequences
From-Scratch Implementation – building it yourself, verified against the library
Diagnostics – how to tell when the method is working and when it is not
Computational Considerations – complexity, scaling, practical limits
Worked Example – end to end on real or synthetic data
Exercises – including at least one diagnostic-failure exercise
Bibliographic Notes – where the ideas came from and where to go deeper

Environment

All code targets the versions pinned in requirements.txt.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

import scipy, numpy, statsmodels, matplotlib
print(f"scipy:        {scipy.__version__}")
print(f"numpy:        {numpy.__version__}")
print(f"statsmodels:  {statsmodels.__version__}")
print(f"matplotlib:   {matplotlib.__version__}")

scipy:        1.14.1
numpy:        2.2.6
statsmodels:  0.14.4
matplotlib:   3.9.2

# Preface {.unnumbered} You call `minimize`, `OLS`, or `LogisticRegression.fit()` and the coefficients come back. But *why* did they come back? What algorithm ran? What assumptions were made? What happens when those assumptions fail and the output is silently wrong? This book answers these questions. It is written for **junior developers, beginning data analysts, and fresh graduate students** who use scipy and statsmodels but want to understand what these libraries actually do under the hood --- not just *how* to call a function, but *what that function computes* and *why it works*. If you have ever stared at a convergence warning, a negative $R^2$, or standard errors that seem too small, this book gives you the tools to diagnose and fix the problem. Every chapter takes a method, develops the mathematics that justify it, re-implements the core algorithm from scratch so you can see every moving part, and then covers the diagnostics and failure modes that the documentation never mentions. The from-scratch implementation is verified against the library output --- if the code matches to 6 digits, the theory is validated. No hand-waving, no "it can be shown" --- just clear explanations backed by runnable code. **Who this book is for.** Junior developers building data pipelines, beginning data analysts moving beyond pandas and charts, and fresh graduate students who want to understand the methods they use in research. You write Python comfortably, use NumPy for array computation, and have taken at least one statistics course. Prior experience with scipy or statsmodels is *not* assumed --- Chapter 1 teaches these from the ground up. The appendices provide refreshers on the scientific Python stack, probability, and matrix algebra for readers who need them. ## How to Read This Book Each chapter follows a fixed eleven-section template: 1. **Motivation** -- why the method exists and when you need it 2. **Mathematical Foundation** -- definitions, theorems, proofs 3. **The Algorithm** -- pseudocode matching the notation ledger 4. **Statistical Properties** -- what the theory guarantees 5. **Library Implementation** -- the library's choices and their consequences 6. **From-Scratch Implementation** -- building it yourself, verified against the library 7. **Diagnostics** -- how to tell when the method is working and when it is not 8. **Computational Considerations** -- complexity, scaling, practical limits 9. **Worked Example** -- end to end on real or synthetic data 10. **Exercises** -- including at least one diagnostic-failure exercise 11. **Bibliographic Notes** -- where the ideas came from and where to go deeper ## Environment All code targets the versions pinned in `requirements.txt`. ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ```{python} #| echo: true #| label: version-check import scipy, numpy, statsmodels, matplotlib print(f"scipy: {scipy.__version__}") print(f"numpy: {numpy.__version__}") print(f"statsmodels: {statsmodels.__version__}") print(f"matplotlib: {matplotlib.__version__}") ```