You call minimize, OLS, or LogisticRegression.fit() and the coefficients come back. But why did they come back? What algorithm ran? What assumptions were made? What happens when those assumptions fail and the output is silently wrong?
This book answers these questions. It is written for junior developers, beginning data analysts, and fresh graduate students who use scipy and statsmodels but want to understand what these libraries actually do under the hood — not just how to call a function, but what that function computes and why it works. If you have ever stared at a convergence warning, a negative \(R^2\), or standard errors that seem too small, this book gives you the tools to diagnose and fix the problem.
Every chapter takes a method, develops the mathematics that justify it, re-implements the core algorithm from scratch so you can see every moving part, and then covers the diagnostics and failure modes that the documentation never mentions. The from-scratch implementation is verified against the library output — if the code matches to 6 digits, the theory is validated. No hand-waving, no “it can be shown” — just clear explanations backed by runnable code.
Who this book is for. Junior developers building data pipelines, beginning data analysts moving beyond pandas and charts, and fresh graduate students who want to understand the methods they use in research. You write Python comfortably, use NumPy for array computation, and have taken at least one statistics course. Prior experience with scipy or statsmodels is not assumed — Chapter 1 teaches these from the ground up. The appendices provide refreshers on the scientific Python stack, probability, and matrix algebra for readers who need them.
How to Read This Book
Each chapter follows a fixed eleven-section template:
Motivation – why the method exists and when you need it
Mathematical Foundation – definitions, theorems, proofs
The Algorithm – pseudocode matching the notation ledger
Statistical Properties – what the theory guarantees
Library Implementation – the library’s choices and their consequences
From-Scratch Implementation – building it yourself, verified against the library
Diagnostics – how to tell when the method is working and when it is not
# Preface {.unnumbered}You call `minimize`, `OLS`, or `LogisticRegression.fit()` and thecoefficients come back. But *why* did they come back? What algorithmran? What assumptions were made? What happens when those assumptionsfail and the output is silently wrong?This book answers these questions. It is written for **juniordevelopers, beginning data analysts, and fresh graduate students**who use scipy and statsmodels but want to understand what theselibraries actually do under the hood --- not just *how* to call afunction, but *what that function computes* and *why it works*. Ifyou have ever stared at a convergence warning, a negative $R^2$, orstandard errors that seem too small, this book gives you the toolsto diagnose and fix the problem.Every chapter takes a method, develops the mathematics that justifyit, re-implements the core algorithm from scratch so you can seeevery moving part, and then covers the diagnostics and failure modesthat the documentation never mentions. The from-scratchimplementation is verified against the library output --- if thecode matches to 6 digits, the theory is validated. No hand-waving,no "it can be shown" --- just clear explanations backed by runnablecode.**Who this book is for.** Junior developers building data pipelines,beginning data analysts moving beyond pandas and charts, and freshgraduate students who want to understand the methods they use inresearch. You write Python comfortably, use NumPy for arraycomputation, and have taken at least one statistics course. Priorexperience with scipy or statsmodels is *not* assumed --- Chapter 1teaches these from the ground up. The appendices provide refresherson the scientific Python stack, probability, and matrix algebra forreaders who need them.## How to Read This BookEach chapter follows a fixed eleven-section template:1. **Motivation** -- why the method exists and when you need it2. **Mathematical Foundation** -- definitions, theorems, proofs3. **The Algorithm** -- pseudocode matching the notation ledger4. **Statistical Properties** -- what the theory guarantees5. **Library Implementation** -- the library's choices and their consequences6. **From-Scratch Implementation** -- building it yourself, verified against the library7. **Diagnostics** -- how to tell when the method is working and when it is not8. **Computational Considerations** -- complexity, scaling, practical limits9. **Worked Example** -- end to end on real or synthetic data10. **Exercises** -- including at least one diagnostic-failure exercise11. **Bibliographic Notes** -- where the ideas came from and where to go deeper## EnvironmentAll code targets the versions pinned in `requirements.txt`.```bashpython-m venv .venvsource .venv/bin/activatepip install -r requirements.txt``````{python}#| echo: true#| label: version-checkimport scipy, numpy, statsmodels, matplotlibprint(f"scipy: {scipy.__version__}")print(f"numpy: {numpy.__version__}")print(f"statsmodels: {statsmodels.__version__}")print(f"matplotlib: {matplotlib.__version__}")```