Part V: Real-World Applications

Chapter 21

Deployment: Where the Model Meets an Untrusted World

schedule15 min readfitness_center3 exercises

The porosity model worked beautifully in the notebook. Then someone wired it to a web form, and at 2 a.m. a drilling engineer in another time zone pasted in a log where the density curve reads −999, the null sentinel a tool writes when it has nothing to report. The notebook would have thrown a sea of red. The deployed service, if no one thought about it, returns a calm, confident porosity of 0.21 and a recommendation to perforate. That number is now in a real decision, and nothing flagged it as garbage.

Deployment is the moment a model stops talking to its author and starts talking to the world: untrusted inputs, a cost per prediction, and a pager that goes off when it breaks. The parts that get the attention (the Dockerfile, the cloud console, the CI pipeline) are mostly boilerplate you write once and copy forever. The engineering is the contract at the boundary: ship the exact artifact you tested, refuse input you cannot trust, and know what each prediction costs before the invoice arrives. This chapter builds that contract, and it runs. The infrastructure around it is shown, marked not to run, because a container and a cloud account cannot live inside a book; but the logic that makes the service safe is all here and tested.

infoWhat You'll Learn

  • Ship the artifact you tested: serialize and reload a trained model so the server never retrains
  • Treat the API boundary as hostile: validate every payload and refuse the null, the out-of-range, the missing curve
  • Put a dollar figure on a prediction, and decide managed-endpoint versus self-hosted from the request volume
  • Wrap the whole thing in the thin shell (Docker, FastAPI, CI) that turns a function into a service

lightbulbWhat Runs Here, and What Doesn't

The model artifact, the request handler, and the cost model are real and execute. The Dockerfile, the FastAPI route, and the CI pipeline are shown with eval: false, you cannot start a container or a web server inside a rendered book, but they wrap the exact functions verified above.

Ship the Artifact, Not the Notebook

The first deployment mistake is shipping the code that trains the model and retraining on startup. That is slow, non-reproducible (a new random seed, a newer library, a shifted dataset), and means the thing serving predictions is not the thing you validated. The fix is to treat the fitted model as a build artifact: train once, serialize the object to bytes, and load that exact byte-for-byte object on the server. Reloading must reproduce the original predictions exactly; if it does not, you have shipped a different model than you tested.

main.py

The artifact reproduces the original prediction to the bit, which is the whole point: the server runs this model, not a fresh one trained on whatever the environment happened to provide. Two cautions ride along with pickle. First, unpickling executes arbitrary code as it rebuilds the object, so an artifact must come only from a build pipeline you trust; never load a model file from an untrusted source (this is what the integrity check in Exercise 21.3 guards against). Second, a model pickled under one scikit-learn version can refuse to load, or worse, silently mispredict, under another, which is exactly why the packaging section pins the library versions the artifact was built against. The artifact and its environment travel together; that is what the container is for.

The Boundary Is Untrusted

Inside the notebook, every input came from you. The moment the model is behind an endpoint, the input comes from a form, another service, or a tired engineer at 2 a.m., and some of it will be wrong. The single most valuable code in a deployment is the validation at the boundary: a handler that checks every payload against the physics before the model ever sees it, and refuses what it cannot trust with a reason a human can act on. A service that errors clearly on RHOB = −999 is safer in the only way that matters, compared with one that returns a confident porosity for it.

main.py

The valid log returns a porosity and a net-pay flag (the > 0.08 test is the porosity floor below which a zone is generally not worth completing, a domain heuristic, not a law); the three bad ones each return a specific, actionable error instead of a fabricated answer. This is the inversion most people miss: in a notebook a crash is an annoyance, but in a service a crash is honest. The dangerous failure is the confident wrong answer that never crashes. The handler's job is to convert every untrusted input into either a trustworthy prediction or a clear refusal, and it is a pure function, so you can test it exhaustively before it ever sees traffic.

Package It So It Runs Anywhere

The artifact and the handler need an environment that is identical on your laptop, in CI, and in the cloud. That is all a container is: a frozen, reproducible box holding the exact Python, the exact libraries, and the artifact. The two files below define it. They do not run here (building an image needs Docker) but notice that they add no logic; they only pin what the verified code already needs.

main.py

What Does a Prediction Cost?

Cloud bills are an engineering input, not an afterthought. The same service has two very different cost shapes: a managed endpoint charges per call (cheap when idle, expensive at volume), while a self-hosted container is an always-on box that costs the same whether it serves ten requests or ten million. Which is right is not a matter of taste; it is a crossover you can compute from the expected request volume.

main.py

The numbers tell the whole story. At 5,000 requests a day the managed endpoint costs \60 a month against \272 for the always-on box, so pay-per-call wins easily. At 500,000 a day the managed bill is \6,000 while the box is still \272: the same code, a hundred-fold cost difference, decided entirely by volume. Break-even sits near 23,000 requests a day. The failure mode here is cultural: a team spins up a GPU instance to serve a model that gets fifty requests a day, and pays for an idle box all year. Run the crossover first; let the request volume pick the architecture.

The Shell That Turns a Function into a Service

The last layer is the smallest. A FastAPI route exposes predict_payload over HTTP, and a CI pipeline guarantees that nothing ships unless the handler's tests pass. Both are shown, not run, but both are thin: the route is the handler with a decorator, and the pipeline is test, then deploy. Its real value is that it makes the validation tests a gate, not a suggestion.

main.py

That is the entire deployment. The container, the route, and the pipeline are boilerplate you will reuse for every model you ship; the artifact discipline, the boundary validation, and the cost crossover are the parts you have to get right, and the parts that run.

Exercises

These work on the boundary logic and the economics, the parts a deployment lives or dies on.

fitness_center
Exercise 21.1Practice

: Catch the Unit Mix-up

A common silent error is a density curve sent in kg/m³ (≈2350) instead of g/cm³ (≈2.35). It is numeric and not a null, so the range check [1.5, 3.1] a...

arrow_forward
codePythonSolve Nowarrow_forward
fitness_center
Exercise 21.2Practice

: Pick the Architecture

Your service is quoted at \0.20 per 1,000 managed calls, or a \0.25/hr instance plus \$150/month of ops time if self-hosted. Compute the monthly cost ...

arrow_forward
codePythonSolve Nowarrow_forward
fitness_center
Exercise 21.3Practice

: Refuse the Wrong Artifact

Add an integrity check to the deployment: write a function that, given a model artifact (the pickled bytes) and an expected SHA-256 hash, loads the mo...

arrow_forward
codePythonSolve Nowarrow_forward

Summary

  • Ship the artifact, not the trainer. Serialize the fitted model and load that exact object; the server must run the model you validated, byte for byte, never a fresh retrain.
  • The boundary is hostile. The highest-value code in a deployment is the handler that validates every payload against the physics and refuses the null, the out-of-range, and the missing curve. A clear error beats a confident wrong answer.
  • A crash is honest; a fabricated answer is not. In a service the dangerous failure is the one that never throws.
  • Cost is an engineering input. Managed-per-call versus always-on self-hosted is a computable crossover (~23,000 req/day here); let the request volume choose, and never pay for an idle box.
  • The infrastructure is the thin shell. Docker, FastAPI, and CI add no logic; they pin the environment, expose the handler, and make the validation tests a gate. The contract at the boundary is the engineering.