Exercise 21.3
Refuse the Wrong Artifact -- Model Integrity Check
Add an integrity check to the deployment: write a function that, given a model artifact (the pickled bytes) and an expected SHA-256 hash, loads the model only if the hash matches and otherwise refuses. Show it loads the correct artifact and rejects one that has been altered by a single byte. Why is "the right code with the wrong model file" a failure that ordinary tests never catch?
---
"The right code with the wrong model file" is a failure ordinary tests never catch -- the service starts, answers requests, and quietly serves predictions from a stale or corrupted artifact. The guard is an integrity check: load the model only if its bytes match a known fingerprint.
The verified MODEL is embedded under a do-not-edit banner. Write one function:
def load_verified(blob, expected_sha256):
"""Return pickle.loads(blob) ONLY if its SHA-256 matches expected_sha256;
otherwise raise ValueError."""Exact procedure: compute hashlib.sha256(blob).hexdigest(). If it does not equal expected_sha256, raise ValueError. Otherwise return pickle.loads(blob).
At module level, serialize MODEL with pickle.dumps, compute its SHA-256 into GOOD_HASH, and expose ARTIFACT (the bytes), GOOD_HASH, and load_verified.
> Think about it: flip a single byte of the artifact and the hash changes > completely, so the check refuses to load it. Why is hashing the artifact a > better guard than, say, checking the file size or the model's class name?
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import numpy as np
from sklearn.ensemble import RandomForestRegressor
# ── Verified Chapter 21 porosity model (do not edit) ─────────────────────
def train_model(seed=0, n=1500):
rng = np.random.default_rng(seed)
Vsh = rng.uniform(0, 1, n)
phi = np.clip(0.30 * (1 - 0.7 * Vsh) + rng.normal(0, 0.02, n), 0.02, 0.34)
GR = 18 * (1 - Vsh) + 135 * Vsh + rng.normal(0, 7, n)
RHOB = 2.65 * (1 - phi) + 1.0 * phi + rng.normal(0, 0.03, n)
NPHI = phi + 0.3 * Vsh + rng.normal(0, 0.02, n)
RT = np.clip(0.5 / (np.clip(phi, 0.03, 1) ** 2) * np.exp(rng.normal(0, 0.3, n)), 0.2, 2000)
X = np.column_stack([GR, RHOB, NPHI, np.log10(RT)])
return RandomForestRegressor(n_estimators=60, random_state=0).fit(X, phi)
MODEL = train_model()
# ── end do-not-edit ───────────────────────────────────────────
import pickle
import hashlib
def load_verified(blob, expected_sha256):
"""Load the model ONLY if its bytes match the expected fingerprint."""
if hashlib.sha256(blob).hexdigest() != expected_sha256:
raise ValueError("artifact hash mismatch -- refusing to load")
return pickle.loads(blob)
ARTIFACT = pickle.dumps(MODEL)
GOOD_HASH = hashlib.sha256(ARTIFACT).hexdigest()
_m = load_verified(ARTIFACT, GOOD_HASH)
print("correct artifact loaded:", _m is not None)
try:
load_verified(ARTIFACT[:-1] + bytes([ARTIFACT[-1] ^ 1]), GOOD_HASH)
except ValueError as e:
print("altered artifact refused:", e)
lockCopying code is a Full Access feature.