Refuse the Wrong Artifact -- Model Integrity Check

Level 2

Chapter 21: Cloud Deployment

descriptionProblem

Add an integrity check to the deployment: write a function that, given a model artifact (the pickled bytes) and an expected SHA-256 hash, loads the model only if the hash matches and otherwise refuses. Show it loads the correct artifact and rejects one that has been altered by a single byte. Why is "the right code with the wrong model file" a failure that ordinary tests never catch?

---

"The right code with the wrong model file" is a failure ordinary tests never catch -- the service starts, answers requests, and quietly serves predictions from a stale or corrupted artifact. The guard is an integrity check: load the model only if its bytes match a known fingerprint.

The verified MODEL is embedded under a do-not-edit banner. Write one function:

def load_verified(blob, expected_sha256):
    """Return pickle.loads(blob) ONLY if its SHA-256 matches expected_sha256;
    otherwise raise ValueError."""

Exact procedure: compute hashlib.sha256(blob).hexdigest(). If it does not equal expected_sha256, raise ValueError. Otherwise return pickle.loads(blob).

At module level, serialize MODEL with pickle.dumps, compute its SHA-256 into GOOD_HASH, and expose ARTIFACT (the bytes), GOOD_HASH, and load_verified.

> Think about it: flip a single byte of the artifact and the hash changes > completely, so the check refuses to load it. Why is hashing the artifact a > better guard than, say, checking the file size or the model's class name?

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import numpy as np
from sklearn.ensemble import RandomForestRegressor

# ── Verified Chapter 21 porosity model (do not edit) ─────────────────────
def train_model(seed=0, n=1500):
    rng = np.random.default_rng(seed)
    Vsh = rng.uniform(0, 1, n)
    phi = np.clip(0.30 * (1 - 0.7 * Vsh) + rng.normal(0, 0.02, n), 0.02, 0.34)
    GR = 18 * (1 - Vsh) + 135 * Vsh + rng.normal(0, 7, n)
    RHOB = 2.65 * (1 - phi) + 1.0 * phi + rng.normal(0, 0.03, n)
    NPHI = phi + 0.3 * Vsh + rng.normal(0, 0.02, n)
    RT = np.clip(0.5 / (np.clip(phi, 0.03, 1) ** 2) * np.exp(rng.normal(0, 0.3, n)), 0.2, 2000)
    X = np.column_stack([GR, RHOB, NPHI, np.log10(RT)])
    return RandomForestRegressor(n_estimators=60, random_state=0).fit(X, phi)

MODEL = train_model()
# ── end do-not-edit ───────────────────────────────────────────
import pickle
import hashlib

def load_verified(blob, expected_sha256):
    """Load the model ONLY if its bytes match the expected fingerprint."""
    # TODO: if hashlib.sha256(blob).hexdigest() != expected_sha256:
    # TODO:     raise ValueError("artifact hash mismatch -- refusing to load")
    # TODO: return pickle.loads(blob)
    return None

ARTIFACT = pickle.dumps(MODEL)
GOOD_HASH = hashlib.sha256(ARTIFACT).hexdigest()
print("hash:", GOOD_HASH[:12], "...")

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
from sklearn.ensemble import RandomForestRegressor


# ── Verified Chapter 21 porosity model (do not edit) ─────────────────────
def train_model(seed=0, n=1500):
    rng = np.random.default_rng(seed)
    Vsh = rng.uniform(0, 1, n)
    phi = np.clip(0.30 * (1 - 0.7 * Vsh) + rng.normal(0, 0.02, n), 0.02, 0.34)
    GR = 18 * (1 - Vsh) + 135 * Vsh + rng.normal(0, 7, n)
    RHOB = 2.65 * (1 - phi) + 1.0 * phi + rng.normal(0, 0.03, n)
    NPHI = phi + 0.3 * Vsh + rng.normal(0, 0.02, n)
    RT = np.clip(0.5 / (np.clip(phi, 0.03, 1) ** 2) * np.exp(rng.normal(0, 0.3, n)), 0.2, 2000)
    X = np.column_stack([GR, RHOB, NPHI, np.log10(RT)])
    return RandomForestRegressor(n_estimators=60, random_state=0).fit(X, phi)


MODEL = train_model()
# ── end do-not-edit ───────────────────────────────────────────
import pickle
import hashlib


def load_verified(blob, expected_sha256):
    """Load the model ONLY if its bytes match the expected fingerprint."""
    if hashlib.sha256(blob).hexdigest() != expected_sha256:
        raise ValueError("artifact hash mismatch -- refusing to load")
    return pickle.loads(blob)


ARTIFACT = pickle.dumps(MODEL)
GOOD_HASH = hashlib.sha256(ARTIFACT).hexdigest()

_m = load_verified(ARTIFACT, GOOD_HASH)
print("correct artifact loaded:", _m is not None)
try:
    load_verified(ARTIFACT[:-1] + bytes([ARTIFACT[-1] ^ 1]), GOOD_HASH)
except ValueError as e:
    print("altered artifact refused:", e)

lockCopying code is a Full Access feature.

arrow_back

21.2 Pick the Architecture -- Managed vs Self-Hosted Cost

22.1 Map Your Own Gaps -- Role Readiness

arrow_forward