Tune the Alarm -- The Staleness Threshold

Level 1

Chapter 20: Dashboards & Apps

descriptionProblem

The alert thresholds (2 days stale, 10%/mo decline, 85% uptime) decide how many trucks roll. Sweep the decline threshold from 5% to 25% and report how many wells it flags at each setting. Find the threshold that flags exactly the wells with a real problem (not normal decline noise), and explain the cost of setting it too low (false alarms) versus too high (a watering-out well missed for a month).

---

The staleness alarm decides how many wells land on the morning triage list. Set it too tight and a one-day telemetry lag pages someone; set it too loose and a genuinely dead feed goes unnoticed. This exercise sweeps the threshold and counts the cost.

The verified engine (make_field, well_kpis) is embedded under a do-not-edit banner. Write one function:

def sweep_stale(kpis, thresholds=(0, 1, 2, 3)):
    """How many wells the staleness alarm flags at each threshold.
    A well is flagged when days_stale > threshold. Returns {threshold: count}."""

Exact procedure: for each t in thresholds, count the wells with days_stale > t; return a dict {int(t): int(count)}.

At module level: FIELD = make_field(), KPIS = well_kpis(FIELD), sweep = sweep_stale(KPIS).

Expose: sweep_stale, sweep.

> Think about it: at a threshold of 0 days the alarm flags 4 of 6 wells -- > but three of those merely reported a day late, which is normal telemetry lag, not > a problem. At 2 days it flags exactly one: OD-001, whose feed has genuinely > gone quiet for three days. The jump from 4 to 1 is the cost of a too-tight > threshold: three false alarms every morning until the foreman stops trusting the > screen. What is the cost of setting it the other way, at 5 days?

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import numpy as np
import pandas as pd

# ── Verified Chapter 20 field surveillance engine (do not edit) ──────────

# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
    ("OD-001", 1500, 0.22, 0.6, "stale"),      # feed stopped reporting 3 days ago
    ("OD-002", 2100, 0.30, 0.7, None),
    ("OD-003", 2600, 0.28, 0.8, "decline"),    # recent step drop -> steep 30-day decline
    ("OD-004", 1800, 0.20, 0.5, None),         # the steady earner
    ("OD-005", 1200, 0.35, 0.7, "downtime"),   # frequent outages -> low uptime
    ("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730

def make_field(seed=11):
    """A field's DAILY production surveillance feed (long format). Sensor noise,
    real outages, and three planted problems -- a stale feed, a steep decline, and
    a low-uptime well -- the raw stream a monitoring dashboard sits on top of."""
    rng = np.random.default_rng(seed)
    rows = []
    for wid, qi, Di_yr, b, problem in WELLS:
        Di = Di_yr / 365.0
        t = np.arange(DAYS)
        q = qi / np.power(1 + b * Di * t, 1.0 / b)
        q = q * rng.normal(1.0, 0.03, DAYS)
        if problem == "decline":                       # a pressure/liquid-loading hit: -45% over last 40 d
            q[-40:] *= np.linspace(1.0, 0.55, 40)
        if problem == "downtime":                      # chronic intermittent producer
            q[rng.random(DAYS) < 0.20] = 0.0           # ~20% of days down -> ~80% uptime
        else:
            for _ in range(rng.integers(2, 5)):        # occasional multi-day outage
                s = rng.integers(0, DAYS - 6)
                q[s:s + rng.integers(1, 6)] = 0.0
        last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))
        q = np.maximum(q[:last], 0.0)
        rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
    return pd.concat(rows, ignore_index=True)

def well_kpis(field):
    """The surveillance scorecard: one row per well, the numbers a foreman reads."""
    asof = field.day.max()
    out = []
    for w, g in field.groupby("well"):
        g = g.sort_values("day")
        rate = g.oil_bopd.values
        prod = rate > 0
        last_rate = float(rate[prod][-1]) if prod.any() else 0.0
        recent, prior = rate[-30:], rate[-60:-30]
        a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
        a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
        decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
        out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
                        uptime_pct=round(float(prod.mean() * 100), 1),
                        cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
    return pd.DataFrame(out)

def downsample(day, rate, n_buckets=120):
    """min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
    spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
    if len(day) <= 2 * n_buckets:
        return day, rate
    keep = []
    for chunk in np.array_split(np.arange(len(day)), n_buckets):
        r = rate[chunk]
        keep.append(chunk[r.argmin()])
        keep.append(chunk[r.argmax()])
    keep = np.unique(keep)
    return day[keep], rate[keep]
# ── end do-not-edit ───────────────────────────────────────────

def sweep_stale(kpis, thresholds=(0, 1, 2, 3)):
    """How many wells the staleness alarm flags at each threshold (days_stale > t)."""
    # TODO: return {int(t): int((kpis.days_stale > t).sum()) for t in thresholds}
    return {}

FIELD = make_field()
KPIS = well_kpis(FIELD)
sweep = sweep_stale(KPIS)
print("sweep:", sweep)

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
import pandas as pd

# ── Verified Chapter 20 field surveillance engine (do not edit) ──────────

# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
    ("OD-001", 1500, 0.22, 0.6, "stale"),      # feed stopped reporting 3 days ago
    ("OD-002", 2100, 0.30, 0.7, None),
    ("OD-003", 2600, 0.28, 0.8, "decline"),    # recent step drop -> steep 30-day decline
    ("OD-004", 1800, 0.20, 0.5, None),         # the steady earner
    ("OD-005", 1200, 0.35, 0.7, "downtime"),   # frequent outages -> low uptime
    ("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730


def make_field(seed=11):
    """A field's DAILY production surveillance feed (long format). Sensor noise,
    real outages, and three planted problems -- a stale feed, a steep decline, and
    a low-uptime well -- the raw stream a monitoring dashboard sits on top of."""
    rng = np.random.default_rng(seed)
    rows = []
    for wid, qi, Di_yr, b, problem in WELLS:
        Di = Di_yr / 365.0
        t = np.arange(DAYS)
        q = qi / np.power(1 + b * Di * t, 1.0 / b)
        q = q * rng.normal(1.0, 0.03, DAYS)
        if problem == "decline":                       # a pressure/liquid-loading hit: -45% over last 40 d
            q[-40:] *= np.linspace(1.0, 0.55, 40)
        if problem == "downtime":                      # chronic intermittent producer
            q[rng.random(DAYS) < 0.20] = 0.0           # ~20% of days down -> ~80% uptime
        else:
            for _ in range(rng.integers(2, 5)):        # occasional multi-day outage
                s = rng.integers(0, DAYS - 6)
                q[s:s + rng.integers(1, 6)] = 0.0
        last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))
        q = np.maximum(q[:last], 0.0)
        rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
    return pd.concat(rows, ignore_index=True)


def well_kpis(field):
    """The surveillance scorecard: one row per well, the numbers a foreman reads."""
    asof = field.day.max()
    out = []
    for w, g in field.groupby("well"):
        g = g.sort_values("day")
        rate = g.oil_bopd.values
        prod = rate > 0
        last_rate = float(rate[prod][-1]) if prod.any() else 0.0
        recent, prior = rate[-30:], rate[-60:-30]
        a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
        a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
        decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
        out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
                        uptime_pct=round(float(prod.mean() * 100), 1),
                        cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
    return pd.DataFrame(out)


def downsample(day, rate, n_buckets=120):
    """min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
    spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
    if len(day) <= 2 * n_buckets:
        return day, rate
    keep = []
    for chunk in np.array_split(np.arange(len(day)), n_buckets):
        r = rate[chunk]
        keep.append(chunk[r.argmin()])
        keep.append(chunk[r.argmax()])
    keep = np.unique(keep)
    return day[keep], rate[keep]
# ── end do-not-edit ───────────────────────────────────────────


def sweep_stale(kpis, thresholds=(0, 1, 2, 3)):
    """How many wells the staleness alarm flags at each threshold (days_stale > t)."""
    return {int(t): int((kpis.days_stale > t).sum()) for t in thresholds}


FIELD = make_field()
KPIS = well_kpis(FIELD)
sweep = sweep_stale(KPIS)
print("wells flagged by staleness threshold:", sweep)

lockCopying code is a Full Access feature.

arrow_back

20.2 Prove the Chart Is Honest -- min/max vs Mean Reduction

21.1 Catch the Unit Mix-up -- A Validating Request Handler

arrow_forward