Prove the Chart Is Honest -- min/max vs Mean Reduction

Level 2

Chapter 20: Dashboards & Apps

descriptionProblem

Take a well, inject a single one-day outage (set one day's rate to zero), and reduce the series both ways, mean-resampling and min/max decimation, to 100 buckets. Show that the mean curve's minimum stays well above zero while the min/max curve's minimum is exactly zero. By how many bopd does mean-resampling hide the outage? Why does this matter more for a 1-day event than a 30-day one?

---

A surveillance chart that averages its data into buckets will quietly erase the one-day outage you are watching for. This exercise proves it: reduce a daily series two ways and compare the lowest value each method lets you see.

The verified engine (make_field, downsample) is embedded under a do-not-edit banner. Write one function:

def reductions(day, rate, n_buckets=100):
    """Reduce a series to n_buckets two ways; return (mean_resample_min, minmax_min).
    mean_resample_min = the smallest bucket MEAN; minmax_min = the smallest sample
    the min/max decimation keeps."""

Exact procedure: split the indices into n_buckets with np.array_split(np.arange(len(day)), n_buckets). mean_resample_min is the minimum over each bucket's mean rate. For minmax_min, call the embedded downsample(day, rate, n_buckets) and take the min of the rates it returns.

At module level, take well OD-002, inject a one-day outage (rate[400] = 0.0), and compute both, plus hidden_by = mean_resample_min - minmax_min.

Expose: reductions, mean_resample_min, minmax_min, hidden_by.

> Think about it: the outage drops the rate to zero for a single day. Mean- > resampling averages that zero in with ~7 healthy neighbours, so the lowest bucket > mean stays around 1,000 bopd -- the chart never dips. min/max decimation keeps > the zero, so the outage is unmissable. Why does this gap matter more for a 1-day > event than for a 30-day shut-in?

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import numpy as np
import pandas as pd

# ── Verified Chapter 20 field surveillance engine (do not edit) ──────────

# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
    ("OD-001", 1500, 0.22, 0.6, "stale"),      # feed stopped reporting 3 days ago
    ("OD-002", 2100, 0.30, 0.7, None),
    ("OD-003", 2600, 0.28, 0.8, "decline"),    # recent step drop -> steep 30-day decline
    ("OD-004", 1800, 0.20, 0.5, None),         # the steady earner
    ("OD-005", 1200, 0.35, 0.7, "downtime"),   # frequent outages -> low uptime
    ("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730

def make_field(seed=11):
    """A field's DAILY production surveillance feed (long format). Sensor noise,
    real outages, and three planted problems -- a stale feed, a steep decline, and
    a low-uptime well -- the raw stream a monitoring dashboard sits on top of."""
    rng = np.random.default_rng(seed)
    rows = []
    for wid, qi, Di_yr, b, problem in WELLS:
        Di = Di_yr / 365.0
        t = np.arange(DAYS)
        q = qi / np.power(1 + b * Di * t, 1.0 / b)
        q = q * rng.normal(1.0, 0.03, DAYS)
        if problem == "decline":                       # a pressure/liquid-loading hit: -45% over last 40 d
            q[-40:] *= np.linspace(1.0, 0.55, 40)
        if problem == "downtime":                      # chronic intermittent producer
            q[rng.random(DAYS) < 0.20] = 0.0           # ~20% of days down -> ~80% uptime
        else:
            for _ in range(rng.integers(2, 5)):        # occasional multi-day outage
                s = rng.integers(0, DAYS - 6)
                q[s:s + rng.integers(1, 6)] = 0.0
        last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))
        q = np.maximum(q[:last], 0.0)
        rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
    return pd.concat(rows, ignore_index=True)

def well_kpis(field):
    """The surveillance scorecard: one row per well, the numbers a foreman reads."""
    asof = field.day.max()
    out = []
    for w, g in field.groupby("well"):
        g = g.sort_values("day")
        rate = g.oil_bopd.values
        prod = rate > 0
        last_rate = float(rate[prod][-1]) if prod.any() else 0.0
        recent, prior = rate[-30:], rate[-60:-30]
        a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
        a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
        decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
        out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
                        uptime_pct=round(float(prod.mean() * 100), 1),
                        cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
    return pd.DataFrame(out)

def downsample(day, rate, n_buckets=120):
    """min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
    spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
    if len(day) <= 2 * n_buckets:
        return day, rate
    keep = []
    for chunk in np.array_split(np.arange(len(day)), n_buckets):
        r = rate[chunk]
        keep.append(chunk[r.argmin()])
        keep.append(chunk[r.argmax()])
    keep = np.unique(keep)
    return day[keep], rate[keep]
# ── end do-not-edit ───────────────────────────────────────────

def reductions(day, rate, n_buckets=100):
    """Reduce a series two ways; return (smallest bucket mean, smallest sample min/max keeps)."""
    # TODO: buckets = np.array_split(np.arange(len(day)), n_buckets)
    # TODO: mean_min = min(rate[b].mean() for b in buckets)
    # TODO: _, dr = downsample(day, rate, n_buckets)
    # TODO: return float(mean_min), float(dr.min())
    return 0.0, 0.0

FIELD = make_field()
_g = FIELD[FIELD.well == "OD-002"].sort_values("day")
day = _g.day.values
rate = _g.oil_bopd.values.copy()
rate[400] = 0.0
mean_resample_min, minmax_min = reductions(day, rate, 100)
hidden_by = mean_resample_min - minmax_min
print("mean min:", mean_resample_min, "minmax min:", minmax_min)

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
import pandas as pd

# ── Verified Chapter 20 field surveillance engine (do not edit) ──────────

# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
    ("OD-001", 1500, 0.22, 0.6, "stale"),      # feed stopped reporting 3 days ago
    ("OD-002", 2100, 0.30, 0.7, None),
    ("OD-003", 2600, 0.28, 0.8, "decline"),    # recent step drop -> steep 30-day decline
    ("OD-004", 1800, 0.20, 0.5, None),         # the steady earner
    ("OD-005", 1200, 0.35, 0.7, "downtime"),   # frequent outages -> low uptime
    ("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730


def make_field(seed=11):
    """A field's DAILY production surveillance feed (long format). Sensor noise,
    real outages, and three planted problems -- a stale feed, a steep decline, and
    a low-uptime well -- the raw stream a monitoring dashboard sits on top of."""
    rng = np.random.default_rng(seed)
    rows = []
    for wid, qi, Di_yr, b, problem in WELLS:
        Di = Di_yr / 365.0
        t = np.arange(DAYS)
        q = qi / np.power(1 + b * Di * t, 1.0 / b)
        q = q * rng.normal(1.0, 0.03, DAYS)
        if problem == "decline":                       # a pressure/liquid-loading hit: -45% over last 40 d
            q[-40:] *= np.linspace(1.0, 0.55, 40)
        if problem == "downtime":                      # chronic intermittent producer
            q[rng.random(DAYS) < 0.20] = 0.0           # ~20% of days down -> ~80% uptime
        else:
            for _ in range(rng.integers(2, 5)):        # occasional multi-day outage
                s = rng.integers(0, DAYS - 6)
                q[s:s + rng.integers(1, 6)] = 0.0
        last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))
        q = np.maximum(q[:last], 0.0)
        rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
    return pd.concat(rows, ignore_index=True)


def well_kpis(field):
    """The surveillance scorecard: one row per well, the numbers a foreman reads."""
    asof = field.day.max()
    out = []
    for w, g in field.groupby("well"):
        g = g.sort_values("day")
        rate = g.oil_bopd.values
        prod = rate > 0
        last_rate = float(rate[prod][-1]) if prod.any() else 0.0
        recent, prior = rate[-30:], rate[-60:-30]
        a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
        a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
        decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
        out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
                        uptime_pct=round(float(prod.mean() * 100), 1),
                        cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
    return pd.DataFrame(out)


def downsample(day, rate, n_buckets=120):
    """min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
    spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
    if len(day) <= 2 * n_buckets:
        return day, rate
    keep = []
    for chunk in np.array_split(np.arange(len(day)), n_buckets):
        r = rate[chunk]
        keep.append(chunk[r.argmin()])
        keep.append(chunk[r.argmax()])
    keep = np.unique(keep)
    return day[keep], rate[keep]
# ── end do-not-edit ───────────────────────────────────────────


def reductions(day, rate, n_buckets=100):
    """Reduce a series two ways; return (smallest bucket mean, smallest sample min/max keeps)."""
    buckets = np.array_split(np.arange(len(day)), n_buckets)
    mean_min = min(rate[b].mean() for b in buckets)
    _, dr = downsample(day, rate, n_buckets)
    return float(mean_min), float(dr.min())


FIELD = make_field()
_g = FIELD[FIELD.well == "OD-002"].sort_values("day")
day = _g.day.values
rate = _g.oil_bopd.values.copy()
rate[400] = 0.0                                  # inject a one-day outage
mean_resample_min, minmax_min = reductions(day, rate, 100)
hidden_by = mean_resample_min - minmax_min

print(f"mean-resample lowest: {mean_resample_min:.1f} bopd   min/max lowest: {minmax_min:.1f} bopd")
print(f"the outage mean-resampling hides: {hidden_by:.1f} bopd")

lockCopying code is a Full Access feature.

arrow_back

20.1 A KPI That Changes the Call -- Days of Inventory

20.3 Tune the Alarm -- The Staleness Threshold

arrow_forward