Exercise 20.2
Prove the Chart Is Honest -- min/max vs Mean Reduction
Take a well, inject a single one-day outage (set one day's rate to zero), and reduce the series both ways, mean-resampling and min/max decimation, to 100 buckets. Show that the mean curve's minimum stays well above zero while the min/max curve's minimum is exactly zero. By how many bopd does mean-resampling hide the outage? Why does this matter more for a 1-day event than a 30-day one?
---
A surveillance chart that averages its data into buckets will quietly erase the one-day outage you are watching for. This exercise proves it: reduce a daily series two ways and compare the lowest value each method lets you see.
The verified engine (make_field, downsample) is embedded under a do-not-edit banner. Write one function:
def reductions(day, rate, n_buckets=100):
"""Reduce a series to n_buckets two ways; return (mean_resample_min, minmax_min).
mean_resample_min = the smallest bucket MEAN; minmax_min = the smallest sample
the min/max decimation keeps."""Exact procedure: split the indices into n_buckets with np.array_split(np.arange(len(day)), n_buckets). mean_resample_min is the minimum over each bucket's mean rate. For minmax_min, call the embedded downsample(day, rate, n_buckets) and take the min of the rates it returns.
At module level, take well OD-002, inject a one-day outage (rate[400] = 0.0), and compute both, plus hidden_by = mean_resample_min - minmax_min.
Expose: reductions, mean_resample_min, minmax_min, hidden_by.
> Think about it: the outage drops the rate to zero for a single day. Mean- > resampling averages that zero in with ~7 healthy neighbours, so the lowest bucket > mean stays around 1,000 bopd -- the chart never dips. min/max decimation keeps > the zero, so the outage is unmissable. Why does this gap matter more for a 1-day > event than for a 30-day shut-in?
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import numpy as np
import pandas as pd
# ── Verified Chapter 20 field surveillance engine (do not edit) ──────────
# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
("OD-001", 1500, 0.22, 0.6, "stale"), # feed stopped reporting 3 days ago
("OD-002", 2100, 0.30, 0.7, None),
("OD-003", 2600, 0.28, 0.8, "decline"), # recent step drop -> steep 30-day decline
("OD-004", 1800, 0.20, 0.5, None), # the steady earner
("OD-005", 1200, 0.35, 0.7, "downtime"), # frequent outages -> low uptime
("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730
def make_field(seed=11):
"""A field's DAILY production surveillance feed (long format). Sensor noise,
real outages, and three planted problems -- a stale feed, a steep decline, and
a low-uptime well -- the raw stream a monitoring dashboard sits on top of."""
rng = np.random.default_rng(seed)
rows = []
for wid, qi, Di_yr, b, problem in WELLS:
Di = Di_yr / 365.0
t = np.arange(DAYS)
q = qi / np.power(1 + b * Di * t, 1.0 / b)
q = q * rng.normal(1.0, 0.03, DAYS)
if problem == "decline": # a pressure/liquid-loading hit: -45% over last 40 d
q[-40:] *= np.linspace(1.0, 0.55, 40)
if problem == "downtime": # chronic intermittent producer
q[rng.random(DAYS) < 0.20] = 0.0 # ~20% of days down -> ~80% uptime
else:
for _ in range(rng.integers(2, 5)): # occasional multi-day outage
s = rng.integers(0, DAYS - 6)
q[s:s + rng.integers(1, 6)] = 0.0
last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))
q = np.maximum(q[:last], 0.0)
rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
return pd.concat(rows, ignore_index=True)
def well_kpis(field):
"""The surveillance scorecard: one row per well, the numbers a foreman reads."""
asof = field.day.max()
out = []
for w, g in field.groupby("well"):
g = g.sort_values("day")
rate = g.oil_bopd.values
prod = rate > 0
last_rate = float(rate[prod][-1]) if prod.any() else 0.0
recent, prior = rate[-30:], rate[-60:-30]
a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
uptime_pct=round(float(prod.mean() * 100), 1),
cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
return pd.DataFrame(out)
def downsample(day, rate, n_buckets=120):
"""min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
if len(day) <= 2 * n_buckets:
return day, rate
keep = []
for chunk in np.array_split(np.arange(len(day)), n_buckets):
r = rate[chunk]
keep.append(chunk[r.argmin()])
keep.append(chunk[r.argmax()])
keep = np.unique(keep)
return day[keep], rate[keep]
# ── end do-not-edit ───────────────────────────────────────────
def reductions(day, rate, n_buckets=100):
"""Reduce a series two ways; return (smallest bucket mean, smallest sample min/max keeps)."""
buckets = np.array_split(np.arange(len(day)), n_buckets)
mean_min = min(rate[b].mean() for b in buckets)
_, dr = downsample(day, rate, n_buckets)
return float(mean_min), float(dr.min())
FIELD = make_field()
_g = FIELD[FIELD.well == "OD-002"].sort_values("day")
day = _g.day.values
rate = _g.oil_bopd.values.copy()
rate[400] = 0.0 # inject a one-day outage
mean_resample_min, minmax_min = reductions(day, rate, 100)
hidden_by = mean_resample_min - minmax_min
print(f"mean-resample lowest: {mean_resample_min:.1f} bopd min/max lowest: {minmax_min:.1f} bopd")
print(f"the outage mean-resampling hides: {hidden_by:.1f} bopd")
lockCopying code is a Full Access feature.