Exercise 6.7
Outlier Detection by Well
A single month that's wildly off the trend is either a real event (a workover that doubled rate, a well test) or a data error (a fat-fingered entry). Either way you want it flagged for a human to look at, never silently averaged in.
The starter builds two years of monthly oil for two wells. OD-001 has one planted spike; OD-002 is clean. Write flag_outliers(df, well_col, value_col, n_sigma=3.0):
- For each well, compute the mean and standard deviation of
value_col. - Compute each row's z-score
(value − mean) / std. - Return the rows whose |z| > n_sigma, with a
z_scorecolumn added.
A well with no outliers contributes nothing. (This is the classic z-score screen; on small samples it's blind, which is why you run it across a full history, and why a flag means "go look", not "delete".)
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import pandas as pd
rows = []
for i in range(24):
rows.append({"well_id": "OD-001", "month": i + 1, "oil_bopd": 2400 - 15 * i})
rows.append({"well_id": "OD-002", "month": i + 1, "oil_bopd": 1800 - 12 * i})
df = pd.DataFrame(rows)
df.loc[(df["well_id"] == "OD-001") & (df["month"] == 11), "oil_bopd"] = 5500
def flag_outliers(df, well_col, value_col, n_sigma=3.0):
flagged = []
for well, g in df.groupby(well_col):
mean = g[value_col].mean()
std = g[value_col].std() # sample std (ddof=1)
if not std or pd.isna(std):
continue
z = (g[value_col] - mean) / std
hit = g[z.abs() > n_sigma].copy()
hit["z_score"] = z[z.abs() > n_sigma]
flagged.append(hit)
if flagged:
return pd.concat(flagged).reset_index(drop=True)
return pd.DataFrame(columns=list(df.columns) + ["z_score"])
out = flag_outliers(df, "well_id", "oil_bopd")
for _, r in out.iterrows():
print(f"{r['well_id']} month {int(r['month'])}: {r['oil_bopd']} bopd "
f"(z = {r['z_score']:+.2f}) -- review")
lockCopying code is a Full Access feature.