Exercise 2.6
Daily Production Report Cleaner
Every morning, the field office emails a CSV-style dump of the previous day's production. Some lines are clean. Some are missing fields. Some have a typo where the oil rate should be. A real surveillance pipeline never crashes on the bad records. It parses what it can, logs what it can't, and produces a per-well summary the morning meeting can act on.
You have 15 daily records covering 5 wells (OD-001 through OD-005) in this format:
"well_name, date, oil_bopd, water_bwpd"Some records are clean; some are missing the water rate; some have non-numeric values where the oil rate should be.
Build the pipeline in three layers:
### 1. parse_record(record: str) -> dict | None
Parse a single comma-separated record string into a dict with keys well, date, oil_bopd, water_bwpd. Return None on any of:
- The record has fewer than 4 fields (
IndexError). oil_bopdorwater_bwpdcannot be converted to float (ValueError).oil_bopdis negative (sensor error, non-physical).
Catch the exceptions; do not crash the caller.
### 2. summarise(records: list[str]) -> dict
Apply parse_record to every record. Return a dict mapping each well name to a dict with these keys:
n_records: number of valid parsed records for that wellmean_oil: the arithmetic mean ofoil_bopdacross those recordstotal_oil: the sum ofoil_bopdacross those records
Wells with zero valid records must be omitted from the result dict. (No NaN, no zero; just leave the key out.)
### 3. Run the pipeline on the data below
The 15 records in the FIXTURES list. Compute the summary into a variable called report and verify it has the right shape.
> Think about it: in the real surveillance pipeline, what additional > outputs would the morning meeting want? What would you log at WARNING > level vs. ERROR level when records fail to parse?
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
records = [
"OD-001, 2026-04-01, 1842, 1100",
"OD-001, 2026-04-02, 1810, 1150",
"OD-001, 2026-04-03, 1780, 1200",
"OD-002, 2026-04-01, 965, 2100",
"OD-002, 2026-04-02, 950",
"OD-002, 2026-04-03, 940, 2050",
"OD-003, 2026-04-01, N/A, 1800",
"OD-003, 2026-04-02, 720, 1850",
"OD-003, 2026-04-03, -50, 1900",
"OD-004, 2026-04-01, typo, 800",
"OD-004, 2026-04-02, 1200",
"OD-004, 2026-04-03, -100, 850",
"OD-005, 2026-04-01, 2200, 890",
"OD-005, 2026-04-02, 2180, 920",
"OD-005, 2026-04-03, 2150, 950",
]
def parse_record(record):
try:
parts = record.split(",")
well = parts[0].strip()
date = parts[1].strip()
oil_bopd = float(parts[2].strip())
water_bwpd = float(parts[3].strip())
except (IndexError, ValueError):
return None
# Sensor sanity: a negative oil rate is non-physical.
if oil_bopd < 0:
return None
return {
"well": well,
"date": date,
"oil_bopd": oil_bopd,
"water_bwpd": water_bwpd,
}
def summarise(records):
# Bucket valid records by well.
by_well = {}
for r in records:
parsed = parse_record(r)
if parsed is None:
continue
by_well.setdefault(parsed["well"], []).append(parsed["oil_bopd"])
# Compute aggregates. Wells with zero valid records are simply absent
# from `by_well`, so they're naturally omitted from the output.
return {
well: {
"n_records": len(rates),
"mean_oil": sum(rates) / len(rates),
"total_oil": sum(rates),
}
for well, rates in by_well.items()
}
report = summarise(records)
for well, stats in sorted(report.items()):
print(
f"{well}: n={stats['n_records']}, "
f"mean={stats['mean_oil']:.1f} bopd, "
f"total={stats['total_oil']:.0f} bopd"
)
lockCopying code is a Full Access feature.