Exerciseschevron_rightChapter 2chevron_right2.6
fitness_center

Exercise 2.6

Daily Production Report Cleaner

Level 3
Chapter 2: Python Essentials
descriptionProblem

Every morning, the field office emails a CSV-style dump of the previous day's production. Some lines are clean. Some are missing fields. Some have a typo where the oil rate should be. A real surveillance pipeline never crashes on the bad records. It parses what it can, logs what it can't, and produces a per-well summary the morning meeting can act on.

You have 15 daily records covering 5 wells (OD-001 through OD-005) in this format:

"well_name, date, oil_bopd, water_bwpd"

Some records are clean; some are missing the water rate; some have non-numeric values where the oil rate should be.

Build the pipeline in three layers:

### 1. parse_record(record: str) -> dict | None

Parse a single comma-separated record string into a dict with keys well, date, oil_bopd, water_bwpd. Return None on any of:

  • The record has fewer than 4 fields (IndexError).
  • oil_bopd or water_bwpd cannot be converted to float (ValueError).
  • oil_bopd is negative (sensor error, non-physical).

Catch the exceptions; do not crash the caller.

### 2. summarise(records: list[str]) -> dict

Apply parse_record to every record. Return a dict mapping each well name to a dict with these keys:

  • n_records: number of valid parsed records for that well
  • mean_oil: the arithmetic mean of oil_bopd across those records
  • total_oil: the sum of oil_bopd across those records

Wells with zero valid records must be omitted from the result dict. (No NaN, no zero; just leave the key out.)

### 3. Run the pipeline on the data below

The 15 records in the FIXTURES list. Compute the summary into a variable called report and verify it has the right shape.

> Think about it: in the real surveillance pipeline, what additional > outputs would the morning meeting want? What would you log at WARNING > level vs. ERROR level when records fail to parse?

lightbulbHints (0/4)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution
main.py
visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

records = [
    "OD-001, 2026-04-01, 1842, 1100",
    "OD-001, 2026-04-02, 1810, 1150",
    "OD-001, 2026-04-03, 1780, 1200",
    "OD-002, 2026-04-01, 965, 2100",
    "OD-002, 2026-04-02, 950",
    "OD-002, 2026-04-03, 940, 2050",
    "OD-003, 2026-04-01, N/A, 1800",
    "OD-003, 2026-04-02, 720, 1850",
    "OD-003, 2026-04-03, -50, 1900",
    "OD-004, 2026-04-01, typo, 800",
    "OD-004, 2026-04-02, 1200",
    "OD-004, 2026-04-03, -100, 850",
    "OD-005, 2026-04-01, 2200,  890",
    "OD-005, 2026-04-02, 2180,  920",
    "OD-005, 2026-04-03, 2150,  950",
]


def parse_record(record):
    try:
        parts = record.split(",")
        well       =        parts[0].strip()
        date       =        parts[1].strip()
        oil_bopd   = float(parts[2].strip())
        water_bwpd = float(parts[3].strip())
    except (IndexError, ValueError):
        return None

    # Sensor sanity: a negative oil rate is non-physical.
    if oil_bopd < 0:
        return None

    return {
        "well": well,
        "date": date,
        "oil_bopd": oil_bopd,
        "water_bwpd": water_bwpd,
    }


def summarise(records):
    # Bucket valid records by well.
    by_well = {}
    for r in records:
        parsed = parse_record(r)
        if parsed is None:
            continue
        by_well.setdefault(parsed["well"], []).append(parsed["oil_bopd"])

    # Compute aggregates. Wells with zero valid records are simply absent
    # from `by_well`, so they're naturally omitted from the output.
    return {
        well: {
            "n_records": len(rates),
            "mean_oil":  sum(rates) / len(rates),
            "total_oil": sum(rates),
        }
        for well, rates in by_well.items()
    }


report = summarise(records)
for well, stats in sorted(report.items()):
    print(
        f"{well}: n={stats['n_records']}, "
        f"mean={stats['mean_oil']:.1f} bopd, "
        f"total={stats['total_oil']:.0f} bopd"
    )

lockCopying code is a Full Access feature.