Daily Production Report Cleaner

Level 3

Chapter 2: Python Essentials

descriptionProblem

Every morning, the field office emails a CSV-style dump of the previous day's production. Some lines are clean. Some are missing fields. Some have a typo where the oil rate should be. A real surveillance pipeline never crashes on the bad records. It parses what it can, logs what it can't, and produces a per-well summary the morning meeting can act on.

You have 15 daily records covering 5 wells (OD-001 through OD-005) in this format:

"well_name, date, oil_bopd, water_bwpd"

Some records are clean; some are missing the water rate; some have non-numeric values where the oil rate should be.

Build the pipeline in three layers:

### 1. parse_record(record: str) -> dict | None

Parse a single comma-separated record string into a dict with keys well, date, oil_bopd, water_bwpd. Return None on any of:

The record has fewer than 4 fields (IndexError).
oil_bopd or water_bwpd cannot be converted to float (ValueError).
oil_bopd is negative (sensor error, non-physical).

Catch the exceptions; do not crash the caller.

### 2. summarise(records: list[str]) -> dict

Apply parse_record to every record. Return a dict mapping each well name to a dict with these keys:

n_records: number of valid parsed records for that well
mean_oil: the arithmetic mean of oil_bopd across those records
total_oil: the sum of oil_bopd across those records

Wells with zero valid records must be omitted from the result dict. (No NaN, no zero; just leave the key out.)

### 3. Run the pipeline on the data below

The 15 records in the FIXTURES list. Compute the summary into a variable called report and verify it has the right shape.

> Think about it: in the real surveillance pipeline, what additional > outputs would the morning meeting want? What would you log at WARNING > level vs. ERROR level when records fail to parse?

lightbulbHints (0/4)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

records = [
    # OD-001 - three clean records
    "OD-001, 2026-04-01, 1842, 1100",
    "OD-001, 2026-04-02, 1810, 1150",
    "OD-001, 2026-04-03, 1780, 1200",
    # OD-002 - one record is missing the water rate
    "OD-002, 2026-04-01, 965, 2100",
    "OD-002, 2026-04-02, 950",                 # missing water
    "OD-002, 2026-04-03, 940, 2050",
    # OD-003 - bad oil rate, negative oil
    "OD-003, 2026-04-01, N/A, 1800",            # bad oil
    "OD-003, 2026-04-02, 720, 1850",
    "OD-003, 2026-04-03, -50, 1900",            # negative oil - non-physical
    # OD-004 - every record is bad in some way
    "OD-004, 2026-04-01, typo, 800",            # bad oil
    "OD-004, 2026-04-02, 1200",                 # missing water
    "OD-004, 2026-04-03, -100, 850",            # negative oil
    # OD-005 - three clean records
    "OD-005, 2026-04-01, 2200,  890",
    "OD-005, 2026-04-02, 2180,  920",
    "OD-005, 2026-04-03, 2150,  950",
]

def parse_record(record):
    """
    Parse one comma-separated production record.

Return a dict with keys 'well', 'date', 'oil_bopd', 'water_bwpd'
    on success, or None when the record is missing fields, has
    non-numeric numbers, or reports a negative oil rate.
    """
    # TODO: split, convert, validate. Catch IndexError + ValueError.
    pass

def summarise(records):
    """
    Parse every record. Return a dict mapping well-name → {n_records,
    mean_oil, total_oil}. Omit wells with zero valid records.
    """
    # TODO
    pass

report = summarise(records)
for well, stats in sorted(report.items()):
    print(f"{well}: n={stats['n_records']}, "
          f"mean={stats['mean_oil']:.1f} bopd, "
          f"total={stats['total_oil']:.0f} bopd")

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

records = [
    "OD-001, 2026-04-01, 1842, 1100",
    "OD-001, 2026-04-02, 1810, 1150",
    "OD-001, 2026-04-03, 1780, 1200",
    "OD-002, 2026-04-01, 965, 2100",
    "OD-002, 2026-04-02, 950",
    "OD-002, 2026-04-03, 940, 2050",
    "OD-003, 2026-04-01, N/A, 1800",
    "OD-003, 2026-04-02, 720, 1850",
    "OD-003, 2026-04-03, -50, 1900",
    "OD-004, 2026-04-01, typo, 800",
    "OD-004, 2026-04-02, 1200",
    "OD-004, 2026-04-03, -100, 850",
    "OD-005, 2026-04-01, 2200,  890",
    "OD-005, 2026-04-02, 2180,  920",
    "OD-005, 2026-04-03, 2150,  950",
]


def parse_record(record):
    try:
        parts = record.split(",")
        well       =        parts[0].strip()
        date       =        parts[1].strip()
        oil_bopd   = float(parts[2].strip())
        water_bwpd = float(parts[3].strip())
    except (IndexError, ValueError):
        return None

    # Sensor sanity: a negative oil rate is non-physical.
    if oil_bopd < 0:
        return None

    return {
        "well": well,
        "date": date,
        "oil_bopd": oil_bopd,
        "water_bwpd": water_bwpd,
    }


def summarise(records):
    # Bucket valid records by well.
    by_well = {}
    for r in records:
        parsed = parse_record(r)
        if parsed is None:
            continue
        by_well.setdefault(parsed["well"], []).append(parsed["oil_bopd"])

    # Compute aggregates. Wells with zero valid records are simply absent
    # from `by_well`, so they're naturally omitted from the output.
    return {
        well: {
            "n_records": len(rates),
            "mean_oil":  sum(rates) / len(rates),
            "total_oil": sum(rates),
        }
        for well, rates in by_well.items()
    }


report = summarise(records)
for well, stats in sorted(report.items()):
    print(
        f"{well}: n={stats['n_records']}, "
        f"mean={stats['mean_oil']:.1f} bopd, "
        f"total={stats['total_oil']:.0f} bopd"
    )

lockCopying code is a Full Access feature.

arrow_back

2.5 Unit Consistency Checker

3.1 Casing String Inventory

arrow_forward