Exerciseschevron_rightChapter 3chevron_right3.10
fitness_center

Exercise 3.10

Data Format Converter

Level 3
Chapter 3: Data Structures
descriptionProblem

Reshaping data between formats is one of the most common jobs in petroleum data engineering. You're given records, a flat list of monthly readings (the rows you'd get from a CSV), not sorted by month. Convert it two ways and prove the conversion is lossless.

Write three functions:

  1. group_by_well(records): return a dict mapping each well to a list

of its records sorted by month (the JSON-by-well shape: one array of monthly records per well).

  1. well_summary(records): return a list with one dict per well holding:

well, avg_oil_bopd, avg_water_bwpd, cum_oil_bbl, cum_water_bbl, and latest_water_cut. Convert a monthly rate to a volume with DAYS_PER_MONTH = 30 (so cumulative oil = sum of rates × 30). latest_water_cut uses the latest month's oil and water.

  1. round_trip_ok(records): serialise the grouped structure with

json.dumps, read it back with json.loads, and confirm the total oil survives the round trip (return True/False). If a conversion changes your totals, it isn't lossless; and silently lossy conversions are how production numbers drift.

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution
main.py
visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import json

records = [
    {"well": "OD-001", "month": 2, "oil_bopd": 1460, "water_bwpd": 120},
    {"well": "OD-001", "month": 1, "oil_bopd": 1500, "water_bwpd": 100},
    {"well": "OD-001", "month": 3, "oil_bopd": 1420, "water_bwpd": 150},
    {"well": "OD-003", "month": 1, "oil_bopd": 1200, "water_bwpd": 300},
    {"well": "OD-003", "month": 3, "oil_bopd": 1100, "water_bwpd": 420},
    {"well": "OD-003", "month": 2, "oil_bopd": 1150, "water_bwpd": 360},
    {"well": "OD-005", "month": 1, "oil_bopd":  800, "water_bwpd":  50},
    {"well": "OD-005", "month": 2, "oil_bopd":  790, "water_bwpd":  60},
    {"well": "OD-005", "month": 3, "oil_bopd":  770, "water_bwpd":  75},
]

DAYS_PER_MONTH = 30


def group_by_well(records):
    grouped = {}
    for r in records:
        grouped.setdefault(r["well"], []).append(r)
    for well in grouped:
        grouped[well].sort(key=lambda r: r["month"])
    return grouped


def well_summary(records):
    summary = []
    for well, rows in group_by_well(records).items():
        oils = [r["oil_bopd"] for r in rows]
        waters = [r["water_bwpd"] for r in rows]
        latest = rows[-1]  # rows are sorted by month
        summary.append({
            "well": well,
            "avg_oil_bopd": sum(oils) / len(oils),
            "avg_water_bwpd": sum(waters) / len(waters),
            "cum_oil_bbl": sum(oils) * DAYS_PER_MONTH,
            "cum_water_bbl": sum(waters) * DAYS_PER_MONTH,
            "latest_water_cut": latest["water_bwpd"] / (latest["oil_bopd"] + latest["water_bwpd"]),
        })
    return summary


def round_trip_ok(records):
    grouped = group_by_well(records)
    restored = json.loads(json.dumps(grouped))
    original_total = sum(r["oil_bopd"] for r in records)
    restored_total = sum(r["oil_bopd"] for rows in restored.values() for r in rows)
    return original_total == restored_total


for s in well_summary(records):
    print(f"{s['well']}: avg oil {s['avg_oil_bopd']:.0f} bopd, "
          f"cum oil {s['cum_oil_bbl']:,} bbl, latest WC {s['latest_water_cut']:.1%}")
print("Lossless:", round_trip_ok(records))

lockCopying code is a Full Access feature.