Exercise 3.10
Data Format Converter
Reshaping data between formats is one of the most common jobs in petroleum data engineering. You're given records, a flat list of monthly readings (the rows you'd get from a CSV), not sorted by month. Convert it two ways and prove the conversion is lossless.
Write three functions:
group_by_well(records): return a dict mapping eachwellto a list
of its records sorted by month (the JSON-by-well shape: one array of monthly records per well).
well_summary(records): return a list with one dict per well holding:
well, avg_oil_bopd, avg_water_bwpd, cum_oil_bbl, cum_water_bbl, and latest_water_cut. Convert a monthly rate to a volume with DAYS_PER_MONTH = 30 (so cumulative oil = sum of rates × 30). latest_water_cut uses the latest month's oil and water.
round_trip_ok(records): serialise the grouped structure with
json.dumps, read it back with json.loads, and confirm the total oil survives the round trip (return True/False). If a conversion changes your totals, it isn't lossless; and silently lossy conversions are how production numbers drift.
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import json
records = [
{"well": "OD-001", "month": 2, "oil_bopd": 1460, "water_bwpd": 120},
{"well": "OD-001", "month": 1, "oil_bopd": 1500, "water_bwpd": 100},
{"well": "OD-001", "month": 3, "oil_bopd": 1420, "water_bwpd": 150},
{"well": "OD-003", "month": 1, "oil_bopd": 1200, "water_bwpd": 300},
{"well": "OD-003", "month": 3, "oil_bopd": 1100, "water_bwpd": 420},
{"well": "OD-003", "month": 2, "oil_bopd": 1150, "water_bwpd": 360},
{"well": "OD-005", "month": 1, "oil_bopd": 800, "water_bwpd": 50},
{"well": "OD-005", "month": 2, "oil_bopd": 790, "water_bwpd": 60},
{"well": "OD-005", "month": 3, "oil_bopd": 770, "water_bwpd": 75},
]
DAYS_PER_MONTH = 30
def group_by_well(records):
grouped = {}
for r in records:
grouped.setdefault(r["well"], []).append(r)
for well in grouped:
grouped[well].sort(key=lambda r: r["month"])
return grouped
def well_summary(records):
summary = []
for well, rows in group_by_well(records).items():
oils = [r["oil_bopd"] for r in rows]
waters = [r["water_bwpd"] for r in rows]
latest = rows[-1] # rows are sorted by month
summary.append({
"well": well,
"avg_oil_bopd": sum(oils) / len(oils),
"avg_water_bwpd": sum(waters) / len(waters),
"cum_oil_bbl": sum(oils) * DAYS_PER_MONTH,
"cum_water_bbl": sum(waters) * DAYS_PER_MONTH,
"latest_water_cut": latest["water_bwpd"] / (latest["oil_bopd"] + latest["water_bwpd"]),
})
return summary
def round_trip_ok(records):
grouped = group_by_well(records)
restored = json.loads(json.dumps(grouped))
original_total = sum(r["oil_bopd"] for r in records)
restored_total = sum(r["oil_bopd"] for rows in restored.values() for r in rows)
return original_total == restored_total
for s in well_summary(records):
print(f"{s['well']}: avg oil {s['avg_oil_bopd']:.0f} bopd, "
f"cum oil {s['cum_oil_bbl']:,} bbl, latest WC {s['latest_water_cut']:.1%}")
print("Lossless:", round_trip_ok(records))
lockCopying code is a Full Access feature.