Exercise 3.3
Production Data Reconciliation
Two departments hand you two lists of well identifiers, one from the drilling database, one from the production database. They almost never match, and finding the discrepancies is a daily data-engineering task.
You are given:
drilled = {"OD-001", "OD-002", "OD-003", "OD-004", "OD-005",
"OD-006", "OD-007", "OD-008", "OD-009", "OD-010"}
produced = {"OD-001", "OD-003", "OD-005", "OD-007", "OD-008",
"OD-009", "OD-011", "OD-012"}Using set operations, compute these four variables:
both: wells present in both databasesdrilled_only: drilled but never producedproduced_only: producing without drilling recordsall_known: the union of both lists
Each of these maps to one petroleum-data-engineering question worth discussing: drilled-but-never-produced wells may be on the abandonment list; producing-without-drilling records may be a missing-data quality issue worth chasing.
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
drilled = {"OD-001", "OD-002", "OD-003", "OD-004", "OD-005",
"OD-006", "OD-007", "OD-008", "OD-009", "OD-010"}
produced = {"OD-001", "OD-003", "OD-005", "OD-007", "OD-008",
"OD-009", "OD-011", "OD-012"}
both = drilled & produced
drilled_only = drilled - produced
produced_only = produced - drilled
all_known = drilled | produced
print(f"Both: {sorted(both)}")
print(f"Drilled only: {sorted(drilled_only)}")
print(f"Produced only: {sorted(produced_only)}")
print(f"All known: {sorted(all_known)} ({len(all_known)} wells)")
lockCopying code is a Full Access feature.