Exerciseschevron_rightChapter 3chevron_right3.3
fitness_center

Exercise 3.3

Production Data Reconciliation

Level 1
Chapter 3: Data Structures
descriptionProblem

Two departments hand you two lists of well identifiers, one from the drilling database, one from the production database. They almost never match, and finding the discrepancies is a daily data-engineering task.

You are given:

drilled  = {"OD-001", "OD-002", "OD-003", "OD-004", "OD-005",
            "OD-006", "OD-007", "OD-008", "OD-009", "OD-010"}
produced = {"OD-001", "OD-003", "OD-005", "OD-007", "OD-008",
            "OD-009", "OD-011", "OD-012"}

Using set operations, compute these four variables:

  • both: wells present in both databases
  • drilled_only: drilled but never produced
  • produced_only: producing without drilling records
  • all_known: the union of both lists

Each of these maps to one petroleum-data-engineering question worth discussing: drilled-but-never-produced wells may be on the abandonment list; producing-without-drilling records may be a missing-data quality issue worth chasing.

lightbulbHints (0/4)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution
main.py
visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

drilled = {"OD-001", "OD-002", "OD-003", "OD-004", "OD-005",
           "OD-006", "OD-007", "OD-008", "OD-009", "OD-010"}
produced = {"OD-001", "OD-003", "OD-005", "OD-007", "OD-008",
            "OD-009", "OD-011", "OD-012"}

both          = drilled & produced
drilled_only  = drilled - produced
produced_only = produced - drilled
all_known     = drilled | produced

print(f"Both:          {sorted(both)}")
print(f"Drilled only:  {sorted(drilled_only)}")
print(f"Produced only: {sorted(produced_only)}")
print(f"All known:     {sorted(all_known)}  ({len(all_known)} wells)")

lockCopying code is a Full Access feature.