Exercise 6.5
Multi-Well Data Merge
Production data and well headers live in separate systems. The production database has rates by well and month; the well-master has location, operator, spud date. To rank wells, map the field, or do anything spatial, you join them. Then you allocate: who's actually carrying the field?
The starter gives a headers DataFrame (well, field, lat, lon, operator) and a prod DataFrame (well, date, oil_bopd, days_on). Write four functions:
merge_well_data(headers, prod): left-join the header columns onto every
production row (merge on well).
field_monthly_oil(prod): total field oil rate per month
(groupby("date")["oil_bopd"].sum()), a Series indexed by date.
well_contribution(prod): a DataFrame with one row per well: its
cum_oil_bbl (cumulative oil volume = Σ oil_bopd × days_on) and its pct_of_field (% of total field volume), sorted by volume descending.
top_producer(prod): the well with the most cumulative oil.
This contribution table is the first slide of every field review.
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import pandas as pd
headers = pd.DataFrame([
{"well": "OD-001", "field": "OML 58", "lat": 4.77, "lon": 7.01, "operator": "NATIONAL PETRO"},
{"well": "OD-002", "field": "OML 58", "lat": 4.78, "lon": 7.02, "operator": "NATIONAL PETRO"},
{"well": "OD-003", "field": "OML 58", "lat": 4.76, "lon": 7.00, "operator": "NATIONAL PETRO"},
])
prod = pd.DataFrame([
{"well": "OD-001", "date": "2023-01-01", "oil_bopd": 2400, "days_on": 31},
{"well": "OD-001", "date": "2023-02-01", "oil_bopd": 2350, "days_on": 28},
{"well": "OD-001", "date": "2023-03-01", "oil_bopd": 2300, "days_on": 31},
{"well": "OD-002", "date": "2023-01-01", "oil_bopd": 1800, "days_on": 31},
{"well": "OD-002", "date": "2023-02-01", "oil_bopd": 1750, "days_on": 28},
{"well": "OD-002", "date": "2023-03-01", "oil_bopd": 1700, "days_on": 31},
{"well": "OD-003", "date": "2023-01-01", "oil_bopd": 3100, "days_on": 31},
{"well": "OD-003", "date": "2023-02-01", "oil_bopd": 3000, "days_on": 28},
{"well": "OD-003", "date": "2023-03-01", "oil_bopd": 2950, "days_on": 31},
])
prod["date"] = pd.to_datetime(prod["date"])
def merge_well_data(headers, prod):
return prod.merge(headers, on="well", how="left")
def field_monthly_oil(prod):
return prod.groupby("date")["oil_bopd"].sum()
def well_contribution(prod):
volume = prod["oil_bopd"] * prod["days_on"]
cum = volume.groupby(prod["well"]).sum().rename("cum_oil_bbl").reset_index()
cum["pct_of_field"] = 100.0 * cum["cum_oil_bbl"] / cum["cum_oil_bbl"].sum()
return cum.sort_values("cum_oil_bbl", ascending=False).reset_index(drop=True)
def top_producer(prod):
return well_contribution(prod).iloc[0]["well"]
contrib = well_contribution(prod)
print(contrib.to_string(index=False))
print(f"Top producer: {top_producer(prod)} "
f"({contrib.iloc[0]['pct_of_field']:.1f}% of field)")
lockCopying code is a Full Access feature.