Multi-Well Data Merge

Level 2

Chapter 6: Petroleum Data Sources

descriptionProblem

Production data and well headers live in separate systems. The production database has rates by well and month; the well-master has location, operator, spud date. To rank wells, map the field, or do anything spatial, you join them. Then you allocate: who's actually carrying the field?

The starter gives a headers DataFrame (well, field, lat, lon, operator) and a prod DataFrame (well, date, oil_bopd, days_on). Write four functions:

merge_well_data(headers, prod): left-join the header columns onto every

production row (merge on well).

field_monthly_oil(prod): total field oil rate per month

(groupby("date")["oil_bopd"].sum()), a Series indexed by date.

well_contribution(prod): a DataFrame with one row per well: its

cum_oil_bbl (cumulative oil volume = Σ oil_bopd × days_on) and its pct_of_field (% of total field volume), sorted by volume descending.

top_producer(prod): the well with the most cumulative oil.

This contribution table is the first slide of every field review.

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import pandas as pd

headers = pd.DataFrame([
    {"well": "OD-001", "field": "OML 58", "lat": 4.77, "lon": 7.01, "operator": "NATIONAL PETRO"},
    {"well": "OD-002", "field": "OML 58", "lat": 4.78, "lon": 7.02, "operator": "NATIONAL PETRO"},
    {"well": "OD-003", "field": "OML 58", "lat": 4.76, "lon": 7.00, "operator": "NATIONAL PETRO"},
])

prod = pd.DataFrame([
    {"well": "OD-001", "date": "2023-01-01", "oil_bopd": 2400, "days_on": 31},
    {"well": "OD-001", "date": "2023-02-01", "oil_bopd": 2350, "days_on": 28},
    {"well": "OD-001", "date": "2023-03-01", "oil_bopd": 2300, "days_on": 31},
    {"well": "OD-002", "date": "2023-01-01", "oil_bopd": 1800, "days_on": 31},
    {"well": "OD-002", "date": "2023-02-01", "oil_bopd": 1750, "days_on": 28},
    {"well": "OD-002", "date": "2023-03-01", "oil_bopd": 1700, "days_on": 31},
    {"well": "OD-003", "date": "2023-01-01", "oil_bopd": 3100, "days_on": 31},
    {"well": "OD-003", "date": "2023-02-01", "oil_bopd": 3000, "days_on": 28},
    {"well": "OD-003", "date": "2023-03-01", "oil_bopd": 2950, "days_on": 31},
])
prod["date"] = pd.to_datetime(prod["date"])

def merge_well_data(headers, prod):
    """Left-join header columns onto each production row."""
    # TODO
    pass

def field_monthly_oil(prod):
    """Series: total field oil rate per month."""
    # TODO
    pass

def well_contribution(prod):
    """DataFrame: well, cum_oil_bbl, pct_of_field, sorted by volume desc."""
    # TODO
    pass

def top_producer(prod):
    """Name of the well with the most cumulative oil."""
    # TODO
    pass

print(well_contribution(prod))
print("Top producer:", top_producer(prod))

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import pandas as pd

headers = pd.DataFrame([
    {"well": "OD-001", "field": "OML 58", "lat": 4.77, "lon": 7.01, "operator": "NATIONAL PETRO"},
    {"well": "OD-002", "field": "OML 58", "lat": 4.78, "lon": 7.02, "operator": "NATIONAL PETRO"},
    {"well": "OD-003", "field": "OML 58", "lat": 4.76, "lon": 7.00, "operator": "NATIONAL PETRO"},
])

prod = pd.DataFrame([
    {"well": "OD-001", "date": "2023-01-01", "oil_bopd": 2400, "days_on": 31},
    {"well": "OD-001", "date": "2023-02-01", "oil_bopd": 2350, "days_on": 28},
    {"well": "OD-001", "date": "2023-03-01", "oil_bopd": 2300, "days_on": 31},
    {"well": "OD-002", "date": "2023-01-01", "oil_bopd": 1800, "days_on": 31},
    {"well": "OD-002", "date": "2023-02-01", "oil_bopd": 1750, "days_on": 28},
    {"well": "OD-002", "date": "2023-03-01", "oil_bopd": 1700, "days_on": 31},
    {"well": "OD-003", "date": "2023-01-01", "oil_bopd": 3100, "days_on": 31},
    {"well": "OD-003", "date": "2023-02-01", "oil_bopd": 3000, "days_on": 28},
    {"well": "OD-003", "date": "2023-03-01", "oil_bopd": 2950, "days_on": 31},
])
prod["date"] = pd.to_datetime(prod["date"])


def merge_well_data(headers, prod):
    return prod.merge(headers, on="well", how="left")


def field_monthly_oil(prod):
    return prod.groupby("date")["oil_bopd"].sum()


def well_contribution(prod):
    volume = prod["oil_bopd"] * prod["days_on"]
    cum = volume.groupby(prod["well"]).sum().rename("cum_oil_bbl").reset_index()
    cum["pct_of_field"] = 100.0 * cum["cum_oil_bbl"] / cum["cum_oil_bbl"].sum()
    return cum.sort_values("cum_oil_bbl", ascending=False).reset_index(drop=True)


def top_producer(prod):
    return well_contribution(prod).iloc[0]["well"]


contrib = well_contribution(prod)
print(contrib.to_string(index=False))
print(f"Top producer: {top_producer(prod)} "
      f"({contrib.iloc[0]['pct_of_field']:.1f}% of field)")

lockCopying code is a Full Access feature.

arrow_back

6.4 Unit Converter for Log Data

6.6 Data Gap Analysis

arrow_forward