Exercise 5.4
Regional Comparison with Seaborn
Take the synthetic well data provided in the starter (40 wells, each with porosity, IP rate, and water cut), assign each well to a "North" or "South" region, then build a 3-panel comparison using Seaborn.
The starter provides a well_data pandas DataFrame and adds a Region column. Your job is to build a 1×3 figure with:
axes[0]:sns.boxplotofwater_cutgrouped byRegion.axes[1]:sns.scatterplotofip_ratevsporosity,
coloured by Region (hue="Region").
axes[2]:sns.barplotof meanip_ratebyRegion.
Seaborn's barplot computes the mean automatically and shows a confidence interval as the error bar.
Each panel needs a title; the figure needs a fig.suptitle.
> Think about it: if the boxplots show meaningfully different water > cut distributions between North and South, what geological > explanations might account for that? How would you design a follow-up > analysis to test those hypotheses?
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(7)
n = 40
well_data = pd.DataFrame({
"porosity": np.clip(np.random.normal(0.18, 0.05, n), 0.05, 0.30),
"ip_rate": np.clip(np.random.normal(1500, 400, n), 200, 4000),
"water_cut": np.clip(np.random.beta(2, 5, n), 0, 1),
})
well_data["Region"] = np.where(np.random.rand(n) < 0.5, "North", "South")
fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
sns.boxplot(data=well_data, x="Region", y="water_cut", ax=axes[0])
axes[0].set_title("Water cut by region")
axes[0].set_ylabel("Water cut")
sns.scatterplot(
data=well_data, x="porosity", y="ip_rate", hue="Region", ax=axes[1]
)
axes[1].set_title("IP rate vs porosity by region")
axes[1].set_xlabel("Porosity (v/v)")
axes[1].set_ylabel("IP rate (bopd)")
sns.barplot(data=well_data, x="Region", y="ip_rate", ax=axes[2])
axes[2].set_title("Mean IP rate by region")
axes[2].set_ylabel("IP rate (bopd)")
fig.suptitle("Regional comparison - North vs South", fontweight="bold")
plt.tight_layout()
plt.show()
lockCopying code is a Full Access feature.