Exerciseschevron_rightChapter 5chevron_right5.4
fitness_center

Exercise 5.4

Regional Comparison with Seaborn

Level 2
Chapter 5: Data Visualization & Plotting
descriptionProblem

Take the synthetic well data provided in the starter (40 wells, each with porosity, IP rate, and water cut), assign each well to a "North" or "South" region, then build a 3-panel comparison using Seaborn.

The starter provides a well_data pandas DataFrame and adds a Region column. Your job is to build a 1×3 figure with:

  1. axes[0]: sns.boxplot of water_cut grouped by Region.
  2. axes[1]: sns.scatterplot of ip_rate vs porosity,

coloured by Region (hue="Region").

  1. axes[2]: sns.barplot of mean ip_rate by Region.

Seaborn's barplot computes the mean automatically and shows a confidence interval as the error bar.

Each panel needs a title; the figure needs a fig.suptitle.

> Think about it: if the boxplots show meaningfully different water > cut distributions between North and South, what geological > explanations might account for that? How would you design a follow-up > analysis to test those hypotheses?

lightbulbHints (0/5)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution
main.py
visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(7)
n = 40
well_data = pd.DataFrame({
    "porosity":   np.clip(np.random.normal(0.18, 0.05, n), 0.05, 0.30),
    "ip_rate":    np.clip(np.random.normal(1500, 400, n), 200, 4000),
    "water_cut":  np.clip(np.random.beta(2, 5, n), 0, 1),
})
well_data["Region"] = np.where(np.random.rand(n) < 0.5, "North", "South")

fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))

sns.boxplot(data=well_data, x="Region", y="water_cut", ax=axes[0])
axes[0].set_title("Water cut by region")
axes[0].set_ylabel("Water cut")

sns.scatterplot(
    data=well_data, x="porosity", y="ip_rate", hue="Region", ax=axes[1]
)
axes[1].set_title("IP rate vs porosity by region")
axes[1].set_xlabel("Porosity (v/v)")
axes[1].set_ylabel("IP rate (bopd)")

sns.barplot(data=well_data, x="Region", y="ip_rate", ax=axes[2])
axes[2].set_title("Mean IP rate by region")
axes[2].set_ylabel("IP rate (bopd)")

fig.suptitle("Regional comparison - North vs South", fontweight="bold")
plt.tight_layout()
plt.show()

lockCopying code is a Full Access feature.