Exercise 14.1
Drilling Data Cleanup - Filter to Active Drilling Intervals
Load a drilling dataset and filter it to active drilling intervals only (ROP > 0, WOB > 5 klbf, RPM > 30). What percentage of the raw data represents actual drilling vs. tripping, circulating, and other activities?
---
A raw drilling log is never 100% drilling. While the bit is on bottom cutting rock the channels look healthy, but the same file also holds tripping rows (pulling pipe, ROP = 0), circulating rows (mud moving but no rock cut, WOB <= 5 klbf), and connection rows (adding a stand, RPM <= 30). Before you compute a single efficiency metric you must strip those out, exactly as the book says: a row is active drilling only when
ROP > 0 AND WOB > 5 klbf AND RPM > 30A verified 240-row OML-127 raw log raw_log is embedded for you (do not edit it. It has columns 'Depth (ft)', 'ROP (ft/hr)', 'WOB (klbf)', 'RPM', built deterministically with np.random.seed(140) from one active block plus known tripping / circulating / connection rows.
Your tasks:
- Write
filter_active_drilling(df, rop_min=0.0, wob_min=5.0, rpm_min=30.0)
that returns a new DataFrame containing only the rows where ROP (ft/hr) > rop_min AND WOB (klbf) > wob_min AND RPM > rpm_min. Use strict > on all three. A row sitting exactly at a threshold is NOT active drilling.
- Write
active_fraction(df, **kw)that returns the active percentage as a
float in [0, 100]: 100 * len(filter_active_drilling(df, kw)) / len(df). It must forward kw to filter_active_drilling so custom thresholds work.
- Build the output variables the tests read:
active_df=filter_active_drilling(raw_log)(the cleaned DataFrame)n_active=int(len(active_df))n_total=int(len(raw_log))active_pct=active_fraction(raw_log)(a float)
> Think about it: the default rop_min=0.0 plus the book's wob_min=5.0, > rpm_min=30.0 is the standard active-drilling filter. Why does using strict > > instead of >= matter for a circulating row logged with exactly > WOB = 5.0? And once you have active_pct, what does the gap below 100% > tell the drilling engineer about rig time spent off bottom?
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import numpy as np
import pandas as pd
# ── verified raw drilling log (do not edit) ──────────────────────────────
# Deterministic OML-127 raw log: 240 rows = one active-drilling block plus
# KNOWN non-drilling rows. Active drilling (per the book) is the row mask
# ROP (ft/hr) > 0 AND WOB (klbf) > 5 AND RPM > 30.
# Injected non-drilling rows each fail exactly one channel:
# tripping -> ROP = 0 (no rock cut)
# circulating -> WOB = 2.0 (<=5) (mud moving, bit off bottom)
# connection -> RPM = 0 (<=30) (drillstring not turning)
def _build_raw_log():
np.random.seed(140)
n_total = 240
n_active = 180 # rows 0..179 are the active-drilling block
depth = 8000.0 + np.cumsum(np.full(n_total, 5.0))
rop = np.zeros(n_total)
wob = np.zeros(n_total)
rpm = np.zeros(n_total)
# Active-drilling block: every channel clears its threshold (ROP>0, WOB>5, RPM>30).
rop[:n_active] = np.round(np.random.uniform(1.0, 120.0, n_active), 1)
wob[:n_active] = np.round(np.random.uniform(12.0, 40.0, n_active), 1)
rpm[:n_active] = np.round(np.random.uniform(60.0, 170.0, n_active)).astype(float)
# Tripping rows (180..199): ROP = 0.
s = slice(180, 200)
rop[s] = 0.0
wob[s] = np.round(np.random.uniform(8.0, 20.0, 20), 1)
rpm[s] = np.round(np.random.uniform(40.0, 120.0, 20)).astype(float)
# Circulating rows (200..219): WOB = 2.0 (<= 5).
s = slice(200, 220)
rop[s] = np.round(np.random.uniform(0.5, 5.0, 20), 1)
wob[s] = 2.0
rpm[s] = np.round(np.random.uniform(40.0, 120.0, 20)).astype(float)
# Connection rows (220..239): RPM = 0 (<= 30).
s = slice(220, 240)
rop[s] = np.round(np.random.uniform(0.5, 5.0, 20), 1)
wob[s] = np.round(np.random.uniform(8.0, 20.0, 20), 1)
rpm[s] = 0.0
return pd.DataFrame({
"Depth (ft)": np.round(depth, 1),
"ROP (ft/hr)": rop,
"WOB (klbf)": wob,
"RPM": rpm.astype(int),
})
raw_log = _build_raw_log()
def filter_active_drilling(df, rop_min=0.0, wob_min=5.0, rpm_min=30.0):
"""Return only the active-drilling rows of a drilling log.
A row is active drilling when (book thresholds, all STRICT >):
ROP (ft/hr) > rop_min AND WOB (klbf) > wob_min AND RPM > rpm_min
"""
mask = (
(df["ROP (ft/hr)"] > rop_min)
& (df["WOB (klbf)"] > wob_min)
& (df["RPM"] > rpm_min)
)
return df[mask]
def active_fraction(df, **kw):
"""Return the active-drilling percentage (0-100) of a log."""
return 100.0 * len(filter_active_drilling(df, **kw)) / len(df)
active_df = filter_active_drilling(raw_log)
n_active = int(len(active_df))
n_total = int(len(raw_log))
active_pct = active_fraction(raw_log)
print("total rows:", n_total, " active rows:", n_active)
print("active drilling fraction (%):", active_pct)
lockCopying code is a Full Access feature.