Exerciseschevron_rightChapter 14chevron_right14.1
fitness_center

Exercise 14.1

Drilling Data Cleanup - Filter to Active Drilling Intervals

Level 1
Chapter 14: Drilling Analytics
descriptionProblem

Load a drilling dataset and filter it to active drilling intervals only (ROP > 0, WOB > 5 klbf, RPM > 30). What percentage of the raw data represents actual drilling vs. tripping, circulating, and other activities?

---

A raw drilling log is never 100% drilling. While the bit is on bottom cutting rock the channels look healthy, but the same file also holds tripping rows (pulling pipe, ROP = 0), circulating rows (mud moving but no rock cut, WOB <= 5 klbf), and connection rows (adding a stand, RPM <= 30). Before you compute a single efficiency metric you must strip those out, exactly as the book says: a row is active drilling only when

ROP > 0   AND   WOB > 5 klbf   AND   RPM > 30

A verified 240-row OML-127 raw log raw_log is embedded for you (do not edit it. It has columns 'Depth (ft)', 'ROP (ft/hr)', 'WOB (klbf)', 'RPM', built deterministically with np.random.seed(140) from one active block plus known tripping / circulating / connection rows.

Your tasks:

  1. Write filter_active_drilling(df, rop_min=0.0, wob_min=5.0, rpm_min=30.0)

that returns a new DataFrame containing only the rows where ROP (ft/hr) > rop_min AND WOB (klbf) > wob_min AND RPM > rpm_min. Use strict > on all three. A row sitting exactly at a threshold is NOT active drilling.

  1. Write active_fraction(df, **kw) that returns the active percentage as a

float in [0, 100]: 100 * len(filter_active_drilling(df, kw)) / len(df). It must forward kw to filter_active_drilling so custom thresholds work.

  1. Build the output variables the tests read:
  • active_df = filter_active_drilling(raw_log) (the cleaned DataFrame)
  • n_active = int(len(active_df))
  • n_total = int(len(raw_log))
  • active_pct = active_fraction(raw_log) (a float)

> Think about it: the default rop_min=0.0 plus the book's wob_min=5.0, > rpm_min=30.0 is the standard active-drilling filter. Why does using strict > > instead of >= matter for a circulating row logged with exactly > WOB = 5.0? And once you have active_pct, what does the gap below 100% > tell the drilling engineer about rig time spent off bottom?

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution
main.py
visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
import pandas as pd


# ── verified raw drilling log (do not edit) ──────────────────────────────
# Deterministic OML-127 raw log: 240 rows = one active-drilling block plus
# KNOWN non-drilling rows. Active drilling (per the book) is the row mask
#   ROP (ft/hr) > 0  AND  WOB (klbf) > 5  AND  RPM > 30.
# Injected non-drilling rows each fail exactly one channel:
#   tripping    -> ROP = 0          (no rock cut)
#   circulating -> WOB = 2.0 (<=5)  (mud moving, bit off bottom)
#   connection  -> RPM = 0  (<=30)  (drillstring not turning)
def _build_raw_log():
    np.random.seed(140)
    n_total = 240
    n_active = 180  # rows 0..179 are the active-drilling block
    depth = 8000.0 + np.cumsum(np.full(n_total, 5.0))
    rop = np.zeros(n_total)
    wob = np.zeros(n_total)
    rpm = np.zeros(n_total)

    # Active-drilling block: every channel clears its threshold (ROP>0, WOB>5, RPM>30).
    rop[:n_active] = np.round(np.random.uniform(1.0, 120.0, n_active), 1)
    wob[:n_active] = np.round(np.random.uniform(12.0, 40.0, n_active), 1)
    rpm[:n_active] = np.round(np.random.uniform(60.0, 170.0, n_active)).astype(float)

    # Tripping rows (180..199): ROP = 0.
    s = slice(180, 200)
    rop[s] = 0.0
    wob[s] = np.round(np.random.uniform(8.0, 20.0, 20), 1)
    rpm[s] = np.round(np.random.uniform(40.0, 120.0, 20)).astype(float)

    # Circulating rows (200..219): WOB = 2.0 (<= 5).
    s = slice(200, 220)
    rop[s] = np.round(np.random.uniform(0.5, 5.0, 20), 1)
    wob[s] = 2.0
    rpm[s] = np.round(np.random.uniform(40.0, 120.0, 20)).astype(float)

    # Connection rows (220..239): RPM = 0 (<= 30).
    s = slice(220, 240)
    rop[s] = np.round(np.random.uniform(0.5, 5.0, 20), 1)
    wob[s] = np.round(np.random.uniform(8.0, 20.0, 20), 1)
    rpm[s] = 0.0

    return pd.DataFrame({
        "Depth (ft)": np.round(depth, 1),
        "ROP (ft/hr)": rop,
        "WOB (klbf)": wob,
        "RPM": rpm.astype(int),
    })


raw_log = _build_raw_log()


def filter_active_drilling(df, rop_min=0.0, wob_min=5.0, rpm_min=30.0):
    """Return only the active-drilling rows of a drilling log.

    A row is active drilling when (book thresholds, all STRICT >):
        ROP (ft/hr) > rop_min  AND  WOB (klbf) > wob_min  AND  RPM > rpm_min
    """
    mask = (
        (df["ROP (ft/hr)"] > rop_min)
        & (df["WOB (klbf)"] > wob_min)
        & (df["RPM"] > rpm_min)
    )
    return df[mask]


def active_fraction(df, **kw):
    """Return the active-drilling percentage (0-100) of a log."""
    return 100.0 * len(filter_active_drilling(df, **kw)) / len(df)


active_df = filter_active_drilling(raw_log)
n_active = int(len(active_df))
n_total = int(len(raw_log))
active_pct = active_fraction(raw_log)

print("total rows:", n_total, " active rows:", n_active)
print("active drilling fraction (%):", active_pct)

lockCopying code is a Full Access feature.