Anomaly Detection Tuning - Surveillance Window & Sigma Sweep

Level 2

Chapter 13: Production Optimization

descriptionProblem

Using the surveillance framework from this chapter, experiment with different window sizes (7, 14, 30, 60 days) and different sigma thresholds (1.5, 2.0, 2.5, 3.0). For each combination, count the number of true positives, false positives, and missed anomalies. Which combination gives the best balance?

---

We'll tune the exact surveillance series from this chapter on an OML-58 well: a 365-day oil-rate record built from a gentle Arps-style decline plus Gaussian noise, with an injected fault: a sudden ~60% drop at day 200, a partial recovery, then a new lower baseline. The fault is deterministic because the series is seeded with np.random.seed(42); the verified build_series() builder and the SERIES it produces are embedded for you. Do not modify them.

The chapter flags a point as anomalous when the daily rate falls outside a rolling-mean ± σ·rolling-std band. With a short window and a loose σ you catch every wobble (lots of false alarms). With a long window and a tight σ you miss the fault entirely. Your job is to build the tunable detector and read out the trade-off.

Constants embedded for you: DAYS = 365, TRUE_ANOMALY_START = 200 (the index where the injected drop begins), WINDOWS = [7, 14, 30, 60], SIGMAS = [1.5, 2.0, 2.5, 3.0].

Your tasks:

Write count_flags(series, window, sigma):

Wrap series in a pandas.Series.
Compute the rolling mean and rolling std with .rolling(window).
A point is flagged when

rate < mean - sigma*std or rate > mean + sigma*std.

The warm-up region (where the rolling window has fewer than window

points) is NaN and must not be flagged.

Return the flag count as a plain int.

Write tp_fp(series, window, sigma, true_start):

Flag points exactly as in count_flags.
A true positive is a flag at index >= true_start (inside the injected

fault region); a false positive is a flag at index < true_start.

Return the tuple (true_positives, false_positives) as two ints.

Compute and expose these module-scope variables:

``python flags_14_2 = count_flags(SERIES, 14, 2.0) flags_7_2 = count_flags(SERIES, 7, 2.0) tp_14_2, fp_14_2 = tp_fp(SERIES, 14, 2.0, TRUE_ANOMALY_START) ``

> Think about it: the 14-day / 2σ chapter setting flags 15 points; of > those, 7 land inside the real fault (true positives) and 8 are noise upstream > (false positives). Tightening σ to 3.0 on the same window collapses that to a > couple of flags. Why does widening σ always reduce the flag count for a fixed > window? And what does that cost you in missed faults?

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import numpy as np
import pandas as pd

# ── Verified chapter surveillance series builder (do not edit) ───────────
# Reproduces the seed-42 synthetic well from the chapter ch13-surveillance
# cell: gentle Arps-style decline + Gaussian noise, with an injected fault
# (sudden ~60% drop at day 200, partial recovery, new lower baseline).
DAYS = 365

def build_series():
    np.random.seed(42)
    base_rate = 1200 * np.exp(-0.0005 * np.arange(DAYS))
    noise = np.random.normal(0, 30, DAYS)
    normal = base_rate + noise
    anomaly = normal.copy()
    anomaly[200:230] *= 0.4   # sudden 60% drop
    anomaly[230:260] *= 0.7   # partial recovery
    anomaly[260:]    *= 0.85  # new lower baseline
    return np.round(anomaly, 0)

SERIES = build_series()              # length-365 oil-rate array (STB/d)
TRUE_ANOMALY_START = 200             # the injected drop begins at this index
WINDOWS = [7, 14, 30, 60]           # rolling-window sizes to sweep (days)
SIGMAS = [1.5, 2.0, 2.5, 3.0]      # sigma thresholds to sweep

def count_flags(series, window, sigma):
    """Rolling-mean +/- sigma*rolling-std anomaly count.

Flag a point when rate < mean - sigma*std OR rate > mean + sigma*std.
    The NaN warm-up region (first window-1 points) is NOT flagged.
    Return the flag count as a plain int.
    """
    # TODO: s = pd.Series(series)
    # TODO: mean = s.rolling(window).mean()
    # TODO: std = s.rolling(window).std()
    # TODO: flags = (s < mean - sigma * std) | (s > mean + sigma * std)
    # TODO: return int(flags.sum())
    pass

def tp_fp(series, window, sigma, true_start):
    """Return (true_positives, false_positives) as two ints.

tp = flags at index >= true_start (inside the injected fault region)
    fp = flags at index <  true_start (noise upstream of the fault)
    """
    # TODO: s = pd.Series(series)
    # TODO: mean = s.rolling(window).mean()
    # TODO: std = s.rolling(window).std()
    # TODO: flags = (s < mean - sigma * std) | (s > mean + sigma * std)
    # TODO: idx = np.where(flags.values)[0]
    # TODO: tp = int(np.sum(idx >= true_start))
    # TODO: fp = int(np.sum(idx <  true_start))
    # TODO: return tp, fp
    pass

# TODO: flags_14_2 = count_flags(SERIES, 14, 2.0)
# TODO: flags_7_2  = count_flags(SERIES, 7, 2.0)
# TODO: tp_14_2, fp_14_2 = tp_fp(SERIES, 14, 2.0, TRUE_ANOMALY_START)
flags_14_2 = None
flags_7_2 = None
tp_14_2 = None
fp_14_2 = None

print("flags (14-day, 2.0-sigma):", flags_14_2)
print("flags (7-day,  2.0-sigma):", flags_7_2)
print("true positives / false positives (14, 2.0):", tp_14_2, "/", fp_14_2)

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import numpy as np
import pandas as pd


# ── Verified chapter surveillance series builder (do not edit) ───────────
# Reproduces the seed-42 synthetic well from the chapter ch13-surveillance
# cell: gentle Arps-style decline + Gaussian noise, with an injected fault
# (sudden ~60% drop at day 200, partial recovery, new lower baseline).
DAYS = 365


def build_series():
    np.random.seed(42)
    base_rate = 1200 * np.exp(-0.0005 * np.arange(DAYS))
    noise = np.random.normal(0, 30, DAYS)
    normal = base_rate + noise
    anomaly = normal.copy()
    anomaly[200:230] *= 0.4   # sudden 60% drop
    anomaly[230:260] *= 0.7   # partial recovery
    anomaly[260:]    *= 0.85  # new lower baseline
    return np.round(anomaly, 0)


SERIES = build_series()              # length-365 oil-rate array (STB/d)
TRUE_ANOMALY_START = 200             # the injected drop begins at this index
WINDOWS = [7, 14, 30, 60]           # rolling-window sizes to sweep (days)
SIGMAS = [1.5, 2.0, 2.5, 3.0]      # sigma thresholds to sweep


def count_flags(series, window, sigma):
    """Rolling-mean +/- sigma*rolling-std anomaly count.

    Flag a point when rate < mean - sigma*std OR rate > mean + sigma*std.
    The NaN warm-up region (first window-1 points) is NOT flagged.
    Return the flag count as a plain int.
    """
    s = pd.Series(series)
    mean = s.rolling(window).mean()
    std = s.rolling(window).std()
    flags = (s < mean - sigma * std) | (s > mean + sigma * std)
    return int(flags.sum())


def tp_fp(series, window, sigma, true_start):
    """Return (true_positives, false_positives) as two ints.

    tp = flags at index >= true_start (inside the injected fault region)
    fp = flags at index <  true_start (noise upstream of the fault)
    """
    s = pd.Series(series)
    mean = s.rolling(window).mean()
    std = s.rolling(window).std()
    flags = (s < mean - sigma * std) | (s > mean + sigma * std)
    idx = np.where(flags.values)[0]
    tp = int(np.sum(idx >= true_start))
    fp = int(np.sum(idx < true_start))
    return tp, fp


flags_14_2 = count_flags(SERIES, 14, 2.0)
flags_7_2 = count_flags(SERIES, 7, 2.0)
tp_14_2, fp_14_2 = tp_fp(SERIES, 14, 2.0, TRUE_ANOMALY_START)

print("flags (14-day, 2.0-sigma):", flags_14_2)
print("flags (7-day,  2.0-sigma):", flags_7_2)
print("true positives / false positives (14, 2.0):", tp_14_2, "/", fp_14_2)

lockCopying code is a Full Access feature.

arrow_back

13.6 Choke Size Optimization - Nodal Sizing at the Separator Floor

13.8 Decline-Aware Allocation - 12-Month Re-Optimization

arrow_forward