Null Value Detection

Level 2

Chapter 6: Petroleum Data Sources

descriptionProblem

A logging run never comes back clean. Tools wash out in bad hole, a curve drops over a wash-out interval, and you get blocks of the LAS null value (-999.25). Before you trust a curve for quantitative work (porosity, Sw, net pay) you check how much of it is actually null. A rule of thumb is that a curve more than ~20% null is unreliable for calculations.

The starter holds a LAS file for OD-009 where the neutron (NPHI) and resistivity (RT) curves have gaps. Write two functions:

count_nulls_by_curve(las_text): read the file, build las.df()

(so -999.25 is already NaN), and return a DataFrame with one row per data curve and columns curve, n_null, null_pct, sorted by null_pct descending (worst curve first).

unreliable_curves(summary, threshold=20.0): given that summary

DataFrame, return the list of curve names whose null_pct is strictly greater than threshold.

This two-line check is what decides whether you can compute porosity over an interval or have to flag it as no-data.

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import lasio
import io
import pandas as pd

LAS_TEXT = """~VERSION INFORMATION
 VERS.   2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP.   NO  : ONE LINE PER DEPTH STEP
~WELL INFORMATION
 WELL.  OD-009 : Well Name
 FLD.   OML 58 : Field Name
 NULL.  -999.2500 : NULL VALUE
~CURVE INFORMATION
 DEPT.FT   : Depth
 GR  .GAPI : Gamma Ray
 RT  .OHMM : Deep Resistivity
 RHOB.G/CC : Bulk Density
 NPHI.V/V  : Neutron Porosity
~A  DEPT       GR       RT       RHOB     NPHI
 9100.000   85.0     3.0      2.45   -999.25
 9100.500   90.0     2.5      2.48   -999.25
 9101.000   45.0    20.0      2.31     0.16
 9101.500   40.0  -999.25     2.29     0.14
 9102.000   38.0    28.0      2.28   -999.25
 9102.500   42.0    22.0      2.30     0.15
 9103.000   88.0     2.2      2.47   -999.25
 9103.500   92.0     2.0      2.49     0.30
 9104.000   44.0    18.0      2.32     0.17
 9104.500   39.0    25.0      2.30     0.15
"""

def count_nulls_by_curve(las_text):
    """DataFrame [curve, n_null, null_pct] sorted by null_pct desc."""
    # TODO
    pass

def unreliable_curves(summary, threshold=20.0):
    """List of curve names with null_pct strictly greater than threshold."""
    # TODO
    pass

summary = count_nulls_by_curve(LAS_TEXT)
print(summary)
print("Unreliable:", unreliable_curves(summary))

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import lasio
import io
import pandas as pd

LAS_TEXT = """~VERSION INFORMATION
 VERS.   2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP.   NO  : ONE LINE PER DEPTH STEP
~WELL INFORMATION
 WELL.  OD-009 : Well Name
 FLD.   OML 58 : Field Name
 NULL.  -999.2500 : NULL VALUE
~CURVE INFORMATION
 DEPT.FT   : Depth
 GR  .GAPI : Gamma Ray
 RT  .OHMM : Deep Resistivity
 RHOB.G/CC : Bulk Density
 NPHI.V/V  : Neutron Porosity
~A  DEPT       GR       RT       RHOB     NPHI
 9100.000   85.0     3.0      2.45   -999.25
 9100.500   90.0     2.5      2.48   -999.25
 9101.000   45.0    20.0      2.31     0.16
 9101.500   40.0  -999.25     2.29     0.14
 9102.000   38.0    28.0      2.28   -999.25
 9102.500   42.0    22.0      2.30     0.15
 9103.000   88.0     2.2      2.47   -999.25
 9103.500   92.0     2.0      2.49     0.30
 9104.000   44.0    18.0      2.32     0.17
 9104.500   39.0    25.0      2.30     0.15
"""


def count_nulls_by_curve(las_text):
    las = lasio.read(io.StringIO(las_text))
    df = las.df()  # DEPT is the index; -999.25 is already NaN
    rows = []
    for col in df.columns:
        n_null = int(df[col].isna().sum())
        rows.append({"curve": col, "n_null": n_null, "null_pct": 100.0 * n_null / len(df)})
    return pd.DataFrame(rows).sort_values("null_pct", ascending=False).reset_index(drop=True)


def unreliable_curves(summary, threshold=20.0):
    return list(summary[summary["null_pct"] > threshold]["curve"])


summary = count_nulls_by_curve(LAS_TEXT)
print(summary.to_string(index=False))
print("Unreliable (>20% null):", unreliable_curves(summary))

lockCopying code is a Full Access feature.

arrow_back

6.10 Data Quality Dashboard

6.3 Production Data Loader

arrow_forward