Exercise 6.2
Null Value Detection
A logging run never comes back clean. Tools wash out in bad hole, a curve drops over a wash-out interval, and you get blocks of the LAS null value (-999.25). Before you trust a curve for quantitative work (porosity, Sw, net pay) you check how much of it is actually null. A rule of thumb is that a curve more than ~20% null is unreliable for calculations.
The starter holds a LAS file for OD-009 where the neutron (NPHI) and resistivity (RT) curves have gaps. Write two functions:
count_nulls_by_curve(las_text): read the file, buildlas.df()
(so -999.25 is already NaN), and return a DataFrame with one row per data curve and columns curve, n_null, null_pct, sorted by null_pct descending (worst curve first).
unreliable_curves(summary, threshold=20.0): given that summary
DataFrame, return the list of curve names whose null_pct is strictly greater than threshold.
This two-line check is what decides whether you can compute porosity over an interval or have to flag it as no-data.
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import lasio
import io
import pandas as pd
LAS_TEXT = """~VERSION INFORMATION
VERS. 2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
WRAP. NO : ONE LINE PER DEPTH STEP
~WELL INFORMATION
WELL. OD-009 : Well Name
FLD. OML 58 : Field Name
NULL. -999.2500 : NULL VALUE
~CURVE INFORMATION
DEPT.FT : Depth
GR .GAPI : Gamma Ray
RT .OHMM : Deep Resistivity
RHOB.G/CC : Bulk Density
NPHI.V/V : Neutron Porosity
~A DEPT GR RT RHOB NPHI
9100.000 85.0 3.0 2.45 -999.25
9100.500 90.0 2.5 2.48 -999.25
9101.000 45.0 20.0 2.31 0.16
9101.500 40.0 -999.25 2.29 0.14
9102.000 38.0 28.0 2.28 -999.25
9102.500 42.0 22.0 2.30 0.15
9103.000 88.0 2.2 2.47 -999.25
9103.500 92.0 2.0 2.49 0.30
9104.000 44.0 18.0 2.32 0.17
9104.500 39.0 25.0 2.30 0.15
"""
def count_nulls_by_curve(las_text):
las = lasio.read(io.StringIO(las_text))
df = las.df() # DEPT is the index; -999.25 is already NaN
rows = []
for col in df.columns:
n_null = int(df[col].isna().sum())
rows.append({"curve": col, "n_null": n_null, "null_pct": 100.0 * n_null / len(df)})
return pd.DataFrame(rows).sort_values("null_pct", ascending=False).reset_index(drop=True)
def unreliable_curves(summary, threshold=20.0):
return list(summary[summary["null_pct"] > threshold]["curve"])
summary = count_nulls_by_curve(LAS_TEXT)
print(summary.to_string(index=False))
print("Unreliable (>20% null):", unreliable_curves(summary))
lockCopying code is a Full Access feature.