Exercise 6.8
LAS to DataFrame Pipeline
In a field study you process dozens of wells, so you write the cleanup once and run it on every LAS. The starter holds a log for OD-012 where the SP curve is mostly dead (washed out) and a couple of other curves have short gaps.
Write las_to_analysis_ready(las_text) returning (df, summary):
- Read the LAS and build
las.df()(nulls already becomeNaN). - Drop any curve that is more than 50% null. It can't be trusted.
- Interpolate short interior gaps of ≤ 3 consecutive
NaN
(df.interpolate(method="linear", limit=3, limit_area="inside")).
- Reset the depth index to a
DEPTH_FTcolumn and addDEPTH_M
(DEPTH_FT × 0.3048).
- Return the cleaned DataFrame and a
summarydict withn_curves_in,
curves_dropped (list), n_interpolated (how many NaN you filled), and n_rows.
The discipline matters: you drop what's unusable, you fill only gaps short enough to trust, and you never overwrite the raw values silently.
Stuck? Reveal hints one at a time — they progress from nudge to near-solution.
visibilityReveal reference solutionexpand_more
Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.
import lasio
import io
import pandas as pd
LAS_TEXT = """~VERSION INFORMATION
VERS. 2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
WRAP. NO : ONE LINE PER DEPTH STEP
~WELL INFORMATION
WELL. OD-012 : Well Name
FLD. OML 58 : Field Name
NULL. -999.2500 : NULL VALUE
~CURVE INFORMATION
DEPT.FT : Depth
GR .GAPI : Gamma Ray
RT .OHMM : Deep Resistivity
RHOB.G/CC : Bulk Density
NPHI.V/V : Neutron Porosity
SP .MV : Spontaneous Potential
~A DEPT GR RT RHOB NPHI SP
9200.000 85.0 3.0 2.45 0.18 -999.25
9200.500 88.0 2.8 2.46 0.17 -999.25
9201.000 45.0 20.0 2.31 0.16 20.0
9201.500 42.0 -999.25 2.30 -999.25 -999.25
9202.000 40.0 25.0 2.29 -999.25 -999.25
9202.500 44.0 22.0 2.30 0.15 22.0
9203.000 90.0 2.2 2.47 0.30 -999.25
9203.500 92.0 2.0 2.49 0.31 -999.25
9204.000 46.0 18.0 2.32 0.17 21.0
9204.500 41.0 24.0 2.30 0.15 19.0
"""
M_PER_FT = 0.3048
def las_to_analysis_ready(las_text):
las = lasio.read(io.StringIO(las_text))
df = las.df() # depth index; -999.25 already NaN
n_curves_in = df.shape[1]
# 1. Drop curves more than 50% null.
null_frac = df.isna().mean()
curves_dropped = list(null_frac[null_frac > 0.50].index)
df = df.drop(columns=curves_dropped)
# 2. Interpolate short interior gaps (<= 3 consecutive).
n_before = int(df.isna().sum().sum())
df = df.interpolate(method="linear", limit=3, limit_area="inside")
n_interpolated = n_before - int(df.isna().sum().sum())
# 3. Depth as a column, plus metres.
df = df.reset_index().rename(columns={"DEPT": "DEPTH_FT"})
df["DEPTH_M"] = df["DEPTH_FT"] * M_PER_FT
summary = {
"n_curves_in": n_curves_in,
"curves_dropped": curves_dropped,
"n_interpolated": n_interpolated,
"n_rows": len(df),
}
return df, summary
df, summary = las_to_analysis_ready(LAS_TEXT)
print(f"Curves in: {summary['n_curves_in']}, dropped {summary['curves_dropped']}, "
f"filled {summary['n_interpolated']} gaps, {summary['n_rows']} rows")
print(df.to_string(index=False))
lockCopying code is a Full Access feature.