LAS to DataFrame Pipeline

Level 3

Chapter 6: Petroleum Data Sources

descriptionProblem

In a field study you process dozens of wells, so you write the cleanup once and run it on every LAS. The starter holds a log for OD-012 where the SP curve is mostly dead (washed out) and a couple of other curves have short gaps.

Write las_to_analysis_ready(las_text) returning (df, summary):

Read the LAS and build las.df() (nulls already become NaN).
Drop any curve that is more than 50% null. It can't be trusted.
Interpolate short interior gaps of ≤ 3 consecutive NaN

(df.interpolate(method="linear", limit=3, limit_area="inside")).

Reset the depth index to a DEPTH_FT column and add DEPTH_M

(DEPTH_FT × 0.3048).

Return the cleaned DataFrame and a summary dict with n_curves_in,

curves_dropped (list), n_interpolated (how many NaN you filled), and n_rows.

The discipline matters: you drop what's unusable, you fill only gaps short enough to trust, and you never overwrite the raw values silently.

lightbulbHints (0/3)

Stuck? Reveal hints one at a time — they progress from nudge to near-solution.

codeYour solution

main.py

import lasio
import io
import pandas as pd

LAS_TEXT = """~VERSION INFORMATION
 VERS.   2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP.   NO  : ONE LINE PER DEPTH STEP
~WELL INFORMATION
 WELL.  OD-012 : Well Name
 FLD.   OML 58 : Field Name
 NULL.  -999.2500 : NULL VALUE
~CURVE INFORMATION
 DEPT.FT   : Depth
 GR  .GAPI : Gamma Ray
 RT  .OHMM : Deep Resistivity
 RHOB.G/CC : Bulk Density
 NPHI.V/V  : Neutron Porosity
 SP  .MV   : Spontaneous Potential
~A  DEPT       GR       RT        RHOB     NPHI       SP
 9200.000   85.0     3.0       2.45     0.18    -999.25
 9200.500   88.0     2.8       2.46     0.17    -999.25
 9201.000   45.0    20.0       2.31     0.16      20.0
 9201.500   42.0  -999.25      2.30   -999.25   -999.25
 9202.000   40.0    25.0       2.29   -999.25   -999.25
 9202.500   44.0    22.0       2.30     0.15      22.0
 9203.000   90.0     2.2       2.47     0.30    -999.25
 9203.500   92.0     2.0       2.49     0.31    -999.25
 9204.000   46.0    18.0       2.32     0.17      21.0
 9204.500   41.0    24.0       2.30     0.15      19.0
"""

def las_to_analysis_ready(las_text):
    """Return (cleaned_df, summary_dict)."""
    # TODO
    pass

df, summary = las_to_analysis_ready(LAS_TEXT)
print(summary)
print(df.head())

visibilityReveal reference solutionexpand_more

Try solving it yourself first — the hints walk you through it. The solution below is one valid approach; yours may differ and still be correct.

import lasio
import io
import pandas as pd

LAS_TEXT = """~VERSION INFORMATION
 VERS.   2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP.   NO  : ONE LINE PER DEPTH STEP
~WELL INFORMATION
 WELL.  OD-012 : Well Name
 FLD.   OML 58 : Field Name
 NULL.  -999.2500 : NULL VALUE
~CURVE INFORMATION
 DEPT.FT   : Depth
 GR  .GAPI : Gamma Ray
 RT  .OHMM : Deep Resistivity
 RHOB.G/CC : Bulk Density
 NPHI.V/V  : Neutron Porosity
 SP  .MV   : Spontaneous Potential
~A  DEPT       GR       RT        RHOB     NPHI       SP
 9200.000   85.0     3.0       2.45     0.18    -999.25
 9200.500   88.0     2.8       2.46     0.17    -999.25
 9201.000   45.0    20.0       2.31     0.16      20.0
 9201.500   42.0  -999.25      2.30   -999.25   -999.25
 9202.000   40.0    25.0       2.29   -999.25   -999.25
 9202.500   44.0    22.0       2.30     0.15      22.0
 9203.000   90.0     2.2       2.47     0.30    -999.25
 9203.500   92.0     2.0       2.49     0.31    -999.25
 9204.000   46.0    18.0       2.32     0.17      21.0
 9204.500   41.0    24.0       2.30     0.15      19.0
"""

M_PER_FT = 0.3048


def las_to_analysis_ready(las_text):
    las = lasio.read(io.StringIO(las_text))
    df = las.df()  # depth index; -999.25 already NaN
    n_curves_in = df.shape[1]

    # 1. Drop curves more than 50% null.
    null_frac = df.isna().mean()
    curves_dropped = list(null_frac[null_frac > 0.50].index)
    df = df.drop(columns=curves_dropped)

    # 2. Interpolate short interior gaps (<= 3 consecutive).
    n_before = int(df.isna().sum().sum())
    df = df.interpolate(method="linear", limit=3, limit_area="inside")
    n_interpolated = n_before - int(df.isna().sum().sum())

    # 3. Depth as a column, plus metres.
    df = df.reset_index().rename(columns={"DEPT": "DEPTH_FT"})
    df["DEPTH_M"] = df["DEPTH_FT"] * M_PER_FT

    summary = {
        "n_curves_in": n_curves_in,
        "curves_dropped": curves_dropped,
        "n_interpolated": n_interpolated,
        "n_rows": len(df),
    }
    return df, summary


df, summary = las_to_analysis_ready(LAS_TEXT)
print(f"Curves in: {summary['n_curves_in']}, dropped {summary['curves_dropped']}, "
      f"filled {summary['n_interpolated']} gaps, {summary['n_rows']} rows")
print(df.to_string(index=False))

lockCopying code is a Full Access feature.

arrow_back

6.7 Outlier Detection by Well

6.9 Multi-Well Production Analysis

arrow_forward