Raw vs Curated Data and Transformations

Raw vendor data is the starting point; curated curves and models are versioned outputs of documented transformation pipelines.

Explanation

Raw data stores vendor quotes and time series as delivered, preserving the original record for traceability.

Curated artefacts such as arbitrage-free curves, vol surfaces, and calibrated models are produced by cleaning and transformation steps.

Transformation logic is captured in configuration objects that describe filters, bootstrapping methods, interpolation, and calibration settings.

Each curated artefact links back to both its raw data snapshot and its transformation configuration, enabling replay and challenge of the pipeline.

raw datacuratedtransformationspipeline

Interactive visualisation

Raw vs curated data transformations

Raw vendor quotes are stored as-is. Curated curves are versioned outputs of an explicit transform configuration.

Transformation configuration

Filter

40%

Smooth

55%

Bootstrap method

linearpiecewisespline

Interpretation

Raw data is the audit record. It should not be “cleaned in place”.
Curated artefacts are derived outputs: they must carry the transform config + version.
When you change filters or interpolation, you are creating a new curated artefact.

cfg

INFO

{"filter_strength":"40%","smoothing":"55%","bootstrap":"spline"}

Raw quotes (points) → Curated curve (line)

Higher filter removes outliers. Higher smoothness reduces kinks. Changing config should create a new curated artefact version.

Pipeline graph (toy)

Raw snapshot → Transform config → Curated artefact (versioned)

Raw vendor snapshot

immutable

Transform config

spline / f=40% / s=55%

Curated artefact

versioned output