Raw vs Curated Data and Transformations
Raw vendor data is the starting point; curated curves and models are versioned outputs of documented transformation pipelines.
Explanation
Raw data stores vendor quotes and time series as delivered, preserving the original record for traceability.
Curated artefacts such as arbitrage-free curves, vol surfaces, and calibrated models are produced by cleaning and transformation steps.
Transformation logic is captured in configuration objects that describe filters, bootstrapping methods, interpolation, and calibration settings.
Each curated artefact links back to both its raw data snapshot and its transformation configuration, enabling replay and challenge of the pipeline.
raw datacuratedtransformationspipeline
Interactive visualisation
Raw vs curated data transformations
Raw vendor quotes are stored as-is. Curated curves are versioned outputs of an explicit transform configuration.
Transformation configuration
Filter
40%
Smooth
55%
Bootstrap method
Interpretation
- Raw data is the audit record. It should not be “cleaned in place”.
- Curated artefacts are derived outputs: they must carry the transform config + version.
- When you change filters or interpolation, you are creating a new curated artefact.
cfg
INFO
{"filter_strength":"40%","smoothing":"55%","bootstrap":"spline"}
Raw quotes (points) → Curated curve (line)
Higher filter removes outliers. Higher smoothness reduces kinks. Changing config should create a new curated artefact version.
Pipeline graph (toy)
Raw snapshot → Transform config → Curated artefact (versioned)
Raw vendor snapshot
immutable
Transform config
spline / f=40% / s=55%
Curated artefact
versioned output