Data Contract
The backtest harness needs enough information to replay the same model surface consistently across folds and across repos.
Minimum Required Inputs
For a runnable pilot manifest, the repo needs:
- a weekly source table
- a date column
- a KPI / response column
- the locked model formula
- priors
- boundaries
- repo targets and repo paths
- fit settings and seed policy
If recommendation stability is in scope, the repo also needs:
- media spend history or equivalent channel-spend history
- a declared recommendation contract
- a channel map from spend inputs to model terms / allocation variables
Why The Source Table Matters
The formulas in scope include lagged and rolling terms. That means fold inputs must be rebuilt from source data at each cutoff rather than sliced from a full-sample engineered matrix.
Tracked Data Packages
The repo keeps smaller GitHub-friendly replication packages under ../data/:
_stactive engineering pilot_ovreserve candidate_osretained stress fixture
Large reviewed bundles under data_review/ are kept local only.
Current Active Pilot
The active engineering pilot is _st.
The active manifest currently lives in the local planning layer at
.planning/research/pilot_manifest.yaml.
Note that .planning/ is local-only in the current repo setup, so colleagues
using GitHub alone should rely on the tracked replication data, README, docs,
and report rather than the local planning spine.