the current recommendation layer is backtest-owned and provisional; it is
suitable for engineering comparison, but not yet an owner-approved production
allocator.
Subsections of DSAMbayes Backtest Documentation
Chapter 1
Getting Started
Core orientation and the minimum steps to inspect the active backtest harness.
Subsections of Getting Started
Overview
DSAMbayes-backtest is a dedicated repository for a reproducible walk-forward
backtest harness for DSAMbayes marketing mix models.
The repo exists to keep the evaluation contract outside the source modelling
repositories. It compares:
original DSAMbayes
DSAMbayes-Charles-Dev
on the same data, folds, and scoring rules.
What The Repo Owns
the manifest and replay contract
blocked walk-forward fold planning
common KPI-scale holdout scoring
adjacent-refit stability analytics
repo-owned result artifacts and summary tables
a short reporting surface for engineering and peer-review use
Current Scope
The current milestone is intentionally narrow:
one single-series time-series pilot at a time
one shared comparison contract across both repos
one shared external scorer
explicit guardrails around lagged and rolling features
Hierarchical / panel backtesting is deferred.
Current Worked Example
The active engineering example is _st, run under an explicit engineering-only
scale = FALSE policy. On that example, the repo currently supports:
forward holdout scoring
parameter stability
provisional fixed-budget recommendation stability
The engineering worked example is not the final stakeholder-facing verdict. The
intended next substantive use is the UK dataset once it is available.
Quickstart
This is the minimum path to inspect the active backtest harness.
That report is the current compact worked example for the library.
Chapter 2
Reference
Reference material for the current data contract, CLI surface, metrics, and result artifacts.
Subsections of Reference
Data Contract
The backtest harness needs enough information to replay the same model surface
consistently across folds and across repos.
Minimum Required Inputs
For a runnable pilot manifest, the repo needs:
a weekly source table
a date column
a KPI / response column
the locked model formula
priors
boundaries
repo targets and repo paths
fit settings and seed policy
If recommendation stability is in scope, the repo also needs:
media spend history or equivalent channel-spend history
a declared recommendation contract
a channel map from spend inputs to model terms / allocation variables
Why The Source Table Matters
The formulas in scope include lagged and rolling terms. That means fold inputs
must be rebuilt from source data at each cutoff rather than sliced from a
full-sample engineered matrix.
Tracked Data Packages
The repo keeps smaller GitHub-friendly replication packages under ../data/:
_st active engineering pilot
_ov reserve candidate
_os retained stress fixture
Large reviewed bundles under data_review/ are kept local only.
Current Active Pilot
The active engineering pilot is _st.
The active manifest currently lives in the local planning layer at
.planning/research/pilot_manifest.yaml.
Note that .planning/ is local-only in the current repo setup, so colleagues
using GitHub alone should rely on the tracked replication data, README, docs,
and report rather than the local planning spine.
Results And Artifacts
Backtest outputs are written under run-scoped result trees so filtered reruns do
not overwrite earlier summaries.
The holdout and parameter-stability summaries are identical across repos on
that example. Recommendation stability is also present, but it should still be
treated as provisional.
Rscript scripts/dsambayes-backtest.R plan \
--manifest .planning/research/pilot_manifest.yaml
run
Execute a batch or write a dry-run result tree.
Dry run:
Rscript scripts/dsambayes-backtest.R run \
--manifest .planning/research/pilot_manifest.yaml \
--dry-run
Target one repo:
Rscript scripts/dsambayes-backtest.R run \
--manifest .planning/research/pilot_manifest.yaml \
--repo-target charles_dev
Target one fold:
Rscript scripts/dsambayes-backtest.R run \
--manifest .planning/research/pilot_manifest.yaml \
--fold-id 1
Common Options
--manifest <path>
--repo-target <name>
--fold-id <n>
--dry-run
--results-root <dir>
Current Limitation
The CLI is designed around the current M1 single-series parity surface. It is
not yet a general hierarchical / panel backtest runner.
Metrics Reference
Forward Holdout Metrics
RMSE
Root mean squared error on the observed KPI scale.
WMAPE
Weighted mean absolute percentage error on the observed KPI scale.
Mean error
Signed bias on the observed KPI scale.
SMAPE
Secondary holdout metric on the observed KPI scale.
Holdout ELPD / log score
Secondary probabilistic diagnostics when compatible posterior outputs are
available.
Stability Metrics
standardized_posterior_shift
Adjacent-refit coefficient shift scaled by posterior uncertainty.
allocation_turnover0.5 * sum(abs(w_t - w_t-1)) across matched channels.
marginal_response_rank_corr
Spearman correlation of channel marginal-response ranks across adjacent
refits on the shared recommendation surface.
Important note:
marginal_response_rank_corr is not ROI. It is a rank comparison on the
repo-owned recommendation surface. The metric was deliberately renamed from a
previous ROI-style label because the current allocator does not compute true
ROI.
Interpretation Guidance
Holdout metrics address predictive performance.
Parameter stability addresses how much posterior media effects move between
adjacent refits.
Recommendation stability addresses how much the recommended allocation
surface moves between adjacent refits under one controlled comparison
scenario.
Recommendation stability in the current repo should still be treated as
provisional, because the allocator surface is backtest-owned and not yet an
owner-approved production policy.