DSAMbayes Backtest Documentation

This directory contains lightweight documentation for the DSAMbayes-backtest library.

Current scope:

single-series time-series backtest workflows,
repo-external parity comparison of DSAMbayes and DSAMbayes-Charles-Dev,
common holdout scoring, parameter stability, and provisional recommendation stability.

Start here:

Current worked example:

the active engineering pilot is _st
the active report is ../reports/DSAMbayes_backtest_report/main.pdf

Important limit:

the current recommendation layer is backtest-owned and provisional; it is suitable for engineering comparison, but not yet an owner-approved production allocator.

Chapter 1

Getting Started

Core orientation and the minimum steps to inspect the active backtest harness.

Overview

DSAMbayes-backtest is a dedicated repository for a reproducible walk-forward backtest harness for DSAMbayes marketing mix models.

The repo exists to keep the evaluation contract outside the source modelling repositories. It compares:

original DSAMbayes
DSAMbayes-Charles-Dev

on the same data, folds, and scoring rules.

What The Repo Owns

the manifest and replay contract
blocked walk-forward fold planning
common KPI-scale holdout scoring
adjacent-refit stability analytics
repo-owned result artifacts and summary tables
a short reporting surface for engineering and peer-review use

Current Scope

The current milestone is intentionally narrow:

one single-series time-series pilot at a time
one shared comparison contract across both repos
one shared external scorer
explicit guardrails around lagged and rolling features

Hierarchical / panel backtesting is deferred.

Current Worked Example

The active engineering example is _st, run under an explicit engineering-only scale = FALSE policy. On that example, the repo currently supports:

forward holdout scoring
parameter stability
provisional fixed-budget recommendation stability

The engineering worked example is not the final stakeholder-facing verdict. The intended next substantive use is the UK dataset once it is available.

Quickstart

This is the minimum path to inspect the active backtest harness.

1. Validate The Active Manifest

Rscript scripts/dsambayes-backtest.R validate \
  --manifest .planning/research/pilot_manifest.yaml

This checks the active pilot manifest, dataset contract, repo paths, and fold schedule.

2. Inspect The Planned Run Matrix

Rscript scripts/dsambayes-backtest.R plan \
  --manifest .planning/research/pilot_manifest.yaml

This prints the planned repo-by-fold run surface.

3. Do A Dry Run

Rscript scripts/dsambayes-backtest.R run \
  --manifest .planning/research/pilot_manifest.yaml \
  --dry-run

This writes run-scoped result directories and status artifacts without fitting the external repos.

4. Inspect The Current Worked Example

The completed _st engineering example is under:

results_engineering_m1_st_full/_st/engineering_m1_st_scale_false/
run_id=20260407T211743.943118Z__all-repos__all-folds__live/

Key summary files:

summary/holdout_summary.csv
summary/parameter_stability_summary.csv
summary/recommendation_stability_summary.csv

5. Read The Report

See:

../reports/DSAMbayes_backtest_report/main.pdf

That report is the current compact worked example for the library.

Chapter 2

Reference

Reference material for the current data contract, CLI surface, metrics, and result artifacts.

Data Contract

The backtest harness needs enough information to replay the same model surface consistently across folds and across repos.

Minimum Required Inputs

For a runnable pilot manifest, the repo needs:

a weekly source table
a date column
a KPI / response column
the locked model formula
priors
boundaries
repo targets and repo paths
fit settings and seed policy

If recommendation stability is in scope, the repo also needs:

media spend history or equivalent channel-spend history
a declared recommendation contract
a channel map from spend inputs to model terms / allocation variables

Why The Source Table Matters

The formulas in scope include lagged and rolling terms. That means fold inputs must be rebuilt from source data at each cutoff rather than sliced from a full-sample engineered matrix.

Tracked Data Packages

The repo keeps smaller GitHub-friendly replication packages under ../data/:

_st active engineering pilot
_ov reserve candidate
_os retained stress fixture

Large reviewed bundles under data_review/ are kept local only.

Current Active Pilot

The active engineering pilot is _st.

The active manifest currently lives in the local planning layer at .planning/research/pilot_manifest.yaml.

Note that .planning/ is local-only in the current repo setup, so colleagues using GitHub alone should rely on the tracked replication data, README, docs, and report rather than the local planning spine.

Results And Artifacts

Backtest outputs are written under run-scoped result trees so filtered reruns do not overwrite earlier summaries.

Result Tree Shape

Typical layout:

results.../
  <dataset_id>/
    <comparison_label>/
      run_id=.../
        experiment_manifest.yaml
        fold_manifest.csv
        summary/
          run_status.csv
          holdout_scores.csv
          holdout_summary.csv
          parameter_stability_summary.csv
          recommendation_stability_summary.csv
        repo_target=<repo>/
          fold_id=01/
            run_manifest.yaml
            run_status.json
            fit_payload.rds
            prediction_payload.rds
            holdout_scores.csv
            recommendations.csv
          stability/
            parameter_drift.csv
            parameter_drift_summary.csv
            recommendation_drift.csv
            recommendation_drift_summary.csv

Most Important Summary Files

summary/run_status.csv Fold-by-fold execution state.
summary/holdout_summary.csv Repo-level forward holdout comparison.
summary/parameter_stability_summary.csv Repo-level adjacent-refit parameter drift summary.
summary/recommendation_stability_summary.csv Repo-level recommendation stability summary on the current provisional shared recommendation surface.

Current Worked Example

The active _st engineering batch is:

results_engineering_m1_st_full/_st/engineering_m1_st_scale_false/
run_id=20260407T211743.943118Z__all-repos__all-folds__live/

The holdout and parameter-stability summaries are identical across repos on that example. Recommendation stability is also present, but it should still be treated as provisional.

CLI Reference

The main entrypoint is:

Rscript scripts/dsambayes-backtest.R <command> [options]

`validate`

Validate a pilot manifest and print the planned run scope.

Rscript scripts/dsambayes-backtest.R validate \
  --manifest .planning/research/pilot_manifest.yaml

`plan`

Build the run matrix for the active manifest.

Rscript scripts/dsambayes-backtest.R plan \
  --manifest .planning/research/pilot_manifest.yaml

`run`

Execute a batch or write a dry-run result tree.

Dry run:

Rscript scripts/dsambayes-backtest.R run \
  --manifest .planning/research/pilot_manifest.yaml \
  --dry-run

Target one repo:

Rscript scripts/dsambayes-backtest.R run \
  --manifest .planning/research/pilot_manifest.yaml \
  --repo-target charles_dev

Target one fold:

Rscript scripts/dsambayes-backtest.R run \
  --manifest .planning/research/pilot_manifest.yaml \
  --fold-id 1

Common Options

--manifest <path>
--repo-target <name>
--fold-id <n>
--dry-run
--results-root <dir>

Current Limitation

The CLI is designed around the current M1 single-series parity surface. It is not yet a general hierarchical / panel backtest runner.

Metrics Reference

Forward Holdout Metrics

RMSE Root mean squared error on the observed KPI scale.
WMAPE Weighted mean absolute percentage error on the observed KPI scale.
Mean error Signed bias on the observed KPI scale.
SMAPE Secondary holdout metric on the observed KPI scale.
Holdout ELPD / log score Secondary probabilistic diagnostics when compatible posterior outputs are available.

Stability Metrics

standardized_posterior_shift Adjacent-refit coefficient shift scaled by posterior uncertainty.
allocation_turnover 0.5 * sum(abs(w_t - w_t-1)) across matched channels.
marginal_response_rank_corr Spearman correlation of channel marginal-response ranks across adjacent refits on the shared recommendation surface.

Important note:

marginal_response_rank_corr is not ROI. It is a rank comparison on the repo-owned recommendation surface. The metric was deliberately renamed from a previous ROI-style label because the current allocator does not compute true ROI.

Interpretation Guidance

Holdout metrics address predictive performance.
Parameter stability addresses how much posterior media effects move between adjacent refits.
Recommendation stability addresses how much the recommended allocation surface moves between adjacent refits under one controlled comparison scenario.

Recommendation stability in the current repo should still be treated as provisional, because the allocator surface is backtest-owned and not yet an owner-approved production policy.

DSAMbayes Backtest Documentation

Subsections of DSAMbayes Backtest Documentation

Getting Started

Subsections of Getting Started

Overview

What The Repo Owns

Current Scope

Current Worked Example

Quickstart

1. Validate The Active Manifest

2. Inspect The Planned Run Matrix

3. Do A Dry Run

4. Inspect The Current Worked Example

5. Read The Report

Reference

Subsections of Reference

Data Contract

Minimum Required Inputs

Why The Source Table Matters

Tracked Data Packages

Current Active Pilot

Results And Artifacts

Result Tree Shape

Most Important Summary Files

Current Worked Example

CLI Reference

validate

plan

run

Common Options

Current Limitation

Metrics Reference

Forward Holdout Metrics

Stability Metrics

Interpretation Guidance

`validate`

`plan`

`run`