# Walk-forward optimization

`Backtest.optimize()` picks the best parameter combination over the
*entire* dataset — and reports its in-sample performance. This is the
classic recipe for fitting noise. The "best Sharpe 2.4" you see almost
never holds up live.

`Backtest.walk_forward_optimize()` runs the same grid search inside
rolling windows and reports performance on data the strategy was
never tuned on:

```
                        train          test
window 0:   [─────────────────────][──────]
window 1:           [─────────────────────][──────]
window 2:                   [─────────────────────][──────]
                                                step
```

For every window pair, QTrade picks the best params on the train slice
and evaluates them on the immediately following test slice. The
aggregate of the test slices is your honest, out-of-sample (OoS)
estimate of how the strategy actually performs.

## API

```python
result = bt.walk_forward_optimize(
    train_window=200,
    test_window=50,
    maximize='Sharpe Ratio',
    step=50,
    constraint=lambda p: p['n1'] < p['n2'],
    n1=range(5, 30, 5),
    n2=range(20, 60, 10),
)
```

Required:

- **`train_window`**: bars per training slice.
- **`test_window`**: bars per test slice (immediately follows the train slice).
- **`maximize`**: name of the metric from `calculate_stats()` to
  maximize within each train slice. Common choices: `Sharpe Ratio`,
  `Total Return [%]`, `Calmar Ratio`. See the
  [stats glossary](stats_glossary.md) for what each means.
- **`**params_grid`**: same syntax as `optimize()` — keyword arguments
  whose values are iterables of candidates.

Optional:

- **`step`**: how many bars to advance between consecutive train starts.
  Defaults to `test_window` (non-overlapping test windows). Smaller
  values give overlapping (correlated) test windows; larger values
  leave gaps.
- **`constraint`**: filter `lambda p: bool` applied to each parameter
  dict before evaluating. Use this to skip nonsensical combinations
  (e.g. fast SMA window > slow SMA window).

## Reading the result

`result` is a dict:

```python
result['windows']    # list of per-window dicts
result['summary']    # aggregate OoS metrics
```

Each window dict contains:

| Key | Value |
|---|---|
| `train_start`, `train_end` | Timestamps of the train slice. |
| `test_start`, `test_end` | Timestamps of the test slice. |
| `best_params` | The parameter combo that maximized `maximize` on the train slice. |
| `train_stats` | Full stats dict from the train run with `best_params`. |
| `test_stats` | Full stats dict from the test run. |
| `test_equity` | The test slice's `equity_history` Series. |

The summary:

| Key | Meaning |
|---|---|
| `n_windows` | Total number of train/test pairs evaluated. |
| `mean_oos_return` | Average `Total Return [%]` across test windows. |
| `hit_rate` | Fraction of test windows with positive return. |
| `min_oos_return`, `max_oos_return` | Best and worst test windows. |

## A typical workflow

```python
from qtrade.backtest import Backtest
from qtrade.utils.stats import calculate_stats

# 1. Cheap in-sample search to confirm the strategy is even worth tuning.
best_params, best_stats, _ = bt.optimize(
    maximize='Sharpe Ratio',
    n1=range(5, 30, 5),
    n2=range(20, 60, 10),
    constraint=lambda p: p['n1'] < p['n2'],
)
print("In-sample Sharpe:", best_stats['Sharpe Ratio'])

# 2. Walk-forward to get the realistic number.
result = bt.walk_forward_optimize(
    train_window=200,
    test_window=50,
    maximize='Sharpe Ratio',
    n1=range(5, 30, 5),
    n2=range(20, 60, 10),
    constraint=lambda p: p['n1'] < p['n2'],
)
print("OoS hit rate:", result['summary']['hit_rate'])
print("OoS mean return:", result['summary']['mean_oos_return'], '%')

# 3. Inspect window-by-window for parameter stability.
for w in result['windows']:
    print(
        f"{w['test_start'].date()}–{w['test_end'].date()}: "
        f"params={w['best_params']} "
        f"oos_return={w['test_stats']['Total Return [%]']:.2f}%"
    )
```

If the in-sample Sharpe and OoS mean are wildly different, you've
found your overfit. The OoS number is the one to trust.

## Choosing the windows

The right `train_window` and `test_window` depend on:

- **Strategy "memory"**: how many bars does the strategy need to fit a
  parameter? A 20-bar SMA needs at least ~50 bars to give the system
  some room. A 200-bar lookback model needs a much larger train_window.
- **Market regime length**: a train window that's too small only sees
  one regime; too large straddles multiple regimes (averaging conflicting
  optima). 6–12 months of daily bars is a typical starting point.
- **Number of windows you can afford**: longer windows = fewer windows,
  but more data per fit; shorter windows = more windows but each is
  noisier. Aim for **at least 5–10 windows** so the summary statistics
  are meaningful.

Rule of thumb: `train_window ≈ 4–10 × test_window` is a common ratio.

## Common pitfalls

### "OoS hit rate is 0.5"

Coin flip. Either the strategy genuinely doesn't have edge, or your
parameter grid doesn't cover the meaningful space. Try:

1. Look at *which* params win in each window — if they're wildly
   different across windows, the strategy isn't stable.
2. Widen the grid. If the optimum keeps landing at the edge, you're
   not exploring far enough.
3. Try a longer `train_window`.

### "Mean OoS return is positive but min is catastrophic"

A common signature of strategies that work most of the time but blow
up periodically (e.g. martingale-like betting, undefended short vol).
The mean hides the tail.

Always look at the per-window list, not just the summary.

### "Walk-forward Sharpe is much lower than in-sample Sharpe"

Expected and normal. The gap is your overfit. A 2:1 ratio
(in-sample 2.0 → OoS 1.0) is fine; 5:1 means you're fitting noise.
Reduce parameter freedom (smaller grid) or extend the strategy's
inductive bias (use a constraint, share parameters across assets, etc.).

### "Step smaller than test_window"

This produces *overlapping* test windows. The summary metrics
double-count bars and inflate `n_windows`. Sometimes you want this
(e.g. for adaptive walk-forward that retrains weekly), but understand
that the windows aren't statistically independent.

## See also

- [Stats glossary](stats_glossary.md) — pick the right `maximize` metric.
- [Getting Started](getting_started.md) — basic Backtest workflow.
- [`examples/portfolio_strategy.py`](https://github.com/gguan/qtrade/blob/main/examples/portfolio_strategy.py)
  — multi-asset walk-forward in a runnable file.