Ferrous-Solitaire/docs/testing-architecture.md

# Testing Architecture — Engine-first Validation

Ferrous Solitaire validation is split into three layers with clear ownership:

1. **Rust unit tests (`solitaire_core`)**
   - move generation and legality
   - deal generation determinism
   - scoring and penalties
   - undo semantics
   - win detection

2. **Engine integration tests (`solitaire_wasm` debug API)**
   - autonomous game execution without UI/pointer simulation
   - invariant checks after every move
   - deterministic seed replay
   - high-volume seeded runs (including long-running soak tests)

3. **Playwright UI tests**
   - verify rendering vs engine state
   - drag/drop and keyboard UX behavior
   - responsive layout behavior
   - browser-compatibility checks

## Source of truth

The Rust engine is authoritative. Browser tests must interact with the game via
debug API hooks, not via pixel/OCR solving or hardcoded screen coordinates.

## Debug API surfaces

Two automation surfaces are exposed:

- `solitaire_wasm::SolitaireGame` methods:
  - `debug_snapshot()`
  - `debug_legal_moves()`
  - `debug_move_history()`
  - `debug_apply_legal_move(index)`
  - `debug_apply_move_json(json)`
- Browser bridge on `game.html`:
  - `window.__FERROUS_DEBUG__.snapshot()`
  - `window.__FERROUS_DEBUG__.legalMoves()`
  - `window.__FERROUS_DEBUG__.moveHistory()`
  - `window.__FERROUS_DEBUG__.applyLegalMove(index)`
  - `window.__FERROUS_DEBUG__.applyMove(move)`
  - `window.__FERROUS_DEBUG__.failureReport()`
  - `window.__FERROUS_DEBUG__.runAutoplay(options)`

## Required failure payload

Every automation failure should capture:

- seed
- move history
- current game state
- screenshot
- browser trace
- console logs

`failureReport()` provides the engine-side fields (`seed`, `moveHistory`,
`currentState`) so UI harnesses only need to attach browser artifacts.

## Execution guidance

- Fast verification:
  - `cargo test -p solitaire_core -p solitaire_wasm`
- Full verification:
  - `cargo test --workspace`
  - `cargo clippy --workspace -- -D warnings`
- Long unattended soak:
  - `cargo test -p solitaire_wasm debug_api_autonomous_thousands_seed_soak -- --ignored`

### Browser e2e harness

The Playwright suite lives under `solitaire_server/e2e/` and boots
`solitaire_server` via Playwright `webServer` config.

- Install + run:
  - `cd solitaire_server/e2e`
  - `npm ci`
  - `npx playwright install chromium`
  - `npm test`
- Cycle metrics batch run:
  - `cd solitaire_server/e2e`
  - `npm run review:cycles -- --games 1000 --steps 350 --policy baseline --max-visits 1 --out /tmp/cycle-baseline.json`
  - `npm run review:cycles -- --games 1000 --steps 350 --policy loop_aware --max-visits 2 --out /tmp/cycle-loop-aware.json`
  - `npm run review:cycles:regression` (thresholded gate, writes `test-results/cycle-regression.json`)
  - `npm run review:cycles:candidate` (loop-aware candidate run, writes `test-results/cycle-candidate.json`)

### Cycle-risk regression baseline and guardrails

- Current regression gate command:
  - `npm run review:cycles:regression`
  - config: `games=240`, `steps=350`, `policy=baseline`, `max-visits=1`
- Current guardrail thresholds:
  - `all.cycle_rate_pct <= 86`
  - `draw1.cycle_rate_pct <= 76`
  - `draw3.cycle_rate_pct <= 95`
  - `all.win_rate_pct >= 14`
  - zero invariant/apply/page/console issue counts
- Baseline sample (240 games):
  - overall: `win_rate=15.8%`, `cycle_rate=84.2%`
  - draw-one: `win_rate=25.8%`, `cycle_rate=74.2%`
  - draw-three: `win_rate=5.8%`, `cycle_rate=94.2%`
- Candidate loop-aware sample (240 games, lookahead via simulated move + restore):
  - overall: `win_rate=20.4%`, `cycle_rate=32.5%`
  - draw-one: `win_rate=33.3%`, `cycle_rate=16.7%`
  - draw-three: `win_rate=7.5%`, `cycle_rate=48.3%`
  - no invariant/apply/page/console issues in the sampled run
- Additional 500-game candidate soak:
  - overall: `win_rate=20.2%`, `cycle_rate=28.6%`, `step_budget=51.2%`
  - draw-three remains the dominant risk (`cycle_rate=45.2%`)
- Fix applied: cycle metrics regression now supports explicit
  `max_step_budget_rate_*` thresholds. Candidate command now enforces
  `max_step_budget_rate_all <= 60` to prevent silent drift from cycles into
  step-budget stalls.