# Testing Architecture — Engine-first Validation Ferrous Solitaire validation is split into three layers with clear ownership: 1. **Rust unit tests (`solitaire_core`)** - move generation and legality - deal generation determinism - scoring and penalties - undo semantics - win detection 2. **Engine integration tests (`solitaire_wasm` debug API)** - autonomous game execution without UI/pointer simulation - invariant checks after every move - deterministic seed replay - high-volume seeded runs (including long-running soak tests) 3. **Playwright UI tests** - verify rendering vs engine state - drag/drop and keyboard UX behavior - responsive layout behavior - browser-compatibility checks ## Source of truth The Rust engine is authoritative. Browser tests must interact with the game via debug API hooks, not via pixel/OCR solving or hardcoded screen coordinates. ## Debug API surfaces Two automation surfaces are exposed: - `solitaire_wasm::SolitaireGame` methods: - `debug_snapshot()` - `debug_legal_moves()` - `debug_move_history()` - `debug_apply_legal_move(index)` - `debug_apply_move_json(json)` - Browser bridge on `game.html`: - `window.__FERROUS_DEBUG__.snapshot()` - `window.__FERROUS_DEBUG__.legalMoves()` - `window.__FERROUS_DEBUG__.moveHistory()` - `window.__FERROUS_DEBUG__.applyLegalMove(index)` - `window.__FERROUS_DEBUG__.applyMove(move)` - `window.__FERROUS_DEBUG__.failureReport()` - `window.__FERROUS_DEBUG__.runAutoplay(options)` ## Required failure payload Every automation failure should capture: - seed - move history - current game state - screenshot - browser trace - console logs `failureReport()` provides the engine-side fields (`seed`, `moveHistory`, `currentState`) so UI harnesses only need to attach browser artifacts. ## Execution guidance - Fast verification: - `cargo test -p solitaire_core -p solitaire_wasm` - Full verification: - `cargo test --workspace` - `cargo clippy --workspace -- -D warnings` - Long unattended soak: - `cargo test -p solitaire_wasm debug_api_autonomous_thousands_seed_soak -- --ignored` ### Browser e2e harness The Playwright suite lives under `solitaire_server/e2e/` and boots `solitaire_server` via Playwright `webServer` config. - Install + run: - `cd solitaire_server/e2e` - `npm ci` - `npx playwright install chromium` - `npm test` - Cycle metrics batch run: - `cd solitaire_server/e2e` - `npm run review:cycles -- --games 1000 --steps 350 --policy baseline --max-visits 1 --out /tmp/cycle-baseline.json` - `npm run review:cycles -- --games 1000 --steps 350 --policy loop_aware --max-visits 2 --out /tmp/cycle-loop-aware.json` - `npm run review:cycles:regression` (thresholded gate, writes `test-results/cycle-regression.json`) - `npm run review:cycles:candidate` (loop-aware candidate run, writes `test-results/cycle-candidate.json`) ### Cycle-risk regression baseline and guardrails - Current regression gate command: - `npm run review:cycles:regression` - config: `games=240`, `steps=350`, `policy=baseline`, `max-visits=1` - Current guardrail thresholds: - `all.cycle_rate_pct <= 86` - `draw1.cycle_rate_pct <= 76` - `draw3.cycle_rate_pct <= 95` - `all.win_rate_pct >= 14` - zero invariant/apply/page/console issue counts - Baseline sample (240 games): - overall: `win_rate=15.8%`, `cycle_rate=84.2%` - draw-one: `win_rate=25.8%`, `cycle_rate=74.2%` - draw-three: `win_rate=5.8%`, `cycle_rate=94.2%` - Candidate loop-aware sample (240 games, lookahead via simulated move + restore): - overall: `win_rate=20.4%`, `cycle_rate=32.5%` - draw-one: `win_rate=33.3%`, `cycle_rate=16.7%` - draw-three: `win_rate=7.5%`, `cycle_rate=48.3%` - no invariant/apply/page/console issues in the sampled run - Additional 500-game candidate soak: - overall: `win_rate=20.2%`, `cycle_rate=28.6%`, `step_budget=51.2%` - draw-three remains the dominant risk (`cycle_rate=45.2%`) - Fix applied: cycle metrics regression now supports explicit `max_step_budget_rate_*` thresholds. Candidate command now enforces `max_step_budget_rate_all <= 60` to prevent silent drift from cycles into step-budget stalls.