feat(e2e): add Playwright browser test suite for web routes
solitaire_server/e2e/: - smoke.spec.js: verifies /play-classic loads, exposes window.__FERROUS_DEBUG__ bridge, keyboard parity (Space=draw, U=undo), debug failure report, and replay payload builder exports schema-v2 moves. - gameplay_review.spec.js: HUD/controls render check, stock-click + undo player flow, draw-mode toggle, autonomous play invariant batch, and cycle-detection regression guard. - cycle_metrics.js: headless cycle-rate analysis tool; run via `npm run review:cycles` with configurable policy, game count, and thresholds. Regression gate baked into package.json scripts. - playwright.config.js: targets the local server at http://localhost:8080. - package.json / package-lock.json: @playwright/test 1.60.0. .gitea/workflows/web-e2e.yml: - Runs on pushes to solitaire_server/, solitaire_wasm/, solitaire_core/, or Cargo changes. Starts the server binary, waits for /health, runs the full Playwright suite, uploads test-results/ on failure. docs/testing-architecture.md: documents the three-tier test strategy (unit → Playwright smoke → cycle regression) and the __FERROUS_DEBUG__ bridge contract. scripts/update_quaternions_deps.sh: helper to bump the Quaternions registry deps (klondike, card_game) by version and run the full safety gate including deterministic replay checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,115 @@
|
||||
# Testing Architecture — Engine-first Validation
|
||||
|
||||
Ferrous Solitaire validation is split into three layers with clear ownership:
|
||||
|
||||
1. **Rust unit tests (`solitaire_core`)**
|
||||
- move generation and legality
|
||||
- deal generation determinism
|
||||
- scoring and penalties
|
||||
- undo semantics
|
||||
- win detection
|
||||
|
||||
2. **Engine integration tests (`solitaire_wasm` debug API)**
|
||||
- autonomous game execution without UI/pointer simulation
|
||||
- invariant checks after every move
|
||||
- deterministic seed replay
|
||||
- high-volume seeded runs (including long-running soak tests)
|
||||
|
||||
3. **Playwright UI tests**
|
||||
- verify rendering vs engine state
|
||||
- drag/drop and keyboard UX behavior
|
||||
- responsive layout behavior
|
||||
- browser-compatibility checks
|
||||
|
||||
## Source of truth
|
||||
|
||||
The Rust engine is authoritative. Browser tests must interact with the game via
|
||||
debug API hooks, not via pixel/OCR solving or hardcoded screen coordinates.
|
||||
|
||||
## Debug API surfaces
|
||||
|
||||
Two automation surfaces are exposed:
|
||||
|
||||
- `solitaire_wasm::SolitaireGame` methods:
|
||||
- `debug_snapshot()`
|
||||
- `debug_legal_moves()`
|
||||
- `debug_move_history()`
|
||||
- `debug_apply_legal_move(index)`
|
||||
- `debug_apply_move_json(json)`
|
||||
- Browser bridge on `game.html`:
|
||||
- `window.__FERROUS_DEBUG__.snapshot()`
|
||||
- `window.__FERROUS_DEBUG__.legalMoves()`
|
||||
- `window.__FERROUS_DEBUG__.moveHistory()`
|
||||
- `window.__FERROUS_DEBUG__.applyLegalMove(index)`
|
||||
- `window.__FERROUS_DEBUG__.applyMove(move)`
|
||||
- `window.__FERROUS_DEBUG__.failureReport()`
|
||||
- `window.__FERROUS_DEBUG__.runAutoplay(options)`
|
||||
|
||||
## Required failure payload
|
||||
|
||||
Every automation failure should capture:
|
||||
|
||||
- seed
|
||||
- move history
|
||||
- current game state
|
||||
- screenshot
|
||||
- browser trace
|
||||
- console logs
|
||||
|
||||
`failureReport()` provides the engine-side fields (`seed`, `moveHistory`,
|
||||
`currentState`) so UI harnesses only need to attach browser artifacts.
|
||||
|
||||
## Execution guidance
|
||||
|
||||
- Fast verification:
|
||||
- `cargo test -p solitaire_core -p solitaire_wasm`
|
||||
- Full verification:
|
||||
- `cargo test --workspace`
|
||||
- `cargo clippy --workspace -- -D warnings`
|
||||
- Long unattended soak:
|
||||
- `cargo test -p solitaire_wasm debug_api_autonomous_thousands_seed_soak -- --ignored`
|
||||
|
||||
### Browser e2e harness
|
||||
|
||||
The Playwright suite lives under `solitaire_server/e2e/` and boots
|
||||
`solitaire_server` via Playwright `webServer` config.
|
||||
|
||||
- Install + run:
|
||||
- `cd solitaire_server/e2e`
|
||||
- `npm ci`
|
||||
- `npx playwright install chromium`
|
||||
- `npm test`
|
||||
- Cycle metrics batch run:
|
||||
- `cd solitaire_server/e2e`
|
||||
- `npm run review:cycles -- --games 1000 --steps 350 --policy baseline --max-visits 1 --out /tmp/cycle-baseline.json`
|
||||
- `npm run review:cycles -- --games 1000 --steps 350 --policy loop_aware --max-visits 2 --out /tmp/cycle-loop-aware.json`
|
||||
- `npm run review:cycles:regression` (thresholded gate, writes `test-results/cycle-regression.json`)
|
||||
- `npm run review:cycles:candidate` (loop-aware candidate run, writes `test-results/cycle-candidate.json`)
|
||||
|
||||
### Cycle-risk regression baseline and guardrails
|
||||
|
||||
- Current regression gate command:
|
||||
- `npm run review:cycles:regression`
|
||||
- config: `games=240`, `steps=350`, `policy=baseline`, `max-visits=1`
|
||||
- Current guardrail thresholds:
|
||||
- `all.cycle_rate_pct <= 86`
|
||||
- `draw1.cycle_rate_pct <= 76`
|
||||
- `draw3.cycle_rate_pct <= 95`
|
||||
- `all.win_rate_pct >= 14`
|
||||
- zero invariant/apply/page/console issue counts
|
||||
- Baseline sample (240 games):
|
||||
- overall: `win_rate=15.8%`, `cycle_rate=84.2%`
|
||||
- draw-one: `win_rate=25.8%`, `cycle_rate=74.2%`
|
||||
- draw-three: `win_rate=5.8%`, `cycle_rate=94.2%`
|
||||
- Candidate loop-aware sample (240 games, lookahead via simulated move + restore):
|
||||
- overall: `win_rate=20.4%`, `cycle_rate=32.5%`
|
||||
- draw-one: `win_rate=33.3%`, `cycle_rate=16.7%`
|
||||
- draw-three: `win_rate=7.5%`, `cycle_rate=48.3%`
|
||||
- no invariant/apply/page/console issues in the sampled run
|
||||
- Additional 500-game candidate soak:
|
||||
- overall: `win_rate=20.2%`, `cycle_rate=28.6%`, `step_budget=51.2%`
|
||||
- draw-three remains the dominant risk (`cycle_rate=45.2%`)
|
||||
- Fix applied: cycle metrics regression now supports explicit
|
||||
`max_step_budget_rate_*` thresholds. Candidate command now enforces
|
||||
`max_step_budget_rate_all <= 60` to prevent silent drift from cycles into
|
||||
step-budget stalls.
|
||||
Reference in New Issue
Block a user