d45b7cb82b
solitaire_server/e2e/: - smoke.spec.js: verifies /play-classic loads, exposes window.__FERROUS_DEBUG__ bridge, keyboard parity (Space=draw, U=undo), debug failure report, and replay payload builder exports schema-v2 moves. - gameplay_review.spec.js: HUD/controls render check, stock-click + undo player flow, draw-mode toggle, autonomous play invariant batch, and cycle-detection regression guard. - cycle_metrics.js: headless cycle-rate analysis tool; run via `npm run review:cycles` with configurable policy, game count, and thresholds. Regression gate baked into package.json scripts. - playwright.config.js: targets the local server at http://localhost:8080. - package.json / package-lock.json: @playwright/test 1.60.0. .gitea/workflows/web-e2e.yml: - Runs on pushes to solitaire_server/, solitaire_wasm/, solitaire_core/, or Cargo changes. Starts the server binary, waits for /health, runs the full Playwright suite, uploads test-results/ on failure. docs/testing-architecture.md: documents the three-tier test strategy (unit → Playwright smoke → cycle regression) and the __FERROUS_DEBUG__ bridge contract. scripts/update_quaternions_deps.sh: helper to bump the Quaternions registry deps (klondike, card_game) by version and run the full safety gate including deterministic replay checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.1 KiB
4.1 KiB
Testing Architecture — Engine-first Validation
Ferrous Solitaire validation is split into three layers with clear ownership:
-
Rust unit tests (
solitaire_core)- move generation and legality
- deal generation determinism
- scoring and penalties
- undo semantics
- win detection
-
Engine integration tests (
solitaire_wasmdebug API)- autonomous game execution without UI/pointer simulation
- invariant checks after every move
- deterministic seed replay
- high-volume seeded runs (including long-running soak tests)
-
Playwright UI tests
- verify rendering vs engine state
- drag/drop and keyboard UX behavior
- responsive layout behavior
- browser-compatibility checks
Source of truth
The Rust engine is authoritative. Browser tests must interact with the game via debug API hooks, not via pixel/OCR solving or hardcoded screen coordinates.
Debug API surfaces
Two automation surfaces are exposed:
solitaire_wasm::SolitaireGamemethods:debug_snapshot()debug_legal_moves()debug_move_history()debug_apply_legal_move(index)debug_apply_move_json(json)
- Browser bridge on
game.html:window.__FERROUS_DEBUG__.snapshot()window.__FERROUS_DEBUG__.legalMoves()window.__FERROUS_DEBUG__.moveHistory()window.__FERROUS_DEBUG__.applyLegalMove(index)window.__FERROUS_DEBUG__.applyMove(move)window.__FERROUS_DEBUG__.failureReport()window.__FERROUS_DEBUG__.runAutoplay(options)
Required failure payload
Every automation failure should capture:
- seed
- move history
- current game state
- screenshot
- browser trace
- console logs
failureReport() provides the engine-side fields (seed, moveHistory,
currentState) so UI harnesses only need to attach browser artifacts.
Execution guidance
- Fast verification:
cargo test -p solitaire_core -p solitaire_wasm
- Full verification:
cargo test --workspacecargo clippy --workspace -- -D warnings
- Long unattended soak:
cargo test -p solitaire_wasm debug_api_autonomous_thousands_seed_soak -- --ignored
Browser e2e harness
The Playwright suite lives under solitaire_server/e2e/ and boots
solitaire_server via Playwright webServer config.
- Install + run:
cd solitaire_server/e2enpm cinpx playwright install chromiumnpm test
- Cycle metrics batch run:
cd solitaire_server/e2enpm run review:cycles -- --games 1000 --steps 350 --policy baseline --max-visits 1 --out /tmp/cycle-baseline.jsonnpm run review:cycles -- --games 1000 --steps 350 --policy loop_aware --max-visits 2 --out /tmp/cycle-loop-aware.jsonnpm run review:cycles:regression(thresholded gate, writestest-results/cycle-regression.json)npm run review:cycles:candidate(loop-aware candidate run, writestest-results/cycle-candidate.json)
Cycle-risk regression baseline and guardrails
- Current regression gate command:
npm run review:cycles:regression- config:
games=240,steps=350,policy=baseline,max-visits=1
- Current guardrail thresholds:
all.cycle_rate_pct <= 86draw1.cycle_rate_pct <= 76draw3.cycle_rate_pct <= 95all.win_rate_pct >= 14- zero invariant/apply/page/console issue counts
- Baseline sample (240 games):
- overall:
win_rate=15.8%,cycle_rate=84.2% - draw-one:
win_rate=25.8%,cycle_rate=74.2% - draw-three:
win_rate=5.8%,cycle_rate=94.2%
- overall:
- Candidate loop-aware sample (240 games, lookahead via simulated move + restore):
- overall:
win_rate=20.4%,cycle_rate=32.5% - draw-one:
win_rate=33.3%,cycle_rate=16.7% - draw-three:
win_rate=7.5%,cycle_rate=48.3% - no invariant/apply/page/console issues in the sampled run
- overall:
- Additional 500-game candidate soak:
- overall:
win_rate=20.2%,cycle_rate=28.6%,step_budget=51.2% - draw-three remains the dominant risk (
cycle_rate=45.2%)
- overall:
- Fix applied: cycle metrics regression now supports explicit
max_step_budget_rate_*thresholds. Candidate command now enforcesmax_step_budget_rate_all <= 60to prevent silent drift from cycles into step-budget stalls.