Files
Ferrous-Solitaire/docs/testing-architecture.md
T
funman300 d45b7cb82b
Build and Deploy / build-and-push (push) Successful in 1m6s
Web E2E / web-e2e (push) Successful in 4m40s
feat(e2e): add Playwright browser test suite for web routes
solitaire_server/e2e/:
- smoke.spec.js: verifies /play-classic loads, exposes window.__FERROUS_DEBUG__
  bridge, keyboard parity (Space=draw, U=undo), debug failure report, and
  replay payload builder exports schema-v2 moves.
- gameplay_review.spec.js: HUD/controls render check, stock-click + undo
  player flow, draw-mode toggle, autonomous play invariant batch, and
  cycle-detection regression guard.
- cycle_metrics.js: headless cycle-rate analysis tool; run via
  `npm run review:cycles` with configurable policy, game count, and
  thresholds. Regression gate baked into package.json scripts.
- playwright.config.js: targets the local server at http://localhost:8080.
- package.json / package-lock.json: @playwright/test 1.60.0.

.gitea/workflows/web-e2e.yml:
- Runs on pushes to solitaire_server/, solitaire_wasm/, solitaire_core/,
  or Cargo changes. Starts the server binary, waits for /health, runs
  the full Playwright suite, uploads test-results/ on failure.

docs/testing-architecture.md: documents the three-tier test strategy
  (unit → Playwright smoke → cycle regression) and the __FERROUS_DEBUG__
  bridge contract.

scripts/update_quaternions_deps.sh: helper to bump the Quaternions
  registry deps (klondike, card_game) by version and run the full
  safety gate including deterministic replay checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 12:40:30 -07:00

4.1 KiB

Testing Architecture — Engine-first Validation

Ferrous Solitaire validation is split into three layers with clear ownership:

  1. Rust unit tests (solitaire_core)

    • move generation and legality
    • deal generation determinism
    • scoring and penalties
    • undo semantics
    • win detection
  2. Engine integration tests (solitaire_wasm debug API)

    • autonomous game execution without UI/pointer simulation
    • invariant checks after every move
    • deterministic seed replay
    • high-volume seeded runs (including long-running soak tests)
  3. Playwright UI tests

    • verify rendering vs engine state
    • drag/drop and keyboard UX behavior
    • responsive layout behavior
    • browser-compatibility checks

Source of truth

The Rust engine is authoritative. Browser tests must interact with the game via debug API hooks, not via pixel/OCR solving or hardcoded screen coordinates.

Debug API surfaces

Two automation surfaces are exposed:

  • solitaire_wasm::SolitaireGame methods:
    • debug_snapshot()
    • debug_legal_moves()
    • debug_move_history()
    • debug_apply_legal_move(index)
    • debug_apply_move_json(json)
  • Browser bridge on game.html:
    • window.__FERROUS_DEBUG__.snapshot()
    • window.__FERROUS_DEBUG__.legalMoves()
    • window.__FERROUS_DEBUG__.moveHistory()
    • window.__FERROUS_DEBUG__.applyLegalMove(index)
    • window.__FERROUS_DEBUG__.applyMove(move)
    • window.__FERROUS_DEBUG__.failureReport()
    • window.__FERROUS_DEBUG__.runAutoplay(options)

Required failure payload

Every automation failure should capture:

  • seed
  • move history
  • current game state
  • screenshot
  • browser trace
  • console logs

failureReport() provides the engine-side fields (seed, moveHistory, currentState) so UI harnesses only need to attach browser artifacts.

Execution guidance

  • Fast verification:
    • cargo test -p solitaire_core -p solitaire_wasm
  • Full verification:
    • cargo test --workspace
    • cargo clippy --workspace -- -D warnings
  • Long unattended soak:
    • cargo test -p solitaire_wasm debug_api_autonomous_thousands_seed_soak -- --ignored

Browser e2e harness

The Playwright suite lives under solitaire_server/e2e/ and boots solitaire_server via Playwright webServer config.

  • Install + run:
    • cd solitaire_server/e2e
    • npm ci
    • npx playwright install chromium
    • npm test
  • Cycle metrics batch run:
    • cd solitaire_server/e2e
    • npm run review:cycles -- --games 1000 --steps 350 --policy baseline --max-visits 1 --out /tmp/cycle-baseline.json
    • npm run review:cycles -- --games 1000 --steps 350 --policy loop_aware --max-visits 2 --out /tmp/cycle-loop-aware.json
    • npm run review:cycles:regression (thresholded gate, writes test-results/cycle-regression.json)
    • npm run review:cycles:candidate (loop-aware candidate run, writes test-results/cycle-candidate.json)

Cycle-risk regression baseline and guardrails

  • Current regression gate command:
    • npm run review:cycles:regression
    • config: games=240, steps=350, policy=baseline, max-visits=1
  • Current guardrail thresholds:
    • all.cycle_rate_pct <= 86
    • draw1.cycle_rate_pct <= 76
    • draw3.cycle_rate_pct <= 95
    • all.win_rate_pct >= 14
    • zero invariant/apply/page/console issue counts
  • Baseline sample (240 games):
    • overall: win_rate=15.8%, cycle_rate=84.2%
    • draw-one: win_rate=25.8%, cycle_rate=74.2%
    • draw-three: win_rate=5.8%, cycle_rate=94.2%
  • Candidate loop-aware sample (240 games, lookahead via simulated move + restore):
    • overall: win_rate=20.4%, cycle_rate=32.5%
    • draw-one: win_rate=33.3%, cycle_rate=16.7%
    • draw-three: win_rate=7.5%, cycle_rate=48.3%
    • no invariant/apply/page/console issues in the sampled run
  • Additional 500-game candidate soak:
    • overall: win_rate=20.2%, cycle_rate=28.6%, step_budget=51.2%
    • draw-three remains the dominant risk (cycle_rate=45.2%)
  • Fix applied: cycle metrics regression now supports explicit max_step_budget_rate_* thresholds. Candidate command now enforces max_step_budget_rate_all <= 60 to prevent silent drift from cycles into step-budget stalls.