PSSaaS Architect — Phase 9: Parallel Validation Against Desktop App

Role: PSSaaS Systems Architect Phase: PowerFill Phase 9 (Parallel Validation + Tom/Greg Critique) — empirical proof that PSSaaS PowerFill produces per-loan-equivalent allocations to the legacy Desktop App on the same input data Date dispatched: TBD (next Architect session; can dispatch in parallel with Phase 8.5 work since the surfaces don't overlap) Model required: Opus 4.7 High Thinking — verify in the Cursor model picker before responding. If running under any other model, STOP and escalate. Estimated effort: Spec line 651 ("Phase 9 — Validation + Tom/Greg critique + cutover — 2-3 weeks"). Phase 9 as scoped here is the validation harness + first comparison run + first Tom/Greg review surface, NOT cutover (that's Phase 10+). Realistic estimate: 3-5 Architect-sessions for the harness + first end-to-end comparison + the report-generation surface; subsequent customer-DB validation runs are operator-driven (Phase 9 builds the lever; the PO + customer reps pull it).

Predecessor work: Phase 8 fully COMPLETE as of 2026-04-19 — both workstreams + the A54 fix + the F-W2-PSD-1 / Path γ tenant-slot fix all shipped. The 6-step PowerFill orchestration runs end-to-end-Complete on PS_DemoData via staging API; the React UI surface at https://pssaas.staging.powerseller.com/app/ exposes the operator workflow (auth deferred to Phase 8.5; staging is currently public). Phase 9 is the next sequential phase per spec, dispatched in parallel with Phase 8.5 (auth + Superset embedding) which has its own kickoff queued behind PSX-Infra-side work (#30 + #31 in session-handoff Backlog).

FOUR substantive context updates since the W2 kickoff (f4531ae) that Phase 9 must internalize:

End-to-end Complete-run on PS_DemoData empirically achievable post-A54-fix (cf8ef8b); 12+ historical runs in pfill_run_history (3 Complete + 7 Failed + 2 Cancelled) provide a baseline corpus for the harness to compare against.
A66 + A65 banked observations are now load-bearing for Phase 9's harness design — A65 directly names the harness as the proving-ground for "do the multi-pa_key + settlement-date-variance triggers fire on this customer DB?" probes; A66 names the post-Complete empty-output behavior on syn-trade-empty datasets like PS_DemoData (Phase 9 must distinguish "PSSaaS produced empty because UE rebuild-empty per A66" from "PSSaaS produced empty because of a bug" — the harness's verdict logic must understand both).
A68 added (tenant-id-vs-config-slot conflation as design wart). Phase 9's harness will be a third writer into pfill_run_history (the harness itself, possibly running on AKS or a dev workstation) — the convention used must agree with the existing 'ps-demodata' tag, OR Phase 8.5 closure (long-term decoupling of TenantId from connection-string-slot routing) must land first. Currently the safer assumption is "convention-agree with existing rows" (write tenant_id='ps-demodata'); fold-into-Phase-8.5 is the long-term answer.
Phase 8.5 (NEW) is queued in parallel — PSSaaS joins ecosystem auth (Keycloak via oauth2-proxy) + replaces "View in Superset" anchor links with embedded SDK. Surfaces don't overlap with Phase 9 (Phase 9 is harness + comparison + reports; 8.5 is auth + embed). Phase 9 should NOT make the harness depend on auth being landed — the harness should be runnable today against unauth staging, and remain runnable against Phase 8.5's auth-protected staging by adding an OIDC-token mint step in the harness invocation. Build the harness assuming it'll need to grow that step but doesn't yet.

Per the PO's strategic frame (re-stated for Architect awareness, since Phase 9 ships the load-bearing demo asset): the PSSaaS migration thesis is "chunk-by-chunk extraction from the Desktop App, with each chunk verifiably matching legacy behavior loan-by-loan against real customer data." Phase 9's harness is the empirical proof of that thesis for chunk #1 (PowerFill). Without it, the Greg demo is "look at our cool new UI"; with it, the Greg demo is "here is empirical evidence the chunk-extraction model works." This framing is in the PO-written A54-fix completion report (docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.md §"Bug as Feature" demo narrative); Phase 9's harness output should slot into that demo narrative as the "and here's loan-by-loan parity proof" slide.

Session-start checklist

Read these in this order before doing anything else:

CLAUDE.md — project identity, role-identification procedure, push-is-an-ask convention
AGENTS.md — agent memory: principles, lessons, F-PSD findings summary
docs-site/docs/agents/architect-context.md — your role definition
docs-site/docs/agents/process-discipline.md — canonical practices, gates, antipatterns. Per banked observation 2026-04-19 (commit 8dba7b4 message), the next revision is likely to add "Subagent Output Defended Beyond Scope" antipattern + a Backlog re-read pass canonical promotion + a refinement of practice #13 cell-granularity ("does artifact X get written to with the same convention from every environment that can write it?"). Use the latest committed version of process-discipline.md as authoritative, but anticipate these refinements landing during Phase 9 if the next discipline-doc revision ships in parallel.
docs-site/docs/agents/handoff-prompts.md — Templates for delegation
docs-site/docs/handoffs/pssaas-session-handoff.md — current state. Re-read the Backlog table at planning time per the trigger-based countermeasure shape (canonical commit 10133c6) — Backlog re-read pass at planning-start now has 3-instance corroboration (F-7-7 anticipated; F-8-BR-1 caught; F-W2-CONTRACT-1 + F-W2-BR-3 caught) and the pattern-recognition is canonical-adoption-ready. Use it explicitly in §2 of your plan.
docs-site/docs/specs/powerfill-engine.md — full spec. Phase 9-relevant sections: §Phased Implementation (line ~651), §Algorithm (the per-stage semantics A1 documents, which Phase 9's harness validates per-loan), §Data Contracts (the 8 Phase 7 endpoint shapes are the harness's read surface), §Run APIs (the operator-grade run-mgmt surface from Phases 6e/7).
docs-site/docs/handoffs/powerfill-phase-8-w2-completion.md — most recent completion report. The §Capability × Environment matrix demonstrates practice #13 application; Phase 9's harness completion report should produce the same matrix shape, but with cells about per-customer-DB validation runs instead of per-environment deploy verifications (per Architect's W2 recommendation #2: "Phase 9 should add a parallel-validation harness 'data-shape compatibility' pre-flight per A65 + A66 + the W2 environment matrix").
docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.md — the PO-facing Greg-demo narrative. Phase 9's harness output is the load-bearing slide of that narrative. The harness's report format should slot into the §"Bug as Feature" 5-slide structure, specifically as the "and here's loan-by-loan parity proof against real customer data" addition.
docs-site/docs/handoffs/powerfill-phase-7-completion.md — Phase 7 endpoint contracts (the 8 GET endpoints the harness reads from). Per A67 + F-W2-CONTRACT-1: the canonical wire-shape is RunEndpoints.cs MapGet/MapPost registrations + the OpenAPI document at /api/powerfill/swagger/, NOT the ReportContracts.cs XML doc-comments (which still claim a /reports/ path segment that doesn't exist).
docs-site/docs/legacy/powerfill-deep-dive.md + docs-site/docs/legacy/n_cst_powerfill.sru (in raw legacy source) — the Desktop App side of the parallel validation. The harness invokes the Desktop App's PowerFill plugin path; the Architect needs a working understanding of how the Desktop App is invoked headlessly OR via PowerBuilder script-mode OR via direct T-SQL exec of psp_powerfill_* against PS_DemoData (skipping the PB front-end entirely; the PB layer mostly orchestrates the same psp_powerfill_* calls PSSaaS now wraps).
docs-site/docs/specs/powerfill-assumptions-log.md — A1 (revised; Phase 9's per-loan correctness validation is the gate per A1's Banking note); A28 + A37 (RESOLVED); A38 (RESOLVED); A41-A45; A47-A58; A60 (latest-Complete-wins; affects harness's read-side semantics); A61; A62 (PS_DemoData view drift; Phase 9 close-out per Backlog #24); A63-A64 (Phase 8 W1; A64 platform-tailwind note about multi-tenant Superset registration becoming easier under platform-Superset); A65 (Phase 9 directly named: harness must probe multi-pa_key + settlement-date variance triggers on each customer DB); A66 (NEW; harness's verdict logic must distinguish UE-rebuild-empty from buggy-empty); A67 (cosmetic XML doc fix, harness can serve as the forcing function to align it); A68 (NEW 2026-04-20; harness convention must agree with existing 'ps-demodata' tag OR fold the long-term decoupling into Phase 8.5).
docs-site/docs/devlog/ — most recent: 2026-04-19g-powerfill-phase-8-w2.md is the W2 ship; expect a 2026-04-19h-powerfill-phase-9-kickoff.md as your devlog at session end.

After reading those, acknowledge your role and proceed.

YOUR TASK — Phase 9: Parallel Validation Harness

Per the spec's Phase 9 row + every prior phase's "Phase 9 parallel-validation will exercise this" carry-over (A1, A47, A53, A54-historical, A56-historical, A58, A65, A66 directly named), Phase 9 ships:

Harness: A reproducible tool that takes a frozen input snapshot, runs PowerFill in PSSaaS (via the staging API or a local equivalent), runs PowerFill in the Desktop App equivalent path (likely direct T-SQL exec of the legacy procs against the same DB, skipping the PB front-end), and produces a per-loan-pool-allocation diff report with a verdict (Match / TolerableDiff / Divergent) per loan + a summary stat per run.

First comparison: One end-to-end harness run against PS_DemoData with a documented input snapshot, producing a comparison report demonstrating the harness works. NOT a multi-customer-DB sweep; that's operator-driven post-Phase-9.

Demo asset: The harness's first-run output report formatted to slot into the existing PO-written powerfill-a54-fix-greg-demo-readiness.md "Bug as Feature" 5-slide demo narrative as the "loan-by-loan parity proof" addition.

Phase 9 has no React UI work (auth + embedding is Phase 8.5, dispatched in parallel) and no production cutover work (that's Phase 10+). Phase 9's surface is the harness + first comparison + the report shape the PO can use with Greg.

Inherited context (do not re-litigate)

Topic	State as of `9a83b92`
Phase 7 / Phase 8 W1 / Phase 8 W2 / A54 fix / Path γ / A68 banked / Backlog #30 + #31 / A64+A68 platform-tailwinds	All COMPLETE as of HEAD; sentinel `phase-8-superset-react-ready-a54-fixed`; staging at `https://pssaas.staging.powerseller.com/` (currently public; Phase 8.5 will gate behind Keycloak in parallel with Phase 9)
End-to-end Complete-run on PS_DemoData	Empirically achievable in ~30s post-A54-fix; 12+ rows in `pfill_run_history` (3 Complete + 7 Failed + 2 Cancelled); the canonical tagged-`tenant_id='ps-demodata'` row population per the local-route convention
6-step orchestration	All 6 steps (BX cash-grids → BX settle-and-price → candidates → conset → pool_guide → UE) exercise green end-to-end on PS_DemoData
Existing endpoints (which the harness reads from)	`POST /run` (202; the harness triggers the PSSaaS run via this), `GET /runs/{id}` (the harness polls via this for run completion), 8 Phase 7 `GET /runs/{id}/<report>` (the harness reads outputs via these)
A1 per-stage allocation semantics	Documented in the assumptions log; Phase 9's per-loan correctness validation is the gate (per A1's Banking note "the legacy proc body deploys verbatim per ADR-021; per-stage-semantic correctness validation against Desktop App output is the Phase 9 parallel-validation gate")
A65 — multi-pa_key + settlement-date-variance	Two distinct A54 triggers; both fire on PS_DemoData. The harness's pre-flight should explicitly probe both on each tenant DB it validates against (per the W2 Architect's Phase 9 Recommendation #2)
A66 — UE clears + rebuilds-empty on syn-trade-empty datasets	The harness's verdict logic must distinguish "PSSaaS empty because UE rebuild-empty per A66" (Match if Desktop App also produces empty under the same path) from "PSSaaS empty because of a bug" (Divergent). Hub Dashboard 1 (run history) is the canonical proof-of-life on PS_DemoData; the user-facing report tables are 0-row by design on this dataset
A68 — tenant-id-vs-config-slot conflation	The harness will be a third writer into `pfill_run_history`. Currently safest convention: write `tenant_id='ps-demodata'` (matches existing 12 rows on PS_DemoData). Long-term: fold the decoupling into Phase 8.5 per A68's platform-tailwind note. Phase 9 should NOT re-tag existing rows; it should adopt the existing convention.
Backlog #30 — Superset → pss-platform migration	DONE 2026-04-19 (PSX Infra completed during Phase 9 dispatch window; ~3 min cutover). Hostname unchanged at `bi.staging.powerseller.com`; all 20 dashboards / 56 charts / 77 datasets preserved (`pg_dump` immediately pre-cutover); same Keycloak SSO + same `superset` OIDC client + same admin credentials; both data sources (PSX postgres cross-namespace via FQDN + SQL MI via VNet peering) still work; same Superset Python image SHA so `pymssql` still installed. Old `psx-staging` Superset pod scaled to 0 with deployment retained for 24-hour rollback insurance. Phase 9 implication: NONE. The base-URL paranoia from this kickoff's draft no longer applies — `bi.staging.powerseller.com` is the stable forever URL. Hard-coded `psx-staging` namespace references in our codebase need updating (Collaborator-side sweep in flight); the Architect can ignore the namespace for harness purposes since the harness reads from the PSSaaS API, not Superset directly.
Backlog #31 — Phase 8.5 (ecosystem auth + embedded Superset)	Queued in parallel with Phase 9. Not a Phase 9 prerequisite. Phase 9's harness should be runnable today against unauth staging, and remain runnable against Phase 8.5's auth-protected staging by adding an OIDC-token mint step in the harness invocation. Build the harness assuming it'll grow that step but doesn't yet.
W2 React UI	Live at `https://pssaas.staging.powerseller.com/app/`. Not directly Phase 9-relevant (the harness is operator-driven, not UI-driven), but the UI's Hub dashboard link + run-status page provide a useful click-through for Phase 9 reviewers wanting to see "did this PSSaaS run actually Complete?" without curling.
Canonical Claim-vs-Evidence family + practice #13	Live (commits `95084cb` + `10133c6` + `d4a70af` + `863c139` + `4b08b51`). Most relevant for Phase 9: practice #13 — the harness's first-run completion report MUST produce a Capability × Environment matrix with explicit "verified vs not measured here" per cell. The W2 completion report's matrix is the canonical example to copy; Phase 9 adapts it to "per customer DB" cells (PS_DemoData verified; PS608 customer DB NOT MEASURED HERE pending customer-rep approval; future tenants NOT MEASURED HERE).
W2 process observation banked but not yet canonical	"Subagent Output Defended Beyond Scope" antipattern (the PS608-tenant-dropdown scope drift); "convention conflation under low-corroboration count" (A68 root pattern); "Single-Probe Confidence" (the PSX Infra falsification + the embedded-SDK-OFF claim). Phase 9 may surface more instances; bank as observations in the completion report's Counterfactual Retro, don't pre-litigate canonical adoption.

Explicit scope (IN)

Workstream 1: Harness design + first comparison

Architectural decision (Alternatives-First Gate): how to invoke the Desktop App side of the comparison? Three candidate options:
- (A) Direct T-SQL exec of the legacy psp_powerfill_conset + psp_powerfill_pool_guide + psp_powerfillUE procs against PS_DemoData via sqlcmd, skipping the PowerBuilder front-end entirely. The n_cst_powerfill.sru is mostly orchestration over these procs anyway. Lowest friction; matches what we already do for ad-hoc PoC runs.
- (B) PowerBuilder headless invocation via PB's command-line / script-mode if available — gives true Desktop-App-equivalent path including any PB-side state setup. Higher friction; uncertain whether headless PB invocation is straightforward.
- (C) Snapshot-then-compare — take a PS_DemoData snapshot of the relevant pfill_* tables BEFORE Desktop App runs, have a human (or the PO) trigger Desktop App via the normal UI, take a snapshot AFTER, compare PSSaaS-against-the-pre-snapshot vs Desktop-App-against-the-pre-snapshot. Trades headless invocation difficulty for human-in-the-loop overhead.
- Recommend: (A) for the first harness instance; document (B)/(C) as future extensions in the Phase 9 ADR. A is the direct-comparison path: PSSaaS (via staging API) and Desktop App equivalent (via direct sqlcmd) both touch the same procs against the same DB; the diff is between whatever the procs produce in the two invocations. Note: (A) trades "exact Desktop App path" for "exact legacy proc body path" — A1's Banking note specifies the legacy proc body is the canonical contract per ADR-021.
- Document choice in NEW ADR-027 (Phase 9 Parallel Validation Harness Design).
NEW tools/parallel-validation/ directory (or wherever fits the repo layout — Architect's decision; document in ADR-027). Contains:
- The harness invoker (Python? .NET console app? PowerShell? — Alternatives-First Gate decision; recommend Python for sqlcmd + JSON manipulation + report rendering ergonomics, but Architect can defend an alternative).
- A harness_config.yaml (or similar) declaring the input snapshot + the comparison thresholds + the per-column tolerance bands.
- The output-report renderer (Markdown or HTML for the PO to share with Greg + Tom).
Input-snapshot capture mechanism — what gets frozen? At minimum:
- The pfill_run_history options_json from the reference run (so the harness re-runs with identical options).
- The DB state checksum (e.g. loan + pscat_* + pfill_constraints + pfill_carry_cost row counts + a hash of loan.id+loan.note_rate for the relevant pipeline subset).
- The reference timestamp (so PSSaaS's start_date_default derivation matches exactly per Q9).
PSSaaS side invocation:
- Trigger via POST /run with the resolved options from the input snapshot.
- Poll GET /runs/{id} for terminal state (Complete / Failed / Cancelled).
- Read all 8 Phase 7 reports.
- Optionally: also direct-query pfill_pool_guide etc. for cross-checks the report APIs don't cover.
Desktop App side invocation (assuming Option A):
- Run the same 6-step proc sequence directly via sqlcmd against the SAME DB but possibly different pa_key / scratch-table namespace to avoid colliding with the in-flight PSSaaS run. Architect must figure out the scratch-table isolation pattern (##cte_* global temp tables are session-scoped; consider running PSSaaS and Desktop-App-equivalent in separate sqlcmd sessions).
- Capture the same 8 report-shape outputs from the post-run state.
Diff engine + verdict logic:
- Per-loan comparison across the relevant report shapes (Pooling Guide, Cash Trade Slotting, Recap, Pool Candidates, Switching, Kickouts, Existing Disposition, Guide).
- Per-column tolerance (e.g. price comparisons within $0.001; carry-cost within rounding tolerance per A36; row-count exact equality for inclusion/exclusion lists).
- Verdict per loan: Match / TolerableDiff / Divergent.
- Verdict per run: aggregate stats (% Match / % TolerableDiff / % Divergent; absolute Divergent count).
- A66-aware: if PSSaaS produces 0 rows on a syn-trade-empty dataset AND the Desktop-App-equivalent path produces 0 rows on the same dataset, that's Match (NOT Divergent). The verdict logic must be aware that "both produce empty for the same A66 reason" is the right answer.
Output report shape:
- Markdown (or HTML) artifact suitable for inclusion in a Greg-demo deck.
- Top of report: per-run summary verdict (e.g. "PSSaaS-vs-Desktop-App on PS_DemoData run <harness-run-id>: 514/515 loans Match; 1 TolerableDiff (rounding); 0 Divergent. Time: 30s PSSaaS + 28s Desktop-App-equivalent.").
- Per-section breakdown by report type.
- Per-Divergent-loan detail with the specific column(s) that diverged, the PSSaaS value, the Desktop-App value, and the magnitude of the divergence.
- Slot-into-PO-demo: the report's top-level summary line is what slots into the existing powerfill-a54-fix-greg-demo-readiness.md "Bug as Feature" demo as the "loan-by-loan parity proof" slide.

Workstream 2: Phase 9 close-out items (carried over from prior phases)

The accumulated "close at Phase 9" carry-overs from prior phases:

A62 closure (PS_DemoData view drift): per Backlog #24, deploy 002_CreatePowerFillViews.sql to PS_DemoData OR rename PSSaaS view to pfillv2_*. Phase 9 is the natural close point (the harness will exercise the existing-disposition endpoint; if A62 is open, the harness emits a Note-handling carve-out for that endpoint). Recommend: rename PSSaaS view to pfillv2_* (less risk than blind-overwriting an encrypted legacy view; matches the PSSaaS-namespacing convention).
A67 closure (ReportContracts.cs XML doc-comment Truth Rot): the harness's read surface IS the canonical contract; Phase 9 should fix the XML docs to match the actual RunEndpoints.cs routes. Trivial 8-line edit; satisfies the W2 Architect's Recommendation #4 to use Phase 9 as the forcing function.
Process-discipline observations from W2-deploy session (banked but not yet canonical): "Subagent Output Defended Beyond Scope" + "convention conflation under low-corroboration count" + "Single-Probe Confidence". Phase 9's Counterfactual Retro should reference these; if Phase 9 surfaces additional instances of any, that's the canonical-promotion forcing function.

Cross-cutting

Status sentinel bump to phase-9-validation-ready (preserves the phase-N-<short-name> pattern; do NOT carry the -a54-fixed sub-suffix forward into Phase 9 — A54 closure is now historical, not gating).
Spec amendment to docs-site/docs/specs/powerfill-engine.md — Phase 9 row in §Phased Implementation table marks "validation harness DONE; first comparison run DONE; cutover not in scope".
Assumptions log additions — A69+ for new Phase 9 findings.
NEW ADR-027 — Phase 9 Parallel Validation Harness Design (mandatory if harness lands this session).
Pre-push docs-build check per Phase 6e/7/8-W1/8-W2 banked discipline: docker build -f docs-site/Dockerfile.prod docs-site before push if any new docs-site/docs/** files created.

Explicit scope (OUT)

Multi-customer-DB sweep — operator-driven post-Phase-9 (Phase 9 builds the lever; PO + customer reps pull it for PS608 / future tenants once customer-rep approval lands).
Production cutover — Phase 10+ (the spec's "cutover" wording in the Phase 9 row is a 2026-03 spec-line assumption that's been superseded by the 8.5 + 9 + 10 split; Phase 9 ships validation only).
Auth integration — Phase 8.5.
Superset embedded SDK — Phase 8.5.
React UI changes for harness output viewing — out of scope; the harness output is a Markdown/HTML artifact that the PO views via the docs site OR shares as a leave-behind.
New PowerFill API endpoints — Phase 6e + 7 closed those; Phase 9 reads from the existing surface.
Real-time monitoring dashboard for harness runs — out of scope; Phase 9 produces one-shot reports.
Performance tuning of PowerFill itself — out of scope unless the harness reveals a per-loan correctness issue rooted in a perf shortcut.
Re-tagging existing pfill_run_history rows to a new tenant_id convention — A68's long-term decoupling lives in Phase 8.5; Phase 9 adopts the existing 'ps-demodata' convention.
A54 fix re-litigation — RESOLVED 2026-04-19; if the harness reveals A54 is fired by a customer DB in a way the fix doesn't address, that's an A65 follow-up filed against the customer DB, NOT a re-opening of A54's fix.

Process discipline (canonical, non-negotiable)

Gates that must produce documented output

Gate	Where to apply	What "documented output" means
Three-layer Primary-Source Verification Gate (now 3-instance corroborated; canonical-promotion-anticipated)	Spec-vs-implementation: verify Phase 7 endpoint contracts match what the harness's HTTP-client layer assumes. NVO-vs-implementation: verify the Desktop-App-equivalent invocation (Option A or B or C) actually exercises the same proc body PSSaaS does. Implementation-vs-runtime: re-read session-handoff Backlog table during planning; F-W2-CONTRACT-1 + F-W2-BR-3 caught at W2 planning are canonical evidence this catches issues before the PoC.	A Phase 9 plan §2 findings table per layer + explicit Backlog re-read pass log per row.
Alternatives-First Gate	At least 3 architectural decisions: (a) Desktop-App invocation path (A / B / C above); (b) harness implementation language (Python / .NET / PowerShell / TypeScript); (c) verdict-rendering format (Markdown / HTML / JSON-with-static-renderer).	A Phase 9 plan §3 alternatives section per decision; ADR-027 for the harness-design choices.
Required Delegation Categories	Heavily delegable: per-report diff logic (one delegated subagent per report shape = up to 8 micro-deliverables); the harness invoker scaffold; the verdict-renderer template. Self-implement: the architectural-contract-per-artifact load-bearing parts — Desktop-App invocation pattern + the "is this a real divergence or an A66 expected-empty?" verdict logic + the snapshot-capture mechanism.	A Phase 9 plan §8 delegation inventory with subagent prompts AND Deliberate Non-Delegation justifications per practice #9.
Reviewable Chunks at intra-session scope	Consider checkpointing after the harness scaffold lands + first end-to-end comparison run (against a known-good PS_DemoData input) before producing the full per-report diff output.	If checkpointing, send a plan-stage Architect Report after the first end-to-end run.
Deploy Verification Gate	Arm (a) sentinel = `phase-9-validation-ready`. Arm (b) harness invocable from a non-Architect machine (Collaborator-side reproducibility check). Arm (c) end-to-end harness run against PS_DemoData produces a verdict report.	A Phase 9 completion report Markdown citing screenshots / report excerpts + the per-tenant-DB Capability matrix per practice #13.
Counterfactual Retro	At session end	A retro section. Phase 8 W2 banked 7 observations including "Backlog re-read pass IS canonical-adoption-ready at 3-instance corroboration"; Phase 9 should report whether the practice continues to pay off (or if 4-instance corroboration justifies pulling the trigger on canonical promotion).

Antipatterns to avoid (canonical list applies)

Phase-0 Truth Rot — A57 was qualified by A59; the Backlog re-read pass at planning-start has been the empirical safeguard. Phase 9's harness design WILL surface contract-vs-implementation drift if any exists; embrace it as forcing function for A62 + A67 closure.
Empirical-Citation Type Mismatch (Phase 5 origin) — when reading from the 8 Phase 7 endpoints in the harness, use the actual JSON property names from ReportContracts.cs (snake_case via [JsonPropertyName]), NOT the C# property names (PascalCase). The OpenAPI / Swagger UI at http://pssaas.powerseller.local/api/swagger/ is the canonical wire-shape reference; consider auto-generating types from it (or manually mirroring as the W2 React UI does).
Verification Avoidance (Phase 4 origin) — dotnet build (if any backend changes) + the harness's own test surface (if any) before declaring complete; the harness's first end-to-end comparison run on PS_DemoData IS the integration test.
Ghost Deploy (PSX origin) — the harness is operator-driven, not deployed-as-a-service, so this antipattern doesn't directly apply BUT the harness's invocation pattern should support content-match verification (e.g. the harness's output report should include the exact PSSaaS sentinel + Desktop-App-proc-version it ran against, so the PO can verify in seconds whether the report is from the right code).
Delegation Skip (Phase 4 origin) — per-report diff logic is the heaviest delegation candidate; architectural-contract-per-artifact decisions (ADR-027, the verdict semantics, the snapshot-capture mechanism) are yours to self-implement.
Capability Inflation (Phase 8 W1 / Claim-vs-Evidence family) — the harness's first comparison run produces a result on PS_DemoData ONLY. Do NOT extend that to "validates PowerFill on customer data" — that's a Capability Inflation framing. The honest claim is "validates PowerFill behavior on PS_DemoData against the legacy proc body when invoked via [chosen Option]". Customer DB validation is post-Phase-9 operator work.
Capability Drift (Claim-vs-Evidence family) — if the W2 React UI's tenant-picker constants drift between sessions (e.g. someone re-introduces PS608), the harness assumptions could quietly become wrong. Phase 9's harness should explicitly verify the tenant-id convention agrees with pfill_run_history row tags before invoking PSSaaS — a 1-query pre-flight that surfaces A68-class drift early.
Subagent Output Defended Beyond Scope (banked but not yet canonical; W2 origin) — when reviewing delegated diff-logic subagent output, the explicit first question is "is this what the kickoff asked for?" before any disposition framing. Output additions outside scope are removed unless re-justified against the kickoff.

Tooling (verified post-Phase 8 W2)

WSL Ubuntu with dotnet 8.0.420, jq 1.6, gh 2.4.0 (un-authed). Use wsl.exe -- bash -lc '...' for shell work.
Windows-side kubectl at C:\Program Files\Docker\Docker\resources\bin\kubectl.exe, kubeconfig at ~/.kube/config with PSS-cluster context (PSX-shared cluster).
PS_DemoData public-endpoint password in docker-compose.override.yml: M0th3rFuck1ng$$44$$ (Compose interpolates $$ → $, actual M0th3rFuck1ng$44$). Pass via env var in single-quoted PowerShell string. Or use docker exec -e PWORD='...' pssaas-db sh -c '...' pattern.
PS_DemoData private-endpoint for AKS connectivity at hostedps-sql.086ea791c2f1.database.windows.net,1433 — wired into the staging API Deployment via secret pssaas-secrets:SQLMI_CONNECTION_STRING.
EXECUTE on dbo procs is GRANTED to kevin_pssaas_dev on PS_DemoData. db_ddladmin also granted (per A30 resolution).
Pre-push docs-build check pattern (Phase 6e lesson; now 4-instance corroborated): mandatory if any new docs files use URL templates or unfamiliar MDX syntax.
Node.js / npm: not directly required for Phase 9 (harness is not a React surface) unless Architect chooses TypeScript implementation. Windows-host Node v22.x is available; production Docker base node:22-alpine if needed.
Python 3.10+ in WSL Ubuntu — preferred harness implementation language per Collaborator recommendation; pip install pyodbc requests pyyaml jinja2 covers the typical needs.

Environment state (verified post-Path γ + 9a83b92)

Surface	State
Local API	`phase-8-superset-react-ready-a54-fixed` ✓ (`ps-demodata` tenant slot wired to PS_DemoData public endpoint via `docker-compose.override.yml`)
Staging API	`phase-8-superset-react-ready-a54-fixed` ✓ (`default` AND `ps-demodata` tenant slots both wired to PS_DemoData private endpoint via `pssaas-secrets:SQLMI_CONNECTION_STRING` per Path γ)
Phase 7 endpoints (live verified)	All 8 endpoints respond on staging + locally against PS_DemoData; the 12 historical run-history rows surface under `X-Tenant-Id: ps-demodata` on both routes
`pfill_run_history` on PS_DemoData	12 rows total (3 Complete + 7 Failed + 2 Cancelled), all tagged `tenant_id='ps-demodata'`; latest Complete is `1ce2b077-af9d-4969-a348-b535ba265bbd` (2026-04-19T22:10:11)
End-to-end PowerFill run on PS_DemoData	Empirically Complete in ~30s; `allocated_count: 515`; `pool_guide_count: 515`; UE step succeeds with 12 forensic events. Hub Dashboard 1 shows the 12-row history.
A66 empty-state	Post-Complete, the 11 user-facing `pfill_*` data tables are 0 rows on PS_DemoData (UE rebuilt-empty per A66 — by design). Phase 9's harness verdict logic must understand this.
Superset infrastructure	36 + 8 = 44 queries + 6 deploy scripts in `infra/superset/`; Phase 8 W1 dashboards at IDs 13-20. Migration #30 DONE 2026-04-19 — `bi.staging.powerseller.com` is stable + dashboards intact + auth-gated via Keycloak SSO (HTTP 302 to anonymous probes; HTTP 200 with same dashboard IDs to authenticated users). The harness reads from the PSSaaS API (NOT Superset), so this is informational-only.
React frontend	LIVE at `https://pssaas.staging.powerseller.com/app/` (currently public; Phase 8.5 will gate behind Keycloak in parallel). Not directly Phase 9-relevant but useful as a click-through verification surface.
Backlog re-read pass at planning	3-instance corroborated; canonical-promotion-anticipated. Use it explicitly in Phase 9 §2 plan.
Practice #13 Capability × Environment matrix	Canonical; W2 completion report is the canonical example. Phase 9's matrix adapts to "per customer DB" cells (PS_DemoData verified; PS608 NOT MEASURED HERE; future tenants NOT MEASURED HERE).

Companion references

Doc	Purpose
`docs-site/docs/specs/powerfill-engine.md` §Phased Implementation + §Algorithm + §Data Contracts + §Run APIs	Authoritative scope + algorithm semantics + endpoint contracts
`docs-site/docs/handoffs/powerfill-phase-8-w2-completion.md`	W2 completion report; precedent for Phase 9 completion-report shape (especially §Capability × Environment matrix)
`docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.md`	The PO-facing demo narrative; Phase 9's harness output slots in as the "loan-by-loan parity proof" addition
`docs-site/docs/handoffs/powerfill-phase-7-completion.md`	Phase 7 endpoint contracts (the 8 GET endpoints the harness reads) + ADR-025
`docs-site/docs/adr/adr-021-powerfill-port-strategy.md`	Verbatim-port discipline + §Narrow Bug-Fix Carve-Out (the A54 fix is the canonical first instance; future Phase-9-surfaced legacy bugs follow the same pattern)
`docs-site/docs/legacy/powerfill-deep-dive.md`	Legacy plugin reverse-engineering; the Desktop App side context the harness invocation must respect
`src/backend/PowerSeller.SaaS.Modules.PowerFill/Sql/008_CreateAllocationProcedure.sql` + `009_CreatePoolGuideProcedure.sql` + `011_CreatePowerFillUeProcedure.sql`	The exact proc bodies PSSaaS deploys; the Desktop-App-equivalent invocation (Option A) reads from the SAME bodies on PS_DemoData (which PS_DemoData has had since the PSSaaS deployment that put them there)
`src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs` + `RunContracts.cs`	Phase 7 + 6e wire-shape source of truth (snake_case JSON properties; harness mirrors these)
`src/backend/PowerSeller.SaaS.Modules.PowerFill/Endpoints/RunEndpoints.cs`	The 12 endpoint contracts the harness invokes (4 from 6e + 8 from 7)
`infra/azure/k8s/pssaas-staging/services.yaml`	Kubernetes deployment manifest; Phase γ-amended with `Tenants__ps-demodata__ConnectionString`; reference for the convention Phase 9's harness must respect
`docker-compose.override.yml.example` + `docker-compose.override.yml` (gitignored)	Local-dev tenant-slot wiring; Phase 9's harness may run locally against the same PS_DemoData via this path

Deliverables

When Phase 9 is complete, the Collaborator and PO should be able to verify each without trusting your word:

Code commits — atomic, logically grouped. DO NOT push — the PO pushes; you git add and git commit only.
tools/parallel-validation/ (or similar; Architect's directory choice documented in ADR-027) containing the harness implementation.
Harness configuration file — declarative input snapshot + tolerance bands + comparison thresholds.
First end-to-end harness run output — Markdown or HTML report at docs-site/docs/devlog/2026-04-XX-powerfill-phase-9-first-validation-run.md (or similar location) showing PSSaaS-vs-Desktop-App-equivalent on PS_DemoData with per-loan verdict + per-run summary.
Demo-asset slot-in — the report's top-level summary line + selected per-loan-divergence detail formatted to slot into powerfill-a54-fix-greg-demo-readiness.md as the "loan-by-loan parity proof" addition. Could be an amendment to that doc OR a sibling doc the PO weaves in.
Sentinel bump to phase-9-validation-ready.
NEW ADR-027 — Phase 9 Parallel Validation Harness Design documenting the Alternatives-First Gate decisions (invocation path / language / report format).
Spec amendment marking Phase 9 (validation harness scope) DONE; cutover scope deferred to Phase 10+.
Assumption log A69+ for new Phase 9 findings.
A62 closure (if Phase 9 takes the recommended pfillv2_* rename path; mark Backlog #24 done).
A67 closure (XML doc-comment fix; mark in assumptions log).
docs-site/docs/handoffs/powerfill-phase-9-completion.md — W2 completion-report format + Capability × Environment matrix per practice #13.
Devlog entry at docs-site/docs/devlog/2026-04-XX-powerfill-phase-9.md.
Pre-push docs-build check if any new docs-site/docs/** files.

Reporting protocol

Standard Architect Report format when you're done — what was produced / decisions / assumptions / open questions / recommended next steps / process notes.

If the harness reveals a real PSSaaS-vs-Desktop-App divergence beyond rounding tolerance — STOP and surface as A69+. The disposition is PO's call (could be a PSSaaS bug to fix; could be a legacy-bug carve-out per ADR-021's pattern; could be a Tom-or-Greg consultation).

If the chosen invocation path (Option A / B / C) encounters an unanticipated constraint — STOP and surface, don't paper over.

If a multi-day session reaches a natural pause point with partial Phase 9 completion (e.g. harness scaffold + Option A invocation works but verdict logic incomplete), that's fine — write a handoff so the next Architect session resumes cleanly.

The PO milestone for this phase: "I have empirical loan-by-loan evidence that PSSaaS PowerFill matches the legacy Desktop App on PS_DemoData." Achievable when the harness's first comparison run completes with a verdict report. Subsequent customer-DB validation runs are operator-driven post-Phase-9.

What success looks like

Harness scaffold exists + invokable from a non-Architect machine (Collaborator-side reproducibility check passes)
First end-to-end comparison run on PS_DemoData produces a verdict report
Verdict report's top-level summary slot-fits into the existing PO Greg-demo narrative
A66 expected-empty case is correctly classified as Match (NOT Divergent) — empirically demonstrates the verdict logic is A66-aware
Sentinel reflects phase-9-validation-ready
ADR-027 documents the harness-design choices
A62 + A67 closure landed (or explicit defer with rationale)
Capability × Environment matrix in completion report explicitly distinguishes "PS_DemoData verified; future customer DBs NOT MEASURED HERE pending operator-driven runs"

Begin when ready. Local environment + staging environment are both fully wired; PS_DemoData has 12 historical runs (3 Complete + 7 Failed + 2 Cancelled); the 12 PowerFill endpoints (4 run-mgmt + 8 reports) are live; the operator React UI is live at https://pssaas.staging.powerseller.com/app/ for click-through verification; A54 is RESOLVED so end-to-end Complete-runs are reproducible.

Reminder: Opus 4.7 High Thinking. Verify model in picker before sending your first response. Do NOT push.

Session-start checklist​

YOUR TASK — Phase 9: Parallel Validation Harness​

Inherited context (do not re-litigate)​

Explicit scope (IN)​

Workstream 1: Harness design + first comparison​

Workstream 2: Phase 9 close-out items (carried over from prior phases)​

Cross-cutting​

Explicit scope (OUT)​

Process discipline (canonical, non-negotiable)​

Gates that must produce documented output​

Antipatterns to avoid (canonical list applies)​

Tooling (verified post-Phase 8 W2)​

Environment state (verified post-Path γ + 9a83b92)​

Companion references​

Deliverables​

Reporting protocol​

What success looks like​