Skip to main content

PSSaaS Architect — Phase 9: Parallel Validation Against Desktop App

Role: PSSaaS Systems Architect Phase: PowerFill Phase 9 (Parallel Validation + Tom/Greg Critique) — empirical proof that PSSaaS PowerFill produces per-loan-equivalent allocations to the legacy Desktop App on the same input data Date dispatched: TBD (next Architect session; can dispatch in parallel with Phase 8.5 work since the surfaces don't overlap) Model required: Opus 4.7 High Thinking — verify in the Cursor model picker before responding. If running under any other model, STOP and escalate. Estimated effort: Spec line 651 ("Phase 9 — Validation + Tom/Greg critique + cutover — 2-3 weeks"). Phase 9 as scoped here is the validation harness + first comparison run + first Tom/Greg review surface, NOT cutover (that's Phase 10+). Realistic estimate: 3-5 Architect-sessions for the harness + first end-to-end comparison + the report-generation surface; subsequent customer-DB validation runs are operator-driven (Phase 9 builds the lever; the PO + customer reps pull it).

Predecessor work: Phase 8 fully COMPLETE as of 2026-04-19 — both workstreams + the A54 fix + the F-W2-PSD-1 / Path γ tenant-slot fix all shipped. The 6-step PowerFill orchestration runs end-to-end-Complete on PS_DemoData via staging API; the React UI surface at https://pssaas.staging.powerseller.com/app/ exposes the operator workflow (auth deferred to Phase 8.5; staging is currently public). Phase 9 is the next sequential phase per spec, dispatched in parallel with Phase 8.5 (auth + Superset embedding) which has its own kickoff queued behind PSX-Infra-side work (#30 + #31 in session-handoff Backlog).

FOUR substantive context updates since the W2 kickoff (f4531ae) that Phase 9 must internalize:

  1. End-to-end Complete-run on PS_DemoData empirically achievable post-A54-fix (cf8ef8b); 12+ historical runs in pfill_run_history (3 Complete + 7 Failed + 2 Cancelled) provide a baseline corpus for the harness to compare against.
  2. A66 + A65 banked observations are now load-bearing for Phase 9's harness design — A65 directly names the harness as the proving-ground for "do the multi-pa_key + settlement-date-variance triggers fire on this customer DB?" probes; A66 names the post-Complete empty-output behavior on syn-trade-empty datasets like PS_DemoData (Phase 9 must distinguish "PSSaaS produced empty because UE rebuild-empty per A66" from "PSSaaS produced empty because of a bug" — the harness's verdict logic must understand both).
  3. A68 added (tenant-id-vs-config-slot conflation as design wart). Phase 9's harness will be a third writer into pfill_run_history (the harness itself, possibly running on AKS or a dev workstation) — the convention used must agree with the existing 'ps-demodata' tag, OR Phase 8.5 closure (long-term decoupling of TenantId from connection-string-slot routing) must land first. Currently the safer assumption is "convention-agree with existing rows" (write tenant_id='ps-demodata'); fold-into-Phase-8.5 is the long-term answer.
  4. Phase 8.5 (NEW) is queued in parallel — PSSaaS joins ecosystem auth (Keycloak via oauth2-proxy) + replaces "View in Superset" anchor links with embedded SDK. Surfaces don't overlap with Phase 9 (Phase 9 is harness + comparison + reports; 8.5 is auth + embed). Phase 9 should NOT make the harness depend on auth being landed — the harness should be runnable today against unauth staging, and remain runnable against Phase 8.5's auth-protected staging by adding an OIDC-token mint step in the harness invocation. Build the harness assuming it'll need to grow that step but doesn't yet.

Per the PO's strategic frame (re-stated for Architect awareness, since Phase 9 ships the load-bearing demo asset): the PSSaaS migration thesis is "chunk-by-chunk extraction from the Desktop App, with each chunk verifiably matching legacy behavior loan-by-loan against real customer data." Phase 9's harness is the empirical proof of that thesis for chunk #1 (PowerFill). Without it, the Greg demo is "look at our cool new UI"; with it, the Greg demo is "here is empirical evidence the chunk-extraction model works." This framing is in the PO-written A54-fix completion report (docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.md §"Bug as Feature" demo narrative); Phase 9's harness output should slot into that demo narrative as the "and here's loan-by-loan parity proof" slide.


Session-start checklist

Read these in this order before doing anything else:

  1. CLAUDE.md — project identity, role-identification procedure, push-is-an-ask convention
  2. AGENTS.md — agent memory: principles, lessons, F-PSD findings summary
  3. docs-site/docs/agents/architect-context.md — your role definition
  4. docs-site/docs/agents/process-discipline.md — canonical practices, gates, antipatterns. Per banked observation 2026-04-19 (commit 8dba7b4 message), the next revision is likely to add "Subagent Output Defended Beyond Scope" antipattern + a Backlog re-read pass canonical promotion + a refinement of practice #13 cell-granularity ("does artifact X get written to with the same convention from every environment that can write it?"). Use the latest committed version of process-discipline.md as authoritative, but anticipate these refinements landing during Phase 9 if the next discipline-doc revision ships in parallel.
  5. docs-site/docs/agents/handoff-prompts.md — Templates for delegation
  6. docs-site/docs/handoffs/pssaas-session-handoff.md — current state. Re-read the Backlog table at planning time per the trigger-based countermeasure shape (canonical commit 10133c6) — Backlog re-read pass at planning-start now has 3-instance corroboration (F-7-7 anticipated; F-8-BR-1 caught; F-W2-CONTRACT-1 + F-W2-BR-3 caught) and the pattern-recognition is canonical-adoption-ready. Use it explicitly in §2 of your plan.
  7. docs-site/docs/specs/powerfill-engine.md — full spec. Phase 9-relevant sections: §Phased Implementation (line ~651), §Algorithm (the per-stage semantics A1 documents, which Phase 9's harness validates per-loan), §Data Contracts (the 8 Phase 7 endpoint shapes are the harness's read surface), §Run APIs (the operator-grade run-mgmt surface from Phases 6e/7).
  8. docs-site/docs/handoffs/powerfill-phase-8-w2-completion.md — most recent completion report. The §Capability × Environment matrix demonstrates practice #13 application; Phase 9's harness completion report should produce the same matrix shape, but with cells about per-customer-DB validation runs instead of per-environment deploy verifications (per Architect's W2 recommendation #2: "Phase 9 should add a parallel-validation harness 'data-shape compatibility' pre-flight per A65 + A66 + the W2 environment matrix").
  9. docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.md — the PO-facing Greg-demo narrative. Phase 9's harness output is the load-bearing slide of that narrative. The harness's report format should slot into the §"Bug as Feature" 5-slide structure, specifically as the "and here's loan-by-loan parity proof against real customer data" addition.
  10. docs-site/docs/handoffs/powerfill-phase-7-completion.md — Phase 7 endpoint contracts (the 8 GET endpoints the harness reads from). Per A67 + F-W2-CONTRACT-1: the canonical wire-shape is RunEndpoints.cs MapGet/MapPost registrations + the OpenAPI document at /api/powerfill/swagger/, NOT the ReportContracts.cs XML doc-comments (which still claim a /reports/ path segment that doesn't exist).
  11. docs-site/docs/legacy/powerfill-deep-dive.md + docs-site/docs/legacy/n_cst_powerfill.sru (in raw legacy source) — the Desktop App side of the parallel validation. The harness invokes the Desktop App's PowerFill plugin path; the Architect needs a working understanding of how the Desktop App is invoked headlessly OR via PowerBuilder script-mode OR via direct T-SQL exec of psp_powerfill_* against PS_DemoData (skipping the PB front-end entirely; the PB layer mostly orchestrates the same psp_powerfill_* calls PSSaaS now wraps).
  12. docs-site/docs/specs/powerfill-assumptions-log.md — A1 (revised; Phase 9's per-loan correctness validation is the gate per A1's Banking note); A28 + A37 (RESOLVED); A38 (RESOLVED); A41-A45; A47-A58; A60 (latest-Complete-wins; affects harness's read-side semantics); A61; A62 (PS_DemoData view drift; Phase 9 close-out per Backlog #24); A63-A64 (Phase 8 W1; A64 platform-tailwind note about multi-tenant Superset registration becoming easier under platform-Superset); A65 (Phase 9 directly named: harness must probe multi-pa_key + settlement-date variance triggers on each customer DB); A66 (NEW; harness's verdict logic must distinguish UE-rebuild-empty from buggy-empty); A67 (cosmetic XML doc fix, harness can serve as the forcing function to align it); A68 (NEW 2026-04-20; harness convention must agree with existing 'ps-demodata' tag OR fold the long-term decoupling into Phase 8.5).
  13. docs-site/docs/devlog/ — most recent: 2026-04-19g-powerfill-phase-8-w2.md is the W2 ship; expect a 2026-04-19h-powerfill-phase-9-kickoff.md as your devlog at session end.

After reading those, acknowledge your role and proceed.


YOUR TASK — Phase 9: Parallel Validation Harness

Per the spec's Phase 9 row + every prior phase's "Phase 9 parallel-validation will exercise this" carry-over (A1, A47, A53, A54-historical, A56-historical, A58, A65, A66 directly named), Phase 9 ships:

Harness: A reproducible tool that takes a frozen input snapshot, runs PowerFill in PSSaaS (via the staging API or a local equivalent), runs PowerFill in the Desktop App equivalent path (likely direct T-SQL exec of the legacy procs against the same DB, skipping the PB front-end), and produces a per-loan-pool-allocation diff report with a verdict (Match / TolerableDiff / Divergent) per loan + a summary stat per run.

First comparison: One end-to-end harness run against PS_DemoData with a documented input snapshot, producing a comparison report demonstrating the harness works. NOT a multi-customer-DB sweep; that's operator-driven post-Phase-9.

Demo asset: The harness's first-run output report formatted to slot into the existing PO-written powerfill-a54-fix-greg-demo-readiness.md "Bug as Feature" 5-slide demo narrative as the "loan-by-loan parity proof" addition.

Phase 9 has no React UI work (auth + embedding is Phase 8.5, dispatched in parallel) and no production cutover work (that's Phase 10+). Phase 9's surface is the harness + first comparison + the report shape the PO can use with Greg.

Inherited context (do not re-litigate)

TopicState as of 9a83b92
Phase 7 / Phase 8 W1 / Phase 8 W2 / A54 fix / Path γ / A68 banked / Backlog #30 + #31 / A64+A68 platform-tailwindsAll COMPLETE as of HEAD; sentinel phase-8-superset-react-ready-a54-fixed; staging at https://pssaas.staging.powerseller.com/ (currently public; Phase 8.5 will gate behind Keycloak in parallel with Phase 9)
End-to-end Complete-run on PS_DemoDataEmpirically achievable in ~30s post-A54-fix; 12+ rows in pfill_run_history (3 Complete + 7 Failed + 2 Cancelled); the canonical tagged-tenant_id='ps-demodata' row population per the local-route convention
6-step orchestrationAll 6 steps (BX cash-grids → BX settle-and-price → candidates → conset → pool_guide → UE) exercise green end-to-end on PS_DemoData
Existing endpoints (which the harness reads from)POST /run (202; the harness triggers the PSSaaS run via this), GET /runs/{id} (the harness polls via this for run completion), 8 Phase 7 GET /runs/{id}/<report> (the harness reads outputs via these)
A1 per-stage allocation semanticsDocumented in the assumptions log; Phase 9's per-loan correctness validation is the gate (per A1's Banking note "the legacy proc body deploys verbatim per ADR-021; per-stage-semantic correctness validation against Desktop App output is the Phase 9 parallel-validation gate")
A65 — multi-pa_key + settlement-date-varianceTwo distinct A54 triggers; both fire on PS_DemoData. The harness's pre-flight should explicitly probe both on each tenant DB it validates against (per the W2 Architect's Phase 9 Recommendation #2)
A66 — UE clears + rebuilds-empty on syn-trade-empty datasetsThe harness's verdict logic must distinguish "PSSaaS empty because UE rebuild-empty per A66" (Match if Desktop App also produces empty under the same path) from "PSSaaS empty because of a bug" (Divergent). Hub Dashboard 1 (run history) is the canonical proof-of-life on PS_DemoData; the user-facing report tables are 0-row by design on this dataset
A68 — tenant-id-vs-config-slot conflationThe harness will be a third writer into pfill_run_history. Currently safest convention: write tenant_id='ps-demodata' (matches existing 12 rows on PS_DemoData). Long-term: fold the decoupling into Phase 8.5 per A68's platform-tailwind note. Phase 9 should NOT re-tag existing rows; it should adopt the existing convention.
Backlog #30 — Superset → pss-platform migrationDONE 2026-04-19 (PSX Infra completed during Phase 9 dispatch window; ~3 min cutover). Hostname unchanged at bi.staging.powerseller.com; all 20 dashboards / 56 charts / 77 datasets preserved (pg_dump immediately pre-cutover); same Keycloak SSO + same superset OIDC client + same admin credentials; both data sources (PSX postgres cross-namespace via FQDN + SQL MI via VNet peering) still work; same Superset Python image SHA so pymssql still installed. Old psx-staging Superset pod scaled to 0 with deployment retained for 24-hour rollback insurance. Phase 9 implication: NONE. The base-URL paranoia from this kickoff's draft no longer applies — bi.staging.powerseller.com is the stable forever URL. Hard-coded psx-staging namespace references in our codebase need updating (Collaborator-side sweep in flight); the Architect can ignore the namespace for harness purposes since the harness reads from the PSSaaS API, not Superset directly.
Backlog #31 — Phase 8.5 (ecosystem auth + embedded Superset)Queued in parallel with Phase 9. Not a Phase 9 prerequisite. Phase 9's harness should be runnable today against unauth staging, and remain runnable against Phase 8.5's auth-protected staging by adding an OIDC-token mint step in the harness invocation. Build the harness assuming it'll grow that step but doesn't yet.
W2 React UILive at https://pssaas.staging.powerseller.com/app/. Not directly Phase 9-relevant (the harness is operator-driven, not UI-driven), but the UI's Hub dashboard link + run-status page provide a useful click-through for Phase 9 reviewers wanting to see "did this PSSaaS run actually Complete?" without curling.
Canonical Claim-vs-Evidence family + practice #13Live (commits 95084cb + 10133c6 + d4a70af + 863c139 + 4b08b51). Most relevant for Phase 9: practice #13 — the harness's first-run completion report MUST produce a Capability × Environment matrix with explicit "verified vs not measured here" per cell. The W2 completion report's matrix is the canonical example to copy; Phase 9 adapts it to "per customer DB" cells (PS_DemoData verified; PS608 customer DB NOT MEASURED HERE pending customer-rep approval; future tenants NOT MEASURED HERE).
W2 process observation banked but not yet canonical"Subagent Output Defended Beyond Scope" antipattern (the PS608-tenant-dropdown scope drift); "convention conflation under low-corroboration count" (A68 root pattern); "Single-Probe Confidence" (the PSX Infra falsification + the embedded-SDK-OFF claim). Phase 9 may surface more instances; bank as observations in the completion report's Counterfactual Retro, don't pre-litigate canonical adoption.

Explicit scope (IN)

Workstream 1: Harness design + first comparison

  • Architectural decision (Alternatives-First Gate): how to invoke the Desktop App side of the comparison? Three candidate options:
    • (A) Direct T-SQL exec of the legacy psp_powerfill_conset + psp_powerfill_pool_guide + psp_powerfillUE procs against PS_DemoData via sqlcmd, skipping the PowerBuilder front-end entirely. The n_cst_powerfill.sru is mostly orchestration over these procs anyway. Lowest friction; matches what we already do for ad-hoc PoC runs.
    • (B) PowerBuilder headless invocation via PB's command-line / script-mode if available — gives true Desktop-App-equivalent path including any PB-side state setup. Higher friction; uncertain whether headless PB invocation is straightforward.
    • (C) Snapshot-then-compare — take a PS_DemoData snapshot of the relevant pfill_* tables BEFORE Desktop App runs, have a human (or the PO) trigger Desktop App via the normal UI, take a snapshot AFTER, compare PSSaaS-against-the-pre-snapshot vs Desktop-App-against-the-pre-snapshot. Trades headless invocation difficulty for human-in-the-loop overhead.
    • Recommend: (A) for the first harness instance; document (B)/(C) as future extensions in the Phase 9 ADR. A is the direct-comparison path: PSSaaS (via staging API) and Desktop App equivalent (via direct sqlcmd) both touch the same procs against the same DB; the diff is between whatever the procs produce in the two invocations. Note: (A) trades "exact Desktop App path" for "exact legacy proc body path" — A1's Banking note specifies the legacy proc body is the canonical contract per ADR-021.
    • Document choice in NEW ADR-027 (Phase 9 Parallel Validation Harness Design).
  • NEW tools/parallel-validation/ directory (or wherever fits the repo layout — Architect's decision; document in ADR-027). Contains:
    • The harness invoker (Python? .NET console app? PowerShell? — Alternatives-First Gate decision; recommend Python for sqlcmd + JSON manipulation + report rendering ergonomics, but Architect can defend an alternative).
    • A harness_config.yaml (or similar) declaring the input snapshot + the comparison thresholds + the per-column tolerance bands.
    • The output-report renderer (Markdown or HTML for the PO to share with Greg + Tom).
  • Input-snapshot capture mechanism — what gets frozen? At minimum:
    • The pfill_run_history options_json from the reference run (so the harness re-runs with identical options).
    • The DB state checksum (e.g. loan + pscat_* + pfill_constraints + pfill_carry_cost row counts + a hash of loan.id+loan.note_rate for the relevant pipeline subset).
    • The reference timestamp (so PSSaaS's start_date_default derivation matches exactly per Q9).
  • PSSaaS side invocation:
    • Trigger via POST /run with the resolved options from the input snapshot.
    • Poll GET /runs/{id} for terminal state (Complete / Failed / Cancelled).
    • Read all 8 Phase 7 reports.
    • Optionally: also direct-query pfill_pool_guide etc. for cross-checks the report APIs don't cover.
  • Desktop App side invocation (assuming Option A):
    • Run the same 6-step proc sequence directly via sqlcmd against the SAME DB but possibly different pa_key / scratch-table namespace to avoid colliding with the in-flight PSSaaS run. Architect must figure out the scratch-table isolation pattern (##cte_* global temp tables are session-scoped; consider running PSSaaS and Desktop-App-equivalent in separate sqlcmd sessions).
    • Capture the same 8 report-shape outputs from the post-run state.
  • Diff engine + verdict logic:
    • Per-loan comparison across the relevant report shapes (Pooling Guide, Cash Trade Slotting, Recap, Pool Candidates, Switching, Kickouts, Existing Disposition, Guide).
    • Per-column tolerance (e.g. price comparisons within $0.001; carry-cost within rounding tolerance per A36; row-count exact equality for inclusion/exclusion lists).
    • Verdict per loan: Match / TolerableDiff / Divergent.
    • Verdict per run: aggregate stats (% Match / % TolerableDiff / % Divergent; absolute Divergent count).
    • A66-aware: if PSSaaS produces 0 rows on a syn-trade-empty dataset AND the Desktop-App-equivalent path produces 0 rows on the same dataset, that's Match (NOT Divergent). The verdict logic must be aware that "both produce empty for the same A66 reason" is the right answer.
  • Output report shape:
    • Markdown (or HTML) artifact suitable for inclusion in a Greg-demo deck.
    • Top of report: per-run summary verdict (e.g. "PSSaaS-vs-Desktop-App on PS_DemoData run <harness-run-id>: 514/515 loans Match; 1 TolerableDiff (rounding); 0 Divergent. Time: 30s PSSaaS + 28s Desktop-App-equivalent.").
    • Per-section breakdown by report type.
    • Per-Divergent-loan detail with the specific column(s) that diverged, the PSSaaS value, the Desktop-App value, and the magnitude of the divergence.
    • Slot-into-PO-demo: the report's top-level summary line is what slots into the existing powerfill-a54-fix-greg-demo-readiness.md "Bug as Feature" demo as the "loan-by-loan parity proof" slide.

Workstream 2: Phase 9 close-out items (carried over from prior phases)

The accumulated "close at Phase 9" carry-overs from prior phases:

  • A62 closure (PS_DemoData view drift): per Backlog #24, deploy 002_CreatePowerFillViews.sql to PS_DemoData OR rename PSSaaS view to pfillv2_*. Phase 9 is the natural close point (the harness will exercise the existing-disposition endpoint; if A62 is open, the harness emits a Note-handling carve-out for that endpoint). Recommend: rename PSSaaS view to pfillv2_* (less risk than blind-overwriting an encrypted legacy view; matches the PSSaaS-namespacing convention).
  • A67 closure (ReportContracts.cs XML doc-comment Truth Rot): the harness's read surface IS the canonical contract; Phase 9 should fix the XML docs to match the actual RunEndpoints.cs routes. Trivial 8-line edit; satisfies the W2 Architect's Recommendation #4 to use Phase 9 as the forcing function.
  • Process-discipline observations from W2-deploy session (banked but not yet canonical): "Subagent Output Defended Beyond Scope" + "convention conflation under low-corroboration count" + "Single-Probe Confidence". Phase 9's Counterfactual Retro should reference these; if Phase 9 surfaces additional instances of any, that's the canonical-promotion forcing function.

Cross-cutting

  • Status sentinel bump to phase-9-validation-ready (preserves the phase-N-<short-name> pattern; do NOT carry the -a54-fixed sub-suffix forward into Phase 9 — A54 closure is now historical, not gating).
  • Spec amendment to docs-site/docs/specs/powerfill-engine.md — Phase 9 row in §Phased Implementation table marks "validation harness DONE; first comparison run DONE; cutover not in scope".
  • Assumptions log additions — A69+ for new Phase 9 findings.
  • NEW ADR-027 — Phase 9 Parallel Validation Harness Design (mandatory if harness lands this session).
  • Pre-push docs-build check per Phase 6e/7/8-W1/8-W2 banked discipline: docker build -f docs-site/Dockerfile.prod docs-site before push if any new docs-site/docs/** files created.

Explicit scope (OUT)

  • Multi-customer-DB sweep — operator-driven post-Phase-9 (Phase 9 builds the lever; PO + customer reps pull it for PS608 / future tenants once customer-rep approval lands).
  • Production cutover — Phase 10+ (the spec's "cutover" wording in the Phase 9 row is a 2026-03 spec-line assumption that's been superseded by the 8.5 + 9 + 10 split; Phase 9 ships validation only).
  • Auth integration — Phase 8.5.
  • Superset embedded SDK — Phase 8.5.
  • React UI changes for harness output viewing — out of scope; the harness output is a Markdown/HTML artifact that the PO views via the docs site OR shares as a leave-behind.
  • New PowerFill API endpoints — Phase 6e + 7 closed those; Phase 9 reads from the existing surface.
  • Real-time monitoring dashboard for harness runs — out of scope; Phase 9 produces one-shot reports.
  • Performance tuning of PowerFill itself — out of scope unless the harness reveals a per-loan correctness issue rooted in a perf shortcut.
  • Re-tagging existing pfill_run_history rows to a new tenant_id convention — A68's long-term decoupling lives in Phase 8.5; Phase 9 adopts the existing 'ps-demodata' convention.
  • A54 fix re-litigation — RESOLVED 2026-04-19; if the harness reveals A54 is fired by a customer DB in a way the fix doesn't address, that's an A65 follow-up filed against the customer DB, NOT a re-opening of A54's fix.

Process discipline (canonical, non-negotiable)

Gates that must produce documented output

GateWhere to applyWhat "documented output" means
Three-layer Primary-Source Verification Gate (now 3-instance corroborated; canonical-promotion-anticipated)Spec-vs-implementation: verify Phase 7 endpoint contracts match what the harness's HTTP-client layer assumes. NVO-vs-implementation: verify the Desktop-App-equivalent invocation (Option A or B or C) actually exercises the same proc body PSSaaS does. Implementation-vs-runtime: re-read session-handoff Backlog table during planning; F-W2-CONTRACT-1 + F-W2-BR-3 caught at W2 planning are canonical evidence this catches issues before the PoC.A Phase 9 plan §2 findings table per layer + explicit Backlog re-read pass log per row.
Alternatives-First GateAt least 3 architectural decisions: (a) Desktop-App invocation path (A / B / C above); (b) harness implementation language (Python / .NET / PowerShell / TypeScript); (c) verdict-rendering format (Markdown / HTML / JSON-with-static-renderer).A Phase 9 plan §3 alternatives section per decision; ADR-027 for the harness-design choices.
Required Delegation CategoriesHeavily delegable: per-report diff logic (one delegated subagent per report shape = up to 8 micro-deliverables); the harness invoker scaffold; the verdict-renderer template. Self-implement: the architectural-contract-per-artifact load-bearing parts — Desktop-App invocation pattern + the "is this a real divergence or an A66 expected-empty?" verdict logic + the snapshot-capture mechanism.A Phase 9 plan §8 delegation inventory with subagent prompts AND Deliberate Non-Delegation justifications per practice #9.
Reviewable Chunks at intra-session scopeConsider checkpointing after the harness scaffold lands + first end-to-end comparison run (against a known-good PS_DemoData input) before producing the full per-report diff output.If checkpointing, send a plan-stage Architect Report after the first end-to-end run.
Deploy Verification GateArm (a) sentinel = phase-9-validation-ready. Arm (b) harness invocable from a non-Architect machine (Collaborator-side reproducibility check). Arm (c) end-to-end harness run against PS_DemoData produces a verdict report.A Phase 9 completion report Markdown citing screenshots / report excerpts + the per-tenant-DB Capability matrix per practice #13.
Counterfactual RetroAt session endA retro section. Phase 8 W2 banked 7 observations including "Backlog re-read pass IS canonical-adoption-ready at 3-instance corroboration"; Phase 9 should report whether the practice continues to pay off (or if 4-instance corroboration justifies pulling the trigger on canonical promotion).

Antipatterns to avoid (canonical list applies)

  • Phase-0 Truth Rot — A57 was qualified by A59; the Backlog re-read pass at planning-start has been the empirical safeguard. Phase 9's harness design WILL surface contract-vs-implementation drift if any exists; embrace it as forcing function for A62 + A67 closure.
  • Empirical-Citation Type Mismatch (Phase 5 origin) — when reading from the 8 Phase 7 endpoints in the harness, use the actual JSON property names from ReportContracts.cs (snake_case via [JsonPropertyName]), NOT the C# property names (PascalCase). The OpenAPI / Swagger UI at http://pssaas.powerseller.local/api/swagger/ is the canonical wire-shape reference; consider auto-generating types from it (or manually mirroring as the W2 React UI does).
  • Verification Avoidance (Phase 4 origin)dotnet build (if any backend changes) + the harness's own test surface (if any) before declaring complete; the harness's first end-to-end comparison run on PS_DemoData IS the integration test.
  • Ghost Deploy (PSX origin) — the harness is operator-driven, not deployed-as-a-service, so this antipattern doesn't directly apply BUT the harness's invocation pattern should support content-match verification (e.g. the harness's output report should include the exact PSSaaS sentinel + Desktop-App-proc-version it ran against, so the PO can verify in seconds whether the report is from the right code).
  • Delegation Skip (Phase 4 origin) — per-report diff logic is the heaviest delegation candidate; architectural-contract-per-artifact decisions (ADR-027, the verdict semantics, the snapshot-capture mechanism) are yours to self-implement.
  • Capability Inflation (Phase 8 W1 / Claim-vs-Evidence family) — the harness's first comparison run produces a result on PS_DemoData ONLY. Do NOT extend that to "validates PowerFill on customer data" — that's a Capability Inflation framing. The honest claim is "validates PowerFill behavior on PS_DemoData against the legacy proc body when invoked via [chosen Option]". Customer DB validation is post-Phase-9 operator work.
  • Capability Drift (Claim-vs-Evidence family) — if the W2 React UI's tenant-picker constants drift between sessions (e.g. someone re-introduces PS608), the harness assumptions could quietly become wrong. Phase 9's harness should explicitly verify the tenant-id convention agrees with pfill_run_history row tags before invoking PSSaaS — a 1-query pre-flight that surfaces A68-class drift early.
  • Subagent Output Defended Beyond Scope (banked but not yet canonical; W2 origin) — when reviewing delegated diff-logic subagent output, the explicit first question is "is this what the kickoff asked for?" before any disposition framing. Output additions outside scope are removed unless re-justified against the kickoff.

Tooling (verified post-Phase 8 W2)

  • WSL Ubuntu with dotnet 8.0.420, jq 1.6, gh 2.4.0 (un-authed). Use wsl.exe -- bash -lc '...' for shell work.
  • Windows-side kubectl at C:\Program Files\Docker\Docker\resources\bin\kubectl.exe, kubeconfig at ~/.kube/config with PSS-cluster context (PSX-shared cluster).
  • PS_DemoData public-endpoint password in docker-compose.override.yml: M0th3rFuck1ng$$44$$ (Compose interpolates $$$, actual M0th3rFuck1ng$44$). Pass via env var in single-quoted PowerShell string. Or use docker exec -e PWORD='...' pssaas-db sh -c '...' pattern.
  • PS_DemoData private-endpoint for AKS connectivity at hostedps-sql.086ea791c2f1.database.windows.net,1433 — wired into the staging API Deployment via secret pssaas-secrets:SQLMI_CONNECTION_STRING.
  • EXECUTE on dbo procs is GRANTED to kevin_pssaas_dev on PS_DemoData. db_ddladmin also granted (per A30 resolution).
  • Pre-push docs-build check pattern (Phase 6e lesson; now 4-instance corroborated): mandatory if any new docs files use URL templates or unfamiliar MDX syntax.
  • Node.js / npm: not directly required for Phase 9 (harness is not a React surface) unless Architect chooses TypeScript implementation. Windows-host Node v22.x is available; production Docker base node:22-alpine if needed.
  • Python 3.10+ in WSL Ubuntu — preferred harness implementation language per Collaborator recommendation; pip install pyodbc requests pyyaml jinja2 covers the typical needs.

Environment state (verified post-Path γ + 9a83b92)

SurfaceState
Local APIphase-8-superset-react-ready-a54-fixed ✓ (ps-demodata tenant slot wired to PS_DemoData public endpoint via docker-compose.override.yml)
Staging APIphase-8-superset-react-ready-a54-fixed ✓ (default AND ps-demodata tenant slots both wired to PS_DemoData private endpoint via pssaas-secrets:SQLMI_CONNECTION_STRING per Path γ)
Phase 7 endpoints (live verified)All 8 endpoints respond on staging + locally against PS_DemoData; the 12 historical run-history rows surface under X-Tenant-Id: ps-demodata on both routes
pfill_run_history on PS_DemoData12 rows total (3 Complete + 7 Failed + 2 Cancelled), all tagged tenant_id='ps-demodata'; latest Complete is 1ce2b077-af9d-4969-a348-b535ba265bbd (2026-04-19T22:10:11)
End-to-end PowerFill run on PS_DemoDataEmpirically Complete in ~30s; allocated_count: 515; pool_guide_count: 515; UE step succeeds with 12 forensic events. Hub Dashboard 1 shows the 12-row history.
A66 empty-statePost-Complete, the 11 user-facing pfill_* data tables are 0 rows on PS_DemoData (UE rebuilt-empty per A66 — by design). Phase 9's harness verdict logic must understand this.
Superset infrastructure36 + 8 = 44 queries + 6 deploy scripts in infra/superset/; Phase 8 W1 dashboards at IDs 13-20. Migration #30 DONE 2026-04-19bi.staging.powerseller.com is stable + dashboards intact + auth-gated via Keycloak SSO (HTTP 302 to anonymous probes; HTTP 200 with same dashboard IDs to authenticated users). The harness reads from the PSSaaS API (NOT Superset), so this is informational-only.
React frontendLIVE at https://pssaas.staging.powerseller.com/app/ (currently public; Phase 8.5 will gate behind Keycloak in parallel). Not directly Phase 9-relevant but useful as a click-through verification surface.
Backlog re-read pass at planning3-instance corroborated; canonical-promotion-anticipated. Use it explicitly in Phase 9 §2 plan.
Practice #13 Capability × Environment matrixCanonical; W2 completion report is the canonical example. Phase 9's matrix adapts to "per customer DB" cells (PS_DemoData verified; PS608 NOT MEASURED HERE; future tenants NOT MEASURED HERE).

Companion references

DocPurpose
docs-site/docs/specs/powerfill-engine.md §Phased Implementation + §Algorithm + §Data Contracts + §Run APIsAuthoritative scope + algorithm semantics + endpoint contracts
docs-site/docs/handoffs/powerfill-phase-8-w2-completion.mdW2 completion report; precedent for Phase 9 completion-report shape (especially §Capability × Environment matrix)
docs-site/docs/handoffs/powerfill-a54-fix-greg-demo-readiness.mdThe PO-facing demo narrative; Phase 9's harness output slots in as the "loan-by-loan parity proof" addition
docs-site/docs/handoffs/powerfill-phase-7-completion.mdPhase 7 endpoint contracts (the 8 GET endpoints the harness reads) + ADR-025
docs-site/docs/adr/adr-021-powerfill-port-strategy.mdVerbatim-port discipline + §Narrow Bug-Fix Carve-Out (the A54 fix is the canonical first instance; future Phase-9-surfaced legacy bugs follow the same pattern)
docs-site/docs/legacy/powerfill-deep-dive.mdLegacy plugin reverse-engineering; the Desktop App side context the harness invocation must respect
src/backend/PowerSeller.SaaS.Modules.PowerFill/Sql/008_CreateAllocationProcedure.sql + 009_CreatePoolGuideProcedure.sql + 011_CreatePowerFillUeProcedure.sqlThe exact proc bodies PSSaaS deploys; the Desktop-App-equivalent invocation (Option A) reads from the SAME bodies on PS_DemoData (which PS_DemoData has had since the PSSaaS deployment that put them there)
src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs + RunContracts.csPhase 7 + 6e wire-shape source of truth (snake_case JSON properties; harness mirrors these)
src/backend/PowerSeller.SaaS.Modules.PowerFill/Endpoints/RunEndpoints.csThe 12 endpoint contracts the harness invokes (4 from 6e + 8 from 7)
infra/azure/k8s/pssaas-staging/services.yamlKubernetes deployment manifest; Phase γ-amended with Tenants__ps-demodata__ConnectionString; reference for the convention Phase 9's harness must respect
docker-compose.override.yml.example + docker-compose.override.yml (gitignored)Local-dev tenant-slot wiring; Phase 9's harness may run locally against the same PS_DemoData via this path

Deliverables

When Phase 9 is complete, the Collaborator and PO should be able to verify each without trusting your word:

  1. Code commits — atomic, logically grouped. DO NOT push — the PO pushes; you git add and git commit only.
  2. tools/parallel-validation/ (or similar; Architect's directory choice documented in ADR-027) containing the harness implementation.
  3. Harness configuration file — declarative input snapshot + tolerance bands + comparison thresholds.
  4. First end-to-end harness run output — Markdown or HTML report at docs-site/docs/devlog/2026-04-XX-powerfill-phase-9-first-validation-run.md (or similar location) showing PSSaaS-vs-Desktop-App-equivalent on PS_DemoData with per-loan verdict + per-run summary.
  5. Demo-asset slot-in — the report's top-level summary line + selected per-loan-divergence detail formatted to slot into powerfill-a54-fix-greg-demo-readiness.md as the "loan-by-loan parity proof" addition. Could be an amendment to that doc OR a sibling doc the PO weaves in.
  6. Sentinel bump to phase-9-validation-ready.
  7. NEW ADR-027 — Phase 9 Parallel Validation Harness Design documenting the Alternatives-First Gate decisions (invocation path / language / report format).
  8. Spec amendment marking Phase 9 (validation harness scope) DONE; cutover scope deferred to Phase 10+.
  9. Assumption log A69+ for new Phase 9 findings.
  10. A62 closure (if Phase 9 takes the recommended pfillv2_* rename path; mark Backlog #24 done).
  11. A67 closure (XML doc-comment fix; mark in assumptions log).
  12. docs-site/docs/handoffs/powerfill-phase-9-completion.md — W2 completion-report format + Capability × Environment matrix per practice #13.
  13. Devlog entry at docs-site/docs/devlog/2026-04-XX-powerfill-phase-9.md.
  14. Pre-push docs-build check if any new docs-site/docs/** files.

Reporting protocol

Standard Architect Report format when you're done — what was produced / decisions / assumptions / open questions / recommended next steps / process notes.

If the harness reveals a real PSSaaS-vs-Desktop-App divergence beyond rounding tolerance — STOP and surface as A69+. The disposition is PO's call (could be a PSSaaS bug to fix; could be a legacy-bug carve-out per ADR-021's pattern; could be a Tom-or-Greg consultation).

If the chosen invocation path (Option A / B / C) encounters an unanticipated constraint — STOP and surface, don't paper over.

If a multi-day session reaches a natural pause point with partial Phase 9 completion (e.g. harness scaffold + Option A invocation works but verdict logic incomplete), that's fine — write a handoff so the next Architect session resumes cleanly.

The PO milestone for this phase: "I have empirical loan-by-loan evidence that PSSaaS PowerFill matches the legacy Desktop App on PS_DemoData." Achievable when the harness's first comparison run completes with a verdict report. Subsequent customer-DB validation runs are operator-driven post-Phase-9.

What success looks like

  • Harness scaffold exists + invokable from a non-Architect machine (Collaborator-side reproducibility check passes)
  • First end-to-end comparison run on PS_DemoData produces a verdict report
  • Verdict report's top-level summary slot-fits into the existing PO Greg-demo narrative
  • A66 expected-empty case is correctly classified as Match (NOT Divergent) — empirically demonstrates the verdict logic is A66-aware
  • Sentinel reflects phase-9-validation-ready
  • ADR-027 documents the harness-design choices
  • A62 + A67 closure landed (or explicit defer with rationale)
  • Capability × Environment matrix in completion report explicitly distinguishes "PS_DemoData verified; future customer DBs NOT MEASURED HERE pending operator-driven runs"

Begin when ready. Local environment + staging environment are both fully wired; PS_DemoData has 12 historical runs (3 Complete + 7 Failed + 2 Cancelled); the 12 PowerFill endpoints (4 run-mgmt + 8 reports) are live; the operator React UI is live at https://pssaas.staging.powerseller.com/app/ for click-through verification; A54 is RESOLVED so end-to-end Complete-runs are reproducible.

Reminder: Opus 4.7 High Thinking. Verify model in picker before sending your first response. Do NOT push.