ADR-028: Phase 9 Parallel Validation Harness Design

Status

Proposed (2026-04-20) — drafted as part of the Phase 9 dispatch per the kickoff at powerfill-phase-9-kickoff. Records three architectural decisions surfaced via Alternatives-First Gate during Phase 9 planning + a fourth framing decision (Frame D Hybrid) surfaced via Andon-cord pre-plan exchange between the Architect and the PO.

Note on ADR number assignment (renumbered 2026-04-20 in commit batch): the Phase 9 kickoff initially mentioned ADR-027 as the Phase 9 harness ADR slot, but during the Phase 9 Architect dispatch window the Collaborator independently authored ADR-027 (Superset Embedding Strategy) in commit ece500e (banking the PSX Collaborator's reply to the embedding-pattern relay; landed before this Phase 9 ADR was committed). Per the canonical "ADRs are numbered sequentially and never renumbered" rule, the first-committed ADR-027 (Superset Embedding) keeps its number; this Phase 9 harness design ADR took ADR-028 instead. The Phase 9 kickoff document's reference to "NEW ADR-027" is now historical; the actual Phase 9 ADR is ADR-028. The Phase 8.5 PSSaaS Auth Strategy ADR (originally projected as ADR-028) folds into the Collaborator's ADR-027 per the framing decision documented there.

Context

The PSSaaS migration thesis is "chunk-by-chunk extraction from the Desktop App, with each chunk verifiably matching legacy behavior loan-by-loan against real customer data." Phase 9 ships the empirical proof of that thesis for chunk #1 (PowerFill). Without a parallel-validation surface, the Greg-demo claim is "look at our cool new UI"; with it, the claim is "here is empirical evidence the chunk-extraction model works."

Phase 9's harness ships:

A reproducible tool that runs PSSaaS PowerFill alongside a Desktop-App-equivalent invocation against the same tenant DB and produces a per-loan-allocation diff report with a verdict per loan (Match / TolerableDiff / Divergent / Incomparable) and a per-run summary stat.
A first comparison run against PS_DemoData with a documented input snapshot, producing a comparison report demonstrating the harness works.
A demo asset: the harness's first-run output report formatted to slot into the existing PO-written Greg-demo "Bug as Feature" narrative as the loan-by-loan parity-proof addition.

Three decisions about how the harness is built drive the architecture.

Decision

Framing Decision (D-9-0): Frame D Hybrid (PO-confirmed pre-plan)

Adopted: Frame D Hybrid.

Per the Andon-cord pre-plan exchange between the Architect and the PO (captured in the Phase 9 plan §"Framing locked"), Phase 9's first comparison run on PS_DemoData proves orchestration parity (PSSaaS's C# orchestration of the SQL procs produces row-equivalent outputs to direct sqlcmd EXEC of the same procs against the same DB). It does NOT prove legacy-vs-fixed-body parity because the ADR-021 §Narrow Bug-Fix Carve-Out is forward-only — the A54-fixed psp_powerfill_pool_guide body is deployed to PS_DemoData; both invocation paths execute the same fixed body; the legacy unmodified body deterministically Fails on PS_DemoData (the canonical "Bug as Feature" demo signal).

The Capability × Environment matrix in every harness output report explicitly carries cells like "Legacy unmodified proc body vs PSSaaS- fixed proc body parity: NOT MEASURABLE HERE — pending customer-rep approval" (Capability Inflation countermeasure per canonical practice #13).

Alternatives considered + rejected:

Frame A (orchestration parity, sole framing) — rejected as sole framing because packaging it as "loan-by-loan parity vs Desktop App" in a Greg-demo slide titled "loan-by-loan parity proof against real customer data" would be a Capability Inflation instance (the canonical example named in process-discipline.md).
Frame B (defer to customer-DB run) — rejected because Phase 9's kickoff explicitly defers multi-customer-DB sweeps to operator-driven post-Phase-9 work; choosing B re-scopes Phase 9 to the wrong phase.
Frame C (snapshot pre/post-fix on PS_DemoData) — rejected because it re-tells the A54 fix story already told in powerfill-a54-fix-greg- demo-readiness.md, conflating "the fix works" with "PSSaaS produces the right answer per loan".

Decision 1 (D-9-1): Desktop App equivalent invocation path

Chosen: Option A — Direct sqlcmd EXEC of psp_pfill_bx_settle_and_price, psp_powerfill_conset, psp_powerfill_pool_guide, and psp_powerfillUE (plus optionally psp_pfill_bx_cash_grids if bx_price_floor is set per A12) against the target tenant DB via pyodbc. Skips the PowerBuilder front-end entirely; matches the legacy ADR-021 verbatim-port discipline (the procs ARE the canonical contract per A1's Banking note).

Alternatives:

Option B — PowerBuilder headless invocation — rejected. Higher friction; uncertain headless support; no available test surface; re-introduces a PB dependency the modular-monolith ADR-004 + ADR-021 explicitly avoid.
Option C — Snapshot-then-compare — rejected for first-run instance. Trades headless-invocation difficulty for human-in-the-loop overhead. Documented as a future extension in the §"Future considerations" section below.

Note on parameter mapping fidelity: the harness's SqlcmdInvoker._resolve_six_params mirrors PowerFillRunService.cs lines 389-394 + 549-554 byte-for-byte, including the cl/co scope mapping and the pc/po price-mode mapping per A40 + F-6d-5. UE takes the same 6 parameters as conset.

Decision 2 (D-9-2): Harness implementation language

Chosen: Python 3.10+ in WSL Ubuntu with the pyodbc + requests + PyYAML + Jinja2 stack. Kickoff §"Tooling" line 180 explicitly recommends this combination; Architect-side fluency is high; the dev- environment install path requires msodbcsql18 + unixodbc-dev + python3-pip from the Microsoft apt repo, captured in the harness's README.md §"Prerequisites".

Alternatives:

.NET console app — rejected. The PSSaaS API contract types could be reused via project reference, but the diff-rendering ergonomics are weaker; would create a fourth deploy artifact (after api, docs, frontend).
PowerShell — rejected. Cross-platform constraints (the kickoff anticipates the harness must work in WSL Ubuntu); JSON manipulation ergonomics weaker than Python.
TypeScript — rejected. Would re-use the React UI's existing wire-shape types but would re-introduce a Node-runtime dependency the Option L runtime constraint (see D-9-4 below) explicitly avoids.

Decision 3 (D-9-3): Verdict-rendering output format

Chosen: Markdown rendered via Jinja2 from tools/parallel-validation/templates/comparison_report.md.j2. The output artifact lives at docs-site/docs/devlog/2026-04-20-powerfill- phase-9-first-validation-run.md (Docusaurus-rendered; PO can paste sections into the Greg-demo deck or screenshot for slides). Per kickoff §"Demo asset" the artifact must "slot into the existing PO-written powerfill-a54-fix-greg-demo-readiness.md 'Bug as Feature' demo narrative" — Markdown is the same format that doc uses.

Alternatives:

HTML — rejected for first instance. Heavier toolchain; less natural for the docs-site source-of-truth integration.
JSON-with-static-renderer — rejected. Premature for a 1-shot first-run output; can be added as a sibling Jinja2 template if a future phase needs programmatic consumption.

Decision 4 (D-9-4): Runtime location for the harness binary (Option L)

Chosen: Option L — Local-only. The harness runs in WSL Ubuntu, invokes the local pssaas-api container (http://pssaas.powerseller.local/api/powerfill) for the PSSaaS side and direct sqlcmd to the PS_DemoData public endpoint for the legacy-equivalent side. OIDC sidecar planned-for, NOT built this session.

Alternatives:

Option S — Local harness against staging API + private endpoint — rejected. Requires the Architect's WSL to reach the SQL MI private endpoint (which is reachable from AKS via VNet peering, not from a dev workstation). Mixing API-against-staging + sqlcmd-against-public- endpoint creates a configuration discrepancy the harness has to encode. Architect-vs-Infra ownership ambiguity per architect-context.md "Infrastructure operations escalate to the Collaborator/PSX Infra Agent".
Option K — Containerized K8s Job — rejected. +1 sub-session of plumbing (image build, GHCR push, K8s manifest, GHA path-filter, RBAC) before any harness-as-tool work happens. Phase 9's PO milestone is "loan-by-loan parity evidence on PS_DemoData" — building a Job-deployable harness produces zero additional evidence over Option L for the first comparison run; it just changes WHO can run it WHERE. Kept as a future-extension if post-Phase-9 customer-DB runs reveal local-only doesn't generalize.

Demo-vs-runtime dynamic (PO-clarified during Q1 follow-up): the Greg demo lives on staging React UI; the harness output artifact's clickable run-status URLs point at staging via the harness_config.yaml :: report.pssaas_ui_base_url indirection (default https://pssaas.staging.powerseller.com). The harness binary runs locally; its output references staging surfaces. PSSaaS's API path is identical on local + staging (same proc body via SQL MI), so the parity claim is environment-independent.

Consequences

Positive

Zero new platform infrastructure. The harness reuses the existing local-dev pssaas-api + the PS_DemoData public endpoint. No new GHCR images, no new K8s manifests, no new GHA workflows, no new secrets.
Honest about what's measured. Frame D Hybrid + the Capability × Environment matrix per practice #13 prevent Capability Inflation in the demo-asset framing. The harness's own self-test exercises the load-bearing semantics (A66-aware Match; Asymmetric-failure Incomparable) at unit level.
Surfaces real findings. The first comparison run surfaced A69 (state-dependent UE failure on non-empty post-pool_guide state on PS_DemoData) — exactly the class of finding the Phase 9 harness was built to surface. The harness earned its Phase 9 charter on its very first run.
Reproducible. tools/parallel-validation/README.md documents the install + invocation pattern; a non-Architect machine can reproduce by docker compose --profile dev up + python harness.py.

Negative

Sqlcmd-direct path skips the C#-side PowerFillCandidateBuilder pre-step. That step writes diagnostic counters into PSSaaS's RunSummary (constraint_count / candidate_count / etc.) but does NOT populate pfill_loan2trade_candy_level_01 directly — psp_powerfill_conset rebuilds that table itself per 008_CreateAllocationProcedure.sql lines 1300-1301. So the proc-body output is symmetric across both invocation paths; only the C#-side counter set is asymmetric. The harness's verdict logic doesn't rely on those counters.
OIDC integration is not yet wired. When Phase 8.5 lands per Backlog #31, the harness's HTTP client signatures will need a bearer-token argument. The function shapes already accept an optional bearer_token-style parameter for forward compatibility; wiring is a future commit.
PSX Infra ownership of msodbcsql18 + ODBC Driver 18 install on any non-Architect machine. The harness's README.md §Prerequisites documents the apt sequence, but per architect-context.md infrastructure operations escalate to the Collaborator/PSX Infra Agent. Banking observation: F-W2-TOOLING-1 was Node-on-WSL; this ADR's tooling-prereqs is the analogous case for pyodbc.

Risks and Mitigations

Risk	Mitigation
Harness's first comparison run reveals real PSSaaS-vs-sqlcmd-direct divergence beyond rounding tolerance	STOP and surface as A69+ per kickoff §"Reporting protocol". Already exercised on Phase 9's first run (A69 banked).
Harness's A66-aware verdict logic mis-classifies a buggy-empty case as Match	Diff-engine self-test Cases 1, 6, 7 cover the load-bearing semantics: A66 happy-path, Failed-PSSaaS, Asymmetric-Failed. Verdict logic adjusted post-first-run to suppress A66 when EITHER side Failed.
Sqlcmd-direct path overwrites `pfill_*` tables PSSaaS just wrote, racing if the harness invokes them in parallel	Harness invokes them sequentially (PSSaaS first; reads HTTP reports into in-memory dicts; THEN sqlcmd-direct EXECs). PSSaaS's data is materialized before any sqlcmd write.
Harness's clickable run-status URLs point at staging via `pssaas_ui_base_url` config indirection; if Backlog #30's Superset migration happens to change the staging URL, the report links break	Config indirection lives in `harness_config.yaml`; one-line update flips all URLs in future runs. Already-rendered reports remain historical artifacts.

ADR-021: PowerFill Port Strategy — verbatim-port discipline + the §Narrow Bug-Fix Carve-Out that forward-only-deploys A54 fixes; the canonical reference for what Frame D's "same fixed proc body" claim means.
ADR-022: PowerFill Allocation Algorithm — the iterative-passes algorithm whose per-stage semantics A1 documents and Phase 9's per-loan correctness validation is the gate for.
ADR-024: PowerFill Async Run Pattern — the BackgroundService + Channel pattern that's the PSSaaS-side invocation surface the harness reads from.
ADR-025: PowerFill Report API Pattern — the latest-Complete-wins semantics + freshness verdicts that the harness's HTTP-client layer needs to understand (see also A60 in the assumptions log).

A1 (per-stage allocation semantics) — Phase 9's per-loan correctness validation is the canonical gate per A1's Banking note.
A60 (latest-Complete-wins; ADR-025 reference).
A65 (multi-pa_key + per-loan settlement-date variance are two distinct A54 triggers; harness pre-flight could probe these on a customer DB; deferred to operator-driven post-Phase-9 sweep).
A66 (UE rebuild-empty on syn-trade-empty datasets like PS_DemoData) — encoded in the harness's verdict logic as the A66-aware Match rule.
A67 (ReportContracts.cs XML doc Truth Rot) — closed in this Phase 9 commit batch.
A68 (tenant_id-vs-config-slot conflation) — the harness uses X-Tenant-Id: ps-demodata to match the existing pfill_run_history row tags per A68's short-term Path γ disposition.
A69 (NEW this session) — state-dependent UE failure on non-empty post-pool_guide state surfaced by the harness's first run; banked for Greg/Tom consultation.

Future Considerations

Snapshot replay (Option C from D-9-1) — capture pre-run DB state
- replay after each comparison to enable true side-by-side comparisons of two harness runs against the same input. Currently the harness relies on the proc bodies being deterministic given identical inputs.
Multi-tenant operator-driven sweep — point the harness at a customer DB (substituting connection string + tenant ID) for the post-Phase-9 validation runs the kickoff defers.
Phase 8.5 OIDC sidecar — when Keycloak lands per Backlog #31, wire bearer-token authentication into the harness's HTTP client.
HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
Containerized harness (Option K from D-9-4) — defer until evidence emerges that local-only invocation doesn't generalize to a customer-DB scenario.

Revision Triggers

This ADR is revised when:

The Frame D framing is challenged by Greg/Tom in a way that makes the orchestration-equivalence claim insufficient as the demo signal.
The first non-PS_DemoData customer-DB validation run is performed (closes the Capability × Environment matrix's "Customer DB" column).
A second invocation-path option (B / C from D-9-1) becomes preferred over the direct-sqlcmd default.
The Phase 8.5 OIDC integration lands and the harness's HTTP client signature needs to bake bearer-token in by default.

Status​

Context​

Decision​

Framing Decision (D-9-0): Frame D Hybrid (PO-confirmed pre-plan)​

Decision 1 (D-9-1): Desktop App equivalent invocation path​

Decision 2 (D-9-2): Harness implementation language​

Decision 3 (D-9-3): Verdict-rendering output format​

Decision 4 (D-9-4): Runtime location for the harness binary (Option L)​

Consequences​

Positive​

Negative​

Risks and Mitigations​

Related ADRs​

Related Assumptions​

Future Considerations​

Revision Triggers​

Status

Context

Decision

Framing Decision (D-9-0): Frame D Hybrid (PO-confirmed pre-plan)

Decision 1 (D-9-1): Desktop App equivalent invocation path

Decision 2 (D-9-2): Harness implementation language

Decision 3 (D-9-3): Verdict-rendering output format

Decision 4 (D-9-4): Runtime location for the harness binary (Option L)

Consequences

Positive

Negative

Risks and Mitigations

Related ADRs

Related Assumptions

Future Considerations

Revision Triggers