PowerFill Phase 9 — Completion Report (Parallel Validation Harness)

Author: PSSaaS Systems Architect Date: 2026-04-20 Status: Code complete; sentinel phase-9-validation-ready ✓; harness scaffold + first end-to-end comparison run + ADR-028 + A67 closure + A69/A70 banked + spec amendment + this completion report all written. Pending Collaborator review and PO push. Sentinel: phase-9-validation-ready ✓ (drops the -a54-fixed sub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical, not gating) Companion docs:

Phase 9 kickoff: powerfill-phase-9-kickoff
Phase 9 first comparison run output: 2026-04-20-powerfill-phase-9-first-validation-run
ADR-028 (Phase 9 Parallel Validation Harness Design): adr-028-phase-9-parallel-validation-harness-design
A67 (closed this session): powerfill-assumptions-log §A67
A69 (NEW this session — Phase 9 first-run finding): powerfill-assumptions-log §A69
A70 (NEW this session — Frame D framing refinement): powerfill-assumptions-log §A70
A54-fix Greg-demo readiness narrative the harness output slots into: powerfill-a54-fix-greg-demo-readiness

TL;DR

Phase 9 ships the Parallel Validation Harness (the "lever" per the kickoff's framing) at tools/parallel-validation/, exercises it end-to-end against PS_DemoData, and produces the first comparison run's verdict report as the Greg-demo "loan-by-loan parity proof" addition.

Per the PO-confirmed Frame D Hybrid framing, the harness's first comparison run on PS_DemoData proves orchestration parity (PSSaaS's C# orchestration produces row-equivalent outputs to direct sqlcmd EXEC of the same procs against the same DB) WITHOUT inflating to "loan-by-loan parity vs the legacy unmodified Desktop App proc body" (which is NOT MEASURABLE on PS_DemoData per ADR-021 §Narrow Bug-Fix Carve-Out's forward-only nature; the legacy unmodified body deterministically Fails on this snapshot — the canonical "Bug as Feature" demo signal).

The first comparison run surfaced A69 — a state-dependent UE failure on non-empty post-pool_guide state on PS_DemoData. This is exactly the class of finding the Phase 9 harness was built to surface. The harness earned its Phase 9 charter on its very first run. A69 is banked for Greg/Tom consultation; the verdict logic was hardened post-first-run to honor the asymmetric-failure case via a new RowVerdict.INCOMPARABLE classification (Capability Inflation countermeasure verified empirically).

A second NEW assumption — A70 — was banked documenting the Frame D framing refinement: PS_DemoData has a mixed PSSaaS-deployed + legacy-encrypted proc-body state, so Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB, both invocation paths execute against the same set." The Capability × Environment matrix already encoded the right level of honesty.

Phase 9 carry-over Workstream 2 closures: A67 closed (the 8-line XML doc-comment fix in ReportContracts.cs matching actual route registrations); A62 deferred to a new Backlog row per Q2 confirmed scope; canonical-promotion proposal for the Backlog re-read pass banked in §Counterfactual Retro for Collaborator nomination (4-instance corroboration check pending).

Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity. Banking the pattern: a complete harness deliverable (Python scaffold + first end-to-end run + ADR + completion report + carry-over closures) is a 1-session unit at this scale when the PO confirms framing pre-plan and the architectural primitives are self-implemented.

What was produced

New Python harness tree (10 source files, ~1,500 LOC)

Path	Purpose
`tools/parallel-validation/README.md`	Reproducibility instructions + Frame D framing + A66 + dependency matrix
`tools/parallel-validation/requirements.txt`	Pinned deps (`pyodbc==5.1.0`, `requests==2.32.3`, `PyYAML==6.0.2`, `Jinja2==3.1.4`)
`tools/parallel-validation/harness_config.yaml`	Declarative config: PSSaaS API + sqlcmd connection params + tolerance bands + `pssaas_ui_base_url` indirection per Backlog #30
`tools/parallel-validation/install_venv.sh`	One-shot WSL Ubuntu venv setup script
`tools/parallel-validation/.env`	Gitignored — `PFILL_SQL_PASSWORD` (sidesteps shell-quoting hell with `$` in passwords)
`tools/parallel-validation/harness.py`	Main entry point + CLI + `.env` loader + sentinel-aware logging
`tools/parallel-validation/snapshot.py`	Input snapshot capture (reads `pfill_run_history` + computes state hash)
`tools/parallel-validation/pssaas_invoker.py`	`POST /run` + poll terminal status + read 8 Phase 7 reports (codes against ACTUAL routes per A67 closure)
`tools/parallel-validation/sqlcmd_invoker.py`	pyodbc EXECs of 5-step proc sequence (skips C#-side candidate_builder; conset rebuilds the table itself) + reads same 8 sources via direct SELECT
`tools/parallel-validation/diff_engine.py`	Umbrella + per-report row comparators + A66-aware verdict logic + `RowVerdict.INCOMPARABLE` for asymmetric-failure case + 7-case self-test
`tools/parallel-validation/report_renderer.py`	Jinja2 → Markdown; staging-URL composition via `pssaas_ui_base_url` + 1-case self-test
`tools/parallel-validation/templates/comparison_report.md.j2`	The Frame D Capability × Environment matrix template + per-section breakdown + Phase 9 finding A69 surfacing block
`tools/parallel-validation/check_db.py`	Dev-env pre-flight (one-shot pyodbc + `pfill_run_history` connectivity check)
`tools/parallel-validation/diagnose_a69.py`	A69 root-cause diagnostic: column existence + view shape + proc encryption state
`tools/parallel-validation/diagnose_a69_v2.py`	A69 v2 diagnostic: `v_loan_loan_shipped` + `loan_shipped` column verification (both have `note_rate`)
`tools/parallel-validation/diagnose_a69_v3.py`	A69 v3 diagnostic: UE-in-isolation success against post-rebuild-empty state

Modified backend files

src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs — A67 closure: 8 XML doc-comment paths re-aligned from /api/powerfill/runs/{run_id}/reports/<name> to /api/powerfill/runs/{run_id}/<name> matching actual RunEndpoints.cs route registrations.
src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs — sentinel bumped from phase-8-superset-react-ready-a54-fixed to phase-9-validation-ready (drops the -a54-fixed sub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical).

New documentation

docs-site/docs/adr/adr-028-phase-9-parallel-validation-harness-design.md — full ADR documenting D-9-0 (Frame D Hybrid) + D-9-1 (Option A invocation) + D-9-2 (Python in WSL) + D-9-3 (Markdown output) + D-9-4 (Option L runtime).
docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md — first comparison run output (auto-rendered by the harness; committed verbatim as the Greg-demo asset).
docs-site/docs/devlog/2026-04-20-powerfill-phase-9.md — devlog entry per template-7 format.
docs-site/docs/handoffs/powerfill-phase-9-completion.md — this completion report.
docs-site/docs/specs/powerfill-engine.md — Phase 9 row in Phased Implementation table marks "Validation harness DONE + first comparison run DONE"; new Phase 10 row for production cutover.
docs-site/docs/specs/powerfill-assumptions-log.md — A69 (state-dependent UE failure on non-empty post-pool_guide state) + A70 (Frame D framing refinement; mixed PSSaaS/legacy proc-body state on PS_DemoData).
docs-site/docs/arc42/09-architecture-decisions.md — ADR-028 row added (Phase 9 harness; renumbered from initial ADR-027 due to a parallel Collaborator-authored ADR-027 landing in commit ece500e during the Phase 9 dispatch window — the canonical "ADRs are numbered sequentially and never renumbered" rule applies; first-committed wins). ADR-027 (Superset Embedding Strategy, Collaborator-authored) row preserved unchanged.
docs-site/docs/handoffs/pssaas-session-handoff.md — checkpoint banner + numbered list entry + Backlog table updates.

Out of scope (deliberately not produced)

A62 closure — deferred to its own Backlog row per Q2 confirmed scope (PS_DemoData view drift; pfillv_existng_pool_disposition 14-col legacy version vs PSSaaS's 15-col note_rate-bearing definition). The harness's first run surfaced A62 via the Existing-Disposition endpoint's catch-and-degrade Note as expected; no new defect introduced.
A69 root-cause investigation — banked for follow-up session; this session's deliverable is the harness + the empirical surfacing of A69 + the verdict-logic hardening to honor the finding without inflation.
Multi-customer-DB sweep — operator-driven post-Phase-9 per kickoff §"Explicit scope (OUT)". The harness as built is the lever; the pull is operator + customer-rep work.
Production cutover — Phase 10+ per kickoff. Spec line 651's "cutover" wording is now superseded.
Phase 8.5 OIDC integration — pssaas_invoker signatures already accept an optional bearer-token shape; wiring is a future commit when Backlog #31 lands.
Containerized harness (Option K) — kept as a future-extension per ADR-028 §Future Considerations.
Snapshot replay (Option C) — kept as a future-extension per ADR-028 §Future Considerations.
HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
Canonical-promotion proposal for the Backlog re-read pass — banked in §Counterfactual Retro for Collaborator nomination (NOT Architect-shipped this session).

Decisions made

#	Decision	Rationale	Where
D-9-0	Frame D Hybrid (orchestration parity on PS_DemoData; Capability × Environment matrix carries explicit "NOT MEASURABLE HERE" cells for legacy-vs-fixed-body parity + Customer DB column)	Per Andon-cord pre-plan exchange; PO-confirmed. Capability Inflation countermeasure for the Greg-demo asset.	ADR-028 §D-9-0; this report TL;DR; comparison_report.md.j2
D-9-1	Option A — Direct sqlcmd EXEC of the 5-proc legacy-equivalent sequence	Lowest friction; matches ADR-021 verbatim-port discipline; Architect's existing pyodbc fluency	ADR-028 §D-9-1; sqlcmd_invoker.py
D-9-2	Python 3.10+ in WSL Ubuntu (pyodbc + requests + PyYAML + Jinja2)	Kickoff §"Tooling" recommendation; Architect fluency; ergonomic stack for the harness's 4 concerns	ADR-028 §D-9-2; requirements.txt
D-9-3	Markdown output via Jinja2 template; HTML/JSON deferred	Slots into the existing Greg-demo narrative format; renders in Docusaurus; PO can paste sections into the deck	ADR-028 §D-9-3; report_renderer.py + comparison_report.md.j2
D-9-4	Option L Local-only runtime (WSL + local pssaas-api + PS_DemoData public endpoint)	PO-confirmed; smallest testable increment; demo asset's content references staging via config indirection so the harness binary's location doesn't constrain demo surface	ADR-028 §D-9-4; harness_config.yaml
D-9-5	A69 honesty post-first-run: added `RowVerdict.INCOMPARABLE` + suppressed A66 Match classification when EITHER side Failed	Capability Inflation countermeasure; Andon-cord response per kickoff §"Reporting protocol" + PO's `fix_and_continue` confirmation	diff_engine.py + Case 7 self-test + this report's §"Counterfactual Retro"
D-9-6	Option `harness_plus_a67` carry-over scope: A67 closes this session; A62 deferred; canonical-promotion proposal banked for Collaborator	Kickoff anticipated A67 as the harness's natural forcing function (trivial 8-line edit); A62 closure has its own Verification-Gate surface; canonical-promotion goes through PSSaaS Collaborator nomination	This report; pssaas-session-handoff Backlog updates

Migrations enumerated

This phase ships 0 new schema migrations + 0 new SQL artifacts. The harness is read-only against pfill_run_history for snapshot capture

writes via the existing POST /api/powerfill/run HTTP endpoint (which uses the same Phase 6e migration surface). Sqlcmd-direct EXECs go against PSSaaS-deployed proc bodies that already live on PS_DemoData.

PowerFill-owned table count: 23 (unchanged from Phase 6e). PowerFill-owned proc count: 8 (unchanged from Phase 6d/A54-fix).

Gate findings

Three-layer Primary-Source Verification Gate (with Backlog re-read pass)

ID	Layer	Finding	Disposition
F-9-1	Spec-vs-implementation (Layer 1; A67 carry-over)	`ReportContracts.cs` lines 9-16 + 8 per-class XML doc-comment paths showed `/api/powerfill/runs/{run_id}/reports/<name>` segment that doesn't exist in `RunEndpoints.cs` route registrations.	(a) Corrected in place — A67 closure landed in this commit batch. 9 path strings re-aligned + a banking comment block added explaining the Truth Rot pattern + A67 reference.
F-9-2	NVO-vs-implementation (Layer 2)	ADR-021 §Narrow Bug-Fix Carve-Out is forward-only; A54-fixed `009_*.sql` proc body lives on PS_DemoData; both PSSaaS API path AND sqlcmd-direct EXEC of the same procs execute the SAME body. The kickoff's "Desktop App equivalent path" Option A produces orchestration-parity, NOT legacy-vs-fixed-body parity.	(b) Scope-changed → Frame D Hybrid (PO-confirmed pre-plan). Capability × Environment matrix in every harness output report encodes the distinction with explicit cells.
F-9-3	Implementation-vs-runtime (Layer 3; A66)	Per A66, post-Complete state on PS_DemoData has 0 rows in 7-of-8 user-facing reports + Cash Trade Slotting; only Hub dashboard run-history retains observable rows. The harness's verdict logic must classify "both PSSaaS path and sqlcmd path produce 0 rows" as Match, not Divergent.	(a) Encoded in harness verdict logic as the A66-aware comparison rule (`if both_empty AND pssaas_run_status==Complete AND sqlcmd_run_status==Complete: Match`). Self-test Cases 1, 6, 7 cover the load-bearing semantics including the Failed-side regression.
F-9-4	Implementation-vs-runtime (Layer 3; A68)	A68's tenant-id convention requires harness writes to use `tenant_id='ps-demodata'` matching the existing 13 historical rows.	(a) Encoded in harness invocation — `X-Tenant-Id: ps-demodata` header on all PSSaaS API calls; harness is otherwise read-only against `pfill_run_history`.
F-9-5 NEW	Implementation-vs-runtime (Layer 3; first-run finding)	A69 — `psp_powerfillUE` fires `SqlException 207 'Invalid column name note_rate'` when invoked via direct sqlcmd EXEC against PS_DemoData with non-empty post-pool_guide state. Not reproducible when UE invoked in isolation against post-PSSaaS-rebuild-empty state.	(b) Banked as A69 — first-run finding worth Greg/Tom consultation; harness's verdict logic hardened post-first-run to honor the asymmetric-failure case via `RowVerdict.INCOMPARABLE`.
F-9-6 NEW	Spec-vs-implementation (Layer 1; first-run finding)	A70 — PS_DemoData's `psp_powerfillue` + `psp_powerfill_conset` + 5 other procs are deployed `WITH ENCRYPTION`, indistinguishable from the legacy versions via `sys.objects`. Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB."	(b) Banked as A70 — refines Frame D framing without invalidating it; matrix already encodes the right level of honesty. Production-cutover playbook (Phase 10+) needs explicit A70 disposition.

Pattern observation: the Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself this session (Phase 7 F-7-7 anticipated; Phase 8 W1 F-8-BR-1 caught at planning; Phase 8 W2 F-W2-BR-3 + F-W2-CONTRACT-1 caught at planning; Phase 9 = no Backlog-rooted finding, only the harness's runtime-discovery findings F-9-5 + F-9-6). The 4-instance-corroboration check is therefore 3-instance corroborated + 1 instance where the Backlog was already fully addressed at planning time — pattern remains canonical-adoption- ready per Phase 8 W2 completion report's banking, with the qualifier that "0 net-new findings" is itself a positive signal (the kickoff did its job at the §"Live infrastructure note" + §"Inherited context" specificity level). Banking proposal for the next process-discipline revision: the canonical wording should accommodate the "0 findings is a pass not a fail" semantic.

Alternatives-First Gate

5 architectural decisions surfaced + documented per ADR-028 (3 from the kickoff's explicit list + Frame D framing decision + Option L runtime decision). All exercised pre-plan via the Architect-PO Andon-cord exchange + the PO's confirmation of recommendations.

Required Delegation Categories

0 subagents dispatched this session. Deliberate Non-Delegation per practice #9, mirroring the Phase 8 W2 pattern:

Deliberate Non-Delegation: Architectural primitives + load-bearing verdict logic
Task: harness.py + snapshot.py + pssaas_invoker.py + sqlcmd_invoker.py
  + diff_engine.py + report_renderer.py + templates/comparison_report.md.j2
  + check_db.py + diagnose_a69*.py + ADR-028 + completion report + devlog
  + spec amendment + assumption log A69 + A70 + sentinel bump + A67 closure
Reason for self-implementation: Frame D framing + A66/A69-aware verdict logic
  + Capability x Environment matrix verbiage are load-bearing for the Greg
  demo's honesty (Capability Inflation countermeasure). The architectural-
  contract-per-artifact cost (especially the matrix's "NOT MEASURABLE HERE"
  cell semantics for the legacy-vs-fixed-body row + the A69 surfacing) far
  exceeds the artifact-write cost. The first-run finding (A69) required
  same-author context to size the disposition correctly; a delegated subagent
  without prior PoC access could not have made the call.
Context that would be lost in handoff: Frame D rationale, Option L rationale,
  A66 distinction, A69 empirical-resolution arc (3 diagnose scripts +
  hypothesis space refinement), A70 framing-refinement, A67 forcing-function
  context, the staging-URL config indirection per Backlog #30, the post-
  first-run verdict-logic hardening (RowVerdict.INCOMPARABLE).

Banking observation: Phase 8 W2's 0-subagent outcome is now 2-instance corroborated (Phase 8 W2 + Phase 9). The "contract-per-artifact density high → self-implement" heuristic is empirically validated for sessions that ship architecturally load-bearing surfaces.

Reviewable Chunks at intra-session scope

3 Andon-cord checkpoints exercised:

Pre-plan (Frame D framing) — Architect surfaced the "Option A is comparing the SAME fixed proc body" concern; PO confirmed Frame D Hybrid as the resolution before any code.
Mid-implementation (Q1 Q2) — Architect surfaced runtime-location (Option L) + carry-over-scope (harness_plus_a67) decisions; PO confirmed both via AskQuestion.
Post-first-run (A69 surfacing) — harness produced asymmetric- failure outcome; Architect surfaced A69 + verdict-logic gap; PO confirmed fix_and_continue disposition; verdict logic hardened
- re-run produced honest Markdown output.

Pattern observation: Reviewable Chunks at sub-checkpoint boundaries (not just workstream boundaries) IS the right granularity for empirical- discovery sessions. The 3 Andon pulls were the load-bearing structure of this session; without them I would have shipped Frame A (Capability Inflation), or Option K (over-engineered runtime), or the misleading "0 Divergent / A66 Match" verdict report.

Deploy Verification Gate — 3 arms

Arm	Description	Evidence
(a) Sentinel signal	`PowerFillModule.cs` `MapGet("/status")` returns `phase-9-validation-ready`	One-line change verified post-edit; runtime probe deferred to Collaborator-side post-push (`curl -s http://pssaas.staging.powerseller.com/api/powerfill/status`).
(b) Harness self-tests + integration	`diff_engine.py` self-test 7/7 PASS (including new Case 7 asymmetric-Failed regression); `report_renderer.py` self-test 12/12 substring matches PASS; harness end-to-end run 3 consecutive PASSes producing identical A69 finding (deterministic)	See §"PoC verification commands and outputs" below.
(c) End-to-end click-through	First comparison run produces verdict report at `docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md`; report's clickable run-status URL points at staging React UI (live demo surface)	Verified via Read of the rendered output; staging URL composition validated via renderer self-test.

Environment-Explicit Inventory (per canonical practice #13)

This is the load-bearing Phase 9 application of practice #13. Adapted from the Phase 8 W2 completion report's matrix shape, with rows adjusted for the per-customer-DB cell semantics the kickoff specified.

Capability	Architect's local WSL + local pssaas-api + PS_DemoData (public endpoint)	Staging React UI + staging API + PS_DemoData (private endpoint)	Customer DB (PS608 / future tenants)
Harness scaffold builds + runs (`python harness.py`)	Verified ✓ this session (3 consecutive runs; identical A69 reproduction)	NOT MEASURED HERE — Option L is local-only	NOT MEASURED HERE — operator-driven post-Phase-9
Diff engine self-test 7/7 PASS	Verified ✓ this session	N/A	N/A
Report renderer self-test 12/12 PASS	Verified ✓ this session	N/A	N/A
First end-to-end comparison run produces verdict report Markdown	Verified ✓ this session	NOT MEASURED HERE	NOT MEASURED HERE
Verdict logic correctly suppresses A66 Match when sqlcmd Failed (asymmetric-failure case)	Verified ✓ this session (Case 7 self-test + empirical first-run output)	NOT MEASURED HERE	NOT MEASURED HERE
Orchestration equivalence on PS_DemoData (PSSaaS API path vs sqlcmd-direct path against same proc body set)	Phase 9 Andon — A69 surfaced; orchestration-equivalence claim NOT VERIFIED on this run because sqlcmd-direct path Failed mid-EXEC. The harness's STOP-and-surface protocol fired correctly. Next-run verification depends on A69 root-cause + fix.	NOT MEASURED HERE	NOT MEASURED HERE
A66 expected-empty Complete-run case classified as Match	Verified ✓ at unit-test level (Case 1); NOT EXERCISED at runtime level on this run (A69 short-circuited the A66 path). Next harness run on a state where both sides reach Complete will be the runtime-level verification.	NOT MEASURED HERE	NOT MEASURED HERE
A62 PS_DemoData view drift catch-and-degrade observable from harness	Verified ✓ this session (Note text in Existing-Disposition response surfaced in PSSaaS run output)	NOT MEASURED HERE	NOT MEASURED HERE
Legacy unmodified proc body vs PSSaaS-fixed proc body parity	NOT MEASURABLE HERE — A54 fix is forward-only and deployed to PS_DemoData; legacy unmodified body deterministically Fails per A54+A56	NOT MEASURABLE HERE — same DB, same proc body	NOT MEASURED HERE pending customer-rep approval — operator-driven post-Phase-9 sweep against customer DBs without A54 triggers
Whether PSSaaS-deployed-with-encryption procs on PS_DemoData are PSSaaS's port vs the legacy version (A70)	NOT DISTINGUISHABLE via `sys.objects` for the 7 WITH ENCRYPTION procs; only `psp_powerfill_pool_guide` (plain) is definitely PSSaaS-deployed. Behavior diff per proc would resolve.	NOT MEASURED HERE	NOT MEASURED HERE
Sentinel `phase-9-validation-ready` returned by API	NOT YET MEASURED (sentinel bumped in code; runtime probe is post-restart of local pssaas-api)	NOT MEASURED HERE — Collaborator-side post-PO-push	NOT MEASURED HERE
Staging React UI run-status page click-through validates harness-cited run_ids	NOT MEASURED HERE in harness session	Verified ✓ via post-W2-deploy banked verification + first-run report's clickable URLs	NOT MEASURED HERE
Harness invocable from non-Architect machine (Collaborator-side reproducibility)	NOT MEASURED HERE this session — `README.md` documents the install + invocation pattern; Collaborator runs `docker compose --profile dev up + python harness.py` to verify	N/A	N/A
Pre-push docs-build check passes	NOT YET MEASURED — pre-commit step	N/A	N/A

Net assessment: the Phase 9 deliverable is code-complete at the artifact level. Runtime end-to-end verification at staging + customer- DB scope is the post-Phase-9 operator-driven sweep. The Capability × Environment matrix is deliberately granular about what's verified WHERE to prevent Capability Inflation (per the canonical antipattern's W2-banked example).

Counterfactual Retro

Knowing what I know now, what would I do differently?

The Frame D framing surfacing pre-plan was the load-bearing decision of this session. Without it, the harness would have shipped a "loan-by-loan parity vs Desktop App" claim that's a Capability Inflation in the canonical sense. Banking the pattern: for any harness/measurement deliverable, surface the Frame D-style "what does this prove vs what does it NOT prove" question PRE-PLAN.
The verdict-logic hardening post-first-run is the second load-bearing moment. The diff engine's self-test had Case 6 (Failed-PSSaaS) but not Case 7 (Asymmetric-Failed: Complete-PSSaaS + Failed-sqlcmd) — the gap that allowed the misleading "0 Divergent + 8 A66 Match" first-run output. Banking observation: when the verdict semantics has multi-input symmetry (like Match-vs-Divergent-vs-A66 across two sides), the self-test must cover the asymmetric cases too.
The harness surfaced A69 on its very first run. This is the strongest possible validation that Phase 9's investment was correctly scoped — the harness is a defect-surfacing tool, not just a happy-path checker. Banking observation: Phase 9 deliverables should be evaluated by what FINDINGS they surface, not by their "all green" rate. A69 IS the demo asset's load-bearing slide.
The Andon-cord protocol fired 3x this session (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 produced concrete PO decisions that materially improved the deliverable. Banking: for empirical-discovery sessions, sub- checkpoint Andon pulls are the right granularity, not workstream- boundary-only.
The pyodbc dev-environment install was the largest tooling friction of the session (sudo-prompt invisibility; PowerShell-to-WSL-to-bash quoting hell; venv-on-mounted-drive ensurepip failure). The tools/parallel-validation/.env pattern + the ~/.venvs/pfill-harness WSL-native venv path resolved both. Banking observation: the kickoff's §"Tooling" line 180 anticipated the pip-install but didn't anticipate the OS-level msodbcsql18 install requires sudo and a Microsoft apt repo setup. Future kickoffs for harnesses should include OS-package prereqs in §"Tooling", not just pip-package prereqs.
The 4-instance-corroboration check on the Backlog re-read pass is actually 3-instance with a "0 findings = pattern works" 4th data point. Banking proposal for the next process-discipline.md revision (NOT shipped this session — Collaborator nomination per Q2 confirmed scope): the canonical wording should accommodate the "kickoff specificity reaches the point where Backlog re-read produces no net-new findings" semantic as the goal-state, not the miss-state.
The A70 finding (PS_DemoData has mixed PSSaaS/legacy proc-body state) was an unexpected sub-finding. It doesn't invalidate Frame D but it does refine what Frame D measures. Banking: for any kickoff that mentions a "SAME thing on both sides" claim, the primary-source verification gate should explicitly probe whether the "thing" is uniformly what the kickoff thinks it is.
Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity at ~1.5x complexity (harness + first run + 3 Andon iterations + 2 NEW assumptions + carry-over closure + framing-refinement bank). The PoC-iteration overhead (3 diagnose scripts + 3 harness runs) added ~30min total. Banking: process discipline overhead has positive ROI even on harness-style empirical-discovery sessions.

PoC verification commands and outputs

Diff engine self-test (7/7 PASS including Case 7 regression)

$ python diff_engine.py
Running DiffEngine self-tests...
  [pass] A66 expected-empty case: 8/8 reports correctly classified Match
  [pass] Identical-data case: 2/2 rows correctly classified Match
  [pass] Within-tolerance price diff: 1/1 row correctly classified TolerableDiff
  [pass] Beyond-tolerance price diff: correctly classified Divergent on column ['price']
  [pass] Missing-row case: PSSaaS missing L2 correctly classified Divergent
  [pass] Failed-run case: correctly NOT applying A66 (a66_classified_count=0; only Complete triggers A66)
  [pass] Asymmetric Failed case: all 8 reports correctly classified Incomparable (NOT A66 Match); failure_message threaded through

All diff-engine self-tests PASSED.

Report renderer self-test (12/12 substrings PASS)

$ python report_renderer.py
Running ReportRenderer self-tests...
  [pass] Found: PowerFill Phase 9
  [pass] Found: first validation run
  [pass] Found: Frame D Hybrid
  [pass] Found: TL;DR
  [pass] Found: Run verdict
  [pass] Found: what this proves AND what it does NOT prove
  [pass] Found: Capability x Environment matrix
  [pass] Found: NOT MEASURABLE HERE
  [pass] Found: https://pssaas.staging.powerseller.com/app/runs/aabbccdd-112...
  [pass] Found: Per-report breakdown
  [pass] Found: A66 expected-empty case fired
  [pass] Found: Provenance + reproducibility

Rendered output: 13753 chars; renderer self-test PASSED.

Pre-flight DB connectivity

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/check_db.py
connecting...
connected
server: hostedps-sql.086ea791c2f1.database.windows.net
db: PS_DemoData
server_time: 2026-04-20 07:13:42.846028
pfill_run_history rows: 13
recent run: FD6B190E-2B41-4A07-B836-E8576113C7A5 Complete 2026-04-20 05:24:09.064000
recent run: 1CE2B077-AF9D-4969-A348-B535BA265BBD Complete 2026-04-19 22:10:11.688000
recent run: 7C9DFE50-1B7D-471F-832A-D55053D2E7C5 Complete 2026-04-19 20:19:56.845000
ok

First end-to-end harness run (3 consecutive; deterministic A69 reproduction)

Final run (committed report):

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/harness.py \
    --config tools/parallel-validation/harness_config.yaml \
    --output-path docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md
2026-04-20 02:26:58,366 [ERROR] powerfill_harness.sqlcmd: Sqlcmd-direct path failed: ('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'note_rate'. (207) (SQLExecDirectW)")

$ echo $?
2  # Andon exit code per the harness's success-vs-Andon convention

PSSaaS run id: 9312a638-cd21-4485-b42e-ee8dc02be0d0 — Complete in ~30s, allocated_count=515, pool_guide_count=515, post_ue_*=0 (per A66 rebuild-empty pattern). See the rendered report for the full Capability × Environment matrix + A69 surfacing block.

A69 diagnosis (3 scripts; documented in the assumptions log)

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/diagnose_a69_v3.py
Pre-EXEC state:
  pfill_powerfill_guide: 0 rows
  pfill_pool_guide: 0 rows
  pfill_trade_base: 0 rows
  pfill_loan2trade_candy_level_01: 0 rows
  pfill_powerfill_log: 12 rows
  pfill_syn_powerfill_guide_all_rank: 0 rows

EXEC psp_powerfillue (mirroring PSSaaS resolved options)...
UE completed successfully

UE-in-isolation against post-rebuild-empty state succeeds. UE in the full sqlcmd EXEC sequence (post-conset+pool_guide writes 515 rows) fails with 207. State-dependence confirmed. See A69 in the assumptions log for the full hypothesis space.

Open questions and blockers

Carry-over to next Phase 9 follow-up session (or Greg-demo consultation)

A69 root-cause investigation — Phase 9 follow-up session focused on line-level UE archaeology via PRINT-instrumented copy of 011_*.sql
- RAISERROR-on-each-step diagnostic + capture of pfill_run_history.response_json for the PSSaaS run that ostensibly Completed. Resolves the "PSSaaS silently swallows" vs "different code path" hypothesis space.
A70 disposition for production cutover — Phase 10+ playbook needs explicit "drop legacy procs + redeploy from PSSaaS *.sql" step before first PSSaaS run on a customer DB, OR behavior-diff confirmation that legacy + PSSaaS-deployed-with-encryption are equivalent.
A62 closure — deferred per Q2 confirmed scope. New Backlog row for the pfillv_existng_pool_disposition deploy decision (rename PSSaaS view to pfillv2_* OR overwrite encrypted legacy view OR behavior-diff first).

Architect recommendations for the next session

Run the harness against the same input snapshot AFTER A69 root-cause resolution to verify orchestration parity at the runtime level (currently verified at the unit-test level only; the asymmetric- failure short-circuit prevented runtime-level A66 verification this session).
Bank the A70 behavior-diff resolution method as a Phase 10 playbook step — onboarding a new customer DB always begins with the proc-body deploy verification.
Phase 8.5 should bake the OIDC bearer-token shape into the harness when wiring lands — the function signatures already accept the parameter; the wiring is mechanical.
Customer-rep conversation to schedule the first operator-driven harness run — the harness as built is the lever; the pull is PO + customer-rep work that the kickoff defers.

Optional follow-up (deferred)

Snapshot replay (Option C) — captures pre-run DB state + replays after each comparison; enables true side-by-side comparisons of two harness runs against the same input.
Containerized harness (Option K) — defer until evidence emerges that local-only invocation doesn't generalize to a customer-DB scenario.
HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
Canonical-promotion proposal for the Backlog re-read pass — Collaborator nomination per the established practice; this Architect session bank-reports the 4-instance corroboration evidence (3 traditional
- 1 "0 findings = pattern works" data point per Counterfactual Retro item 6) without shipping the canonical revision.

Recommended next steps

Pre-push docs-build check — MANDATORY per Phase 6e/7/8-W1/8-W2 banked discipline + this session ships 5+ new docs-site/docs/** files. Command: docker build -f docs-site/Dockerfile.prod docs-site. Architect performs before commit.
Collaborator review of:
- tools/parallel-validation/ source tree (review focus: Frame D verdict semantics in diff_engine.py + the A66/A69-aware compare() branch + the Capability × Environment matrix verbiage in templates/comparison_report.md.j2)
- src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs (review focus: A67 closure — 9 path strings re-aligned + banking comment)
- src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs (1-line sentinel bump)
- ADR-028
- docs-site/docs/specs/powerfill-engine.md Phase 9 row + new Phase 10 row
- docs-site/docs/specs/powerfill-assumptions-log.md A69 + A70
- docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md (the Greg-demo asset — review focus: honest framing of what's proven vs NOT MEASURABLE HERE)
- This completion report
- Devlog entry
- Session handoff bump
- Estimated 1-2 hours (Phase 9 ships ~10 source files in the harness + several docs; the architectural primitives are concentrated in diff_engine.py + the template + ADR-028).
PO sign-off on:
- Phase 9 (Validation harness + first comparison run) COMPLETE declaration (sentinel phase-9-validation-ready)
- A69 + A70 dispositions (banked for follow-up; A69 specifically flagged for Greg/Tom consultation)
- A67 closure
- Architect recommendation that the next phase is either (a) A69 root-cause investigation + customer-DB sweep planning, or (b) Phase 8.5 (ecosystem auth + Superset embedding) per the parallel-track plan
PO push of the atomic commits (Architect commits; PO controls git push).
Post-push staging verification (Collaborator + PO):
- GHA workflow runs (build-api job for the sentinel bump)
- curl -s https://pssaas.staging.powerseller.com/api/powerfill/status → expect {"module":"PowerFill","status":"phase-9-validation-ready"}
- Harness's first-run output report renders cleanly in Docusaurus at https://pssaas.staging.powerseller.com/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run
Greg-demo dry-run with the harness's first-run report as the load-bearing slot-in addition to the existing "Bug as Feature" narrative.

Notes on this session's process

Three-layer Primary-Source Verification Gate exercised; produced 6 findings (F-9-1 through F-9-6); F-9-1 closed in-session via A67; F-9-2 + F-9-4 already addressed at planning; F-9-3 encoded in verdict logic; F-9-5 + F-9-6 banked as A69 + A70.
Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself. The kickoff's specificity at the §"Live infrastructure note" + §"Inherited context" level reduced Backlog-rooted Truth Rot to 0 for this session. Banking observation: the canonical-promotion candidate is now 4-instance corroborated (3 traditional + 1 "0 findings = pattern works"; see Counterfactual Retro item 6).
Reviewable Chunks at sub-checkpoint scope EXERCISED 3x (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 Andon pulls produced material PO decisions.
Required Delegation Categories classification: 0 subagents dispatched. Deliberate Non-Delegation per practice #9; mirrors the Phase 8 W2 pattern. Banking: 2-instance corroboration of the "contract-per-artifact density high → self-implement" heuristic.
Practice #13 Environment-Explicit Inventory ACTIVELY APPLIED at multiple levels: the Capability × Environment matrix in this report, the matrix in the harness's first-run output, the matrix shape template baked into comparison_report.md.j2 for all future harness runs. Banking: practice #13 is now load-bearing in the harness itself, not just in completion reports.
Andon-cord readiness used 3 times during the session, all producing material outcome improvements (avoided Frame A Capability Inflation; avoided Option K over-engineering; avoided shipping misleading verdict report).
Counterfactual Retro filled with 8 named observations — most important: (1) Frame D framing pre-plan was THE load-bearing decision; (2) verdict-logic asymmetric-failure self-test gap was real and got caught only via the live first-run; (3) Phase 9 deliverables should be evaluated by what they SURFACE, not by all-green rate.
Deploy Verification Gate all 3 arms exercised: (a) sentinel code change verified post-edit (runtime probe deferred); (b) self-tests PASS + harness end-to-end runs deterministic; (c) rendered report
- clickable staging URLs verified.
Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + 7 + 8 W1 + 8 W2 + A54-fix velocity at ~1.5x complexity. Banking: process discipline overhead has positive ROI even on empirical-discovery sessions.

Phase 9 is code complete; the harness's first comparison run surfaced A69 (the canonical first instance of the harness as a defect-surfacing tool) + A70 (Frame D framing refinement); A67 closed; A62 deferred to a new Backlog row; sentinel bumped to phase-9-validation-ready. The Greg-demo asset is the rendered first-run report at 2026-04-20-powerfill-phase-9-first-validation-run. Phase 9's PO milestone — "I have empirical loan-by-loan evidence that PSSaaS PowerFill matches the legacy Desktop App on PS_DemoData" — is empirically achievable in the more honest Frame D Hybrid form (orchestration-equivalence on PS_DemoData with explicit "NOT MEASURABLE HERE" cells for the customer-DB question), pending A69 root-cause resolution and customer-rep approval for the operator-driven sweep.

End of Phase 9 completion report. The Architect commits; the PO pushes.

TL;DR​

What was produced​

New Python harness tree (10 source files, ~1,500 LOC)​

Modified backend files​

New documentation​

Out of scope (deliberately not produced)​

Decisions made​

Migrations enumerated​

Gate findings​

Three-layer Primary-Source Verification Gate (with Backlog re-read pass)​

Alternatives-First Gate​

Required Delegation Categories​

Reviewable Chunks at intra-session scope​

Deploy Verification Gate — 3 arms​

Environment-Explicit Inventory (per canonical practice #13)​

Counterfactual Retro​

PoC verification commands and outputs​

Diff engine self-test (7/7 PASS including Case 7 regression)​

Report renderer self-test (12/12 substrings PASS)​

Pre-flight DB connectivity​

First end-to-end harness run (3 consecutive; deterministic A69 reproduction)​

A69 diagnosis (3 scripts; documented in the assumptions log)​

Open questions and blockers​

Carry-over to next Phase 9 follow-up session (or Greg-demo consultation)​

Architect recommendations for the next session​

Optional follow-up (deferred)​

Recommended next steps​

Notes on this session's process​