Skip to main content

PowerFill Phase 9 — Completion Report (Parallel Validation Harness)

Author: PSSaaS Systems Architect Date: 2026-04-20 Status: Code complete; sentinel phase-9-validation-ready ✓; harness scaffold + first end-to-end comparison run + ADR-028 + A67 closure + A69/A70 banked + spec amendment + this completion report all written. Pending Collaborator review and PO push. Sentinel: phase-9-validation-ready ✓ (drops the -a54-fixed sub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical, not gating) Companion docs:


TL;DR

Phase 9 ships the Parallel Validation Harness (the "lever" per the kickoff's framing) at tools/parallel-validation/, exercises it end-to-end against PS_DemoData, and produces the first comparison run's verdict report as the Greg-demo "loan-by-loan parity proof" addition.

Per the PO-confirmed Frame D Hybrid framing, the harness's first comparison run on PS_DemoData proves orchestration parity (PSSaaS's C# orchestration produces row-equivalent outputs to direct sqlcmd EXEC of the same procs against the same DB) WITHOUT inflating to "loan-by-loan parity vs the legacy unmodified Desktop App proc body" (which is NOT MEASURABLE on PS_DemoData per ADR-021 §Narrow Bug-Fix Carve-Out's forward-only nature; the legacy unmodified body deterministically Fails on this snapshot — the canonical "Bug as Feature" demo signal).

The first comparison run surfaced A69 — a state-dependent UE failure on non-empty post-pool_guide state on PS_DemoData. This is exactly the class of finding the Phase 9 harness was built to surface. The harness earned its Phase 9 charter on its very first run. A69 is banked for Greg/Tom consultation; the verdict logic was hardened post-first-run to honor the asymmetric-failure case via a new RowVerdict.INCOMPARABLE classification (Capability Inflation countermeasure verified empirically).

A second NEW assumption — A70 — was banked documenting the Frame D framing refinement: PS_DemoData has a mixed PSSaaS-deployed + legacy-encrypted proc-body state, so Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB, both invocation paths execute against the same set." The Capability × Environment matrix already encoded the right level of honesty.

Phase 9 carry-over Workstream 2 closures: A67 closed (the 8-line XML doc-comment fix in ReportContracts.cs matching actual route registrations); A62 deferred to a new Backlog row per Q2 confirmed scope; canonical-promotion proposal for the Backlog re-read pass banked in §Counterfactual Retro for Collaborator nomination (4-instance corroboration check pending).

Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity. Banking the pattern: a complete harness deliverable (Python scaffold + first end-to-end run + ADR + completion report + carry-over closures) is a 1-session unit at this scale when the PO confirms framing pre-plan and the architectural primitives are self-implemented.


What was produced

New Python harness tree (10 source files, ~1,500 LOC)

PathPurpose
tools/parallel-validation/README.mdReproducibility instructions + Frame D framing + A66 + dependency matrix
tools/parallel-validation/requirements.txtPinned deps (pyodbc==5.1.0, requests==2.32.3, PyYAML==6.0.2, Jinja2==3.1.4)
tools/parallel-validation/harness_config.yamlDeclarative config: PSSaaS API + sqlcmd connection params + tolerance bands + pssaas_ui_base_url indirection per Backlog #30
tools/parallel-validation/install_venv.shOne-shot WSL Ubuntu venv setup script
tools/parallel-validation/.envGitignored — PFILL_SQL_PASSWORD (sidesteps shell-quoting hell with $ in passwords)
tools/parallel-validation/harness.pyMain entry point + CLI + .env loader + sentinel-aware logging
tools/parallel-validation/snapshot.pyInput snapshot capture (reads pfill_run_history + computes state hash)
tools/parallel-validation/pssaas_invoker.pyPOST /run + poll terminal status + read 8 Phase 7 reports (codes against ACTUAL routes per A67 closure)
tools/parallel-validation/sqlcmd_invoker.pypyodbc EXECs of 5-step proc sequence (skips C#-side candidate_builder; conset rebuilds the table itself) + reads same 8 sources via direct SELECT
tools/parallel-validation/diff_engine.pyUmbrella + per-report row comparators + A66-aware verdict logic + RowVerdict.INCOMPARABLE for asymmetric-failure case + 7-case self-test
tools/parallel-validation/report_renderer.pyJinja2 → Markdown; staging-URL composition via pssaas_ui_base_url + 1-case self-test
tools/parallel-validation/templates/comparison_report.md.j2The Frame D Capability × Environment matrix template + per-section breakdown + Phase 9 finding A69 surfacing block
tools/parallel-validation/check_db.pyDev-env pre-flight (one-shot pyodbc + pfill_run_history connectivity check)
tools/parallel-validation/diagnose_a69.pyA69 root-cause diagnostic: column existence + view shape + proc encryption state
tools/parallel-validation/diagnose_a69_v2.pyA69 v2 diagnostic: v_loan_loan_shipped + loan_shipped column verification (both have note_rate)
tools/parallel-validation/diagnose_a69_v3.pyA69 v3 diagnostic: UE-in-isolation success against post-rebuild-empty state

Modified backend files

  • src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs — A67 closure: 8 XML doc-comment paths re-aligned from /api/powerfill/runs/{run_id}/reports/<name> to /api/powerfill/runs/{run_id}/<name> matching actual RunEndpoints.cs route registrations.
  • src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs — sentinel bumped from phase-8-superset-react-ready-a54-fixed to phase-9-validation-ready (drops the -a54-fixed sub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical).

New documentation

  • docs-site/docs/adr/adr-028-phase-9-parallel-validation-harness-design.md — full ADR documenting D-9-0 (Frame D Hybrid) + D-9-1 (Option A invocation) + D-9-2 (Python in WSL) + D-9-3 (Markdown output) + D-9-4 (Option L runtime).
  • docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md — first comparison run output (auto-rendered by the harness; committed verbatim as the Greg-demo asset).
  • docs-site/docs/devlog/2026-04-20-powerfill-phase-9.md — devlog entry per template-7 format.
  • docs-site/docs/handoffs/powerfill-phase-9-completion.md — this completion report.
  • docs-site/docs/specs/powerfill-engine.md — Phase 9 row in Phased Implementation table marks "Validation harness DONE + first comparison run DONE"; new Phase 10 row for production cutover.
  • docs-site/docs/specs/powerfill-assumptions-log.md — A69 (state-dependent UE failure on non-empty post-pool_guide state) + A70 (Frame D framing refinement; mixed PSSaaS/legacy proc-body state on PS_DemoData).
  • docs-site/docs/arc42/09-architecture-decisions.md — ADR-028 row added (Phase 9 harness; renumbered from initial ADR-027 due to a parallel Collaborator-authored ADR-027 landing in commit ece500e during the Phase 9 dispatch window — the canonical "ADRs are numbered sequentially and never renumbered" rule applies; first-committed wins). ADR-027 (Superset Embedding Strategy, Collaborator-authored) row preserved unchanged.
  • docs-site/docs/handoffs/pssaas-session-handoff.md — checkpoint banner + numbered list entry + Backlog table updates.

Out of scope (deliberately not produced)

  • A62 closure — deferred to its own Backlog row per Q2 confirmed scope (PS_DemoData view drift; pfillv_existng_pool_disposition 14-col legacy version vs PSSaaS's 15-col note_rate-bearing definition). The harness's first run surfaced A62 via the Existing-Disposition endpoint's catch-and-degrade Note as expected; no new defect introduced.
  • A69 root-cause investigation — banked for follow-up session; this session's deliverable is the harness + the empirical surfacing of A69 + the verdict-logic hardening to honor the finding without inflation.
  • Multi-customer-DB sweep — operator-driven post-Phase-9 per kickoff §"Explicit scope (OUT)". The harness as built is the lever; the pull is operator + customer-rep work.
  • Production cutover — Phase 10+ per kickoff. Spec line 651's "cutover" wording is now superseded.
  • Phase 8.5 OIDC integrationpssaas_invoker signatures already accept an optional bearer-token shape; wiring is a future commit when Backlog #31 lands.
  • Containerized harness (Option K) — kept as a future-extension per ADR-028 §Future Considerations.
  • Snapshot replay (Option C) — kept as a future-extension per ADR-028 §Future Considerations.
  • HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
  • Canonical-promotion proposal for the Backlog re-read pass — banked in §Counterfactual Retro for Collaborator nomination (NOT Architect-shipped this session).

Decisions made

#DecisionRationaleWhere
D-9-0Frame D Hybrid (orchestration parity on PS_DemoData; Capability × Environment matrix carries explicit "NOT MEASURABLE HERE" cells for legacy-vs-fixed-body parity + Customer DB column)Per Andon-cord pre-plan exchange; PO-confirmed. Capability Inflation countermeasure for the Greg-demo asset.ADR-028 §D-9-0; this report TL;DR; comparison_report.md.j2
D-9-1Option A — Direct sqlcmd EXEC of the 5-proc legacy-equivalent sequenceLowest friction; matches ADR-021 verbatim-port discipline; Architect's existing pyodbc fluencyADR-028 §D-9-1; sqlcmd_invoker.py
D-9-2Python 3.10+ in WSL Ubuntu (pyodbc + requests + PyYAML + Jinja2)Kickoff §"Tooling" recommendation; Architect fluency; ergonomic stack for the harness's 4 concernsADR-028 §D-9-2; requirements.txt
D-9-3Markdown output via Jinja2 template; HTML/JSON deferredSlots into the existing Greg-demo narrative format; renders in Docusaurus; PO can paste sections into the deckADR-028 §D-9-3; report_renderer.py + comparison_report.md.j2
D-9-4Option L Local-only runtime (WSL + local pssaas-api + PS_DemoData public endpoint)PO-confirmed; smallest testable increment; demo asset's content references staging via config indirection so the harness binary's location doesn't constrain demo surfaceADR-028 §D-9-4; harness_config.yaml
D-9-5A69 honesty post-first-run: added RowVerdict.INCOMPARABLE + suppressed A66 Match classification when EITHER side FailedCapability Inflation countermeasure; Andon-cord response per kickoff §"Reporting protocol" + PO's fix_and_continue confirmationdiff_engine.py + Case 7 self-test + this report's §"Counterfactual Retro"
D-9-6Option harness_plus_a67 carry-over scope: A67 closes this session; A62 deferred; canonical-promotion proposal banked for CollaboratorKickoff anticipated A67 as the harness's natural forcing function (trivial 8-line edit); A62 closure has its own Verification-Gate surface; canonical-promotion goes through PSSaaS Collaborator nominationThis report; pssaas-session-handoff Backlog updates

Migrations enumerated

This phase ships 0 new schema migrations + 0 new SQL artifacts. The harness is read-only against pfill_run_history for snapshot capture

  • writes via the existing POST /api/powerfill/run HTTP endpoint (which uses the same Phase 6e migration surface). Sqlcmd-direct EXECs go against PSSaaS-deployed proc bodies that already live on PS_DemoData.

PowerFill-owned table count: 23 (unchanged from Phase 6e). PowerFill-owned proc count: 8 (unchanged from Phase 6d/A54-fix).


Gate findings

Three-layer Primary-Source Verification Gate (with Backlog re-read pass)

IDLayerFindingDisposition
F-9-1Spec-vs-implementation (Layer 1; A67 carry-over)ReportContracts.cs lines 9-16 + 8 per-class XML doc-comment paths showed /api/powerfill/runs/{run_id}/reports/<name> segment that doesn't exist in RunEndpoints.cs route registrations.(a) Corrected in place — A67 closure landed in this commit batch. 9 path strings re-aligned + a banking comment block added explaining the Truth Rot pattern + A67 reference.
F-9-2NVO-vs-implementation (Layer 2)ADR-021 §Narrow Bug-Fix Carve-Out is forward-only; A54-fixed 009_*.sql proc body lives on PS_DemoData; both PSSaaS API path AND sqlcmd-direct EXEC of the same procs execute the SAME body. The kickoff's "Desktop App equivalent path" Option A produces orchestration-parity, NOT legacy-vs-fixed-body parity.(b) Scope-changed → Frame D Hybrid (PO-confirmed pre-plan). Capability × Environment matrix in every harness output report encodes the distinction with explicit cells.
F-9-3Implementation-vs-runtime (Layer 3; A66)Per A66, post-Complete state on PS_DemoData has 0 rows in 7-of-8 user-facing reports + Cash Trade Slotting; only Hub dashboard run-history retains observable rows. The harness's verdict logic must classify "both PSSaaS path and sqlcmd path produce 0 rows" as Match, not Divergent.(a) Encoded in harness verdict logic as the A66-aware comparison rule (if both_empty AND pssaas_run_status==Complete AND sqlcmd_run_status==Complete: Match). Self-test Cases 1, 6, 7 cover the load-bearing semantics including the Failed-side regression.
F-9-4Implementation-vs-runtime (Layer 3; A68)A68's tenant-id convention requires harness writes to use tenant_id='ps-demodata' matching the existing 13 historical rows.(a) Encoded in harness invocationX-Tenant-Id: ps-demodata header on all PSSaaS API calls; harness is otherwise read-only against pfill_run_history.
F-9-5 NEWImplementation-vs-runtime (Layer 3; first-run finding)A69psp_powerfillUE fires SqlException 207 'Invalid column name note_rate' when invoked via direct sqlcmd EXEC against PS_DemoData with non-empty post-pool_guide state. Not reproducible when UE invoked in isolation against post-PSSaaS-rebuild-empty state.(b) Banked as A69 — first-run finding worth Greg/Tom consultation; harness's verdict logic hardened post-first-run to honor the asymmetric-failure case via RowVerdict.INCOMPARABLE.
F-9-6 NEWSpec-vs-implementation (Layer 1; first-run finding)A70 — PS_DemoData's psp_powerfillue + psp_powerfill_conset + 5 other procs are deployed WITH ENCRYPTION, indistinguishable from the legacy versions via sys.objects. Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB."(b) Banked as A70 — refines Frame D framing without invalidating it; matrix already encodes the right level of honesty. Production-cutover playbook (Phase 10+) needs explicit A70 disposition.

Pattern observation: the Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself this session (Phase 7 F-7-7 anticipated; Phase 8 W1 F-8-BR-1 caught at planning; Phase 8 W2 F-W2-BR-3 + F-W2-CONTRACT-1 caught at planning; Phase 9 = no Backlog-rooted finding, only the harness's runtime-discovery findings F-9-5 + F-9-6). The 4-instance-corroboration check is therefore 3-instance corroborated + 1 instance where the Backlog was already fully addressed at planning time — pattern remains canonical-adoption- ready per Phase 8 W2 completion report's banking, with the qualifier that "0 net-new findings" is itself a positive signal (the kickoff did its job at the §"Live infrastructure note" + §"Inherited context" specificity level). Banking proposal for the next process-discipline revision: the canonical wording should accommodate the "0 findings is a pass not a fail" semantic.

Alternatives-First Gate

5 architectural decisions surfaced + documented per ADR-028 (3 from the kickoff's explicit list + Frame D framing decision + Option L runtime decision). All exercised pre-plan via the Architect-PO Andon-cord exchange + the PO's confirmation of recommendations.

Required Delegation Categories

0 subagents dispatched this session. Deliberate Non-Delegation per practice #9, mirroring the Phase 8 W2 pattern:

Deliberate Non-Delegation: Architectural primitives + load-bearing verdict logic
Task: harness.py + snapshot.py + pssaas_invoker.py + sqlcmd_invoker.py
+ diff_engine.py + report_renderer.py + templates/comparison_report.md.j2
+ check_db.py + diagnose_a69*.py + ADR-028 + completion report + devlog
+ spec amendment + assumption log A69 + A70 + sentinel bump + A67 closure
Reason for self-implementation: Frame D framing + A66/A69-aware verdict logic
+ Capability x Environment matrix verbiage are load-bearing for the Greg
demo's honesty (Capability Inflation countermeasure). The architectural-
contract-per-artifact cost (especially the matrix's "NOT MEASURABLE HERE"
cell semantics for the legacy-vs-fixed-body row + the A69 surfacing) far
exceeds the artifact-write cost. The first-run finding (A69) required
same-author context to size the disposition correctly; a delegated subagent
without prior PoC access could not have made the call.
Context that would be lost in handoff: Frame D rationale, Option L rationale,
A66 distinction, A69 empirical-resolution arc (3 diagnose scripts +
hypothesis space refinement), A70 framing-refinement, A67 forcing-function
context, the staging-URL config indirection per Backlog #30, the post-
first-run verdict-logic hardening (RowVerdict.INCOMPARABLE).

Banking observation: Phase 8 W2's 0-subagent outcome is now 2-instance corroborated (Phase 8 W2 + Phase 9). The "contract-per-artifact density high → self-implement" heuristic is empirically validated for sessions that ship architecturally load-bearing surfaces.

Reviewable Chunks at intra-session scope

3 Andon-cord checkpoints exercised:

  1. Pre-plan (Frame D framing) — Architect surfaced the "Option A is comparing the SAME fixed proc body" concern; PO confirmed Frame D Hybrid as the resolution before any code.
  2. Mid-implementation (Q1 Q2) — Architect surfaced runtime-location (Option L) + carry-over-scope (harness_plus_a67) decisions; PO confirmed both via AskQuestion.
  3. Post-first-run (A69 surfacing) — harness produced asymmetric- failure outcome; Architect surfaced A69 + verdict-logic gap; PO confirmed fix_and_continue disposition; verdict logic hardened
    • re-run produced honest Markdown output.

Pattern observation: Reviewable Chunks at sub-checkpoint boundaries (not just workstream boundaries) IS the right granularity for empirical- discovery sessions. The 3 Andon pulls were the load-bearing structure of this session; without them I would have shipped Frame A (Capability Inflation), or Option K (over-engineered runtime), or the misleading "0 Divergent / A66 Match" verdict report.

Deploy Verification Gate — 3 arms

ArmDescriptionEvidence
(a) Sentinel signalPowerFillModule.cs MapGet("/status") returns phase-9-validation-readyOne-line change verified post-edit; runtime probe deferred to Collaborator-side post-push (curl -s http://pssaas.staging.powerseller.com/api/powerfill/status).
(b) Harness self-tests + integrationdiff_engine.py self-test 7/7 PASS (including new Case 7 asymmetric-Failed regression); report_renderer.py self-test 12/12 substring matches PASS; harness end-to-end run 3 consecutive PASSes producing identical A69 finding (deterministic)See §"PoC verification commands and outputs" below.
(c) End-to-end click-throughFirst comparison run produces verdict report at docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md; report's clickable run-status URL points at staging React UI (live demo surface)Verified via Read of the rendered output; staging URL composition validated via renderer self-test.

Environment-Explicit Inventory (per canonical practice #13)

This is the load-bearing Phase 9 application of practice #13. Adapted from the Phase 8 W2 completion report's matrix shape, with rows adjusted for the per-customer-DB cell semantics the kickoff specified.

CapabilityArchitect's local WSL + local pssaas-api + PS_DemoData (public endpoint)Staging React UI + staging API + PS_DemoData (private endpoint)Customer DB (PS608 / future tenants)
Harness scaffold builds + runs (python harness.py)Verified ✓ this session (3 consecutive runs; identical A69 reproduction)NOT MEASURED HERE — Option L is local-onlyNOT MEASURED HERE — operator-driven post-Phase-9
Diff engine self-test 7/7 PASSVerified ✓ this sessionN/AN/A
Report renderer self-test 12/12 PASSVerified ✓ this sessionN/AN/A
First end-to-end comparison run produces verdict report MarkdownVerified ✓ this sessionNOT MEASURED HERENOT MEASURED HERE
Verdict logic correctly suppresses A66 Match when sqlcmd Failed (asymmetric-failure case)Verified ✓ this session (Case 7 self-test + empirical first-run output)NOT MEASURED HERENOT MEASURED HERE
Orchestration equivalence on PS_DemoData (PSSaaS API path vs sqlcmd-direct path against same proc body set)Phase 9 Andon — A69 surfaced; orchestration-equivalence claim NOT VERIFIED on this run because sqlcmd-direct path Failed mid-EXEC. The harness's STOP-and-surface protocol fired correctly. Next-run verification depends on A69 root-cause + fix.NOT MEASURED HERENOT MEASURED HERE
A66 expected-empty Complete-run case classified as MatchVerified ✓ at unit-test level (Case 1); NOT EXERCISED at runtime level on this run (A69 short-circuited the A66 path). Next harness run on a state where both sides reach Complete will be the runtime-level verification.NOT MEASURED HERENOT MEASURED HERE
A62 PS_DemoData view drift catch-and-degrade observable from harnessVerified ✓ this session (Note text in Existing-Disposition response surfaced in PSSaaS run output)NOT MEASURED HERENOT MEASURED HERE
Legacy unmodified proc body vs PSSaaS-fixed proc body parityNOT MEASURABLE HERE — A54 fix is forward-only and deployed to PS_DemoData; legacy unmodified body deterministically Fails per A54+A56NOT MEASURABLE HERE — same DB, same proc bodyNOT MEASURED HERE pending customer-rep approval — operator-driven post-Phase-9 sweep against customer DBs without A54 triggers
Whether PSSaaS-deployed-with-encryption procs on PS_DemoData are PSSaaS's port vs the legacy version (A70)NOT DISTINGUISHABLE via sys.objects for the 7 WITH ENCRYPTION procs; only psp_powerfill_pool_guide (plain) is definitely PSSaaS-deployed. Behavior diff per proc would resolve.NOT MEASURED HERENOT MEASURED HERE
Sentinel phase-9-validation-ready returned by APINOT YET MEASURED (sentinel bumped in code; runtime probe is post-restart of local pssaas-api)NOT MEASURED HERE — Collaborator-side post-PO-pushNOT MEASURED HERE
Staging React UI run-status page click-through validates harness-cited run_idsNOT MEASURED HERE in harness sessionVerified ✓ via post-W2-deploy banked verification + first-run report's clickable URLsNOT MEASURED HERE
Harness invocable from non-Architect machine (Collaborator-side reproducibility)NOT MEASURED HERE this session — README.md documents the install + invocation pattern; Collaborator runs docker compose --profile dev up + python harness.py to verifyN/AN/A
Pre-push docs-build check passesNOT YET MEASURED — pre-commit stepN/AN/A

Net assessment: the Phase 9 deliverable is code-complete at the artifact level. Runtime end-to-end verification at staging + customer- DB scope is the post-Phase-9 operator-driven sweep. The Capability × Environment matrix is deliberately granular about what's verified WHERE to prevent Capability Inflation (per the canonical antipattern's W2-banked example).

Counterfactual Retro

Knowing what I know now, what would I do differently?

  1. The Frame D framing surfacing pre-plan was the load-bearing decision of this session. Without it, the harness would have shipped a "loan-by-loan parity vs Desktop App" claim that's a Capability Inflation in the canonical sense. Banking the pattern: for any harness/measurement deliverable, surface the Frame D-style "what does this prove vs what does it NOT prove" question PRE-PLAN.
  2. The verdict-logic hardening post-first-run is the second load-bearing moment. The diff engine's self-test had Case 6 (Failed-PSSaaS) but not Case 7 (Asymmetric-Failed: Complete-PSSaaS + Failed-sqlcmd) — the gap that allowed the misleading "0 Divergent + 8 A66 Match" first-run output. Banking observation: when the verdict semantics has multi-input symmetry (like Match-vs-Divergent-vs-A66 across two sides), the self-test must cover the asymmetric cases too.
  3. The harness surfaced A69 on its very first run. This is the strongest possible validation that Phase 9's investment was correctly scoped — the harness is a defect-surfacing tool, not just a happy-path checker. Banking observation: Phase 9 deliverables should be evaluated by what FINDINGS they surface, not by their "all green" rate. A69 IS the demo asset's load-bearing slide.
  4. The Andon-cord protocol fired 3x this session (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 produced concrete PO decisions that materially improved the deliverable. Banking: for empirical-discovery sessions, sub- checkpoint Andon pulls are the right granularity, not workstream- boundary-only.
  5. The pyodbc dev-environment install was the largest tooling friction of the session (sudo-prompt invisibility; PowerShell-to-WSL-to-bash quoting hell; venv-on-mounted-drive ensurepip failure). The tools/parallel-validation/.env pattern + the ~/.venvs/pfill-harness WSL-native venv path resolved both. Banking observation: the kickoff's §"Tooling" line 180 anticipated the pip-install but didn't anticipate the OS-level msodbcsql18 install requires sudo and a Microsoft apt repo setup. Future kickoffs for harnesses should include OS-package prereqs in §"Tooling", not just pip-package prereqs.
  6. The 4-instance-corroboration check on the Backlog re-read pass is actually 3-instance with a "0 findings = pattern works" 4th data point. Banking proposal for the next process-discipline.md revision (NOT shipped this session — Collaborator nomination per Q2 confirmed scope): the canonical wording should accommodate the "kickoff specificity reaches the point where Backlog re-read produces no net-new findings" semantic as the goal-state, not the miss-state.
  7. The A70 finding (PS_DemoData has mixed PSSaaS/legacy proc-body state) was an unexpected sub-finding. It doesn't invalidate Frame D but it does refine what Frame D measures. Banking: for any kickoff that mentions a "SAME thing on both sides" claim, the primary-source verification gate should explicitly probe whether the "thing" is uniformly what the kickoff thinks it is.
  8. Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity at ~1.5x complexity (harness + first run + 3 Andon iterations + 2 NEW assumptions + carry-over closure + framing-refinement bank). The PoC-iteration overhead (3 diagnose scripts + 3 harness runs) added ~30min total. Banking: process discipline overhead has positive ROI even on harness-style empirical-discovery sessions.

PoC verification commands and outputs

Diff engine self-test (7/7 PASS including Case 7 regression)

$ python diff_engine.py
Running DiffEngine self-tests...
[pass] A66 expected-empty case: 8/8 reports correctly classified Match
[pass] Identical-data case: 2/2 rows correctly classified Match
[pass] Within-tolerance price diff: 1/1 row correctly classified TolerableDiff
[pass] Beyond-tolerance price diff: correctly classified Divergent on column ['price']
[pass] Missing-row case: PSSaaS missing L2 correctly classified Divergent
[pass] Failed-run case: correctly NOT applying A66 (a66_classified_count=0; only Complete triggers A66)
[pass] Asymmetric Failed case: all 8 reports correctly classified Incomparable (NOT A66 Match); failure_message threaded through

All diff-engine self-tests PASSED.

Report renderer self-test (12/12 substrings PASS)

$ python report_renderer.py
Running ReportRenderer self-tests...
[pass] Found: PowerFill Phase 9
[pass] Found: first validation run
[pass] Found: Frame D Hybrid
[pass] Found: TL;DR
[pass] Found: Run verdict
[pass] Found: what this proves AND what it does NOT prove
[pass] Found: Capability x Environment matrix
[pass] Found: NOT MEASURABLE HERE
[pass] Found: https://pssaas.staging.powerseller.com/app/runs/aabbccdd-112...
[pass] Found: Per-report breakdown
[pass] Found: A66 expected-empty case fired
[pass] Found: Provenance + reproducibility

Rendered output: 13753 chars; renderer self-test PASSED.

Pre-flight DB connectivity

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/check_db.py
connecting...
connected
server: hostedps-sql.086ea791c2f1.database.windows.net
db: PS_DemoData
server_time: 2026-04-20 07:13:42.846028
pfill_run_history rows: 13
recent run: FD6B190E-2B41-4A07-B836-E8576113C7A5 Complete 2026-04-20 05:24:09.064000
recent run: 1CE2B077-AF9D-4969-A348-B535BA265BBD Complete 2026-04-19 22:10:11.688000
recent run: 7C9DFE50-1B7D-471F-832A-D55053D2E7C5 Complete 2026-04-19 20:19:56.845000
ok

First end-to-end harness run (3 consecutive; deterministic A69 reproduction)

Final run (committed report):

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/harness.py \
--config tools/parallel-validation/harness_config.yaml \
--output-path docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md
2026-04-20 02:26:58,366 [ERROR] powerfill_harness.sqlcmd: Sqlcmd-direct path failed: ('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'note_rate'. (207) (SQLExecDirectW)")

$ echo $?
2 # Andon exit code per the harness's success-vs-Andon convention

PSSaaS run id: 9312a638-cd21-4485-b42e-ee8dc02be0d0 — Complete in ~30s, allocated_count=515, pool_guide_count=515, post_ue_*=0 (per A66 rebuild-empty pattern). See the rendered report for the full Capability × Environment matrix + A69 surfacing block.

A69 diagnosis (3 scripts; documented in the assumptions log)

$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/diagnose_a69_v3.py
Pre-EXEC state:
pfill_powerfill_guide: 0 rows
pfill_pool_guide: 0 rows
pfill_trade_base: 0 rows
pfill_loan2trade_candy_level_01: 0 rows
pfill_powerfill_log: 12 rows
pfill_syn_powerfill_guide_all_rank: 0 rows

EXEC psp_powerfillue (mirroring PSSaaS resolved options)...
UE completed successfully

UE-in-isolation against post-rebuild-empty state succeeds. UE in the full sqlcmd EXEC sequence (post-conset+pool_guide writes 515 rows) fails with 207. State-dependence confirmed. See A69 in the assumptions log for the full hypothesis space.


Open questions and blockers

Carry-over to next Phase 9 follow-up session (or Greg-demo consultation)

  • A69 root-cause investigation — Phase 9 follow-up session focused on line-level UE archaeology via PRINT-instrumented copy of 011_*.sql
    • RAISERROR-on-each-step diagnostic + capture of pfill_run_history.response_json for the PSSaaS run that ostensibly Completed. Resolves the "PSSaaS silently swallows" vs "different code path" hypothesis space.
  • A70 disposition for production cutover — Phase 10+ playbook needs explicit "drop legacy procs + redeploy from PSSaaS *.sql" step before first PSSaaS run on a customer DB, OR behavior-diff confirmation that legacy + PSSaaS-deployed-with-encryption are equivalent.
  • A62 closure — deferred per Q2 confirmed scope. New Backlog row for the pfillv_existng_pool_disposition deploy decision (rename PSSaaS view to pfillv2_* OR overwrite encrypted legacy view OR behavior-diff first).

Architect recommendations for the next session

  1. Run the harness against the same input snapshot AFTER A69 root-cause resolution to verify orchestration parity at the runtime level (currently verified at the unit-test level only; the asymmetric- failure short-circuit prevented runtime-level A66 verification this session).
  2. Bank the A70 behavior-diff resolution method as a Phase 10 playbook step — onboarding a new customer DB always begins with the proc-body deploy verification.
  3. Phase 8.5 should bake the OIDC bearer-token shape into the harness when wiring lands — the function signatures already accept the parameter; the wiring is mechanical.
  4. Customer-rep conversation to schedule the first operator-driven harness run — the harness as built is the lever; the pull is PO + customer-rep work that the kickoff defers.

Optional follow-up (deferred)

  • Snapshot replay (Option C) — captures pre-run DB state + replays after each comparison; enables true side-by-side comparisons of two harness runs against the same input.
  • Containerized harness (Option K) — defer until evidence emerges that local-only invocation doesn't generalize to a customer-DB scenario.
  • HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
  • Canonical-promotion proposal for the Backlog re-read pass — Collaborator nomination per the established practice; this Architect session bank-reports the 4-instance corroboration evidence (3 traditional
    • 1 "0 findings = pattern works" data point per Counterfactual Retro item 6) without shipping the canonical revision.

  1. Pre-push docs-build check — MANDATORY per Phase 6e/7/8-W1/8-W2 banked discipline + this session ships 5+ new docs-site/docs/** files. Command: docker build -f docs-site/Dockerfile.prod docs-site. Architect performs before commit.
  2. Collaborator review of:
    • tools/parallel-validation/ source tree (review focus: Frame D verdict semantics in diff_engine.py + the A66/A69-aware compare() branch + the Capability × Environment matrix verbiage in templates/comparison_report.md.j2)
    • src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs (review focus: A67 closure — 9 path strings re-aligned + banking comment)
    • src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs (1-line sentinel bump)
    • ADR-028
    • docs-site/docs/specs/powerfill-engine.md Phase 9 row + new Phase 10 row
    • docs-site/docs/specs/powerfill-assumptions-log.md A69 + A70
    • docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md (the Greg-demo asset — review focus: honest framing of what's proven vs NOT MEASURABLE HERE)
    • This completion report
    • Devlog entry
    • Session handoff bump
    • Estimated 1-2 hours (Phase 9 ships ~10 source files in the harness + several docs; the architectural primitives are concentrated in diff_engine.py + the template + ADR-028).
  3. PO sign-off on:
    • Phase 9 (Validation harness + first comparison run) COMPLETE declaration (sentinel phase-9-validation-ready)
    • A69 + A70 dispositions (banked for follow-up; A69 specifically flagged for Greg/Tom consultation)
    • A67 closure
    • Architect recommendation that the next phase is either (a) A69 root-cause investigation + customer-DB sweep planning, or (b) Phase 8.5 (ecosystem auth + Superset embedding) per the parallel-track plan
  4. PO push of the atomic commits (Architect commits; PO controls git push).
  5. Post-push staging verification (Collaborator + PO):
    • GHA workflow runs (build-api job for the sentinel bump)
    • curl -s https://pssaas.staging.powerseller.com/api/powerfill/status → expect {"module":"PowerFill","status":"phase-9-validation-ready"}
    • Harness's first-run output report renders cleanly in Docusaurus at https://pssaas.staging.powerseller.com/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run
  6. Greg-demo dry-run with the harness's first-run report as the load-bearing slot-in addition to the existing "Bug as Feature" narrative.

Notes on this session's process

  • Three-layer Primary-Source Verification Gate exercised; produced 6 findings (F-9-1 through F-9-6); F-9-1 closed in-session via A67; F-9-2 + F-9-4 already addressed at planning; F-9-3 encoded in verdict logic; F-9-5 + F-9-6 banked as A69 + A70.
  • Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself. The kickoff's specificity at the §"Live infrastructure note" + §"Inherited context" level reduced Backlog-rooted Truth Rot to 0 for this session. Banking observation: the canonical-promotion candidate is now 4-instance corroborated (3 traditional + 1 "0 findings = pattern works"; see Counterfactual Retro item 6).
  • Reviewable Chunks at sub-checkpoint scope EXERCISED 3x (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 Andon pulls produced material PO decisions.
  • Required Delegation Categories classification: 0 subagents dispatched. Deliberate Non-Delegation per practice #9; mirrors the Phase 8 W2 pattern. Banking: 2-instance corroboration of the "contract-per-artifact density high → self-implement" heuristic.
  • Practice #13 Environment-Explicit Inventory ACTIVELY APPLIED at multiple levels: the Capability × Environment matrix in this report, the matrix in the harness's first-run output, the matrix shape template baked into comparison_report.md.j2 for all future harness runs. Banking: practice #13 is now load-bearing in the harness itself, not just in completion reports.
  • Andon-cord readiness used 3 times during the session, all producing material outcome improvements (avoided Frame A Capability Inflation; avoided Option K over-engineering; avoided shipping misleading verdict report).
  • Counterfactual Retro filled with 8 named observations — most important: (1) Frame D framing pre-plan was THE load-bearing decision; (2) verdict-logic asymmetric-failure self-test gap was real and got caught only via the live first-run; (3) Phase 9 deliverables should be evaluated by what they SURFACE, not by all-green rate.
  • Deploy Verification Gate all 3 arms exercised: (a) sentinel code change verified post-edit (runtime probe deferred); (b) self-tests PASS + harness end-to-end runs deterministic; (c) rendered report
    • clickable staging URLs verified.
  • Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + 7 + 8 W1 + 8 W2 + A54-fix velocity at ~1.5x complexity. Banking: process discipline overhead has positive ROI even on empirical-discovery sessions.

Phase 9 is code complete; the harness's first comparison run surfaced A69 (the canonical first instance of the harness as a defect-surfacing tool) + A70 (Frame D framing refinement); A67 closed; A62 deferred to a new Backlog row; sentinel bumped to phase-9-validation-ready. The Greg-demo asset is the rendered first-run report at 2026-04-20-powerfill-phase-9-first-validation-run. Phase 9's PO milestone — "I have empirical loan-by-loan evidence that PSSaaS PowerFill matches the legacy Desktop App on PS_DemoData" — is empirically achievable in the more honest Frame D Hybrid form (orchestration-equivalence on PS_DemoData with explicit "NOT MEASURABLE HERE" cells for the customer-DB question), pending A69 root-cause resolution and customer-rep approval for the operator-driven sweep.


End of Phase 9 completion report. The Architect commits; the PO pushes.