PowerFill Phase 9 — Completion Report (Parallel Validation Harness)
Author: PSSaaS Systems Architect
Date: 2026-04-20
Status: Code complete; sentinel phase-9-validation-ready ✓; harness scaffold + first end-to-end comparison run + ADR-028 + A67 closure + A69/A70 banked + spec amendment + this completion report all written. Pending Collaborator review and PO push.
Sentinel: phase-9-validation-ready ✓ (drops the -a54-fixed sub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical, not gating)
Companion docs:
- Phase 9 kickoff:
powerfill-phase-9-kickoff - Phase 9 first comparison run output:
2026-04-20-powerfill-phase-9-first-validation-run - ADR-028 (Phase 9 Parallel Validation Harness Design):
adr-028-phase-9-parallel-validation-harness-design - A67 (closed this session):
powerfill-assumptions-log §A67 - A69 (NEW this session — Phase 9 first-run finding):
powerfill-assumptions-log §A69 - A70 (NEW this session — Frame D framing refinement):
powerfill-assumptions-log §A70 - A54-fix Greg-demo readiness narrative the harness output slots into:
powerfill-a54-fix-greg-demo-readiness
TL;DR
Phase 9 ships the Parallel Validation Harness (the "lever" per the
kickoff's framing) at tools/parallel-validation/, exercises it
end-to-end against PS_DemoData, and produces the first comparison run's
verdict report as the Greg-demo "loan-by-loan parity proof" addition.
Per the PO-confirmed Frame D Hybrid framing, the harness's first comparison run on PS_DemoData proves orchestration parity (PSSaaS's C# orchestration produces row-equivalent outputs to direct sqlcmd EXEC of the same procs against the same DB) WITHOUT inflating to "loan-by-loan parity vs the legacy unmodified Desktop App proc body" (which is NOT MEASURABLE on PS_DemoData per ADR-021 §Narrow Bug-Fix Carve-Out's forward-only nature; the legacy unmodified body deterministically Fails on this snapshot — the canonical "Bug as Feature" demo signal).
The first comparison run surfaced A69 — a state-dependent UE failure
on non-empty post-pool_guide state on PS_DemoData. This is exactly the
class of finding the Phase 9 harness was built to surface. The harness
earned its Phase 9 charter on its very first run. A69 is banked for
Greg/Tom consultation; the verdict logic was hardened post-first-run to
honor the asymmetric-failure case via a new RowVerdict.INCOMPARABLE
classification (Capability Inflation countermeasure verified empirically).
A second NEW assumption — A70 — was banked documenting the Frame D framing refinement: PS_DemoData has a mixed PSSaaS-deployed + legacy-encrypted proc-body state, so Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB, both invocation paths execute against the same set." The Capability × Environment matrix already encoded the right level of honesty.
Phase 9 carry-over Workstream 2 closures: A67 closed (the 8-line
XML doc-comment fix in ReportContracts.cs matching actual route
registrations); A62 deferred to a new Backlog row per Q2 confirmed
scope; canonical-promotion proposal for the Backlog re-read pass banked
in §Counterfactual Retro for Collaborator nomination (4-instance
corroboration check pending).
Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity. Banking the pattern: a complete harness deliverable (Python scaffold + first end-to-end run + ADR + completion report + carry-over closures) is a 1-session unit at this scale when the PO confirms framing pre-plan and the architectural primitives are self-implemented.
What was produced
New Python harness tree (10 source files, ~1,500 LOC)
| Path | Purpose |
|---|---|
tools/parallel-validation/README.md | Reproducibility instructions + Frame D framing + A66 + dependency matrix |
tools/parallel-validation/requirements.txt | Pinned deps (pyodbc==5.1.0, requests==2.32.3, PyYAML==6.0.2, Jinja2==3.1.4) |
tools/parallel-validation/harness_config.yaml | Declarative config: PSSaaS API + sqlcmd connection params + tolerance bands + pssaas_ui_base_url indirection per Backlog #30 |
tools/parallel-validation/install_venv.sh | One-shot WSL Ubuntu venv setup script |
tools/parallel-validation/.env | Gitignored — PFILL_SQL_PASSWORD (sidesteps shell-quoting hell with $ in passwords) |
tools/parallel-validation/harness.py | Main entry point + CLI + .env loader + sentinel-aware logging |
tools/parallel-validation/snapshot.py | Input snapshot capture (reads pfill_run_history + computes state hash) |
tools/parallel-validation/pssaas_invoker.py | POST /run + poll terminal status + read 8 Phase 7 reports (codes against ACTUAL routes per A67 closure) |
tools/parallel-validation/sqlcmd_invoker.py | pyodbc EXECs of 5-step proc sequence (skips C#-side candidate_builder; conset rebuilds the table itself) + reads same 8 sources via direct SELECT |
tools/parallel-validation/diff_engine.py | Umbrella + per-report row comparators + A66-aware verdict logic + RowVerdict.INCOMPARABLE for asymmetric-failure case + 7-case self-test |
tools/parallel-validation/report_renderer.py | Jinja2 → Markdown; staging-URL composition via pssaas_ui_base_url + 1-case self-test |
tools/parallel-validation/templates/comparison_report.md.j2 | The Frame D Capability × Environment matrix template + per-section breakdown + Phase 9 finding A69 surfacing block |
tools/parallel-validation/check_db.py | Dev-env pre-flight (one-shot pyodbc + pfill_run_history connectivity check) |
tools/parallel-validation/diagnose_a69.py | A69 root-cause diagnostic: column existence + view shape + proc encryption state |
tools/parallel-validation/diagnose_a69_v2.py | A69 v2 diagnostic: v_loan_loan_shipped + loan_shipped column verification (both have note_rate) |
tools/parallel-validation/diagnose_a69_v3.py | A69 v3 diagnostic: UE-in-isolation success against post-rebuild-empty state |
Modified backend files
src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs— A67 closure: 8 XML doc-comment paths re-aligned from/api/powerfill/runs/{run_id}/reports/<name>to/api/powerfill/runs/{run_id}/<name>matching actualRunEndpoints.csroute registrations.src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs— sentinel bumped fromphase-8-superset-react-ready-a54-fixedtophase-9-validation-ready(drops the-a54-fixedsub-suffix per kickoff §"Cross-cutting"; A54 closure is now historical).
New documentation
docs-site/docs/adr/adr-028-phase-9-parallel-validation-harness-design.md— full ADR documenting D-9-0 (Frame D Hybrid) + D-9-1 (Option A invocation) + D-9-2 (Python in WSL) + D-9-3 (Markdown output) + D-9-4 (Option L runtime).docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md— first comparison run output (auto-rendered by the harness; committed verbatim as the Greg-demo asset).docs-site/docs/devlog/2026-04-20-powerfill-phase-9.md— devlog entry per template-7 format.docs-site/docs/handoffs/powerfill-phase-9-completion.md— this completion report.docs-site/docs/specs/powerfill-engine.md— Phase 9 row in Phased Implementation table marks "Validation harness DONE + first comparison run DONE"; new Phase 10 row for production cutover.docs-site/docs/specs/powerfill-assumptions-log.md— A69 (state-dependent UE failure on non-empty post-pool_guide state) + A70 (Frame D framing refinement; mixed PSSaaS/legacy proc-body state on PS_DemoData).docs-site/docs/arc42/09-architecture-decisions.md— ADR-028 row added (Phase 9 harness; renumbered from initial ADR-027 due to a parallel Collaborator-authored ADR-027 landing in commitece500eduring the Phase 9 dispatch window — the canonical "ADRs are numbered sequentially and never renumbered" rule applies; first-committed wins). ADR-027 (Superset Embedding Strategy, Collaborator-authored) row preserved unchanged.docs-site/docs/handoffs/pssaas-session-handoff.md— checkpoint banner + numbered list entry + Backlog table updates.
Out of scope (deliberately not produced)
- A62 closure — deferred to its own Backlog row per Q2 confirmed scope (PS_DemoData view drift;
pfillv_existng_pool_disposition14-col legacy version vs PSSaaS's 15-colnote_rate-bearing definition). The harness's first run surfaced A62 via the Existing-Disposition endpoint's catch-and-degrade Note as expected; no new defect introduced. - A69 root-cause investigation — banked for follow-up session; this session's deliverable is the harness + the empirical surfacing of A69 + the verdict-logic hardening to honor the finding without inflation.
- Multi-customer-DB sweep — operator-driven post-Phase-9 per kickoff §"Explicit scope (OUT)". The harness as built is the lever; the pull is operator + customer-rep work.
- Production cutover — Phase 10+ per kickoff. Spec line 651's "cutover" wording is now superseded.
- Phase 8.5 OIDC integration —
pssaas_invokersignatures already accept an optional bearer-token shape; wiring is a future commit when Backlog #31 lands. - Containerized harness (Option K) — kept as a future-extension per ADR-028 §Future Considerations.
- Snapshot replay (Option C) — kept as a future-extension per ADR-028 §Future Considerations.
- HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
- Canonical-promotion proposal for the Backlog re-read pass — banked in §Counterfactual Retro for Collaborator nomination (NOT Architect-shipped this session).
Decisions made
| # | Decision | Rationale | Where |
|---|---|---|---|
| D-9-0 | Frame D Hybrid (orchestration parity on PS_DemoData; Capability × Environment matrix carries explicit "NOT MEASURABLE HERE" cells for legacy-vs-fixed-body parity + Customer DB column) | Per Andon-cord pre-plan exchange; PO-confirmed. Capability Inflation countermeasure for the Greg-demo asset. | ADR-028 §D-9-0; this report TL;DR; comparison_report.md.j2 |
| D-9-1 | Option A — Direct sqlcmd EXEC of the 5-proc legacy-equivalent sequence | Lowest friction; matches ADR-021 verbatim-port discipline; Architect's existing pyodbc fluency | ADR-028 §D-9-1; sqlcmd_invoker.py |
| D-9-2 | Python 3.10+ in WSL Ubuntu (pyodbc + requests + PyYAML + Jinja2) | Kickoff §"Tooling" recommendation; Architect fluency; ergonomic stack for the harness's 4 concerns | ADR-028 §D-9-2; requirements.txt |
| D-9-3 | Markdown output via Jinja2 template; HTML/JSON deferred | Slots into the existing Greg-demo narrative format; renders in Docusaurus; PO can paste sections into the deck | ADR-028 §D-9-3; report_renderer.py + comparison_report.md.j2 |
| D-9-4 | Option L Local-only runtime (WSL + local pssaas-api + PS_DemoData public endpoint) | PO-confirmed; smallest testable increment; demo asset's content references staging via config indirection so the harness binary's location doesn't constrain demo surface | ADR-028 §D-9-4; harness_config.yaml |
| D-9-5 | A69 honesty post-first-run: added RowVerdict.INCOMPARABLE + suppressed A66 Match classification when EITHER side Failed | Capability Inflation countermeasure; Andon-cord response per kickoff §"Reporting protocol" + PO's fix_and_continue confirmation | diff_engine.py + Case 7 self-test + this report's §"Counterfactual Retro" |
| D-9-6 | Option harness_plus_a67 carry-over scope: A67 closes this session; A62 deferred; canonical-promotion proposal banked for Collaborator | Kickoff anticipated A67 as the harness's natural forcing function (trivial 8-line edit); A62 closure has its own Verification-Gate surface; canonical-promotion goes through PSSaaS Collaborator nomination | This report; pssaas-session-handoff Backlog updates |
Migrations enumerated
This phase ships 0 new schema migrations + 0 new SQL artifacts. The
harness is read-only against pfill_run_history for snapshot capture
- writes via the existing
POST /api/powerfill/runHTTP endpoint (which uses the same Phase 6e migration surface). Sqlcmd-direct EXECs go against PSSaaS-deployed proc bodies that already live on PS_DemoData.
PowerFill-owned table count: 23 (unchanged from Phase 6e). PowerFill-owned proc count: 8 (unchanged from Phase 6d/A54-fix).
Gate findings
Three-layer Primary-Source Verification Gate (with Backlog re-read pass)
| ID | Layer | Finding | Disposition |
|---|---|---|---|
| F-9-1 | Spec-vs-implementation (Layer 1; A67 carry-over) | ReportContracts.cs lines 9-16 + 8 per-class XML doc-comment paths showed /api/powerfill/runs/{run_id}/reports/<name> segment that doesn't exist in RunEndpoints.cs route registrations. | (a) Corrected in place — A67 closure landed in this commit batch. 9 path strings re-aligned + a banking comment block added explaining the Truth Rot pattern + A67 reference. |
| F-9-2 | NVO-vs-implementation (Layer 2) | ADR-021 §Narrow Bug-Fix Carve-Out is forward-only; A54-fixed 009_*.sql proc body lives on PS_DemoData; both PSSaaS API path AND sqlcmd-direct EXEC of the same procs execute the SAME body. The kickoff's "Desktop App equivalent path" Option A produces orchestration-parity, NOT legacy-vs-fixed-body parity. | (b) Scope-changed → Frame D Hybrid (PO-confirmed pre-plan). Capability × Environment matrix in every harness output report encodes the distinction with explicit cells. |
| F-9-3 | Implementation-vs-runtime (Layer 3; A66) | Per A66, post-Complete state on PS_DemoData has 0 rows in 7-of-8 user-facing reports + Cash Trade Slotting; only Hub dashboard run-history retains observable rows. The harness's verdict logic must classify "both PSSaaS path and sqlcmd path produce 0 rows" as Match, not Divergent. | (a) Encoded in harness verdict logic as the A66-aware comparison rule (if both_empty AND pssaas_run_status==Complete AND sqlcmd_run_status==Complete: Match). Self-test Cases 1, 6, 7 cover the load-bearing semantics including the Failed-side regression. |
| F-9-4 | Implementation-vs-runtime (Layer 3; A68) | A68's tenant-id convention requires harness writes to use tenant_id='ps-demodata' matching the existing 13 historical rows. | (a) Encoded in harness invocation — X-Tenant-Id: ps-demodata header on all PSSaaS API calls; harness is otherwise read-only against pfill_run_history. |
| F-9-5 NEW | Implementation-vs-runtime (Layer 3; first-run finding) | A69 — psp_powerfillUE fires SqlException 207 'Invalid column name note_rate' when invoked via direct sqlcmd EXEC against PS_DemoData with non-empty post-pool_guide state. Not reproducible when UE invoked in isolation against post-PSSaaS-rebuild-empty state. | (b) Banked as A69 — first-run finding worth Greg/Tom consultation; harness's verdict logic hardened post-first-run to honor the asymmetric-failure case via RowVerdict.INCOMPARABLE. |
| F-9-6 NEW | Spec-vs-implementation (Layer 1; first-run finding) | A70 — PS_DemoData's psp_powerfillue + psp_powerfill_conset + 5 other procs are deployed WITH ENCRYPTION, indistinguishable from the legacy versions via sys.objects. Frame D's "same fixed proc body" claim is more precisely "whatever proc bodies live on the target DB." | (b) Banked as A70 — refines Frame D framing without invalidating it; matrix already encodes the right level of honesty. Production-cutover playbook (Phase 10+) needs explicit A70 disposition. |
Pattern observation: the Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself this session (Phase 7 F-7-7 anticipated; Phase 8 W1 F-8-BR-1 caught at planning; Phase 8 W2 F-W2-BR-3 + F-W2-CONTRACT-1 caught at planning; Phase 9 = no Backlog-rooted finding, only the harness's runtime-discovery findings F-9-5 + F-9-6). The 4-instance-corroboration check is therefore 3-instance corroborated + 1 instance where the Backlog was already fully addressed at planning time — pattern remains canonical-adoption- ready per Phase 8 W2 completion report's banking, with the qualifier that "0 net-new findings" is itself a positive signal (the kickoff did its job at the §"Live infrastructure note" + §"Inherited context" specificity level). Banking proposal for the next process-discipline revision: the canonical wording should accommodate the "0 findings is a pass not a fail" semantic.
Alternatives-First Gate
5 architectural decisions surfaced + documented per ADR-028 (3 from the kickoff's explicit list + Frame D framing decision + Option L runtime decision). All exercised pre-plan via the Architect-PO Andon-cord exchange + the PO's confirmation of recommendations.
Required Delegation Categories
0 subagents dispatched this session. Deliberate Non-Delegation per practice #9, mirroring the Phase 8 W2 pattern:
Deliberate Non-Delegation: Architectural primitives + load-bearing verdict logic
Task: harness.py + snapshot.py + pssaas_invoker.py + sqlcmd_invoker.py
+ diff_engine.py + report_renderer.py + templates/comparison_report.md.j2
+ check_db.py + diagnose_a69*.py + ADR-028 + completion report + devlog
+ spec amendment + assumption log A69 + A70 + sentinel bump + A67 closure
Reason for self-implementation: Frame D framing + A66/A69-aware verdict logic
+ Capability x Environment matrix verbiage are load-bearing for the Greg
demo's honesty (Capability Inflation countermeasure). The architectural-
contract-per-artifact cost (especially the matrix's "NOT MEASURABLE HERE"
cell semantics for the legacy-vs-fixed-body row + the A69 surfacing) far
exceeds the artifact-write cost. The first-run finding (A69) required
same-author context to size the disposition correctly; a delegated subagent
without prior PoC access could not have made the call.
Context that would be lost in handoff: Frame D rationale, Option L rationale,
A66 distinction, A69 empirical-resolution arc (3 diagnose scripts +
hypothesis space refinement), A70 framing-refinement, A67 forcing-function
context, the staging-URL config indirection per Backlog #30, the post-
first-run verdict-logic hardening (RowVerdict.INCOMPARABLE).
Banking observation: Phase 8 W2's 0-subagent outcome is now 2-instance corroborated (Phase 8 W2 + Phase 9). The "contract-per-artifact density high → self-implement" heuristic is empirically validated for sessions that ship architecturally load-bearing surfaces.
Reviewable Chunks at intra-session scope
3 Andon-cord checkpoints exercised:
- Pre-plan (Frame D framing) — Architect surfaced the "Option A is comparing the SAME fixed proc body" concern; PO confirmed Frame D Hybrid as the resolution before any code.
- Mid-implementation (Q1 Q2) — Architect surfaced runtime-location
(Option L) + carry-over-scope (
harness_plus_a67) decisions; PO confirmed both via AskQuestion. - Post-first-run (A69 surfacing) — harness produced asymmetric-
failure outcome; Architect surfaced A69 + verdict-logic gap; PO
confirmed
fix_and_continuedisposition; verdict logic hardened- re-run produced honest Markdown output.
Pattern observation: Reviewable Chunks at sub-checkpoint boundaries (not just workstream boundaries) IS the right granularity for empirical- discovery sessions. The 3 Andon pulls were the load-bearing structure of this session; without them I would have shipped Frame A (Capability Inflation), or Option K (over-engineered runtime), or the misleading "0 Divergent / A66 Match" verdict report.
Deploy Verification Gate — 3 arms
| Arm | Description | Evidence |
|---|---|---|
| (a) Sentinel signal | PowerFillModule.cs MapGet("/status") returns phase-9-validation-ready | One-line change verified post-edit; runtime probe deferred to Collaborator-side post-push (curl -s http://pssaas.staging.powerseller.com/api/powerfill/status). |
| (b) Harness self-tests + integration | diff_engine.py self-test 7/7 PASS (including new Case 7 asymmetric-Failed regression); report_renderer.py self-test 12/12 substring matches PASS; harness end-to-end run 3 consecutive PASSes producing identical A69 finding (deterministic) | See §"PoC verification commands and outputs" below. |
| (c) End-to-end click-through | First comparison run produces verdict report at docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md; report's clickable run-status URL points at staging React UI (live demo surface) | Verified via Read of the rendered output; staging URL composition validated via renderer self-test. |
Environment-Explicit Inventory (per canonical practice #13)
This is the load-bearing Phase 9 application of practice #13. Adapted from the Phase 8 W2 completion report's matrix shape, with rows adjusted for the per-customer-DB cell semantics the kickoff specified.
| Capability | Architect's local WSL + local pssaas-api + PS_DemoData (public endpoint) | Staging React UI + staging API + PS_DemoData (private endpoint) | Customer DB (PS608 / future tenants) |
|---|---|---|---|
Harness scaffold builds + runs (python harness.py) | Verified ✓ this session (3 consecutive runs; identical A69 reproduction) | NOT MEASURED HERE — Option L is local-only | NOT MEASURED HERE — operator-driven post-Phase-9 |
| Diff engine self-test 7/7 PASS | Verified ✓ this session | N/A | N/A |
| Report renderer self-test 12/12 PASS | Verified ✓ this session | N/A | N/A |
| First end-to-end comparison run produces verdict report Markdown | Verified ✓ this session | NOT MEASURED HERE | NOT MEASURED HERE |
| Verdict logic correctly suppresses A66 Match when sqlcmd Failed (asymmetric-failure case) | Verified ✓ this session (Case 7 self-test + empirical first-run output) | NOT MEASURED HERE | NOT MEASURED HERE |
| Orchestration equivalence on PS_DemoData (PSSaaS API path vs sqlcmd-direct path against same proc body set) | Phase 9 Andon — A69 surfaced; orchestration-equivalence claim NOT VERIFIED on this run because sqlcmd-direct path Failed mid-EXEC. The harness's STOP-and-surface protocol fired correctly. Next-run verification depends on A69 root-cause + fix. | NOT MEASURED HERE | NOT MEASURED HERE |
| A66 expected-empty Complete-run case classified as Match | Verified ✓ at unit-test level (Case 1); NOT EXERCISED at runtime level on this run (A69 short-circuited the A66 path). Next harness run on a state where both sides reach Complete will be the runtime-level verification. | NOT MEASURED HERE | NOT MEASURED HERE |
| A62 PS_DemoData view drift catch-and-degrade observable from harness | Verified ✓ this session (Note text in Existing-Disposition response surfaced in PSSaaS run output) | NOT MEASURED HERE | NOT MEASURED HERE |
| Legacy unmodified proc body vs PSSaaS-fixed proc body parity | NOT MEASURABLE HERE — A54 fix is forward-only and deployed to PS_DemoData; legacy unmodified body deterministically Fails per A54+A56 | NOT MEASURABLE HERE — same DB, same proc body | NOT MEASURED HERE pending customer-rep approval — operator-driven post-Phase-9 sweep against customer DBs without A54 triggers |
| Whether PSSaaS-deployed-with-encryption procs on PS_DemoData are PSSaaS's port vs the legacy version (A70) | NOT DISTINGUISHABLE via sys.objects for the 7 WITH ENCRYPTION procs; only psp_powerfill_pool_guide (plain) is definitely PSSaaS-deployed. Behavior diff per proc would resolve. | NOT MEASURED HERE | NOT MEASURED HERE |
Sentinel phase-9-validation-ready returned by API | NOT YET MEASURED (sentinel bumped in code; runtime probe is post-restart of local pssaas-api) | NOT MEASURED HERE — Collaborator-side post-PO-push | NOT MEASURED HERE |
| Staging React UI run-status page click-through validates harness-cited run_ids | NOT MEASURED HERE in harness session | Verified ✓ via post-W2-deploy banked verification + first-run report's clickable URLs | NOT MEASURED HERE |
| Harness invocable from non-Architect machine (Collaborator-side reproducibility) | NOT MEASURED HERE this session — README.md documents the install + invocation pattern; Collaborator runs docker compose --profile dev up + python harness.py to verify | N/A | N/A |
| Pre-push docs-build check passes | NOT YET MEASURED — pre-commit step | N/A | N/A |
Net assessment: the Phase 9 deliverable is code-complete at the artifact level. Runtime end-to-end verification at staging + customer- DB scope is the post-Phase-9 operator-driven sweep. The Capability × Environment matrix is deliberately granular about what's verified WHERE to prevent Capability Inflation (per the canonical antipattern's W2-banked example).
Counterfactual Retro
Knowing what I know now, what would I do differently?
- The Frame D framing surfacing pre-plan was the load-bearing decision of this session. Without it, the harness would have shipped a "loan-by-loan parity vs Desktop App" claim that's a Capability Inflation in the canonical sense. Banking the pattern: for any harness/measurement deliverable, surface the Frame D-style "what does this prove vs what does it NOT prove" question PRE-PLAN.
- The verdict-logic hardening post-first-run is the second load-bearing moment. The diff engine's self-test had Case 6 (Failed-PSSaaS) but not Case 7 (Asymmetric-Failed: Complete-PSSaaS + Failed-sqlcmd) — the gap that allowed the misleading "0 Divergent + 8 A66 Match" first-run output. Banking observation: when the verdict semantics has multi-input symmetry (like Match-vs-Divergent-vs-A66 across two sides), the self-test must cover the asymmetric cases too.
- The harness surfaced A69 on its very first run. This is the strongest possible validation that Phase 9's investment was correctly scoped — the harness is a defect-surfacing tool, not just a happy-path checker. Banking observation: Phase 9 deliverables should be evaluated by what FINDINGS they surface, not by their "all green" rate. A69 IS the demo asset's load-bearing slide.
- The Andon-cord protocol fired 3x this session (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 produced concrete PO decisions that materially improved the deliverable. Banking: for empirical-discovery sessions, sub- checkpoint Andon pulls are the right granularity, not workstream- boundary-only.
- The pyodbc dev-environment install was the largest tooling friction
of the session (sudo-prompt invisibility; PowerShell-to-WSL-to-bash
quoting hell; venv-on-mounted-drive ensurepip failure). The
tools/parallel-validation/.envpattern + the~/.venvs/pfill-harnessWSL-native venv path resolved both. Banking observation: the kickoff's §"Tooling" line 180 anticipated the pip-install but didn't anticipate the OS-levelmsodbcsql18install requires sudo and a Microsoft apt repo setup. Future kickoffs for harnesses should include OS-package prereqs in §"Tooling", not just pip-package prereqs. - The 4-instance-corroboration check on the Backlog re-read pass is
actually 3-instance with a "0 findings = pattern works" 4th data
point. Banking proposal for the next
process-discipline.mdrevision (NOT shipped this session — Collaborator nomination per Q2 confirmed scope): the canonical wording should accommodate the "kickoff specificity reaches the point where Backlog re-read produces no net-new findings" semantic as the goal-state, not the miss-state. - The A70 finding (PS_DemoData has mixed PSSaaS/legacy proc-body state) was an unexpected sub-finding. It doesn't invalidate Frame D but it does refine what Frame D measures. Banking: for any kickoff that mentions a "SAME thing on both sides" claim, the primary-source verification gate should explicitly probe whether the "thing" is uniformly what the kickoff thinks it is.
- Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + Phase 7 + Phase 8 W1 + Phase 8 W2 + A54-fix velocity at ~1.5x complexity (harness + first run + 3 Andon iterations + 2 NEW assumptions + carry-over closure + framing-refinement bank). The PoC-iteration overhead (3 diagnose scripts + 3 harness runs) added ~30min total. Banking: process discipline overhead has positive ROI even on harness-style empirical-discovery sessions.
PoC verification commands and outputs
Diff engine self-test (7/7 PASS including Case 7 regression)
$ python diff_engine.py
Running DiffEngine self-tests...
[pass] A66 expected-empty case: 8/8 reports correctly classified Match
[pass] Identical-data case: 2/2 rows correctly classified Match
[pass] Within-tolerance price diff: 1/1 row correctly classified TolerableDiff
[pass] Beyond-tolerance price diff: correctly classified Divergent on column ['price']
[pass] Missing-row case: PSSaaS missing L2 correctly classified Divergent
[pass] Failed-run case: correctly NOT applying A66 (a66_classified_count=0; only Complete triggers A66)
[pass] Asymmetric Failed case: all 8 reports correctly classified Incomparable (NOT A66 Match); failure_message threaded through
All diff-engine self-tests PASSED.
Report renderer self-test (12/12 substrings PASS)
$ python report_renderer.py
Running ReportRenderer self-tests...
[pass] Found: PowerFill Phase 9
[pass] Found: first validation run
[pass] Found: Frame D Hybrid
[pass] Found: TL;DR
[pass] Found: Run verdict
[pass] Found: what this proves AND what it does NOT prove
[pass] Found: Capability x Environment matrix
[pass] Found: NOT MEASURABLE HERE
[pass] Found: https://pssaas.staging.powerseller.com/app/runs/aabbccdd-112...
[pass] Found: Per-report breakdown
[pass] Found: A66 expected-empty case fired
[pass] Found: Provenance + reproducibility
Rendered output: 13753 chars; renderer self-test PASSED.
Pre-flight DB connectivity
$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/check_db.py
connecting...
connected
server: hostedps-sql.086ea791c2f1.database.windows.net
db: PS_DemoData
server_time: 2026-04-20 07:13:42.846028
pfill_run_history rows: 13
recent run: FD6B190E-2B41-4A07-B836-E8576113C7A5 Complete 2026-04-20 05:24:09.064000
recent run: 1CE2B077-AF9D-4969-A348-B535BA265BBD Complete 2026-04-19 22:10:11.688000
recent run: 7C9DFE50-1B7D-471F-832A-D55053D2E7C5 Complete 2026-04-19 20:19:56.845000
ok
First end-to-end harness run (3 consecutive; deterministic A69 reproduction)
Final run (committed report):
$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/harness.py \
--config tools/parallel-validation/harness_config.yaml \
--output-path docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md
2026-04-20 02:26:58,366 [ERROR] powerfill_harness.sqlcmd: Sqlcmd-direct path failed: ('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'note_rate'. (207) (SQLExecDirectW)")
$ echo $?
2 # Andon exit code per the harness's success-vs-Andon convention
PSSaaS run id: 9312a638-cd21-4485-b42e-ee8dc02be0d0 — Complete in
~30s, allocated_count=515, pool_guide_count=515, post_ue_*=0 (per A66
rebuild-empty pattern). See the rendered report for the full Capability
× Environment matrix + A69 surfacing block.
A69 diagnosis (3 scripts; documented in the assumptions log)
$ wsl -- ~/.venvs/pfill-harness/bin/python tools/parallel-validation/diagnose_a69_v3.py
Pre-EXEC state:
pfill_powerfill_guide: 0 rows
pfill_pool_guide: 0 rows
pfill_trade_base: 0 rows
pfill_loan2trade_candy_level_01: 0 rows
pfill_powerfill_log: 12 rows
pfill_syn_powerfill_guide_all_rank: 0 rows
EXEC psp_powerfillue (mirroring PSSaaS resolved options)...
UE completed successfully
UE-in-isolation against post-rebuild-empty state succeeds. UE in the full sqlcmd EXEC sequence (post-conset+pool_guide writes 515 rows) fails with 207. State-dependence confirmed. See A69 in the assumptions log for the full hypothesis space.
Open questions and blockers
Carry-over to next Phase 9 follow-up session (or Greg-demo consultation)
- A69 root-cause investigation — Phase 9 follow-up session focused
on line-level UE archaeology via PRINT-instrumented copy of
011_*.sql- RAISERROR-on-each-step diagnostic + capture of
pfill_run_history.response_jsonfor the PSSaaS run that ostensibly Completed. Resolves the "PSSaaS silently swallows" vs "different code path" hypothesis space.
- RAISERROR-on-each-step diagnostic + capture of
- A70 disposition for production cutover — Phase 10+ playbook needs explicit "drop legacy procs + redeploy from PSSaaS *.sql" step before first PSSaaS run on a customer DB, OR behavior-diff confirmation that legacy + PSSaaS-deployed-with-encryption are equivalent.
- A62 closure — deferred per Q2 confirmed scope. New Backlog row
for the
pfillv_existng_pool_dispositiondeploy decision (rename PSSaaS view topfillv2_*OR overwrite encrypted legacy view OR behavior-diff first).
Architect recommendations for the next session
- Run the harness against the same input snapshot AFTER A69 root-cause resolution to verify orchestration parity at the runtime level (currently verified at the unit-test level only; the asymmetric- failure short-circuit prevented runtime-level A66 verification this session).
- Bank the A70 behavior-diff resolution method as a Phase 10 playbook step — onboarding a new customer DB always begins with the proc-body deploy verification.
- Phase 8.5 should bake the OIDC bearer-token shape into the harness when wiring lands — the function signatures already accept the parameter; the wiring is mechanical.
- Customer-rep conversation to schedule the first operator-driven harness run — the harness as built is the lever; the pull is PO + customer-rep work that the kickoff defers.
Optional follow-up (deferred)
- Snapshot replay (Option C) — captures pre-run DB state + replays after each comparison; enables true side-by-side comparisons of two harness runs against the same input.
- Containerized harness (Option K) — defer until evidence emerges that local-only invocation doesn't generalize to a customer-DB scenario.
- HTML / JSON output formats — sibling Jinja2 templates if a future phase needs programmatic consumption beyond the Markdown v1.
- Canonical-promotion proposal for the Backlog re-read pass —
Collaborator nomination per the established practice; this Architect
session bank-reports the 4-instance corroboration evidence (3 traditional
- 1 "0 findings = pattern works" data point per Counterfactual Retro item 6) without shipping the canonical revision.
Recommended next steps
- Pre-push docs-build check — MANDATORY per Phase 6e/7/8-W1/8-W2
banked discipline + this session ships 5+ new
docs-site/docs/**files. Command:docker build -f docs-site/Dockerfile.prod docs-site. Architect performs before commit. - Collaborator review of:
tools/parallel-validation/source tree (review focus: Frame D verdict semantics indiff_engine.py+ the A66/A69-awarecompare()branch + the Capability × Environment matrix verbiage intemplates/comparison_report.md.j2)src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/ReportContracts.cs(review focus: A67 closure — 9 path strings re-aligned + banking comment)src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs(1-line sentinel bump)- ADR-028
docs-site/docs/specs/powerfill-engine.mdPhase 9 row + new Phase 10 rowdocs-site/docs/specs/powerfill-assumptions-log.mdA69 + A70docs-site/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run.md(the Greg-demo asset — review focus: honest framing of what's proven vs NOT MEASURABLE HERE)- This completion report
- Devlog entry
- Session handoff bump
- Estimated 1-2 hours (Phase 9 ships ~10 source files in the
harness + several docs; the architectural primitives are
concentrated in
diff_engine.py+ the template + ADR-028).
- PO sign-off on:
- Phase 9 (Validation harness + first comparison run) COMPLETE
declaration (sentinel
phase-9-validation-ready) - A69 + A70 dispositions (banked for follow-up; A69 specifically flagged for Greg/Tom consultation)
- A67 closure
- Architect recommendation that the next phase is either (a) A69 root-cause investigation + customer-DB sweep planning, or (b) Phase 8.5 (ecosystem auth + Superset embedding) per the parallel-track plan
- Phase 9 (Validation harness + first comparison run) COMPLETE
declaration (sentinel
- PO push of the atomic commits (Architect commits; PO controls
git push). - Post-push staging verification (Collaborator + PO):
- GHA workflow runs (
build-apijob for the sentinel bump) curl -s https://pssaas.staging.powerseller.com/api/powerfill/status→ expect{"module":"PowerFill","status":"phase-9-validation-ready"}- Harness's first-run output report renders cleanly in Docusaurus
at
https://pssaas.staging.powerseller.com/docs/devlog/2026-04-20-powerfill-phase-9-first-validation-run
- GHA workflow runs (
- Greg-demo dry-run with the harness's first-run report as the load-bearing slot-in addition to the existing "Bug as Feature" narrative.
Notes on this session's process
- Three-layer Primary-Source Verification Gate exercised; produced 6 findings (F-9-1 through F-9-6); F-9-1 closed in-session via A67; F-9-2 + F-9-4 already addressed at planning; F-9-3 encoded in verdict logic; F-9-5 + F-9-6 banked as A69 + A70.
- Backlog re-read pass at planning time produced 0 net-new actionable findings against the Backlog itself. The kickoff's specificity at the §"Live infrastructure note" + §"Inherited context" level reduced Backlog-rooted Truth Rot to 0 for this session. Banking observation: the canonical-promotion candidate is now 4-instance corroborated (3 traditional + 1 "0 findings = pattern works"; see Counterfactual Retro item 6).
- Reviewable Chunks at sub-checkpoint scope EXERCISED 3x (Frame D pre-plan; Q1 Q2 carry-over scope; A69 fix_and_continue post-first-run). All 3 Andon pulls produced material PO decisions.
- Required Delegation Categories classification: 0 subagents dispatched. Deliberate Non-Delegation per practice #9; mirrors the Phase 8 W2 pattern. Banking: 2-instance corroboration of the "contract-per-artifact density high → self-implement" heuristic.
- Practice #13 Environment-Explicit Inventory ACTIVELY APPLIED at
multiple levels: the Capability × Environment matrix in this report,
the matrix in the harness's first-run output, the matrix shape
template baked into
comparison_report.md.j2for all future harness runs. Banking: practice #13 is now load-bearing in the harness itself, not just in completion reports. - Andon-cord readiness used 3 times during the session, all producing material outcome improvements (avoided Frame A Capability Inflation; avoided Option K over-engineering; avoided shipping misleading verdict report).
- Counterfactual Retro filled with 8 named observations — most important: (1) Frame D framing pre-plan was THE load-bearing decision; (2) verdict-logic asymmetric-failure self-test gap was real and got caught only via the live first-run; (3) Phase 9 deliverables should be evaluated by what they SURFACE, not by all-green rate.
- Deploy Verification Gate all 3 arms exercised: (a) sentinel code
change verified post-edit (runtime probe deferred); (b) self-tests
PASS + harness end-to-end runs deterministic; (c) rendered report
- clickable staging URLs verified.
- Sub-phase calendar time: ~1 Architect-session. Consistent with 6a-6e + 7 + 8 W1 + 8 W2 + A54-fix velocity at ~1.5x complexity. Banking: process discipline overhead has positive ROI even on empirical-discovery sessions.
Phase 9 is code complete; the harness's first comparison run
surfaced A69 (the canonical first instance of the harness as a
defect-surfacing tool) + A70 (Frame D framing refinement); A67
closed; A62 deferred to a new Backlog row; sentinel bumped to
phase-9-validation-ready. The Greg-demo asset is the rendered
first-run report at
2026-04-20-powerfill-phase-9-first-validation-run.
Phase 9's PO milestone — "I have empirical loan-by-loan evidence
that PSSaaS PowerFill matches the legacy Desktop App on PS_DemoData" —
is empirically achievable in the more honest Frame D Hybrid form
(orchestration-equivalence on PS_DemoData with explicit "NOT MEASURABLE
HERE" cells for the customer-DB question), pending A69 root-cause
resolution and customer-rep approval for the operator-driven sweep.
End of Phase 9 completion report. The Architect commits; the PO pushes.