PowerFill Sub-Phase 6e — Async Runs + Audit + Concurrency + Phase 6 Completion
Date: 2026-04-19
Agent: PSSaaS Systems Architect (Opus 4.7 High Thinking)
Scope: Convert POST /api/powerfill/run to async (202 + run_id + Location header; background worker); add pfill_run_history audit table with BR-8 filtered unique index + BR-9 cleanup + Q3 Option B input_loan_ids_json + Q7 Option B failure_step/failure_message + Phase 6e response_json; add GET /runs, GET /runs/{run_id}, POST /runs/{run_id}/cancel; ship the canonical Phase 6 completion sentinel phase-6e-async-runs-ready.
Why
Phase 6e is the final Phase 6 sub-phase. The PowerFill spec (§Run Execution Model) requires asynchronous runs; sub-phases 6a-6d shipped a synchronous best-effort surface so the orchestration was independently verifiable before adding the async runtime. 6e converts the surface to true async, adds the pfill_run_history audit table per spec §Audit Trail, enforces BR-8 single-active-run-per-tenant via a SQL filtered unique index, implements BR-9 failure-state cleanup, and ships the canonical Phase 6 completion sentinel.
After 6e ships, Phase 6 (Core Allocation Engine) is COMPLETE and Phase 7 (Reports / recap query APIs) becomes available.
What Was Done
SQL artifact (1 new file)
012_CreatePfillRunHistoryTable.sql(104 lines) —pfill_run_historytable (14 cols: 11 spec canonical + Q3 Option Binput_loan_ids_json+ Q7 Option Bfailure_step+failure_message+ Phase 6eresponse_json); filtered unique indexux_pfill_run_history_tenant_active(BR-8); cursor pagination indexix_pfill_run_history_tenant_started_at; idempotent guards + PRINT-in-guards (A32) + A50 SET preamble.
EF Core entity (1 new file)
PowerFillRunHistory.cs— 14 cols, PK(RunId UUID). Registered inPowerFillModule.RegisterEntities. PowerFill-owned table count: 22 → 23.
Service classes (5 new files)
IRunProgressSink.cs— interface +NoopRunProgressSinkfor the orchestrator's per-step status-transition callback.PowerFillRunCancelRegistry.cs— process-singletonConcurrentDictionary<Guid, CancellationTokenSource>.PowerFillRunQueue.cs— boundedChannel<RunJob>(capacity 64, 2s enqueue timeout → 503 on saturation) +RunJobimmutable record carrying captured tenant identity.PowerFillRunHistoryService.cs— scoped audit/cleanup CRUD; BR-8 SqlException 2627 →BR8ConflictExceptiontranslation; BR-9 cleanup of 7 user-facing tables (preserves 4 syn-trades + log per A58);MarkAbandonedActiveRunsAsyncfor startup reconciliation.PowerFillRunBackgroundService.cs—BackgroundServicechannel reader; per-job DI scope; tenant-context replay (resolves F-6e-5); explicit Cancelled vs Failed terminal classification viajob.CancellationToken.IsCancellationRequested(D6);response.Statusreconciled with terminal decision before persistingresponse_json(D7); BR-9 cleanup invocation in finally block.PowerFillRunStartupReconciliationService.cs—IHostedServicerunning once at app startup; iterates every known tenant + per-tenant DI scope + marks abandoned active rows as Failed.
RunService refactor
PowerFillRunService.cs—ExecuteAsync(request, ct)refactored to delegate to newExecuteResolvedAsync(options, runId, IRunProgressSink, ct); the latter is the worker entry point. Status transitions viaIRunProgressSinkat PreProcessing / Allocating / PostProcessing boundaries. The legacy entry point preserves back-compat with the 50+ existingPowerFillRunServiceTests.
Endpoint refactor
RunEndpoints.cs—POST /runreturns 202 Accepted +RunSubmissionResponse+Location: /api/powerfill/runs/{run_id}(with 409 on BR-8 / 503 on queue saturation / 400 on invalid options); new endpointsGET /runs(paginated list),GET /runs/{run_id}(full RunResponse fromresponse_json),POST /runs/{run_id}/cancel;POST /candidates/previewunchanged.
Module registration
PowerFillModule.cs— registeredPowerFillRunHistoryService(scoped),PowerFillRunQueue+PowerFillRunCancelRegistry(singleton),PowerFillRunBackgroundService+PowerFillRunStartupReconciliationService(hosted services); registeredPowerFillRunHistoryentity; sentinel bumped tophase-6e-async-runs-ready.
Tests (4 new files + 1 extension)
PowerFillRunCancelRegistryTests.cs— 10 tests (Register/TryGet/TryCancel/Unregister + thread-safety + multi-run isolation).PowerFillRunQueueTests.cs— 6 tests (FIFO ordering + cancel-propagation + saturation behaviour with timeout).PowerFillRunHistoryServiceTests.cs— 14 tests (Insert canonical-column round-trip + JSON round-trip + tenant scoping + List pagination + cursor logic + GetStatus + Finalize/Transition argument validation).RunStatusTests.cs— 8 tests (enum value count + ordering + active-set integrity vs SQL filter predicate + JSON serialisation contract pinning the BR-8-critical strings byte-for-byte).EntityConfigurationTests.csextension — addedpfill_run_historytoExpectedTableNames(count 22 → 23) +AssertPk<PowerFillRunHistory>(RunId).
Test totals: 158 → 206 passed, 6 skipped, 0 failed. +48 net-new tests for 6e.
Documentation
adr-024-powerfill-async-run-pattern.md— full ADR documenting BackgroundService + Channel decision (Q1 PO-confirmed Option A); Options A-D considered; future-considerations section (multi-pod, replay, scheduled runs).powerfill-engine.mdspec amendments — §Run Execution Model (full async lifecycle), §Audit Trail (14-col schema), BR-8 (filtered index mechanism), BR-9 (cleanup scope split), §Run APIs (POST /run 202 + new endpoints), new §"Phase 6e PSSaaS-explicit tables" sub-section.powerfill-assumptions-log.md— A58 added (BR-9 cleanup scope split + forensic preservation rationale); A56 carry-over update (Phase 6e PoC reproduces identical A54 outcome and validates orchestration layer); A57 second-corroboration note (kickoff specificity → 0 net-new Truth Rot for second consecutive sub-phase).09-architecture-decisions.md— ADR-024 row added.powerfill-phase-6e-completion.md— completion report (~600 lines) with full PoC verification commands and outputs, 8 Gate findings, 11 decisions table, counterfactual retro, Phase 6 completion declaration.- This devlog entry.
Files Produced / Modified
New:
src/backend/PowerSeller.SaaS.Modules.PowerFill/Sql/012_CreatePfillRunHistoryTable.sqlsrc/backend/PowerSeller.SaaS.Modules.PowerFill/Domain/PowerFillRunHistory.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/IRunProgressSink.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunCancelRegistry.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunQueue.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunHistoryService.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunBackgroundService.cssrc/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunStartupReconciliationService.cssrc/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunCancelRegistryTests.cssrc/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunQueueTests.cssrc/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunHistoryServiceTests.cssrc/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Contracts/RunStatusTests.csdocs-site/docs/adr/adr-024-powerfill-async-run-pattern.mddocs-site/docs/handoffs/powerfill-phase-6e-completion.mddocs-site/docs/devlog/2026-04-19c-powerfill-phase-6e.md(this file)
Modified:
src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/RunContracts.cs(RunStatus 2→7 values + 5 new contract types)src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunService.cs(extracted ExecuteResolvedAsync entry point)src/backend/PowerSeller.SaaS.Modules.PowerFill/Endpoints/RunEndpoints.cs(rewritten POST /run + 3 new endpoints)src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs(service registrations + sentinel bump)src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/EntityConfigurationTests.cs(+1 table-name + 1 PK assertion)docs-site/docs/specs/powerfill-engine.md(major amendments: 5 sections)docs-site/docs/specs/powerfill-assumptions-log.md(A58 added; A56 carry-over; A57 corroboration)docs-site/docs/arc42/09-architecture-decisions.md(ADR-024 row added)
Key Decisions
| # | Decision | Reference |
|---|---|---|
| D1 | In-memory Channel<T> + BackgroundService + per-job DI scope (Q1 PO-confirmed Option A) | ADR-024 |
| D2 | SQL filtered unique index for BR-8 (Q2 PO-confirmed Option A) | 012 SQL + RunStatusTests |
| D3 | pfill_run_history 14 cols (Q3 + Q7 Option B + 6e response_json) | 012 SQL + PowerFillRunHistory entity |
| D4 | BR-9 cleanup scope split: clear 7 user-facing, preserve 4 syn-trades + log (A58) | PowerFillRunHistoryService.CleanupRunOutputTablesAsync |
| D5 | Tenant-context propagation via RunJob capture-on-enqueue + replay-on-dequeue (resolves F-6e-5) | RunJob record + PowerFillRunBackgroundService |
| D6 | Cancel-detection via job.CancellationToken.IsCancellationRequested (NOT linked-token CT) | PowerFillRunBackgroundService.ExecuteOneJobAsync |
| D7 | response.Status reconciled with worker's terminal decision before serialising response_json | PowerFillRunBackgroundService finally block |
Full decision details + rationale in the completion report §Decisions made.
What's Next
Phase 6 (Core Allocation Engine) is COMPLETE. Sentinel phase-6e-async-runs-ready. All 6-step orchestration (BX cash-grids → BX settle-and-price → candidate-builder → allocation → pool_guide → UE) structurally deployed; orchestration layer empirically validated against PS_DemoData; Steps 1-4 produce 515-allocation baseline reproducibly; Steps 5-6 deferred to Phase 9 per documented A54+A56 carry-over.
Phase 7 (Reports / recap query APIs) is now available. The 8 read endpoints per spec §Output APIs (/runs/{id}/guide, /runs/{id}/recap, /runs/{id}/switching, etc.) surface the run-output tables that 6e's BR-9 preserves. Phase 7 should:
- Follow the 6d/6e kickoff specificity pattern (per A57's 2-session corroboration).
- Explicitly scope around A54 + A56 carry-over (read APIs that depend on Step 6/UE return empty against A54-affected runs until Phase 9).
- Revisit the test harness (SQL-Server-backed integration tests for the InMemory-blocked paths, extending
PFILL_TEST_SQLSERVER). - NOT introduce a second background-work consumer without explicitly revisiting ADR-024.
Phase 8 (React UI) + Phase 9 (Parallel Validation) breakdowns can begin drafting in parallel with Phase 7 implementation.
Risks Captured
- A54 (legacy proc PK violation on PS_DemoData snapshot) — STILL DEFERRED Phase 9. Phase 6 ships with this carry-over; Phase 7's read APIs don't depend on Step 6 succeeding.
- A56 (Step 5 fail-fast cascade) — STILL OBSERVATION, doubly-blocked with A54. Phase 9 is the gate.
- InMemory test caveat —
ExecuteUpdateAsync/ExecuteSqlRawAsyncnot supported, so severalPowerFillRunHistoryServicemethods (TransitionStatus / Finalize / Cleanup / MarkAbandonedActiveRuns) lack unit-test coverage. Live PoC against PS_DemoData covers them; Phase 7 should add SQL-Server-backed integration coverage. - Pod restart abandons in-flight runs —
PowerFillRunStartupReconciliationServicemitigates by sweeping abandoned active rows at app startup; multi-pod safety is a Phase 7+ concern per ADR-024. - Cancel-vs-Failed terminal classification subtlety — the per-job CTS check (D6) is the load-bearing seam; future BackgroundService work should bank this pattern explicitly.
Process Notes
- Sub-phase calendar time: ~1 Architect-session. Consistent with 6a, 6b, 6c, 6d, pre-6b sweep — well under the breakdown's 5-7 day estimate.
- 0 net-new Truth Rot findings against the kickoff/prompt itself. Second consecutive sub-phase with a clean kickoff (after 6d). A57's pattern observation now has 2-session corroboration; v3.1 nomination drafting is well-supported.
- No subagent delegation this sub-phase — 6e is greenfield (no SQL transcription); the architectural decisions are PSSaaS-novel and self-implementation kept the live-PoC observations (D6 / D7 / D8) in the Architect's context where they could be acted on immediately.
- Andon-cord used twice — Cancelled-vs-Failed misclassification at PoC time (D6 fix); EntityConfig test failures after entity registration (routine test-extension fix). Both surfaced via the build-feedback loop and were fixed in-session.
- All 3 Deploy Verification Gate arms exercised: sentinel green; live API exercised through happy-path enqueue, BR-8 enforcement, BR-9 cleanup, GET pagination, cancel mid-flight; deployed cleanly to local pssaas-db AND PS_DemoData; idempotent re-deploy verified; filter predicate text matches
RunStatusTests.ActiveStatusesbyte-for-byte.
Phase 6 Retrospective (cross-sub-phase observation)
The Phase 6 completion arc shipped 5+1 sub-phases (6a, pre-6b sweep, 6b, 6c, 6d, 6e) over ~5-6 Architect sessions across ~3 calendar days, against a breakdown estimate of 29-41 days. The compression came from:
- Aggressive subagent delegation for SQL transcription — 4 clean first-attempts at 670 / 5,837 / 3,274 / 6,613 lines. The Template 2 / Phase 3 SQL-transcription protocol scaled cleanly.
- The
PowerFillRunServiceextension model — every sub-phase added a Step N + RunSummary fields without rewriting existing steps. JSON contract preserved across all sub-phases. - Per-sub-phase SQL deploy file (006/008/009/010/011/012) — keeping diffs reviewable + revertable + testable in isolation.
- The 3-layer Primary-Source Verification Gate — caught findings BEFORE they propagated into wasted work; produced 50+ findings across 5 sub-phases; ~zero rework.
- The discipline shipped in 6a kept compounding through 6e — every sub-phase's completion report became the next sub-phase's kickoff input; every assumption + every D-decision + every Gate finding accumulated into searchable shared context.
- The Andon-cord protocol — A54 in 6c surfaced a real legacy-proc bug; the response was "Stop, document, escalate disposition, proceed with verbatim port" — not "silently work around." A56 in 6d compounded the observation; A56 in 6e validated the orchestration layer against the predicted outcome.
- The Architect-PO collaboration model — Q1/Q2/Q3/Q7 PO-confirmed defaults inherited from the open-questions doc; PO checkpoints at 6a → 6c → 6d planning; A54 Option C disposition consistently carried from 6c → 6d → 6e.
Banking for Phase 7 estimate: 1-2 Architect-sessions per major sub-phase, NOT 5-7 days. The 5-7 day estimate was calibrated for an Architect doing all the work manually; the subagent + reuse pattern materially changes velocity.