PowerFill Sub-Phase 6e — Async Runs + Audit + Concurrency + Phase 6 Completion

Date: 2026-04-19 Agent: PSSaaS Systems Architect (Opus 4.7 High Thinking) Scope: Convert POST /api/powerfill/run to async (202 + run_id + Location header; background worker); add pfill_run_history audit table with BR-8 filtered unique index + BR-9 cleanup + Q3 Option B input_loan_ids_json + Q7 Option B failure_step/failure_message + Phase 6e response_json; add GET /runs, GET /runs/{run_id}, POST /runs/{run_id}/cancel; ship the canonical Phase 6 completion sentinel phase-6e-async-runs-ready.

Why

Phase 6e is the final Phase 6 sub-phase. The PowerFill spec (§Run Execution Model) requires asynchronous runs; sub-phases 6a-6d shipped a synchronous best-effort surface so the orchestration was independently verifiable before adding the async runtime. 6e converts the surface to true async, adds the pfill_run_history audit table per spec §Audit Trail, enforces BR-8 single-active-run-per-tenant via a SQL filtered unique index, implements BR-9 failure-state cleanup, and ships the canonical Phase 6 completion sentinel.

After 6e ships, Phase 6 (Core Allocation Engine) is COMPLETE and Phase 7 (Reports / recap query APIs) becomes available.

What Was Done

SQL artifact (1 new file)

012_CreatePfillRunHistoryTable.sql (104 lines) — pfill_run_history table (14 cols: 11 spec canonical + Q3 Option B input_loan_ids_json + Q7 Option B failure_step + failure_message + Phase 6e response_json); filtered unique index ux_pfill_run_history_tenant_active (BR-8); cursor pagination index ix_pfill_run_history_tenant_started_at; idempotent guards + PRINT-in-guards (A32) + A50 SET preamble.

EF Core entity (1 new file)

PowerFillRunHistory.cs — 14 cols, PK (RunId UUID). Registered in PowerFillModule.RegisterEntities. PowerFill-owned table count: 22 → 23.

Service classes (5 new files)

IRunProgressSink.cs — interface + NoopRunProgressSink for the orchestrator's per-step status-transition callback.
PowerFillRunCancelRegistry.cs — process-singleton ConcurrentDictionary<Guid, CancellationTokenSource>.
PowerFillRunQueue.cs — bounded Channel<RunJob> (capacity 64, 2s enqueue timeout → 503 on saturation) + RunJob immutable record carrying captured tenant identity.
PowerFillRunHistoryService.cs — scoped audit/cleanup CRUD; BR-8 SqlException 2627 → BR8ConflictException translation; BR-9 cleanup of 7 user-facing tables (preserves 4 syn-trades + log per A58); MarkAbandonedActiveRunsAsync for startup reconciliation.
PowerFillRunBackgroundService.cs — BackgroundService channel reader; per-job DI scope; tenant-context replay (resolves F-6e-5); explicit Cancelled vs Failed terminal classification via job.CancellationToken.IsCancellationRequested (D6); response.Status reconciled with terminal decision before persisting response_json (D7); BR-9 cleanup invocation in finally block.
PowerFillRunStartupReconciliationService.cs — IHostedService running once at app startup; iterates every known tenant + per-tenant DI scope + marks abandoned active rows as Failed.

RunService refactor

PowerFillRunService.cs — ExecuteAsync(request, ct) refactored to delegate to new ExecuteResolvedAsync(options, runId, IRunProgressSink, ct); the latter is the worker entry point. Status transitions via IRunProgressSink at PreProcessing / Allocating / PostProcessing boundaries. The legacy entry point preserves back-compat with the 50+ existing PowerFillRunServiceTests.

Endpoint refactor

RunEndpoints.cs — POST /run returns 202 Accepted + RunSubmissionResponse + Location: /api/powerfill/runs/{run_id} (with 409 on BR-8 / 503 on queue saturation / 400 on invalid options); new endpoints GET /runs (paginated list), GET /runs/{run_id} (full RunResponse from response_json), POST /runs/{run_id}/cancel; POST /candidates/preview unchanged.

Module registration

PowerFillModule.cs — registered PowerFillRunHistoryService (scoped), PowerFillRunQueue + PowerFillRunCancelRegistry (singleton), PowerFillRunBackgroundService + PowerFillRunStartupReconciliationService (hosted services); registered PowerFillRunHistory entity; sentinel bumped to phase-6e-async-runs-ready.

Tests (4 new files + 1 extension)

PowerFillRunCancelRegistryTests.cs — 10 tests (Register/TryGet/TryCancel/Unregister + thread-safety + multi-run isolation).
PowerFillRunQueueTests.cs — 6 tests (FIFO ordering + cancel-propagation + saturation behaviour with timeout).
PowerFillRunHistoryServiceTests.cs — 14 tests (Insert canonical-column round-trip + JSON round-trip + tenant scoping + List pagination + cursor logic + GetStatus + Finalize/Transition argument validation).
RunStatusTests.cs — 8 tests (enum value count + ordering + active-set integrity vs SQL filter predicate + JSON serialisation contract pinning the BR-8-critical strings byte-for-byte).
EntityConfigurationTests.cs extension — added pfill_run_history to ExpectedTableNames (count 22 → 23) + AssertPk<PowerFillRunHistory>(RunId).

Test totals: 158 → 206 passed, 6 skipped, 0 failed. +48 net-new tests for 6e.

Documentation

adr-024-powerfill-async-run-pattern.md — full ADR documenting BackgroundService + Channel decision (Q1 PO-confirmed Option A); Options A-D considered; future-considerations section (multi-pod, replay, scheduled runs).
powerfill-engine.md spec amendments — §Run Execution Model (full async lifecycle), §Audit Trail (14-col schema), BR-8 (filtered index mechanism), BR-9 (cleanup scope split), §Run APIs (POST /run 202 + new endpoints), new §"Phase 6e PSSaaS-explicit tables" sub-section.
powerfill-assumptions-log.md — A58 added (BR-9 cleanup scope split + forensic preservation rationale); A56 carry-over update (Phase 6e PoC reproduces identical A54 outcome and validates orchestration layer); A57 second-corroboration note (kickoff specificity → 0 net-new Truth Rot for second consecutive sub-phase).
09-architecture-decisions.md — ADR-024 row added.
powerfill-phase-6e-completion.md — completion report (~600 lines) with full PoC verification commands and outputs, 8 Gate findings, 11 decisions table, counterfactual retro, Phase 6 completion declaration.
This devlog entry.

Files Produced / Modified

New:

src/backend/PowerSeller.SaaS.Modules.PowerFill/Sql/012_CreatePfillRunHistoryTable.sql
src/backend/PowerSeller.SaaS.Modules.PowerFill/Domain/PowerFillRunHistory.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/IRunProgressSink.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunCancelRegistry.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunQueue.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunHistoryService.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunBackgroundService.cs
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunStartupReconciliationService.cs
src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunCancelRegistryTests.cs
src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunQueueTests.cs
src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Services/PowerFillRunHistoryServiceTests.cs
src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/Contracts/RunStatusTests.cs
docs-site/docs/adr/adr-024-powerfill-async-run-pattern.md
docs-site/docs/handoffs/powerfill-phase-6e-completion.md
docs-site/docs/devlog/2026-04-19c-powerfill-phase-6e.md (this file)

Modified:

src/backend/PowerSeller.SaaS.Modules.PowerFill/Contracts/RunContracts.cs (RunStatus 2→7 values + 5 new contract types)
src/backend/PowerSeller.SaaS.Modules.PowerFill/Services/PowerFillRunService.cs (extracted ExecuteResolvedAsync entry point)
src/backend/PowerSeller.SaaS.Modules.PowerFill/Endpoints/RunEndpoints.cs (rewritten POST /run + 3 new endpoints)
src/backend/PowerSeller.SaaS.Modules.PowerFill/PowerFillModule.cs (service registrations + sentinel bump)
src/backend/tests/PowerSeller.SaaS.Modules.PowerFill.Tests/EntityConfigurationTests.cs (+1 table-name + 1 PK assertion)
docs-site/docs/specs/powerfill-engine.md (major amendments: 5 sections)
docs-site/docs/specs/powerfill-assumptions-log.md (A58 added; A56 carry-over; A57 corroboration)
docs-site/docs/arc42/09-architecture-decisions.md (ADR-024 row added)

Key Decisions

#	Decision	Reference
D1	In-memory `Channel<T>` + `BackgroundService` + per-job DI scope (Q1 PO-confirmed Option A)	ADR-024
D2	SQL filtered unique index for BR-8 (Q2 PO-confirmed Option A)	012 SQL + RunStatusTests
D3	`pfill_run_history` 14 cols (Q3 + Q7 Option B + 6e `response_json`)	012 SQL + PowerFillRunHistory entity
D4	BR-9 cleanup scope split: clear 7 user-facing, preserve 4 syn-trades + log (A58)	PowerFillRunHistoryService.CleanupRunOutputTablesAsync
D5	Tenant-context propagation via `RunJob` capture-on-enqueue + replay-on-dequeue (resolves F-6e-5)	RunJob record + PowerFillRunBackgroundService
D6	Cancel-detection via `job.CancellationToken.IsCancellationRequested` (NOT linked-token CT)	PowerFillRunBackgroundService.ExecuteOneJobAsync
D7	`response.Status` reconciled with worker's terminal decision before serialising `response_json`	PowerFillRunBackgroundService finally block

Full decision details + rationale in the completion report §Decisions made.

What's Next

Phase 6 (Core Allocation Engine) is COMPLETE. Sentinel phase-6e-async-runs-ready. All 6-step orchestration (BX cash-grids → BX settle-and-price → candidate-builder → allocation → pool_guide → UE) structurally deployed; orchestration layer empirically validated against PS_DemoData; Steps 1-4 produce 515-allocation baseline reproducibly; Steps 5-6 deferred to Phase 9 per documented A54+A56 carry-over.

Phase 7 (Reports / recap query APIs) is now available. The 8 read endpoints per spec §Output APIs (/runs/{id}/guide, /runs/{id}/recap, /runs/{id}/switching, etc.) surface the run-output tables that 6e's BR-9 preserves. Phase 7 should:

Follow the 6d/6e kickoff specificity pattern (per A57's 2-session corroboration).
Explicitly scope around A54 + A56 carry-over (read APIs that depend on Step 6/UE return empty against A54-affected runs until Phase 9).
Revisit the test harness (SQL-Server-backed integration tests for the InMemory-blocked paths, extending PFILL_TEST_SQLSERVER).
NOT introduce a second background-work consumer without explicitly revisiting ADR-024.

Phase 8 (React UI) + Phase 9 (Parallel Validation) breakdowns can begin drafting in parallel with Phase 7 implementation.

Risks Captured

A54 (legacy proc PK violation on PS_DemoData snapshot) — STILL DEFERRED Phase 9. Phase 6 ships with this carry-over; Phase 7's read APIs don't depend on Step 6 succeeding.
A56 (Step 5 fail-fast cascade) — STILL OBSERVATION, doubly-blocked with A54. Phase 9 is the gate.
InMemory test caveat — ExecuteUpdateAsync / ExecuteSqlRawAsync not supported, so several PowerFillRunHistoryService methods (TransitionStatus / Finalize / Cleanup / MarkAbandonedActiveRuns) lack unit-test coverage. Live PoC against PS_DemoData covers them; Phase 7 should add SQL-Server-backed integration coverage.
Pod restart abandons in-flight runs — PowerFillRunStartupReconciliationService mitigates by sweeping abandoned active rows at app startup; multi-pod safety is a Phase 7+ concern per ADR-024.
Cancel-vs-Failed terminal classification subtlety — the per-job CTS check (D6) is the load-bearing seam; future BackgroundService work should bank this pattern explicitly.

Process Notes

Sub-phase calendar time: ~1 Architect-session. Consistent with 6a, 6b, 6c, 6d, pre-6b sweep — well under the breakdown's 5-7 day estimate.
0 net-new Truth Rot findings against the kickoff/prompt itself. Second consecutive sub-phase with a clean kickoff (after 6d). A57's pattern observation now has 2-session corroboration; v3.1 nomination drafting is well-supported.
No subagent delegation this sub-phase — 6e is greenfield (no SQL transcription); the architectural decisions are PSSaaS-novel and self-implementation kept the live-PoC observations (D6 / D7 / D8) in the Architect's context where they could be acted on immediately.
Andon-cord used twice — Cancelled-vs-Failed misclassification at PoC time (D6 fix); EntityConfig test failures after entity registration (routine test-extension fix). Both surfaced via the build-feedback loop and were fixed in-session.
All 3 Deploy Verification Gate arms exercised: sentinel green; live API exercised through happy-path enqueue, BR-8 enforcement, BR-9 cleanup, GET pagination, cancel mid-flight; deployed cleanly to local pssaas-db AND PS_DemoData; idempotent re-deploy verified; filter predicate text matches RunStatusTests.ActiveStatuses byte-for-byte.

Phase 6 Retrospective (cross-sub-phase observation)

The Phase 6 completion arc shipped 5+1 sub-phases (6a, pre-6b sweep, 6b, 6c, 6d, 6e) over ~5-6 Architect sessions across ~3 calendar days, against a breakdown estimate of 29-41 days. The compression came from:

Aggressive subagent delegation for SQL transcription — 4 clean first-attempts at 670 / 5,837 / 3,274 / 6,613 lines. The Template 2 / Phase 3 SQL-transcription protocol scaled cleanly.
The PowerFillRunService extension model — every sub-phase added a Step N + RunSummary fields without rewriting existing steps. JSON contract preserved across all sub-phases.
Per-sub-phase SQL deploy file (006/008/009/010/011/012) — keeping diffs reviewable + revertable + testable in isolation.
The 3-layer Primary-Source Verification Gate — caught findings BEFORE they propagated into wasted work; produced 50+ findings across 5 sub-phases; ~zero rework.
The discipline shipped in 6a kept compounding through 6e — every sub-phase's completion report became the next sub-phase's kickoff input; every assumption + every D-decision + every Gate finding accumulated into searchable shared context.
The Andon-cord protocol — A54 in 6c surfaced a real legacy-proc bug; the response was "Stop, document, escalate disposition, proceed with verbatim port" — not "silently work around." A56 in 6d compounded the observation; A56 in 6e validated the orchestration layer against the predicted outcome.
The Architect-PO collaboration model — Q1/Q2/Q3/Q7 PO-confirmed defaults inherited from the open-questions doc; PO checkpoints at 6a → 6c → 6d planning; A54 Option C disposition consistently carried from 6c → 6d → 6e.

Banking for Phase 7 estimate: 1-2 Architect-sessions per major sub-phase, NOT 5-7 days. The 5-7 day estimate was calibrated for an Architect doing all the work manually; the subagent + reuse pattern materially changes velocity.

Why​

What Was Done​

SQL artifact (1 new file)​

EF Core entity (1 new file)​

Service classes (5 new files)​

RunService refactor​

Endpoint refactor​

Module registration​

Tests (4 new files + 1 extension)​

Documentation​

Files Produced / Modified​

Key Decisions​

What's Next​

Risks Captured​

Process Notes​

Phase 6 Retrospective (cross-sub-phase observation)​

Why