PowerFill Phase 8.5 — Ecosystem Auth + Embedded Superset SDK
Date: 2026-04-20
Agent: PSSaaS Systems Architect (Claude Opus 4.7 High Thinking)
Scope: Ship the last demo-blocker before Greg-demo readiness. PSSaaS joins ecosystem auth via oauth2-proxy + Keycloak (W1) + replaces W2 anchor-link "View in Superset" with @superset-ui/embedded-sdk <EmbeddedDashboard> (W2) + ships .NET 8 SupersetGuestTokenClient doing the 3-step Superset handshake (W3) + lands the A68 long-term decoupling code shape (W4). All 4 workstreams in 1 Architect-session at ~1.5x complexity vs the 6a-6e + Phase 7/8/9 baseline. Sentinel bumped to phase-8-5-ecosystem-ready (drops the historical -a54-fixed and -validation-ready suffixes; Phase 8.5 is the new authoritative milestone). 250 .NET tests pass + 6 skipped (was 233 pre-Phase-8.5; +17 net-new: 8 SupersetGuestTokenClient + 9 TenantMiddleware). Frontend npm run build clean (272.69KB JS gzipped; SDK lazy-loaded chunk separated at lib-Cb7LCDYX.js). ADR-027 status flipped Proposed → Accepted with full Architect-at-dispatch deferred-decision dispositions documented; ADR-029 NEW documenting the A68 decoupling shape + canonical-identity-convention deferral per PO disposition. Pending PO push + cross-project-relay request to PSX Infra for Keycloak pssaas-app client + 5-item Superset embedding pre-flight verification.
Why
Per the Phase 8.5 kickoff at powerfill-phase-8-5-kickoff: Phase 8.5 is the LAST demo-blocker before Greg-demo readiness per PO sequence preference (A54 fix → W2 → Phase 9 → Phase 8.5 → Greg demo). The PO milestone "I'd like to do it in staging, and I should have to authenticate via Keycloak to access /app" + the product-instinct upgrade for embedded-vs-anchor-links cannot be satisfied without ecosystem auth + the embedded SDK. Without Phase 8.5, the Greg demo would be "look at our cool new UI on a public-staging URL with new-tab Superset"; with Phase 8.5, the demo becomes "log in via Keycloak, watch a run, see the dashboard inline."
Per ADR-027 (Proposed at 2026-04-19; Accepted at this Phase 8.5 ship): the framing decisions D-8.5-1 through D-8.5-5 inherit from PSX Collaborator's empirical embedding-pattern reply (archived 2026-04-19 at 2026-04-19-psx-superset-embedding-relay). The load-bearing Q3 architectural-mismatch correction (oauth2-proxy + static-site, NOT NextAuth + Next.js — our Vite static bundle is structurally PSX-Docs-shaped) shaped W1 design.
Per A68 (banked 2026-04-19, surfaced empirically 2026-04-20 by Phase 8 W2 staging deploy): the long-term decoupling of logical TenantId from connection-string-slot routing was a natural fold-in for Phase 8.5's auth boundary. With Keycloak as source-of-truth for authenticated user identity, the OIDC tenant_id claim becomes the natural anchor for TenantId-on-rows. PO disposition (Phase 8.5 plan §3 (d)): code-shape only at W4; canonical-identity convention deferred to Phase 10+ first-real-customer-onboarding.
What
Workstream 1 — oauth2-proxy + Keycloak realm/client setup
Committed atomically as commit 4c3b921 per the Reviewable Chunks W1 checkpoint shape. PSSaaS-side artifacts: infra/oauth2-proxy/oauth2-proxy.cfg; infra/azure/k8s/pssaas-staging/services.yaml (oauth2-proxy ConfigMap + Deployment + Service); infra/azure/k8s/ingress/pssaas-ingress.yaml (delegates /oauth2/, /app/, /api/ to oauth2-proxy:4180; preserves /docs/ direct-to-docs:3000 per kickoff non-negotiable); infra/azure/k8s/pssaas-staging/oauth2-proxy-secrets.yaml.example (template for the K8s Secret holding client_secret + cookie_secret; filled version uncommitted + locally-applied by PO); .github/workflows/deploy-staging.yaml (oauth2-proxy rolling-restart step with first-deploy + Secret-existence guards).
Cross-project-relay request to PSX Infra at docs-site/docs/agents/cross-project-relays/2026-04-20-pssaas-keycloak-pssaas-app-client-request.md: 2 artifact deliveries (Keycloak pssaas-app confidential client in pss-platform realm + realm-side mapper for tenant_id claim) + 5 Superset embedding pre-flight verifications (EMBEDDED_SUPERSET=True, TALISMAN_ENABLED=False, X-Frame-Options=ALLOWALL, GUEST_TOKEN_JWT_SECRET stable, PUBLIC_ROLE_LIKE pre-flip per PSX gotcha #1 — the half-day-debug-trap) + 3 open questions (Superset admin creds for guest-token-mint, registration script invocation pattern confirmation, any post-Backlog-#30 gotchas we're missing).
W1 checkpoint report at docs-site/docs/handoffs/powerfill-phase-8-5-w1-checkpoint.md covers the W1 chunk-boundary artifact: 6 W1-specific architectural decisions (D-8.5-W1-1 through D-8.5-W1-6); the load-bearing 5-item PUBLIC_ROLE_LIKE pre-flight verification list; the 4-environment-column Capability × Environment matrix per practice #13 (with explicit "Post-PSX-Infra-collaboration smoke-test" cell); 5 Counterfactual Retro observations.
Workstream 2 — Embedded Superset SDK in React UI
Added @superset-ui/embedded-sdk@^0.3.0 (PSX-pinned version) to src/frontend/package.json as a dynamic-import dependency. The SDK lazy-loads in a separate Vite chunk (lib-Cb7LCDYX.js, 6.82KB / 2.78KB gzipped per npm run build output) so it doesn't bloat the initial bundle.
New shared component src/frontend/src/components/EmbeddedDashboard.tsx mirrors the PSX web/app/principal/components/SupersetEmbed.tsx shape per ADR-027 D-8.5-2: useEffect chain that (1) GETs {guest_token, dashboard_uuid} from /api/superset/guest-token, (2) calls embedDashboard({id: uuid, supersetDomain, mountPoint, fetchGuestToken}) with the SDK's refresh callback returning the JWT string, (3) iframe-sizing trick — setInterval(100ms) poll until containerRef.current.querySelector('iframe') returns non-null then applies width/height/border-radius styles (PSX's empirically-debugged pattern; copied faithfully). Cleanup contract per AGENTS.md async-leak countermeasure: AbortController on initial fetchGuestToken + clearInterval on poll + embedded.unmount() on unmount. A66/A69 honesty preserved at the component level (errors render in a labeled error block; loading state surfaces explicitly; never silently fails to a misleading state).
Anchor-link → embedded-component swaps per ADR-027 D-8.5-3 + Phase 8.5 plan §3 (c) "full replace" disposition:
src/frontend/src/pages/reports/reportShell.tsx— anchor link replaced with smalldata-test-id="superset-embed-marker"indicator in the header;<EmbeddedDashboard>rendered ABOVE the GenericTable when verdict === 'Current'. A66 BLUE banner UX preserved (FreshnessBanner BEFORE embed; embed NOT rendered for TerminalEmpty verdict to avoid confusing-empty-iframe)src/frontend/src/pages/RunStatus.tsx— per-report Superset anchors replaced with embed markers; Hub anchor replaced with inline<EmbeddedDashboard dashboardKey="hub">rendered as a section. Embed only shown for terminal states (active runs poll fresh data every 2s; embed would visually compete)src/frontend/src/pages/Home.tsx— Hub anchor card replaced with inline<EmbeddedDashboard dashboardKey="hub" height="640px">as the canonical proof-of-life surface per A66 + ADR-027 demo narrative coherence
src/frontend/src/api/types.ts extended with GuestTokenResponse { guest_token, dashboard_uuid } + DashboardKey union type. src/frontend/src/api/client.ts adds fetchGuestToken(dashboardKey, options) with separate SUPERSET_API_BASE constant (cross-cutting /api/superset/, NOT under /api/powerfill/). src/frontend/src/config/supersetDashboards.ts rewritten to add key: DashboardKey field per dashboard + SUPERSET_DOMAIN constant; UUIDs left server-side (resolved per-mint by the .NET endpoint's response — F-8.5-W2-SDK-1 architectural finding; see Findings).
src/frontend/src/App.tsx Header tenant-picker DISABLED with tooltip "Tenant determined by your authenticated Keycloak session (OIDC tenant_id claim)" (W4 fold-in consolidated into the W2 visible UI changes); added Sign out link to /oauth2/sign_out (oauth2-proxy endpoint); footer sentinel reference updated.
Workstream 3 — .NET 8 guest-token-mint endpoint + registration script
New src/backend/PowerSeller.SaaS.Modules.Superset/ module per ADR-004 modular-monolith pattern. The load-bearing piece is Services/SupersetGuestTokenClient.cs: typed HttpClient (registered via the new SupersetExtensions.AddSupersetModule(IConfiguration) extension — canonical first instance of IHttpClientFactory use in the codebase per Phase 8.5 plan §2 Consolidation Gate disposition) doing the 3-step handshake. Per PSX Collab gotcha "CSRF is the most-missed piece":
- Step 1: POST
/api/v1/security/loginreturnsaccess_token - Step 2: GET
/api/v1/security/csrf_token/with Bearer Authorization returnscsrf_token - Step 3: POST
/api/v1/security/guest_token/with Bearer Authorization + X-CSRFToken header + Referer header (cousin gotcha — Superset CSRF middleware checks Referer even when CSRFToken is present) + body{user, resources: [{type:dashboard, id:uuid}], rls:[]}returns the guest token
SemaphoreSlim-guarded session memo with LoginSessionCacheSeconds = 300 default reduces Superset chattiness (Option A scoping = up to 9 mints per operator click-through). 401-invalidates-cache for mid-session admin credential rotation handling. No Polly retry in v1 per Phase 8.5 plan §6 banked observation — failure modes surface cleanly through the embed-side error block per A69 honesty pattern.
Endpoints/SupersetEndpoints.cs exposes POST /api/superset/guest-token (cross-cutting per ADR-027 + plan §3 (a)). Auth check: rejects HTTP 401 if X-Forwarded-Access-Token missing (oauth2-proxy MUST forward; direct cluster bypass fails closed). 400 on unknown dashboard_key; 503 on known-but-unconfigured key (W3 script not yet run); 502 on Superset handshake failure; 200 on success.
Configuration/PowerFillEmbedUuids.cs IOptions binds the dashboard_key → UUID resolution map from appsettings.Staging.json + the K8s ConfigMap powerfill-embed-uuids (envFrom configMapRef in services.yaml). Defaults to empty strings; the endpoint distinguishes 400-for-unknown-key from 503-for-empty-UUID by checking known-keys list.
The infra/superset/register-powerfill-embeds.py per-dashboard registration script was DELEGATED to a fast subagent per Phase 8.5 plan §6 Required Delegation Categories disposition. Subagent chose Flask app context as primary path (matches existing deploy-powerfill.py + deploy-dashboards-flask.py patterns; avoids the broken Keycloak service-account REST auth flagged in AGENTS.md); --via-rest documented as fallback for the day Keycloak auth is unblocked. Idempotent get-or-create via EmbeddedDashboardDAO.upsert(...); writes UUIDs to infra/superset/powerfill-embed-uuids.json (NOT committed by the script — PSX Infra runs the script + Architect commits the resulting JSON separately); prints kubectl create configmap command with all 8 UUIDs pre-filled. Subagent output passed the W2-PS608-antipattern first-question check ("is this what the kickoff asked for?" before any disposition framing); no scope drift.
8 net-new tests in src/backend/tests/PowerSeller.SaaS.Modules.Superset.Tests/SupersetGuestTokenClientTests.cs pin the load-bearing 3-step handshake behavior: happy-path 3-step + headers; X-CSRFToken + Referer + Authorization Bearer all attached; resources array + user object body shape; session caching (subsequent mints reuse cached login + CSRF, NOT 3 fresh HTTP calls); 401-invalidates-cache with fresh handshake on retry; missing-AdminUsername throws InvalidOperationException; login-500 throws HttpRequestException; empty-BaseUrl throws InvalidOperationException at constructor.
Workstream 4 — A68 long-term decoupling fold-in
src/backend/PowerSeller.SaaS.Infrastructure/Data/Tenant.cs NEW — public sealed record Tenant(string TenantId, string ConnectionString); is the A68 decoupling primitive. TenantRegistry.Resolve(identity) returns a Tenant? tuple; old GetConnectionString(tenantId) preserved for backward compat with deprecation comment.
src/backend/PowerSeller.SaaS.Api/Middleware/TenantMiddleware.cs REWRITTEN with resolution-precedence INVERTED per Phase 8.5 plan §7 + ADR-029:
- OIDC
tenant_idclaim fromX-Forwarded-Access-Token(NEW primary; Phase 8.5) - ClaimsPrincipal claim (RESERVED Phase 10+ JwtBearer)
- X-Tenant-Id legacy header (BACKWARD-COMPAT)
- "default" fallback
Internal helper ExtractClaimFromAccessToken does base64url decode of JWT payload + claim extraction. No signature validation per trust-of-gateway model — oauth2-proxy already validated the token against Keycloak JWKS before forwarding; full ASP.NET Core JwtBearerHandler integration is the Phase 10+ replacement. <InternalsVisibleTo Include="PowerSeller.SaaS.Api.Tests" /> added to API csproj so tests can pin the helper directly.
9 net-new tests in src/backend/tests/PowerSeller.SaaS.Api.Tests/Middleware/TenantMiddlewareTests.cs pin the resolution-precedence inversion: happy-path OIDC claim resolves; missing-claim+missing-header default fallback; unknown-identity 401; claim-vs-header conflict (claim wins) — the load-bearing assertion; backward-compat header-only; token-without-claim falls back to header; malformed-token falls back to header; helper-direct claim extraction (positive + missing-claim cases).
ADR-029 (Accepted) at docs-site/docs/adr/adr-029-pssaas-tenant-identity-strategy.md documents the long-term shape, the v1 code-shape-only PO disposition, the JWT trust-of-gateway model, the 4 alternatives considered (slug / UUID / keep-ps-demodata / full-JwtBearer-now), and the Phase 10+ follow-up checklist. A68 status updated to PARTIALLY RESOLVED in docs-site/docs/specs/powerfill-assumptions-log.md with the 2026-04-20 status update block at the top of the entry.
Cross-cutting
Sentinel bumped from phase-9-validation-ready to phase-8-5-ecosystem-ready in PowerFillModule.cs (drops the historical phase-9 marker; Phase 8.5 is the new authoritative milestone). Spec amendment at docs-site/docs/specs/powerfill-engine.md adds Phase 8.5 row marked DONE between rows 9 and 10 (sequenced per PO dispatch order: 8 → 9 → 8.5 → 10). ADR-027 status flipped Proposed → Accepted with full §"Decisions deferred to Phase 8.5 Architect — Architect-at-dispatch resolutions" rewrite documenting each deferred item's actual disposition. ADR index at arc42/09-architecture-decisions.md updated.
Completion report at powerfill-phase-8-5-completion; 18-row Capability × Environment matrix per practice #13 with 4 environment columns; bilateral cross-boundary cutover verification recipe + A69 honesty preservation evidence + 8-observation Counterfactual Retro. Greg-demo-readiness handoff amendment adds two load-bearing slot-in slides per kickoff: "live operator workflow against auth-protected staging URL" + "embedded dashboards inside PSSaaS UI."
Findings
| ID | Severity | Disposition |
|---|---|---|
| F-8.5-W2-SDK-1 | Spec-vs-implementation (planning-time) | RESOLVED at dispatch — UUID lifecycle stays server-side; supersetDashboards.ts schema simplified (no uuid field); SDK gets UUID from per-mint response. Cleaner separation of concerns than the plan's original shape. |
| F-8.5-W3-1 | Implementation-vs-runtime (subagent-time) | RESOLVED at delegation — register-powerfill-embeds.py uses Flask app context as primary path per AGENTS.md "REST API broken for service accounts on Keycloak instances"; --via-rest fallback documented |
| F-8.5-W3-TEST-1 | Implementation-vs-runtime (build-time) | RESOLVED in same session — RecordingHandler refactored to eagerly buffer body to a separate string list (avoids ObjectDisposedException after production-side using var request disposes) |
| F-8.5-W4-TEST-1 | Implementation-vs-runtime (build-time) | RESOLVED in same session — Assert.Equal(0, StatusCode) was wrong (DefaultHttpContext defaults to 200); changed to Assert.NotEqual(401, ...) |
| F-8.5-W4-INTERNAL-1 | Spec-vs-implementation (build-time) | RESOLVED in same session — added InternalsVisibleTo for Api.Tests so the internal helper can be pinned by xunit (canonical .NET pattern) |
| F-8.5-CROSS-1 | Backlog re-read pass (planning-time) | Backlog #31 marked DONE at this commit batch; session-handoff bump captures the closure |
The 5-instance-corroborated Backlog re-read pass at planning-time held empirically. Build-time findings (TEST-1, INTERNAL-1) are caught by Deploy Verification Gate arm (b) — the build pipeline itself is the canonical countermeasure for that class of finding.
Notes
- Reviewable Chunks shape mixed — explicit W1 checkpoint per PO disposition; W2-4 then proceeded in same session per user-side directive. The W1 checkpoint artifact remains authoritative for the W1 chunk; this session's completion report consolidates both. Banking observation: Reviewable Chunks shape can be over-ridden mid-session by PO directive without invalidating the prior chunk artifact.
- Required Delegation: 1 subagent dispatched (W3 register script); rest self-implemented per 5-instance-corroborated "contract-per-artifact density high → self-implement" + 2-instance-corroborated "delegate mechanical idempotent script gen" heuristics.
- Practice #13 Environment-Explicit Inventory ACTIVELY APPLIED — 18-row, 4-column matrix in completion report explicitly distinguishes Architect-session vs Post-PO-push vs Post-PSX-Infra-pre-flight vs Production verification levels. PSSaaS-side artifact-level GREEN; runtime-level NOT MEASURED HERE pending bilateral cross-boundary cutover verification.
- Andon-cord readiness — pulled twice (both for build-time test-code bugs); both fixed in under 5 minutes each. The Docker-based
dotnet testworkflow (~30s per cycle) makes this a habitable check. - Counterfactual Retro filled in completion report with 8 observations — most important: (1) single-pass execution is a defensible Reviewable Chunks variant; (2) trust-of-gateway is the right v1 JWT validation model; (3) UUID-stays-server-side is structurally cleaner than the plan's original shape; (4) tests-fail-then-fix cycle caught real bugs (Deploy Verification Gate arm b at work); (8) pre-push docs-build check now 11-instance corroborated.
- Sub-phase calendar time: ~1 Architect-session. Consistent with prior phase velocities at ~1.5x complexity (4 workstreams in one session).
Phase 8.5 is PSSaaS-side code-complete + build-verified + test-pinned; the PSX Infra collaboration ask is dispatched; Greg-demo dry run becomes runnable end-to-end against auth-protected staging URL post-bilateral-verification. Sentinel reflects phase-8-5-ecosystem-ready.