Skip to main content

ADR-029: PSSaaS Tenant Identity Strategy (Phase 8.5 W4)

Status: Accepted (2026-04-20) — code-shape changes shipped at Phase 8.5 W4; canonical-identity-convention deferred to first-real-customer-onboarding (Phase 10+) per PO disposition. Date: 2026-04-20 Supersedes: N/A (formalizes the previously-implicit "X-Tenant-Id header is identity" convention as Path γ short-term fallback while ADR-029 defines the long-term shape) Related: ADR-005 (Database-per-tenant — the multi-DB substrate this ADR threads identity through), ADR-013 (Identity Strategy — Proposed; ADR-027 + this ADR jointly commit PSSaaS to Keycloak as the authentication source-of-truth), ADR-027 (Superset Embedding Strategy — Phase 8.5 W1 ships oauth2-proxy in front of /app/ + /api/, which is what makes the OIDC tenant_id claim available to TenantMiddleware via X-Forwarded-Access-Token)

Context

Throughout PSSaaS Phases 0-8, the tenant_id value persisted on multi-tenant rows (e.g., pfill_run_history.tenant_id) was sourced from _tenant.TenantId, which TenantMiddleware set from the inbound X-Tenant-Id header. The same string served two logically distinct purposes:

  1. Lookup key into TenantRegistry (selecting which connection string to use)
  2. Stable customer-organization identity persisted on every row

Single-writer scenarios masked the conflation entirely. PowerFill Phases 0-8 had only the Architect's local route writing into PS_DemoData; X-Tenant-Id: ps-demodata resolved against the Tenants:ps-demodata config slot; all pfill_run_history rows got tagged tenant_id='ps-demodata'. Coherent.

The conflation became load-bearing when Phase 8 W2 staging deployed and a SECOND writer arrived (the staging-deployed React UI making default-header requests). Per A68 (banked 2026-04-19, surfaced empirically 2026-04-20):

  • Staging API had its SQL MI connection string mapped to the Tenants:default config slot (the staging Kubernetes manifests' convention)
  • A default-header request from the React UI would write rows tagged tenant_id='default'
  • Local route's historical 12 rows were tagged tenant_id='ps-demodata'
  • Same physical database, two different tenant_id values, mutually invisible row sets

The Path γ short-term disposition (commit 8dba7b4, 2026-04-19) added Tenants__ps-demodata__ConnectionString to the staging API Deployment so the slot name agreed with the data tag, unblocking the Phase 8 W2 PO milestone click-through. It did not decouple the conflation — it just shifted the convention so the slot name and the data tag agreed on 'ps-demodata' everywhere.

Phase 8.5 W1 ships oauth2-proxy in front of /app/ and /api/, with pass_access_token = true so the OIDC access token is forwarded to the API via X-Forwarded-Access-Token. The OIDC tenant_id claim (set by a Keycloak realm-side mapper per cross-project-relay 2026-04-20 request item #2) becomes the natural anchor for authenticated tenant identity, structurally distinct from the routing.

ADR-029 formalizes the long-term shape. PO disposition (2026-04-20, plan-stage): code-shape changes ship at Phase 8.5 W4; the data-migration arm is deferred until first-real-customer onboarding (Phase 10+).

Decision

PSSaaS commits to a two-concept tenant model where logical identity is decoupled from connection-string-slot routing:

Two stable concepts

  1. Tenant identity (the value persisted on row columns like pfill_run_history.tenant_id)

    • Sourced from the OIDC tenant_id claim on the access token forwarded by oauth2-proxy via X-Forwarded-Access-Token
    • Invariant across environments + deploys for a given customer organization (Phase 10+ canonical-identity-convention TBD)
    • For Phase 8.5 v1 staging: the literal value ps-demodata (matches historical row tags so no migration is needed per the PO's "minimize Greg-demo risk" disposition)
  2. Connection-string routing

    • Selected via TenantRegistry.Resolve(identity) returning a Tenant(TenantId, ConnectionString) record
    • The Resolve method MAY (Phase 10+) return a TenantId distinct from the lookup identity (e.g., an OIDC sub claim that maps to a stable customer-org slug for row-tagging)
    • For Phase 8.5 v1: TenantId equals the lookup identity (no canonical-rename layer; deferred per PO disposition)

Resolution precedence in TenantMiddleware (Phase 8.5 W4 inversion)

1. OIDC tenant_id claim from X-Forwarded-Access-Token (NEW: primary; Phase 8.5)
2. ClaimsPrincipal tenant_id claim (RESERVED: Phase 10+ JwtBearer)
3. X-Tenant-Id legacy request header (BACKWARD-COMPAT: pre-Phase-8.5)
4. "default" tenant slot (DEV/TEST fallback)

Pre-Phase-8.5 had this inverted (header was primary, claim was last); A68 documents why that broke under multi-writer scenarios. The W2 React UI continues sending X-Tenant-Id in its fetch helpers, but the API ignores it whenever a token-borne claim is available. (We deliberately did NOT remove the header from the React UI to keep the v1 contract backward-compatible; the picker became visually disabled in App.tsx with a tooltip explaining the OIDC-claim source-of-truth.)

JWT trust model (Phase 8.5 v1)

TenantMiddleware decodes the JWT payload via base64url-decode and JsonDocument.Parse to extract the tenant_id claim. It does not validate the signature. This is safe because:

  • oauth2-proxy validated the token against the Keycloak JWKS before forwarding (per infra/oauth2-proxy/oauth2-proxy.cfg + the OIDC discovery flow)
  • The .NET API is only reachable via oauth2-proxy in production / staging (ingress reconfiguration per Phase 8.5 W1)
  • Direct cluster bypass would also bypass X-Forwarded-Access-Token injection entirely; the legacy header / default fallbacks would catch local-dev scenarios

Phase 10+ when the trust-of-gateway model evolves (or direct API exposure outside oauth2-proxy is added), the next step is integrating ASP.NET Core's JwtBearerHandler with full signature validation against Keycloak's JWKS endpoint. The ExtractClaimFromAccessToken helper retires at that point.

Canonical-identity-convention deferral (PO disposition)

Per PO 2026-04-20 (Phase 8.5 plan §3 (d)): keep ps-demodata for the Greg-demo window. The convention decision (UUID vs slug like watermark-tpo vs sandbox vs hashed) is deferred to first-real-customer-onboarding when there's an actual second writer to drive the choice. Risks of deferring:

  • The next time a second writer arrives (e.g., Phase 9 harness if it needs to write to PS_DemoData), the writer must use 'ps-demodata' for tenant_id (mechanically the same constraint Path γ enforced on the staging API config slot — but now exposed at Keycloak realm-mapper config rather than at K8s Deployment env vars)
  • The eventual data migration (re-tagging historical rows from 'ps-demodata' to whatever the canonical convention chooses) becomes more annoying if the row count grows large (currently ~12; Phase 9 harness writes will add ~N per harness run)

These risks are accepted given the Greg-demo-window cost-benefit: a canonical-identity convention picked NOW with no second-writer pressure is more likely to be wrong than one picked with a real customer's organizational shape on the table.

Consequences

Positive

  • A68 partially resolved. Code shape decoupled at Phase 8.5 W4: TenantRegistry.Resolve(identity) returns the (TenantId, ConnectionString) tuple; new callers don't have to reason about "which identity gets persisted" because the tuple separates them.
  • OIDC claim wins over header by default. Future writers (the Phase 9 harness post-OIDC-token-mint, additional React clients, mobile clients eventually) inherit the correct identity automatically as long as they authenticate via Keycloak.
  • Backward-compatible. The W2 React UI's existing X-Tenant-Id header continues to work in pre-Phase-8.5 dev environments where oauth2-proxy isn't in the loop. The ApiClient code didn't need to change at W4 ship time.
  • Pattern composability. Future PSSaaS modules / surfaces inherit the Tenant.Resolve shape; no Phase-by-Phase litigation of identity sourcing.

Negative

  • Deferred-canonical-identity has a "second-writer surprise" risk described above. Mitigated by: (a) the Keycloak realm-side mapper is a single configuration point that can be updated to emit a different value if needed; (b) the SupersetEndpoints.cs + TenantMiddleware code paths are convention-agnostic (they just thread whatever claim value is set); (c) the migration-when-it-comes is a one-shot UPDATE script per A68's documented shape.
  • JWT-payload extraction without signature validation is a trust-of-gateway pattern. Acceptable per the rationale above for Phase 8.5 v1; flagged as the Phase 10+ replacement target.
  • Two convention-binding points must agree: the Keycloak realm-side mapper's emitted tenant_id claim value AND the Tenants:<identity>:ConnectionString config slot keys in K8s. If they drift, the middleware returns 401 "Unknown tenant" — a loud failure mode (not a silent wrong-tenant write), but operator-side coordination is needed at any re-config.
  • The Tenant-picker is visually disabled in v1 but not removed. Future-deprecation-debt: the React UI still has 4 files supporting the now-vestigial picker (TenantContext / tenantContextObject / useTenant / tenantConstants). Removal is deferred to the same Phase 10+ window as the canonical-identity convention.

Operational

  • TenantMiddleware is the single point of identity resolution; future code that needs the canonical TenantId calls _tenant.TenantId (unchanged from pre-Phase-8.5 — the value is just sourced differently now)
  • TenantRegistry.Resolve(identity) is the new preferred API for connection-string lookups; old GetConnectionString(tenantId) is preserved as a backward-compat shim
  • Keycloak pssaas-app client must include a realm-side mapper emitting tenant_id as a claim on access tokens (cross-project-relay 2026-04-20 request item #2 covers the PSX-Infra-side delivery)
  • No data migration this phase. A68's "one-shot UPDATE script re-tagging historical rows" is explicitly deferred per PO disposition.

Alternatives Considered

A. Stable customer-org slug (e.g., watermark-tpo, sandbox)

Rejected for Phase 8.5 v1, candidate for Phase 10+. Pros: human-readable; matches PSX's participant_type='principal' + buyer_id='watermark-tpo' convention; greppable in logs / Superset filters / kubectl env values. Cons: collision risk if a customer ever picks a slug another customer already used; the canonical-naming-authority question is open.

B. UUID

Rejected. Pros: collision-proof; future-proof; opaque (no organizational meaning leaks). Cons: significantly worse operator UX during Greg demo because Superset filters / Backlog audits / kubectl env var values become 36-char hex strings; debugging a "wrong tenant" issue is much harder when the literal value carries no semantic content.

C. Keep ps-demodata; defer canonical convention to Phase 10+ (CHOSEN)

Chosen per PO disposition. Pros: lowest risk for Greg demo (no migration; no convention-decision-fork now); the architectural decoupling code shape lands cleanly without the data-migration arm; the convention decision happens with a real customer's organizational shape on the table. Cons: deferred decision still has to land eventually; the second-writer-surprise risk described above.

D. Full ASP.NET Core JwtBearerHandler integration NOW

Rejected for Phase 8.5 v1, candidate for Phase 10+. Pros: first-class signature validation; standard ASP.NET Core auth pipeline; AuthorizationPolicies become available. Cons: significant scope expansion at a phase explicitly bounded as "demo-blocker resolution"; oauth2-proxy already validates the token; trust-of-gateway is a defensible v1 model per the rationale above; the Phase 10+ migration is non-breaking (the ExtractClaimFromAccessToken helper is the only thing that has to retire, and the resolution-precedence order stays the same).

Decision provenance

  • A68 banked 2026-04-19 — surfaced by the Phase 8 W2 staging deploy F-W2-PSD-1 finding; documented the conflation pattern + Path γ short-term disposition
  • Phase 8.5 plan §3 (d) PO disposition 2026-04-20 — keep ps-demodata; defer canonical-identity convention to first-real-customer onboarding
  • Phase 8.5 plan §7 W4 deliverables — code shape only; migration is no-op; new ADR documents the partial closure shape
  • Cross-project-relay 2026-04-20 request item #2 — Keycloak realm-side mapper for the OIDC tenant_id claim (PSX-Infra-owned delivery)
  • ADR-027 — establishes the oauth2-proxy in front of /api/ that makes X-Forwarded-Access-Token available to TenantMiddleware

Architect-at-dispatch checklist

Phase 8.5 W4 implementation summary (this commit):

  1. New Tenant sealed record at src/backend/PowerSeller.SaaS.Infrastructure/Data/Tenant.cs
  2. TenantRegistry.Resolve(identity) added; GetConnectionString(tenantId) preserved for backward compat
  3. TenantMiddleware.InvokeAsync resolution-precedence inverted: OIDC claim wins, header is fallback
  4. TenantMiddleware.ExtractClaimFromAccessToken(...) internal helper for base64url JWT-payload claim extraction (no signature validation per trust-of-gateway model)
  5. InternalsVisibleTo("PowerSeller.SaaS.Api.Tests") so the helper can be pinned by xunit
  6. Frontend tenant-picker DISABLED with tooltip in src/frontend/src/App.tsx Header (full removal deferred)
  7. 9 new TenantMiddleware tests covering: happy-path claim resolution; missing-claim-and-header default fallback; unknown-identity 401; claim-vs-header conflict (claim wins); backward-compat header-only; token-without-claim falls back to header; malformed-token falls back to header; helper-direct claim extraction (positive + missing-claim cases)
  8. A68 status updated to PARTIALLY RESOLVED in docs-site/docs/specs/powerfill-assumptions-log.md

Phase 10+ follow-up

When a real customer enters the system and the canonical-identity convention must be picked:

  1. Architect re-evaluates options A vs B above with the actual customer-org context
  2. Decision documented as either an ADR-029 amendment OR a successor ADR
  3. TenantRegistry.Resolve implementation updated to map identity → (TenantId, ConnectionString) per the chosen convention (currently the implementation is the identity function on TenantId)
  4. One-shot UPDATE script per A68's documented shape re-tags historical pfill_run_history.tenant_id rows
  5. Keycloak realm-side mapper updated to emit the new convention
  6. (Optional) Full JwtBearerHandler integration; ExtractClaimFromAccessToken retires
  7. (Optional) Tenant-picker removed entirely from React UI; the 4 supporting files (TenantContext / tenantContextObject / useTenant / tenantConstants) deleted