ADR-029: PSSaaS Tenant Identity Strategy (Phase 8.5 W4)
Status: Accepted (2026-04-20) — code-shape changes shipped at Phase 8.5 W4; canonical-identity-convention deferred to first-real-customer-onboarding (Phase 10+) per PO disposition.
Date: 2026-04-20
Supersedes: N/A (formalizes the previously-implicit "X-Tenant-Id header is identity" convention as Path γ short-term fallback while ADR-029 defines the long-term shape)
Related: ADR-005 (Database-per-tenant — the multi-DB substrate this ADR threads identity through), ADR-013 (Identity Strategy — Proposed; ADR-027 + this ADR jointly commit PSSaaS to Keycloak as the authentication source-of-truth), ADR-027 (Superset Embedding Strategy — Phase 8.5 W1 ships oauth2-proxy in front of /app/ + /api/, which is what makes the OIDC tenant_id claim available to TenantMiddleware via X-Forwarded-Access-Token)
Context
Throughout PSSaaS Phases 0-8, the tenant_id value persisted on multi-tenant rows (e.g., pfill_run_history.tenant_id) was sourced from _tenant.TenantId, which TenantMiddleware set from the inbound X-Tenant-Id header. The same string served two logically distinct purposes:
- Lookup key into
TenantRegistry(selecting which connection string to use) - Stable customer-organization identity persisted on every row
Single-writer scenarios masked the conflation entirely. PowerFill Phases 0-8 had only the Architect's local route writing into PS_DemoData; X-Tenant-Id: ps-demodata resolved against the Tenants:ps-demodata config slot; all pfill_run_history rows got tagged tenant_id='ps-demodata'. Coherent.
The conflation became load-bearing when Phase 8 W2 staging deployed and a SECOND writer arrived (the staging-deployed React UI making default-header requests). Per A68 (banked 2026-04-19, surfaced empirically 2026-04-20):
- Staging API had its SQL MI connection string mapped to the
Tenants:defaultconfig slot (the staging Kubernetes manifests' convention) - A default-header request from the React UI would write rows tagged
tenant_id='default' - Local route's historical 12 rows were tagged
tenant_id='ps-demodata' - Same physical database, two different
tenant_idvalues, mutually invisible row sets
The Path γ short-term disposition (commit 8dba7b4, 2026-04-19) added Tenants__ps-demodata__ConnectionString to the staging API Deployment so the slot name agreed with the data tag, unblocking the Phase 8 W2 PO milestone click-through. It did not decouple the conflation — it just shifted the convention so the slot name and the data tag agreed on 'ps-demodata' everywhere.
Phase 8.5 W1 ships oauth2-proxy in front of /app/ and /api/, with pass_access_token = true so the OIDC access token is forwarded to the API via X-Forwarded-Access-Token. The OIDC tenant_id claim (set by a Keycloak realm-side mapper per cross-project-relay 2026-04-20 request item #2) becomes the natural anchor for authenticated tenant identity, structurally distinct from the routing.
ADR-029 formalizes the long-term shape. PO disposition (2026-04-20, plan-stage): code-shape changes ship at Phase 8.5 W4; the data-migration arm is deferred until first-real-customer onboarding (Phase 10+).
Decision
PSSaaS commits to a two-concept tenant model where logical identity is decoupled from connection-string-slot routing:
Two stable concepts
-
Tenant identity (the value persisted on row columns like
pfill_run_history.tenant_id)- Sourced from the OIDC
tenant_idclaim on the access token forwarded by oauth2-proxy viaX-Forwarded-Access-Token - Invariant across environments + deploys for a given customer organization (Phase 10+ canonical-identity-convention TBD)
- For Phase 8.5 v1 staging: the literal value
ps-demodata(matches historical row tags so no migration is needed per the PO's "minimize Greg-demo risk" disposition)
- Sourced from the OIDC
-
Connection-string routing
- Selected via
TenantRegistry.Resolve(identity)returning aTenant(TenantId, ConnectionString)record - The Resolve method MAY (Phase 10+) return a
TenantIddistinct from the lookupidentity(e.g., an OIDC sub claim that maps to a stable customer-org slug for row-tagging) - For Phase 8.5 v1: TenantId equals the lookup identity (no canonical-rename layer; deferred per PO disposition)
- Selected via
Resolution precedence in TenantMiddleware (Phase 8.5 W4 inversion)
1. OIDC tenant_id claim from X-Forwarded-Access-Token (NEW: primary; Phase 8.5)
2. ClaimsPrincipal tenant_id claim (RESERVED: Phase 10+ JwtBearer)
3. X-Tenant-Id legacy request header (BACKWARD-COMPAT: pre-Phase-8.5)
4. "default" tenant slot (DEV/TEST fallback)
Pre-Phase-8.5 had this inverted (header was primary, claim was last); A68 documents why that broke under multi-writer scenarios. The W2 React UI continues sending X-Tenant-Id in its fetch helpers, but the API ignores it whenever a token-borne claim is available. (We deliberately did NOT remove the header from the React UI to keep the v1 contract backward-compatible; the picker became visually disabled in App.tsx with a tooltip explaining the OIDC-claim source-of-truth.)
JWT trust model (Phase 8.5 v1)
TenantMiddleware decodes the JWT payload via base64url-decode and JsonDocument.Parse to extract the tenant_id claim. It does not validate the signature. This is safe because:
- oauth2-proxy validated the token against the Keycloak JWKS before forwarding (per
infra/oauth2-proxy/oauth2-proxy.cfg+ the OIDC discovery flow) - The .NET API is only reachable via oauth2-proxy in production / staging (ingress reconfiguration per Phase 8.5 W1)
- Direct cluster bypass would also bypass
X-Forwarded-Access-Tokeninjection entirely; the legacy header / default fallbacks would catch local-dev scenarios
Phase 10+ when the trust-of-gateway model evolves (or direct API exposure outside oauth2-proxy is added), the next step is integrating ASP.NET Core's JwtBearerHandler with full signature validation against Keycloak's JWKS endpoint. The ExtractClaimFromAccessToken helper retires at that point.
Canonical-identity-convention deferral (PO disposition)
Per PO 2026-04-20 (Phase 8.5 plan §3 (d)): keep ps-demodata for the Greg-demo window. The convention decision (UUID vs slug like watermark-tpo vs sandbox vs hashed) is deferred to first-real-customer-onboarding when there's an actual second writer to drive the choice. Risks of deferring:
- The next time a second writer arrives (e.g., Phase 9 harness if it needs to write to PS_DemoData), the writer must use
'ps-demodata'fortenant_id(mechanically the same constraint Path γ enforced on the staging API config slot — but now exposed at Keycloak realm-mapper config rather than at K8s Deployment env vars) - The eventual data migration (re-tagging historical rows from
'ps-demodata'to whatever the canonical convention chooses) becomes more annoying if the row count grows large (currently ~12; Phase 9 harness writes will add ~N per harness run)
These risks are accepted given the Greg-demo-window cost-benefit: a canonical-identity convention picked NOW with no second-writer pressure is more likely to be wrong than one picked with a real customer's organizational shape on the table.
Consequences
Positive
- A68 partially resolved. Code shape decoupled at Phase 8.5 W4:
TenantRegistry.Resolve(identity)returns the(TenantId, ConnectionString)tuple; new callers don't have to reason about "which identity gets persisted" because the tuple separates them. - OIDC claim wins over header by default. Future writers (the Phase 9 harness post-OIDC-token-mint, additional React clients, mobile clients eventually) inherit the correct identity automatically as long as they authenticate via Keycloak.
- Backward-compatible. The W2 React UI's existing
X-Tenant-Idheader continues to work in pre-Phase-8.5 dev environments where oauth2-proxy isn't in the loop. The ApiClient code didn't need to change at W4 ship time. - Pattern composability. Future PSSaaS modules / surfaces inherit the
Tenant.Resolveshape; no Phase-by-Phase litigation of identity sourcing.
Negative
- Deferred-canonical-identity has a "second-writer surprise" risk described above. Mitigated by: (a) the Keycloak realm-side mapper is a single configuration point that can be updated to emit a different value if needed; (b) the SupersetEndpoints.cs + TenantMiddleware code paths are convention-agnostic (they just thread whatever claim value is set); (c) the migration-when-it-comes is a one-shot UPDATE script per A68's documented shape.
- JWT-payload extraction without signature validation is a trust-of-gateway pattern. Acceptable per the rationale above for Phase 8.5 v1; flagged as the Phase 10+ replacement target.
- Two convention-binding points must agree: the Keycloak realm-side mapper's emitted
tenant_idclaim value AND theTenants:<identity>:ConnectionStringconfig slot keys in K8s. If they drift, the middleware returns 401 "Unknown tenant" — a loud failure mode (not a silent wrong-tenant write), but operator-side coordination is needed at any re-config. - The Tenant-picker is visually disabled in v1 but not removed. Future-deprecation-debt: the React UI still has 4 files supporting the now-vestigial picker (TenantContext / tenantContextObject / useTenant / tenantConstants). Removal is deferred to the same Phase 10+ window as the canonical-identity convention.
Operational
- TenantMiddleware is the single point of identity resolution; future code that needs the canonical TenantId calls
_tenant.TenantId(unchanged from pre-Phase-8.5 — the value is just sourced differently now) - TenantRegistry.Resolve(identity) is the new preferred API for connection-string lookups; old
GetConnectionString(tenantId)is preserved as a backward-compat shim - Keycloak
pssaas-appclient must include a realm-side mapper emittingtenant_idas a claim on access tokens (cross-project-relay 2026-04-20 request item #2 covers the PSX-Infra-side delivery) - No data migration this phase. A68's "one-shot UPDATE script re-tagging historical rows" is explicitly deferred per PO disposition.