Process Discipline — PowerSeller Ecosystem
Canonical document. PSSaaS hosts. PSX and MBS Access reference this file.
This document defines the continuous-improvement discipline every AI agent in the PowerSeller ecosystem must practice. The goal is short feedback loops — surface decisions before committing, check work mid-flight, retro after delivery, and maintain a shared vocabulary for failure modes.
Governing Principle
Every agent is on the assembly line. Every agent can pull the Andon cord when they see a problem. Nothing proceeds silently when something looks wrong.
This is explicitly borrowed from the Toyota Production System's jidoka concept. The worker closest to the work has the authority and the responsibility to stop the line. For AI agents, that means: any Collaborator, Architect, or Developer agent can nominate an antipattern, escalate a concern, or propose reversing a decision — regardless of role hierarchy.
The Practices
Gates (Before Acting)
1. Alternatives-First Gate
Before committing to any non-trivial approach, surface 2-3 options with reasons for rejecting the losers.
Triggers when:
- Replicating a pattern from a sibling project
- Choosing an architecture, tool, or library
- Adopting a convention that will spread through the codebase
- Any decision that would be painful to reverse
What a gated decision looks like:
Options considered:
A. <option> — pros, cons, rejected because ...
B. <option> — pros, cons, rejected because ...
C. <option> — pros, cons, RECOMMENDED because ...
Smallest testable increment: <what we'd ship first to validate>
PO input needed? Yes / No — <why>
Counters: Sibling mimicry, Delayed alternatives, Hidden default.
2. Consolidation Gate
Before adding a new code path that duplicates the responsibility of an existing path, explicitly choose one of:
- Consolidate into the new path (delete or migrate the old)
- Extend the old path instead of adding a new one
- Document why both paths must coexist (an ADR or design note with justification and a tracking issue)
Adding-alongside is prohibited without one of those three choices made explicitly. Silent duplication creates Silent Parallel Code Paths — a named antipattern.
Triggers when:
- Writing code that overlaps in responsibility with an existing module/class/function
- "Just adding a new method" that could reasonably have lived on an existing class
- Introducing a new service/repository/handler that resembles an existing one
- Fixing a bug by forking the code path instead of repairing it
Note: Alternatives-First Gate asks "which option is best?" Consolidation Gate asks the deeper question — "should this be one path or two?" — before the options are even framed.
Counters: Silent Parallel Code Paths.
3. Primary-Source Verification Gate
Before finalizing a phase plan (or any deliverable that depends on a prior "authoritative" artifact), re-verify the prior artifact's key structural claims — counts, names, identities — against primary source.
"Primary source" means both:
- (a) Static artifacts — legacy code, vendor documentation, committed specs, schema DDL files — AND
- (b) Running systems where accessible — running databases queryable via
information_schema, running services via their APIs, deployed images via their manifests
When a live system is accessible, prefer it for questions about what is; reserve static artifacts for questions about what should be. The gap between "is" and "should be" is often where Truth Rot hides. (Scope clarified 2026-04-16 after the gate caught 3 Truth Rot instances in PowerFill Phase 2, one of which was only discoverable by querying the running dev DB.)
Primary source is not a derived document (another spec, ADR, deep dive, or assumptions log).
Gate Output Action (mandatory per finding):
When the gate produces a finding, the plan must explicitly record how the finding was handled. Running the gate is not enough; the plan must act on the finding. Three valid dispositions:
- (a) Corrected in place — the prior artifact was updated immediately; subsequent work proceeds against the corrected artifact
- (b) Scope-changed — the plan's scope itself was altered to reflect what the gate revealed (e.g., moving items to a later phase, splitting a task, adding a task)
- (c) Deferred with justification — the finding is real but addressing it now is wrong; explain why and when it gets addressed
Option (c) requires written justification. Options (a) and (b) are self-documenting. (Added 2026-04-16 after PowerFill Phase 3 plan re-scoped from 5 procedures to 3 in response to a gate finding — Gate Output Under-Weighting is the failure mode this prevents.)
This rubric applies by analogy to the Alternatives-First Gate and Consolidation Gate: when those gates produce findings, the same (a)/(b)/(c) disposition applies.
If any claim fails verification:
- Stop
- Correct the prior artifact before drafting the current plan
- Back-propagate the correction to any other artifacts that referenced the wrong claim
Triggers when:
- Beginning Phase N of a multi-phase project where Phase 0 produced specs/deep-dives
- Implementing against a derived document (spec, ADR, assumptions log) rather than against the source it describes
- Any time a prior "authoritative" artifact is being relied on for structural detail (not just direction)
Note: Alternatives-First Gate and Consolidation Gate cover approach selection. Primary-Source Verification Gate covers trusting your own prior documentation. That's a distinct failure mode.
Counters: Phase-0 Truth Rot.
4. Deploy Verification Gate
Before interpreting results from any run that depends on newly-committed code, verify the new code is actually executing in the system that produced the results. "Pushed it, should be live" is insufficient.
The deployment pipeline can silently fail to pick up new code without raising an error — missing volume mount, stale image cache, container that didn't restart, sidecar with its own copy, schema script that didn't run, config reference that points at the old version. Apparent results — logs, metrics, query output, dashboard numbers — come from the prior version of the code. Decisions made on those results are decisions made on stale behavior, with all the false confidence that produces. (Added 2026-04-16 after the PSX Collaborator's Ghost Deploy Andon-cord nomination — see PSX dev cluster's arq-worker volume-mount incident, 2026-04-16.)
Verification arms (use whichever fits the change):
- (a) Sentinel signal — include in the commit a log line, counter, metric, or response field that would NOT exist in the prior version. After the run, confirm the sentinel appears in output. Best fit for: application-code changes, observability additions, behavior changes where the new behavior emits something distinguishable.
- (b) Container/pod inspection —
grep/catthe running container for the expected file content, import statement, env var value, or module export. Best fit for: infrastructure changes, config/wiring changes, situations where (a) isn't natural (silent code paths, internal refactors). - (c) Live database probe — for schema, view, procedure, or seed-data changes, query the running database (
information_schema.routines,information_schema.columns,sys.objects,sp_helptext,SELECT TOP 1 …) to confirm the change is present. Best fit for: SQL deployments. The integration-test pattern — transient DB → apply scripts → assert viainformation_schema— is the canonical implementation; PowerFill Phase 2 and Phase 3 integration tests are exemplars.
Triggers when:
- Reporting results from a run that consumed newly-committed code (your own or another agent's)
- Approving a phase or accepting a deliverable based on observed system behavior
- Building a Counterfactual Retro on numbers from a recent deploy
- Any handoff that asserts "the new behavior is in production" without showing how that was confirmed
If the sentinel doesn't appear, or the probe doesn't show the expected state:
- Stop the cycle
- Do not interpret the run's results — they came from the wrong system
- Investigate the deploy gap before any further work proceeds (this is a blocker, not a footnote)
- Document the gap in the devlog when it's resolved — Ghost Deploy incidents are evidence the gate was needed
Note: Primary-Source Verification Gate covers trusting prior derived artifacts. Deploy Verification Gate covers trusting that committed code is executing. Both are "verify before acting"; the difference is what you're verifying.
Counters: Ghost Deploy.
5. PO Attention Routing
Classify every decision before acting:
| Mode | When | Example |
|---|---|---|
| PO decides | "What" questions — what to build, for whom, in what order, when to ship | Adding PowerFill; deprecating a feature; picking an anchor customer |
| Collaborator decides, reports after | "How" questions with strategic implications | Tool choice; architecture pattern; cross-product touches |
| Agent decides silently | "How" questions with purely operational scope | Code style; test file structure; variable naming |
Heuristic (category-based, not time-based): If the decision changes the system's externally-observable behavior, or touches how Kevin, Tom, Greg, Jay, or Lisa will interact with it, it's at minimum Collaborator-level.
Counters: Attention drain (over-asking), Hidden default (under-asking — silent operational decisions that turn out to be strategic).
Checkpoints (During)
6. Reviewable Chunks
Work producing multiple artifacts or substantial output gets a mid-point check-in offer. The PO can decline ("keep going") but the offer gets made.
Rule of thumb: if the work will take long enough that you're producing 2+ files or 2+ substantial artifacts without surfacing anything, you owe a checkpoint.
Counters: Batch accumulation.
7. Fail-Fast Permission
Any agent can propose reversing a decision without ceremony if early signal says it's wrong. No sunk-cost preservation. Explicit permission to say "the work done so far is wrong, we should redo it" without political cost.
The mechanism:
- Agent notices the signal
- Agent names the problem and proposes the reversal
- PO or Collaborator accepts or rejects quickly
- If accepted: the previous work is documented as a dead-end in the devlog, not hidden
Counters: Sunk cost retention.
8. Diagnostic-First Rule
For any non-trivial fix (anything beyond a typo, syntax error, or one-line config change), the agent must either:
- (a) Show evidence the hypothesis is correct before making the change, OR
- (b) Add diagnostic logging that will produce evidence, commit the logging separately, run once to gather data, and only then propose the fix
"I'm pretty sure this is the issue" is insufficient.
The diagnostic logging commit is separate from the fix commit. This ensures:
- Diagnostics survive if the fix is reversed
- The audit trail shows what evidence was gathered before the change
- Future debugging sessions inherit the logging
Triggers when:
- Investigating a bug that can't be reproduced with a one-line test
- Multiple iterations haven't converged on a fix ("let me try another approach" without new evidence)
- A fix hypothesis is based on reading code rather than observing behavior
Note: Evidence-Free Diagnosis is the upstream failure that makes Sunk Cost Retention inevitable — once several hypothesis-driven commits are in, the cost of starting over grows. Diagnostic-First Rule breaks the cycle at the source.
Counters: Evidence-Free Diagnosis.
9. Required Delegation Categories
When an agent is about to do work that falls into a pre-defined default-delegate category, it must either (a) delegate to a lower-tier agent (fast subagent via Task tool, manual relay to Cursor Auto/Sonnet, or Developer agent), OR (b) document a Deliberate Non-Delegation justification in the plan.
Default-delegate categories (enumerate in developer-context.md):
- Templated entity scaffolding — more than 3 similar entities
- Boilerplate tests following an existing pattern — more than 5 similar tests
- Mechanical find/replace across files — more than 10 files
- SQL script generation from a documented schema — more than 5 tables
- Find-and-replace refactors where the target state is unambiguous
Why delegate these:
- Preserves specialist context for work that actually requires it
- Enables parallelism (fast subagents can run in parallel)
- Applies cheaper models to mechanical work
- Forces the decision to be conscious rather than defaulted
Deliberate Non-Delegation format (when justified):
Deliberate Non-Delegation: <category matched>
Task: <description>
Reason for self-implementation: <why delegation would cost more than self-doing>
Context that would be lost in handoff: <specific findings, Truth-Rot corrections, etc.>
Counters: Delegation Skip.
Retros (After)
10. Counterfactual Retro
After each deliverable, ask:
"If I were starting over knowing what I know now, what would I do differently — and why am I not doing that?"
Not "what worked / what didn't" — that's performative. The counterfactual question forces action.
If the answer produces an insight, act on it:
- Update the practice
- Nominate an antipattern
- Reverse a decision if warranted (see Fail-Fast Permission)
Counters: Performative retro.
11. Outcome-Linked Retro
When a decision's outcome becomes observable, check whether it produced what was expected. If not, update the decision model — not just the process log.
Example: If we adopted "Reviewable Chunks" expecting fewer missed course-corrections, and after a few deliverables we still had late-stage rework, we revise the practice rather than claim it's working.
12. Three-Category Criterion (procedural)
Categorize every framing claim as one of:
- Predictive: something broke; the framing diagnosed and prescribed
- Prophylactic: nothing broke yet; the framing prevents a class of break
- Descriptive: vocabulary upgrade; same behavior, clearer name
Predictive AND prophylactic claims earn architectural-direction ADR weight. Descriptive earns vocabulary capture only. The criterion applies to one's own framings, not just to material under review.
Example: Banking a "kickoff specificity reduces Truth Rot probability" observation as a Predictive claim after 2 corroborating sessions, and finding at session 3 that one mis-cite slipped through, requires the claim to be downgraded to Predictive-with-qualifier rather than retained as a strong Predictive. (PSSaaS A57 → A59 arc, 2026-04-19.)
Counters: Predictive Inflation.
13. Environment-Explicit Inventory (procedural)
Capability inventories use a Capability × Environment matrix with default-empty cells. Empty cells are NOT "it works"; they are explicit "not measured here." Joint design documents and capability claims start from this convention.
Example: Phase 7's "F-7-8 Cash Trade Slotting returns 688 real rows on Failed runs" claim, framed as the canonical Phase 9 validation pattern, was true on the partial-environment of "Failed runs only." A Capability × Environment matrix with rows for "Failed run" + "Complete run" would have surfaced the empty cell that A66 (post-A54-fix) eventually filled in: "Complete run → 0 rows because UE supersedes." (PSSaaS Phase 7 → A54-fix arc, 2026-04-19.)
Counters: Capability Inflation.
14. Durable Regression Harness on Cadence (structural)
Encode capability assumptions in a regression test that re-runs on a cadence proportional to upstream change rate. Unlike #12 and #13 (procedural; cost human attention), this is structural and costs engineering build-out. The asymmetry is fundamental: capability drift's countermeasure cannot be "review better" because the original review was correct at write-time.
Example: PSX X22 (Batch 1 safety check at api/routes/pricing.py:225 encoded an assumption that was correct at write-time; upstream engine optimization silently invalidated it; regression harness scripts/trace_llpa_evaluation_baseline.py with --diag-scope flag caught the drift on the next exercise; 2026-04-19).
Counters: Capability Drift.
15. Backlog Re-Read Pass (trigger-based)
At planning-start for any Architect kickoff or phase plan, re-read the Backlog table in docs-site/docs/handoffs/pssaas-session-handoff.md end-to-end before drafting the plan. The trigger is the act of starting a new phase plan; the cost is composing an existing capability (the Backlog table already exists; reading it is free) at the right moment.
The failure mode this prevents is a specific Phase-0-Truth-Rot variant: a Backlog item that should constrain the new phase's scope or surface as inherited context slips through because the planner trusted the kickoff-prompt's inherited-context section to be complete, and the kickoff was itself drafted before the most recent Backlog additions landed. The Primary-Source Verification Gate catches drift between derived artifacts and primary source; this practice catches drift between the planning trigger and the most recently updated state-of-board document.
Triggers when:
- Drafting an Architect kickoff for a new phase
- Beginning a planning session for a new sub-phase or workstream
- Starting any work that will produce a plan-shaped artifact (ADR, deep dive, completion report § Recommended next steps)
What the pass produces (per phase):
A short table of Backlog rows that landed since the last phase plan, classified per finding:
| ID | Backlog row | Layer matched | Disposition |
|---|---|---|---|
| F-<phase>-BR-<row#> | <one-line summary> | Implementation-vs-runtime | (a) / (b) / (c) per Gate Output Action |
A "0 net-new findings" outcome is itself a valid + valuable result — it's evidence the kickoff's inherited-context section captured the relevant Backlog state correctly. Banking those zeros as part of the corroboration count is part of the discipline.
Empirical corroboration as of canonical promotion (2026-04-20):
6 instances across PowerFill Phase 7 / Phase 8 W1 / Phase 8 W2 / Phase 9 / Phase 8.5 W1 / Phase 8.5 W2-4-cross-cutting. Mix of caught-findings + zero-finding pattern-validation runs. Per-instance evidence in the respective phase completion reports under §"Primary-Source Verification Gate."
Counters: Phase-0 Truth Rot (the kickoff-time-vs-state-of-board variant specifically).
16. Cross-Boundary Cutover Verification Recipe (procedural)
When one team cuts over an infrastructure boundary the other team consumes, both sides run primary-source verification independently before trusting the cutover. The pattern formalizes a specific shape of Deploy Verification Gate (#4) that arises when ownership crosses a project boundary — neither side can rely on the other's "should be transparent" framing without empirical confirmation from their own vantage point.
Three checks at the consumer side:
- (a) HTTP /health on the unchanged-hostname surface — confirms the surface itself is alive and addressable
- (b) Content-resolution probes on auth-gated artifacts — e.g. dashboard IDs returning HTTP 302 to anonymous probes confirms the artifacts exist + auth-gating works as designed; this is the second-probe layer that prevents Single-Probe Confidence
- (c) Pod count by label in BOTH old and new namespaces —
kubectl get pods -n <new> -l app=<svc>confirms 1/1 Running in the new home;kubectl get pods -n <old> -l app=<svc>confirms scaled to 0 / no resources in the old home. The both-sides probe converts a presence claim into a presence + cleanup claim.
Equivalent checks at the producer side from their angle (e.g. az sql mi show confirming endpoint identity for a SQL MI cutover; kubectl rollout status for a Deployment cutover; the analog of (b) for whatever the producer's auth-or-routing surface is).
Triggers when:
- A producer team announces an infrastructure cutover the consumer team relies on
- A consumer team is asked to trust a "should be transparent" framing of an upstream change
- Any cross-project resource movement (namespace migration, hostname change, secret rotation that spans owners)
The pattern generalizes beyond the canonical Superset-namespace-migration instance to any infra boundary where producer says "should be transparent." Empirical-verification-on-both-sides converts "should be" into "is."
Origin: PSX Infra ↔ PSSaaS Collaborator Superset → pss-platform migration close-out 2026-04-19. PSX Infra explicitly adopted the recipe as canonical for any future Superset move; PSSaaS adopted it for any future producer-or-consumer-side infra cutover. Cheaper to copy the recipe than to re-derive each time.
Counters: Ghost Deploy (cross-boundary variant); Single-Probe Confidence (when ownership crosses a project line and a single producer-side claim could be load-bearing for consumer-side work).
Procedural vs Structural vs Trigger-Based Countermeasures
The Practices above fall into one of three shapes:
- PROCEDURAL: cost human attention. Examples: Alternatives-First Gate, Reviewable Chunks, Three-Category Criterion (#12), Environment-Explicit Inventory (#13). Effective when the claim-vs-evidence gap is detectable at attention-time (before commit, during review).
- STRUCTURAL: cost engineering build-out (NEW infrastructure). Examples: Durable Regression Harness on Cadence (#14), Deploy Verification Gate (when implemented as automated probes rather than manual checklist). Required when the failure mode appears outside attention-windows during continuous or scheduled execution (after upstream changes, during long-running production behavior, post-pod-restart, etc.).
- TRIGGER-BASED: cost composition of EXISTING infrastructure on event detection. Different from procedural (the cost isn't applying a criterion at write-time; it's recognizing a trigger condition just activated). Different from structural (doesn't require building new infrastructure; composes existing capability). Effective when (a) the relevant capability already exists AND (b) the right time to invoke it is anchored to a specific event rather than continuous cadence or attention-time. Anticipated examples: re-exercise the diagnostic surface after each major bug closes (Diagnostic Masking countermeasure, standalone nomination pending); re-read the session-handoff Backlog table at planning-start (PSSaaS Phase 7 CR #1 candidate practice, separate canonical adoption pending).
All three shapes are valid; the choice constrains what kinds of failures the countermeasure can actually catch. When proposing a new countermeasure, characterize it as one of the three in the nomination template so canonical readers know what each countermeasure actually demands.
Hybrid countermeasures exist and are valid. Real-world countermeasures may compose multiple shapes — e.g., a Cursor IDE rule (.cursor/rules/*.mdc) auto-loaded on file-glob match is structural (cost of building + maintaining the rule file) + trigger-based (continuous file-glob-match trigger composing existing IDE infrastructure). Hybrids do not break the taxonomy; they show that the taxonomy categorizes pure shapes and that real countermeasures may combine them. Canonical example: MBS Access's planned .cursor/rules/odoo-v18-deprecations.mdc (response to Capability Drift mechanism (b) — knowledge propagation across agents; the rule file makes deprecation knowledge auto-encountered rather than devlog-resident).
Antipatterns (Shared Vocabulary)
Any agent can pull the Andon cord — nominate a new antipattern when they see one. The Collaborator consolidates nominations; the PO approves canonical naming. Named antipatterns become shared vocabulary across PSSaaS, PSX, and MBS Access.
Initial Vocabulary
| Name | Behavior | Countermeasure |
|---|---|---|
| Sibling mimicry (also PSX default) | Copying a sibling-project pattern without checking fit for the current project | Alternatives-First Gate |
| Delayed alternatives | Surfacing options after committing to one | Alternatives-First Gate |
| Silent parallel code paths (PSX origin) | Adding a new feature or fix as a parallel code path alongside an existing one; over time, paths drift independently and bug fixes don't propagate | Consolidation Gate |
| Phase-0 Truth Rot (PSSaaS origin, 2026-04-16) | Phase 0 artifacts (specs, deep dives, assumptions logs) drift from source truth; later phases trust them without re-verifying against legacy code/DB/docs; errors propagate silently | Primary-Source Verification Gate |
| Batch accumulation | Deferring review until too much is done to course-correct cheaply | Reviewable Chunks |
| Sunk cost retention | Keeping a design that's wrong because work has been done | Fail-Fast Permission |
| Evidence-free diagnosis (PSX origin) | Applying a fix based on a hypothesis about root cause without first gathering evidence; multiple hypothesis-driven iterations accumulate before anyone measures what's actually happening | Diagnostic-First Rule |
| Delegation skip (PSSaaS origin, 2026-04-16) | A high-context specialist does work the multi-agent structure specifies should flow to a less-specialized agent; context pressure concentrates, cheaper models unused, second-set-of-eyes lost | Required Delegation Categories |
| Gate Output Under-Weighting (PSSaaS origin, 2026-04-16) | An agent runs a verification gate, captures a finding, but treats it as a footnote in the plan instead of re-scoping around it; the gate works, the follow-through doesn't | Gate Output Action mandatory disposition per finding |
| Ghost Deploy (PSX origin, 2026-04-16) | An agent reports results from a system run, assuming the latest committed code is what executed. The deployment pipeline silently fails to pick up the new code (missing volume mount, stale image, unmounted sidecar, schema script that didn't run, config reference to old version). Apparent results — logs, metrics, dashboard numbers — come from the prior version. Decisions made on those results are decisions made on stale behavior, with the false confidence that produces. Worse failure mode than known uncertainty | Deploy Verification Gate |
| Attention drain | Asking PO for feedback on operational trivia, degrading signal-to-noise | PO Attention Routing |
| Performative retro | "What worked / what didn't" with no resulting action | Counterfactual Retro |
| Hidden default | Unchosen defaults silently governing — no one consciously picked them | Force explicit choice |
| Instruction fade (PSX origin) | Rules forgotten within or across sessions | Align trigger, handoff discipline |
| Predictive Inflation (PSX origin, 2026-04-19) | Claiming the predictive validation status of a framing before diagnostic evidence supports the claim. The framing might prove out as predictive later, but at the time of the claim the diagnostic is incomplete or actively contradicting the framing. Distinct from Evidence-Free Diagnosis (which is about applying a fix without evidence the hypothesis is correct); Predictive Inflation is about claiming a framing-level success status without evidence the framing earned it. Lead canonical example: PSX bistability calibration check (Xigo Collaborator catch, 2026-04-17 to 2026-04-19) — Xarbi Collaborator was about to lock into the lens ADR a framing of bistability evidence as "second predictive validation" of brief §3-§4; Xigo Collaborator surfaced that the diagnostic was mid-flight; the diagnostic later rejected the bistability framing entirely (3a-extended showed tristable misclassification). PSSaaS supporting example: A57 → A59 arc (2026-04-19) where "kickoff specificity reduces Truth Rot" was banked at 2-session corroboration, and at session 3 one mis-cite required the claim to be downgraded to "reduces but does not eliminate." | Three-Category Criterion (#12) |
| Capability Inflation (PSSaaS + PSX co-origin, 2026-04-19) | Claiming a deployed-and-working capability based on partial-environment evidence. The capability worked in the environment where it was tested, but the claim implicitly extends to environments (staging, production, different run-states) where it wasn't tested. The gap between claim and evidence is the failure mode. Distinct from Predictive Inflation (which is about framing-validation status); Capability Inflation is about deployment / working-state status. Distinct from Hidden Default (which is about silent governance choices); Capability Inflation is about partial evidence presented as complete. Lead canonical example: PSSaaS F-7-8 / 688-row arc (2026-04-19) — Phase 7 completion report framed "Cash Trade Slotting returns 688 real rows on Failed runs" as "the canonical Phase 9 validation pattern"; Phase 8 W1 doubled down by framing the same 688-row dashboard as "the proof-of-life dashboard"; the A54 fix completion report (same-day) walked the claim back via F-A54-8 + A66 ("688 → 0 rows because UE supersedes Step 4's pfill_cash_market_map per A66"). The capability claim was true on the partial-environment of "Failed runs only" and silently extended to "any run." Strong supporting example: PSX Teaching Dashboard (2026-04-11 broken; 6 days of "Teaching is operational" claims across multiple handoffs without anyone exercising the capability for a post-2026-04-11 buyer). MBS Access supporting example (2026-04 sprint): ADR-0040 Power Query injection declared "stack confirmed working end-to-end" after developer agent committed code; downstream PO test revealed the injection silently no-oped on UTF-16-encoded customXml/item1.xml because the engine decoded as UTF-8 — claim based on "code committed" rather than "downloaded file contains the substituted key" (commit 8c95506). | Environment-Explicit Inventory (#13) |
| Single-Probe Confidence (PSSaaS origin, 2026-04-20) | An agent probes one artifact at one boundary, gets a plausible signal, and reasons forward as if the claim has been verified — when the boundary the claim crosses (infrastructure vs application; cross-project; cross-tenant; auth-gated surface) means a single probe is structurally insufficient. Distinct from Evidence-Free Diagnosis (which is no evidence at all); Single-Probe Confidence is some evidence treated as if it were enough evidence. Distinct from Capability Inflation (which is partial-environment evidence presented as complete); Single-Probe Confidence is single-probe evidence at a single boundary treated as cross-boundary verification. The structural shape: when the claim becomes load-bearing for downstream work owned by a different agent, project, or tenant, the cost of a wrong claim is borne at the boundary; the cost of a second probe to confirm is borne by the claimant. The asymmetry is what makes single-probe-and-reason-forward feel reasonable in the moment and turn out wrong at the boundary. Lead canonical example: PSX Keycloak realm-name claim (PSSaaS Architect 2026-04-20 W1 cross-project relay specified realm pss-platform based on inferring "K8s namespace name = realm name" from a single signal; PSX Infra empirically disconfirmed at delivery — master, psx-staging, pss-services, no pss-platform; resolved by creating pssaas-app in psx-staging instead). Supporting example 1: PSX Infra "two databases" falsification arc (PSSaaS Phase 8 W2 staging-verify single-probe claim disconfirmed by PSX Infra explicit two-database probe). Supporting example 2: embedded-SDK-OFF claim (PSSaaS Phase 8 W2 staging-verify near-miss; the claim could not be verified without authenticated PSX access; would have load-bore W3 .NET guest-token-mint endpoint design if it had shipped uncorrected). | Counter (procedural): when a claim crosses an ownership boundary (Infra vs Application; cross-project; cross-tenant; auth-gated surface), require either (i) a second independent probe of a different artifact OR (ii) explicit owner-confirmation from the boundary-owning party before treating the finding as load-bearing. Counter (trigger-based): the Cross-Boundary Cutover Verification Recipe (#16) is the structural-shape composition of (i) for the canonical infrastructure-cutover variant; the analog for non-cutover boundary crossings is the relay-compose-time check ("ask Project X what realms exist before naming one"). |
| Capability Drift (PSX origin, 2026-04-19) | A capability claim was honest at write-time. Upstream component changed. The claim silently became wrong. No fault of the original claim; the change happened post-write-time and invalidated assumptions the claim depended on. Distinct from Capability Inflation (which is about claim outrunning evidence at write-time); Capability Drift is about correct claim at T1 becoming wrong at T2. Distinct from Phase-0 Truth Rot (which is about facts at the start of the project that were never updated); Capability Drift is about capabilities that worked then upstream changes invalidated. Two known mechanisms (banked for future decomposition if a third surfaces): (a) upstream code change — external dependency, library, or sibling-system optimization invalidates an assumption; (b) knowledge propagation — knowledge of the correct pattern lived in a devlog or other ephemeral artifact that the next agent didn't encounter; the "drift" is across actors, not across code-time. Lead canonical example (mechanism a): PSX X22 (Batch 1 safety check at api/routes/pricing.py:225 encoded an assumption — evaluate_stacking() returns one LayerEvaluationResult per geometric pair — that was correct at write-time; upstream engine optimization silently invalidated it; the regression harness scripts/trace_llpa_evaluation_baseline.py with --diag-scope flag caught the drift on the next exercise; 2026-04-19). PSSaaS supporting example (mechanism a): A58 ("BR-9 cleanup scope split: 4 syn-trades + log tables preserved across BR-9") was VERIFIED at Phase 6e write-time (when A56 fail-fast prevented UE from running); A66 walked it back when A54 closed and UE began rebuilding the tables itself. MBS Access supporting example (mechanism b): the <div class="oe_chatter"> pattern was deprecated by Odoo v18 and replaced with <chatter/>; documented in pss_mbs_access devlog 2026-03-04 (v18.0.1.197.17). When a developer agent built mbs_workbook_template_views.xml under ADR-0040, the deprecated pattern was used anyway because the knowledge lived in a devlog rather than a structural-or-trigger-based countermeasure that the new agent would automatically encounter. | Durable Regression Harness on Cadence (#14) — mechanism (a). Trigger-based (e.g. .cursor/rules/*.mdc auto-loaded on file-glob match) — mechanism (b). |
Out of Scope for the Claim-vs-Evidence Family — Diagnostic Masking
The four antipatterns above (Predictive Inflation / Capability Inflation / Single-Probe Confidence / Capability Drift) form a family that addresses claims-vs-evidence-at-claim-time failures. Single-Probe Confidence (PSSaaS origin 2026-04-20) joined as the fourth member when realm-name-conflation in the W1 Keycloak relay closed the third corroborating instance of the cross-boundary single-probe shape; it sits structurally between Capability Inflation (partial-environment evidence presented as cross-environment-complete) and Capability Drift (correct-at-write-time claim that became wrong) — single-probe-at-one-boundary evidence presented as cross-boundary-verified.
A separate, structurally distinct failure mode — Diagnostic Masking — exists where a bug's downstream symptom is suppressed by another bug that fires first, and the symptom only becomes observable once the first bug is fixed. PSSaaS A66 is the canonical example (UE's clear-and-rebuild semantics on syn-trade-empty datasets were unobservable until A54 + A56 closed). Diagnostic Masking is NOT a member of the Claim-vs-Evidence family because the countermeasure shape is different: it requires re-exercising the full diagnostic surface after each major bug closes, not stricter attention to claim quality. A standalone Diagnostic Masking nomination is anticipated; this note prevents future submissions from absorbing it into the family by mistake.
Nominating a New Antipattern
Use the canonical nomination template below. Quality of nomination determines quality of consolidation — rigor in the submission saves time in the review.
Nomination Template
Nomination: <Proposed Name>
Behavior: <One paragraph describing the observable failure pattern.
What the agent does, what goes wrong, why it matters.>
Example: <Concrete instance with project, date if known, specific
details. Enough for someone unfamiliar with the situation to
understand what happened.>
Distinct from: <Explicit comparison against existing antipatterns that
might seem similar. Cite the existing name and explain the distinction.>
- <Existing antipattern> — <why this is different>
- <Existing antipattern> — <why this is different>
Proposed name: <The name, with alternatives considered if relevant>
Proposed countermeasure: <Name and description of countermeasure.
Whether it's an existing practice or a new one. If new, explain how
it fits in the Gates/Checkpoints/Retros structure.>
Proposed countermeasure shape: <Procedural, Structural, or
Trigger-Based — see §"Procedural vs Structural vs Trigger-Based
Countermeasures" above for the distinctions. Procedural
countermeasures cost human attention; Structural countermeasures
cost engineering build-out (new infrastructure); Trigger-Based
countermeasures cost composition of existing infrastructure on
event detection. All three are valid; the choice constrains what
kinds of failures the countermeasure can actually catch.>
Why new countermeasure (if proposing one): <What gap the new
countermeasure fills that existing practices don't cover.>
Nomination Process
Any agent:
- Identifies a behavior pattern producing bad outcomes
- Completes the nomination template (above)
- Submits to their project's Collaborator
- Collaborator consolidates — checks distinctness against existing vocabulary, refines wording, cross-references sibling project examples if relevant
- Collaborator presents to PO with a recommendation
- PO approves canonical naming, or sends back for revision
- Once approved, the PSSaaS Collaborator (owner of the canonical) adds it to this document
- PSSaaS Collaborator drafts a relay to cross-project Collaborators announcing the addition
- PO relays to other projects as needed
Until PO approval, the nomination is a working draft — agents can use it informally but shouldn't treat it as canonical.
What makes a strong nomination:
- Concrete example (not hypothetical)
- Explicit "Distinct from" analysis against existing vocabulary
- Proposed countermeasure that either maps to an existing practice or adds a well-scoped new one
- If adding a practice, clarity on where it fits in Gates/Checkpoints/Retros
The PSX Collaborator's 2026-04-16 nominations (Silent Parallel Code Paths, Evidence-Free Diagnosis, Ghost Deploy) are exemplary models — see the devlog entry for that day for the full submissions. The Ghost Deploy nomination is particularly strong on the "Distinct from" analysis — it explicitly compares against three existing antipatterns and articulates why this is a worse failure mode than known uncertainty (false confidence vs. acknowledged ignorance).
Cadence (Signal-Based, Not Time-Based)
AI agents cannot reliably measure wall-clock time. Trigger revision based on signals, not dates:
- Immediate revision after any named antipattern occurrence — add the specific incident to the antipattern's examples; adjust the countermeasure if needed
- Pattern-driven revision — when the same issue surfaces in 2-3 separate retros, the practice itself needs revision, not just awareness
- PO-triggered revision — Kevin calls for a review
Cross-Project Distribution
PSSaaS hosts the canonical. PSX and MBS Access reference it.
- PSX: A short stub in PSX's
CLAUDE.mdorAGENTS.mdlinks back to this file. PSX agents read PSSaaS's canonical at session start. - MBS Access: Same pattern. Lower priority since less agent work happens there.
- Updates propagate manually. When this document changes, the PSSaaS Collaborator notifies Kevin, who relays the update to PSX and MBS Access Collaborators in their respective sessions.
Why PSSaaS Hosts Rather Than PSX
PSX has more execution maturity but less structural discipline. PSSaaS is where the discipline is being built deliberately. Hosting the canonical here reinforces that PSSaaS is the clean-slate platform and that discipline is a first-class feature of the ecosystem, not just of one project.
Checklist for Every Agent Session
At session start:
- Confirm role
- Read CLAUDE.md, role context doc, session handoff
- Read this document (or skim if recent)
Before any non-trivial decision:
- Alternatives-First Gate applied?
- Consolidation Gate checked (would this create Silent Parallel Code Paths)?
- Primary-Source Verification Gate applied (if relying on prior derived artifacts)?
- Deploy Verification Gate applied (if interpreting results from a recent deploy)?
- PO Attention Routing classified?
During work:
- Offering Reviewable Chunks where appropriate?
- Comfortable reversing if early signal says wrong?
- Diagnostic-First Rule applied to non-trivial fixes (evidence before code change)?
- Required Delegation Categories checked — delegating when category matched, or documenting Deliberate Non-Delegation?
After each deliverable:
- Counterfactual Retro performed (and acted on if needed)?
- Outcome tracked if observable?
- Antipattern to nominate?
What This Document Is Not
- Not a ceremony to check boxes while doing the same work anyway
- Not a defense mechanism to blame a process when outcomes are bad
- Not a substitute for judgment — practices are heuristics, not rules
The point is to accelerate learning. If a practice isn't accelerating learning, we revise it.
Open Questions
None currently. When ambiguity arises, agents should raise it via the Andon cord — either as an antipattern nomination, a practice-revision proposal, or a PO escalation.