Process Discipline v3 — Ghost Deploy + Deploy Verification Gate

Date: 2026-04-16 Agent: Collaborator Scope: Consolidate and codify the PSX Collaborator's Ghost Deploy antipattern nomination, with one PSSaaS-specific refinement to the countermeasure. PO approved.

Why

The PSX Collaborator pulled the Andon cord with a third nomination today, drawn from a real PSX dev-cluster incident this morning: arq-worker pods had the app-source hostPath volume declared in the pod spec but the matching volumeMounts entry missing in the container spec. Three commits worth of new code (Fix 1, Phase 1 observability, Phase 2 MVP) were "deployed" — pushed, image-rebuilt, pods restarted — but every convergent loop run executed the pre-fix image. The PSX Collaborator built a Counterfactual Retro on observability data that came from the old code, then approved Phase 2 MVP based on those numbers, then "deployed" Phase 2 MVP into the same unmounted worker. The gap was caught only when a NewRez run produced Phase 1 observability log entries that shouldn't have been possible — forcing investigation, infra patched the mount, commit f8f7a46 documented the wiring fix.

This is the failure mode where an agent's apparent evidence comes from the wrong system, and decisions get made on stale behavior with all the false confidence that produces.

What Was Done

New Practice Added

Deploy Verification Gate (Gate #4, before acting) — Before interpreting results from any run that depends on newly-committed code, verify the new code is actually executing. Three verification arms:
- (a) Sentinel signal — include a log line, counter, metric, or response field in the commit that wouldn't exist in the prior version; confirm it appears in run output.
- (b) Container/pod inspection — grep / cat the running container for expected file content, import, env var, or module export.
- (c) Live database probe — for SQL/schema/seed changes, query information_schema.routines, information_schema.columns, sys.objects, sp_helptext to confirm presence.
- "Pushed it, should be live" is insufficient.

Total practices: 11 (was 10). Numbering in the canonical was rolled to keep Gates contiguous (#1 Alternatives-First, #2 Consolidation, #3 Primary-Source Verification, #4 Deploy Verification, #5 PO Attention Routing).

New Antipattern Added

Ghost Deploy (PSX origin, 2026-04-16) — An agent reports results from a system run, assuming the latest committed code is what executed. The deployment pipeline silently fails to pick up the new code (missing volume mount, stale image cache, container that didn't restart, sidecar with its own copy, schema script that didn't run, config reference to old version). Apparent results — logs, metrics, dashboard numbers — come from the prior version. Decisions made on those results are decisions made on stale behavior, with the false confidence that produces.

Total named antipatterns: 14 (was 13).

PSX's nomination proposed two verification arms — sentinel signal and container inspection. PSSaaS has a different deployment surface: a lot of work is SQL artifacts (schema scripts, view definitions, stored procedures, seed data). Adding a "log line" to a CREATE PROCEDURE doesn't fit; grepping inside an mssql container is awkward. We've already independently invented the right pattern — transient DB → apply scripts → assert via information_schema — for the PowerFill Phase 2 and Phase 3 integration tests. The refinement makes that arm canonical:

(c) Live database probe — for schema, view, procedure, or seed-data changes, query the running database (information_schema.routines, information_schema.columns, sys.objects, sp_helptext, SELECT TOP 1 …) to confirm the change is present.

The PowerFill integration tests are listed as the canonical exemplars for this arm.

Vocabulary Drift Caught and Corrected

While editing psx-relay-stub.md to add Ghost Deploy, I noticed the stub's "Initial Vocabulary" list was missing five previously-accepted antipatterns — Silent Parallel Code Paths, Phase-0 Truth Rot, Evidence-Free Diagnosis, Delegation Skip, Gate Output Under-Weighting. The stub had not been updated as the canonical evolved across v2 / v3 / v4 / v4.1 of the process doc. PSX would have inherited a stale vocabulary list on next session start. Corrected in the same edit. Logged here so the gap-detection mechanism (this drift was caught only because Ghost Deploy forced a stub edit) becomes visible.

Files Modified

docs-site/docs/agents/process-discipline.md — added Gate #4 Deploy Verification Gate; added Ghost Deploy to antipattern table; renumbered practices #5–#11; added Deploy Verification Gate to session checklist; updated nomination exemplars
docs-site/docs/agents/collaborator-context.md — added Deploy Verification Gate to "Before Acting on Anything Non-Trivial" list
AGENTS.md — added Deploy Verification Gate to Gates summary; added Ghost Deploy to named antipatterns
docs-site/docs/agents/psx-relay-stub.md — added Ghost Deploy + 5 previously-missing antipatterns to vocabulary list (drift correction)
This devlog entry

Counterfactual Retro

If I were starting over knowing what I know now, what would I do differently — and why am I not doing that?

The PSX-relay stub's vocabulary list should auto-derive from the canonical's antipattern table, not be a hand-maintained copy. The drift I caught here is the predictable result of having two sources of truth. Acting on it: not yet. Auto-derivation is feasible (small Python script reads the markdown table) but adds infrastructure for a problem that surfaces every few weeks. For now, added to backlog; the per-update discipline is "when adding to canonical, also update the stub." If drift recurs, that's a signal to invest in the auto-derivation.
Ghost Deploy could have been caught in PSSaaS too. I had at least two near-misses in this session — the Superset dashboard cache pointing at old SQL, and the early setup.sh line-endings issue that almost let me read query results from an empty schema. PSX surfaced the pattern first because their dev cycle has more deploy-driven feedback loops. PSSaaS will benefit equally — every SQL deployment, every Docker rebuild, every Kubernetes image swap is a potential Ghost Deploy surface.
The Deploy Verification Gate's arm (c) — live DB probe — was already invented in PowerFill Phase 2 and Phase 3 integration tests. I didn't recognize it as a generalizable practice until PSX's nomination forced the categorization. Acting on it: when patterns recur in concrete work, they should bubble into named practice candidates earlier. Future Counterfactual Retros will include "is there a pattern here worth promoting to canonical?" as an explicit question.

What's Next

Relay back to PSX Collaborator confirming acceptance, with the (c) refinement called out
Update the PSX-side CLAUDE.md reference block when Kevin next opens a PSX session (no PSSaaS action needed)
Apply the Deploy Verification Gate to the next PSSaaS deploy that depends on newly-committed code — likely the PSX infra agent's manual kubectl set image for the Phase 3 GHCR images. The verification arm (a) is already in place: the /api/powerfill/status endpoint now returns phase-3-preprocess-ready (a sentinel that wouldn't exist in the prior version); confirming that string after rollout is the gate's first canonical use in PSSaaS.

Why​

What Was Done​

New Practice Added​

New Antipattern Added​

PSSaaS-Specific Refinement​

Vocabulary Drift Caught and Corrected​

Files Modified​

Counterfactual Retro​

What's Next​

Why