Your AI Agent Governance Is Just a Suggestion
Behavioral rules for AI agents are text in the context window. Under pressure โ deep in a fix loop, resolving conflicting instructions, running low on context โ the model rationalizes around them. This isn’t a hypothetical failure mode. It’s documented.
Documented Failures
These are real incidents from a single workspace running Claude Code with CLAUDE.md governance rules over several months of daily use.
| Incident | What happened |
|---|---|
| Root-owned .git files | A merge helper ran as sudo, left root-owned files across the working tree. Subsequent git commands failed with permission errors. CLAUDE.md said “use make fix-ownership” โ the agent ran sudo chown instead. |
| Fix-loops | Agent edited the same file 5+ times without diagnosing the root cause. Each edit was individually reasonable. The pattern was not. |
| Hardcoded host paths | Agent wrote /home/joe/ paths in container code instead of /home/dev/. The behavioral rule about correct path prefixes was in the CLAUDE.md. The agent read it. |
| Index corruption | Non-atomic writes corrupted the experience store index. Recovery required manual intervention. |
| False positive pytest | Substring matching on test output reported tests passed when they hadn’t. Agent moved to the next task based on the false signal. |
| Status probe false flags | gh pr checks returns non-zero exit on pending checks. Agent interpreted this as a failure and entered a retry loop. |
| Silent CSV failures | except (ValueError, TypeError): continue in a CSV parser. 0 out of 3,853 rows matched. No error raised. The broad except clause masked a column name mismatch. |
Every one of these had a behavioral rule that should have prevented it. The rules were present in the context. The agent processed them, weighed them against the immediate goal, and decided the situation warranted a different approach.
This Post as Evidence
The governance hook (enforce.py) blocked the agent three times during the writing of this post’s first draft:
| Blocked command | Rule triggered | Replacement |
|---|---|---|
cat (Bash tool) | Banned bash โ use Read tool | Read tool provides audit trail |
find (Bash tool) | Banned bash โ use Glob tool | Glob tool provides audit trail |
sudo chown | Learned ban from prior incident | make fix-ownership |
The agent didn’t choose to use the correct tools. The hook rejected the incorrect ones and the agent adapted. That’s the difference between a suggestion and a constraint.
Mechanical Enforcement Architecture
The system uses a single PreToolUse hook โ a Python script that receives tool invocations as JSON on stdin and returns block decisions. Seven rules execute sequentially. First block wins.
| Order | Rule | What it enforces |
|---|---|---|
| 1 | check_phase_allowed | Tool must be permitted in the current workflow phase |
| 2 | check_banned_bash | Block shell commands that have dedicated tool equivalents (cat, grep, find, sed) |
| 3 | check_no_sudo_chown | Learned ban โ sudo chown caused root-ownership cascades |
| 4 | check_path_rule | Code references must use container paths, not host paths |
| 5 | check_tdd_discipline | Tests must exist before implementation in applicable phases |
| 6 | check_ci_awareness | CI status checks must account for non-zero exit on pending |
| 7 | check_learned_bans | Patterns extracted from prior session failures via closed-loop analysis |
The agent can think whatever it wants about whether cat is faster than the Read tool. The hook rejects the command before it executes. The agent gets a rejection reason and finds another approach.
Phase State Machine
The agent operates in an eight-phase workflow. Each phase whitelists specific tool operations.
| Phase | Value | Allowed operations |
|---|---|---|
| INTAKE | 0 | Read files, search code |
| REQUIREMENTS | 1 | Read, search, draft requirements |
| PLAN | 2 | Read, search โ no writes, no execution |
| TEST_SPEC | 3 | Read, write test files |
| IMPLEMENTATION | 4 | Read, write source files, run commands |
| VERIFICATION | 5 | Run pytest and ruff โ no source edits |
| DONE | 6 | Reporting only |
| MAINTENANCE | 7 | Controlled maintenance operations |
An agent in PLAN cannot write code regardless of confidence. An agent in VERIFICATION cannot edit source files regardless of test failures โ it regresses to IMPLEMENTATION first. These aren’t suggestions the agent can reinterpret. The hook rejects the tool call.
Integrity Layer
The experience store uses CRC32 checksums on every event:
- Each event is JSON-serialized with deterministic formatting
zlib.crc32()computes a 32-bit checksum, stored as an 8-character hex wrapper- On read, checksum is revalidated โ corrupted events are flagged, not silently loaded
- Append-only writes with
fcntlfile locking โ events can be added but not modified or deleted
This matters because the agent reads from this store to inform decisions. Corrupted experience data produces downstream decisions based on bad state โ the kind of bug that’s invisible after the fact.
Pattern Detection
Some failure modes aren’t about individual actions. They’re about patterns.
Fix-loop detection. Counts edits to the same file within a session. Warning at the configured warn threshold, critical at the critical threshold. Both values loaded from config โ not hardcoded. The pattern of editing runner.py five times in one session is almost never productive, even if each individual edit is reasonable. An agent in a fix loop won’t reliably recognize it. An edit counter will.
Memory drift detection. Monitors line counts of governance files (CLAUDE.md, MEMORY.md) against configured budgets. Flags overages and identifies misrouted content โ operational data (version numbers, CI status, timestamps) stuffed into files meant for rules and patterns.
Neither detector blocks operations directly. They surface patterns for human review or a meta-governance layer to act on. Detection is mechanical: counters and thresholds, not self-assessment.
Authority Stack
| Layer | Mechanism | What it provides |
|---|---|---|
| Container boundaries | Filesystem permissions, Docker isolation | Agent cannot access paths outside its container |
| PreToolUse hook chain | 7 sequential rules, first-block-wins | Execution-boundary enforcement of banned commands, phase gates, path rules |
| Phase state machine | 8-phase IntEnum with auto-advance | Workflow cadence โ prevents skipping requirements or tests |
| Experience store | CRC32 checksummed, append-only, file-locked | Tamper-evident audit trail |
| Pattern detectors | Fix-loop counters, memory drift budgets | Surface behavioral anomalies mechanically |
| CLAUDE.md | Governance policy document | Defines rules โ backed by hooks that enforce the critical ones |
Defense in depth. No single layer is sufficient. Stacked together, they create boundaries that hold when the agent is under pressure, confused, or wrong about what it should do.
Status
The enforcement hooks, phase state machine, experience store, and pattern detectors are available in ouroboros v4.0.0 on PyPI. Designed for Claude Code’s hook system. The principles apply to any agent framework with a pre-execution interception point.
Configuration details reflect a production environment at time of writing. Implementation specifics vary based on tooling versions, platform updates, and organizational requirements. Validate approaches against current documentation before deployment.