Your AI Agent Governance Is Just a Suggestion

March 20, 2026 ai-governance, claude-code, agent-security, mechanical-enforcement, hooks, autonomous-agents, ouroboros

Behavioral rules for AI agents are text in the context window. Under pressure — deep in a fix loop, resolving conflicting instructions, running low on context — the model rationalizes around them. This isn’t a hypothetical failure mode. It’s documented.

Documented Failures

These are real incidents from a single workspace running Claude Code with CLAUDE.md governance rules over several months of daily use.

Incident	What happened
Root-owned .git files	A merge helper ran as sudo, left root-owned files across the working tree. Subsequent `git` commands failed with permission errors. CLAUDE.md said “use `make fix-ownership`” — the agent ran `sudo chown` instead.
Fix-loops	Agent edited the same file 5+ times without diagnosing the root cause. Each edit was individually reasonable. The pattern was not.
Hardcoded host paths	Agent wrote `/home/joe/` paths in container code instead of `/home/dev/`. The behavioral rule about correct path prefixes was in the CLAUDE.md. The agent read it.
Index corruption	Non-atomic writes corrupted the experience store index. Recovery required manual intervention.
False positive pytest	Substring matching on test output reported tests passed when they hadn’t. Agent moved to the next task based on the false signal.
Status probe false flags	`gh pr checks` returns non-zero exit on pending checks. Agent interpreted this as a failure and entered a retry loop.
Silent CSV failures	`except (ValueError, TypeError): continue` in a CSV parser. 0 out of 3,853 rows matched. No error raised. The broad except clause masked a column name mismatch.

Every one of these had a behavioral rule that should have prevented it. The rules were present in the context. The agent processed them, weighed them against the immediate goal, and decided the situation warranted a different approach.

This Post as Evidence

The governance hook (enforce.py) blocked the agent three times during the writing of this post’s first draft:

Blocked command	Rule triggered	Replacement
`cat` (Bash tool)	Banned bash — use Read tool	Read tool provides audit trail
`find` (Bash tool)	Banned bash — use Glob tool	Glob tool provides audit trail
`sudo chown`	Learned ban from prior incident	`make fix-ownership`

The agent didn’t choose to use the correct tools. The hook rejected the incorrect ones and the agent adapted. That’s the difference between a suggestion and a constraint.

Mechanical Enforcement Architecture

The system uses a single PreToolUse hook — a Python script that receives tool invocations as JSON on stdin and returns block decisions. Seven rules execute sequentially. First block wins.

Order	Rule	What it enforces
1	`check_phase_allowed`	Tool must be permitted in the current workflow phase
2	`check_banned_bash`	Block shell commands that have dedicated tool equivalents (`cat`, `grep`, `find`, `sed`)
3	`check_no_sudo_chown`	Learned ban — `sudo chown` caused root-ownership cascades
4	`check_path_rule`	Code references must use container paths, not host paths
5	`check_tdd_discipline`	Tests must exist before implementation in applicable phases
6	`check_ci_awareness`	CI status checks must account for non-zero exit on pending
7	`check_learned_bans`	Patterns extracted from prior session failures via closed-loop analysis

The agent can think whatever it wants about whether cat is faster than the Read tool. The hook rejects the command before it executes. The agent gets a rejection reason and finds another approach.

Phase State Machine

The agent operates in an eight-phase workflow. Each phase whitelists specific tool operations.

Phase	Value	Allowed operations
INTAKE	0	Read files, search code
REQUIREMENTS	1	Read, search, draft requirements
PLAN	2	Read, search — no writes, no execution
TEST_SPEC	3	Read, write test files
IMPLEMENTATION	4	Read, write source files, run commands
VERIFICATION	5	Run pytest and ruff — no source edits
DONE	6	Reporting only
MAINTENANCE	7	Controlled maintenance operations

An agent in PLAN cannot write code regardless of confidence. An agent in VERIFICATION cannot edit source files regardless of test failures — it regresses to IMPLEMENTATION first. These aren’t suggestions the agent can reinterpret. The hook rejects the tool call.

Integrity Layer

The experience store uses CRC32 checksums on every event:

Each event is JSON-serialized with deterministic formatting
zlib.crc32() computes a 32-bit checksum, stored as an 8-character hex wrapper
On read, checksum is revalidated — corrupted events are flagged, not silently loaded
Append-only writes with fcntl file locking — events can be added but not modified or deleted

This matters because the agent reads from this store to inform decisions. Corrupted experience data produces downstream decisions based on bad state — the kind of bug that’s invisible after the fact.

Pattern Detection

Some failure modes aren’t about individual actions. They’re about patterns.

Fix-loop detection. Counts edits to the same file within a session. Warning at the configured warn threshold, critical at the critical threshold. Both values loaded from config — not hardcoded. The pattern of editing runner.py five times in one session is almost never productive, even if each individual edit is reasonable. An agent in a fix loop won’t reliably recognize it. An edit counter will.

Memory drift detection. Monitors line counts of governance files (CLAUDE.md, MEMORY.md) against configured budgets. Flags overages and identifies misrouted content — operational data (version numbers, CI status, timestamps) stuffed into files meant for rules and patterns.

Neither detector blocks operations directly. They surface patterns for human review or a meta-governance layer to act on. Detection is mechanical: counters and thresholds, not self-assessment.

Authority Stack

Layer	Mechanism	What it provides
Container boundaries	Filesystem permissions, Docker isolation	Agent cannot access paths outside its container
PreToolUse hook chain	7 sequential rules, first-block-wins	Execution-boundary enforcement of banned commands, phase gates, path rules
Phase state machine	8-phase IntEnum with auto-advance	Workflow cadence — prevents skipping requirements or tests
Experience store	CRC32 checksummed, append-only, file-locked	Tamper-evident audit trail
Pattern detectors	Fix-loop counters, memory drift budgets	Surface behavioral anomalies mechanically
CLAUDE.md	Governance policy document	Defines rules — backed by hooks that enforce the critical ones

Defense in depth. No single layer is sufficient. Stacked together, they create boundaries that hold when the agent is under pressure, confused, or wrong about what it should do.

Status

The enforcement hooks, phase state machine, experience store, and pattern detectors are available in ouroboros v4.0.0 on PyPI. Designed for Claude Code’s hook system. The principles apply to any agent framework with a pre-execution interception point.

Configuration details reflect a production environment at time of writing. Implementation specifics vary based on tooling versions, platform updates, and organizational requirements. Validate approaches against current documentation before deployment.

← Back to Journal