Containerized AI Development with Governance and Intelligence Layers

docker, governance, infrastructure, ai-agents, security, containers, automation, intelligence

Changes: v1.5 โ†’ v1.7

Two architecture versions deployed in a single push. v1.6 adds containerization and workspace governance. v1.7 adds the intelligence and automation layer on top.

VersionTitleComponents Added
v1.6Containerization and Workspace GovernanceDocker orchestration, sandbox, dashboard, SOPS, 5-layer governance, pre-commit enforcement
v1.7Intelligence and Automation LayerPrometheus, Cassandra, Sentinel, OpenClaw nightly cron, skills library, extensions

Previous state: bare-metal services on WSL2, flat workspace structure, manual secret handling, no overnight automation.

New state: Docker-orchestrated containers with security hardening, five-layer workspace hierarchy, SOPS-encrypted secrets, 12-hook pre-commit enforcement, and 11 overnight cron jobs handling everything from code analysis to content generation.


v1.6: Containerization and Governance

Container Architecture

Three containers run in production, each with security hardening applied.

ContainerRolePortManaged By
GatewayAI agent orchestration, session routing18789Docker Compose
Dev SandboxIsolated code execution, multi-runtime environment9500Standalone Dockerfile
Buddy DashboardSecond-brain UI, memory curation, automation monitoring5050Docker Compose

Docker Compose orchestrates the gateway and dashboard. The dev sandbox runs via a standalone Dockerfile with the same hardening. All containers share:

  • cap_drop: ALL โ€” no Linux capabilities granted by default
  • security_opt: no-new-privileges โ€” prevents privilege escalation
  • Memory and PID limits โ€” prevents resource exhaustion
  • Isolated Docker bridge network

Secret Management

Secrets are encrypted with Mozilla SOPS using age keys. The pattern:

  1. At rest โ€” all secrets are SOPS-encrypted in the repository
  2. At decrypt โ€” decryption happens to tmpfs (RAM-backed filesystem) only
  3. At runtime โ€” containers mount the tmpfs decrypted secrets as read-only
  4. At shutdown โ€” tmpfs is cleared, no plaintext persists

No plaintext secrets exist in any repository, on any persistent filesystem, or in any container image.

Workspace Governance

Five-layer workspace hierarchy enforcing separation of concerns:

LayerPurposeEditable by agents?
_governanceWorkspace rules, policies, boundary documentsNo โ€” human-only
_foundationShared libraries (lib-verification, lib-harmonia)With review
_activeLive services and applicationsYes โ€” governed
_archiveRetired services, preserved for referenceNo โ€” read-only
_experimentsSandbox for prototyping, no production promotionYes โ€” ungoverned

The hierarchy is enforced by pre-commit hooks and CI validation.

Pre-commit Enforcement

12 hooks run on every commit across the workspace:

CategoryHooksCount
Python qualityruff (lint), ruff-format2
File hygienetrailing-whitespace, end-of-file-fixer, check-merge-conflict, detect-private-key, check-added-large-files5
Format validationcheck-yaml, check-json, check-toml3
Governancegovernance-tests (lib-verification), check-cross-brand-imports2

Cross-brand detection prevents code from one project referencing another, maintaining strict workspace isolation.

Dev Sandbox Capabilities

The sandbox container provides an isolated multi-runtime development environment:

Language Runtimes: Python 3.12, Node.js 22, Hugo extended edition

ML & Data Stack: PyTorch, scikit-learn, pandas, numpy, Jupyter

Media & Browser: Chromium (headless), ffmpeg

API: Sandbox API on port 9500 for isolated code execution via OpenClaw’s sandbox-exec extension.

Permissions Framework

Agent permissions use a scoped, time-limited design:

  • Scoped โ€” each permission grants access to specific operations, not blanket access
  • Expiring โ€” all grants expire after 90 days, requiring re-authorization
  • Auditable โ€” every permission grant and usage is logged
  • Layered โ€” workspace governance, pre-commit hooks, and runtime checks each enforce independently

No single bypass disables all layers.


v1.7: Intelligence and Automation Layer

Intelligence Pipelines

Three specialized pipelines added for tracking the AI landscape:

PipelinePurposeStatus
PrometheusAI model evolution trackingActive
CassandraAI predictions lifecycle managementActive
AI SentinelNews intelligence and daily digestsActive

OpenClaw Skills Library

25 task-specific skills giving the AI agent structured access to operations: builds, deploys, content generation, video processing, pipeline management, and system status checks. Each skill is defined in SKILL.md format with metadata for Buddy’s command system.

OpenClaw Extensions

4 enforcement plugins adding runtime governance:

ExtensionPurpose
governance-enforcementWorkspace and security rule enforcement
sandbox-execRoutes execution to dev sandbox container
cron-circuit-breakerAuto-disables jobs after consecutive failures
webserver-enforcementWeb server access controls

Overnight Cron Automation

OpenClaw manages 11 scheduled jobs (8 daily + 3 weekly), not a single monolithic runner. Each job runs in an isolated session with its own model, timeout, and safety rules.

Daily Schedule:

TimeJobModel
22:00Infrastructure health + architecture drift detectionSonnet
22:50Deep code analysis (CLAUDE.md generation, code reviews)Opus
01:30Content generationOpus
02:00Content & research (blog posts, social batches)Opus
03:00Feature discovery (proposals for new content and features)Sonnet
04:00Test & CI generation (test files, GitHub Actions workflows)Sonnet
05:30Morning briefing (summary of all overnight work)Haiku
06:00Content generation (weekdays only)Sonnet

Weekly: prediction updates, feedback synthesis, staging cleanup.

The 22:00 infrastructure health job includes architecture drift detection โ€” it reads the architecture page and changelog, compares against the actual environment, and writes drift reports to staging if anything has changed. This is how gaps between documentation and reality get caught automatically.

All jobs follow safety rules: never push code, never delete files, never modify cron jobs. Output goes to staging for human review in the morning briefing.

Rationale

Why containerize? The bare-metal WSL2 setup worked, but created risks: dependency conflicts between services, no resource isolation, and difficulty reproducing the environment. Docker provides reproducible, isolated, hardened containers.

Why workspace governance? As the number of services grew past 15, flat directory structures became unmanageable. The five-layer system provides clear boundaries: what’s shared, what’s active, what’s experimental, and what’s archived.

Why SOPS? Secrets management was ad-hoc โ€” some in .env files, some in memory. SOPS + age provides encryption at rest with a simple, auditable decryption pattern. The tmpfs constraint means a powered-off machine has zero accessible secrets.

Why OpenClaw cron instead of system cron? OpenClaw’s cron system runs jobs through the AI gateway with session isolation, model selection, timeout enforcement, and circuit breaker protection. System cron can run scripts; OpenClaw cron can run AI agents with governance.


Architecture version: v1.5 โ†’ v1.7. Architecture drift detection runs nightly at 22:00 via OpenClaw cron Job 1.

Configuration details reflect a production environment at time of writing. Implementation specifics vary based on tooling versions, platform updates, and organizational requirements. Validate approaches against current documentation before deployment.