All posts
Agentic Engineering

The Agentic Project Harness

A baseline operating layer for agent-friendly repositories — four readiness levels from context harness to product harness.

Every repository that an AI agent touches needs a harness: instructions, checks, scripts, docs, pipelines, observability, recovery, and feedback loops. Not the product itself, but the operating layer around it.

The goal is to make a repo easy for humans and agents to understand, modify, verify, deploy, recover, and operate — without relying on hidden context.

The core rule

If a repo needs hidden memory to be changed safely, the harness is incomplete. Encode that memory in files, scripts, checks, dashboards, or documented workflows.

Ground truth is disk, not memory.

Four readiness levels

The harness is organized into four levels so you can adopt incrementally. Each level adds what matters at that stage.

L0: Context harness

Make the repo legible.

Item Purpose
Agent instructions (AGENTS.md / CLAUDE.md) Defines repo-specific behavior, guardrails, commands, and forbidden actions
README Fast entry point for humans
Skills Encodes repeatable workflows agents should follow
MCP configs Gives agents structured access to external tools
Docs folder Stores durable product, architecture, and operational context
Scratch-pad Safe place for temporary investigation notes
Scripts folder Holds repeatable operational commands
Decision records Preserves why important choices were made

L1: Development harness

Make local development repeatable and reviewable.

  • .env.example — documents required environment variables without leaking secrets
  • .gitignore / .dockerignore — keeps noise and secrets out of git and Docker contexts
  • Local bootstrap — one documented setup path for a new machine
  • Linting and formatting — catches problems before review, keeps diffs focused on behavior
  • Pre-commit hooks — moves cheap checks earlier than CI
  • Tests and type checks — verifies correctness
  • Automated migrations — handles schema changes reproducibly
  • Changelog folder — records what changed and why
  • Dependency management — controls the supply chain

L2: Production harness

Make the project shippable and recoverable.

This level is required before production traffic, sensitive data, payments, or customer commitments.

  • CI/CD pipelines with gated deployment
  • Health checks and monitoring
  • Database backup and restore automation
  • Alerting and runbooks
  • Security audits and vulnerability scanning
  • Secrets management
  • Performance testing
  • Incident history

L3: Product harness

Learn from users and operate as a product.

This level is project-dependent — adopt it when stage, team size, compliance, or growth motion justifies it.

  • User feedback collection and analysis
  • License and legal documentation
  • Feature flags and experimentation
  • Customer support system
  • Issue tracking
  • Marketing and outreach

Recommended repo shape

.
├── AGENTS.md or CLAUDE.md
├── README.md
├── .env.example
├── .gitignore
├── .dockerignore
├── docs/
├── scripts/
├── changelog/
├── scratch-pad/
├── infra/
├── migrations/
├── tests/
└── .github/workflows/

Folder names matter less than discoverability. A new maintainer or agent should find instructions, commands, docs, verification, release history, and runbooks within minutes.

Completion discipline

The harness includes a completion protocol that agents follow before declaring work done:

  1. Recover state from disk — start with git status and git diff to find the actual changed files
  2. Inspect — look for edge cases, stale assumptions, missing tests, state leakage, and docs drift
  3. Verify — run the completion check script and fix issues before final response
  4. Report — state files changed, checks run, and remaining risk

A lightweight completion_check.sh script prints dirty files, touched files, untracked files, conflict markers, debug/TODO markers, and large uncommitted files. Wire it into git hooks later.

When context is tight, scale the pass down but don't skip it. Prioritize: money/payment state, auth/privacy, production/deploy risk, data correctness, runtime state, user-facing UX, docs/tests.

Standards

Each harness item uses one of three requirement levels:

Standard Meaning
Required Every serious repo needs this. Missing it creates avoidable risk.
Production-required Required before production traffic, sensitive data, or customer commitments.
Project-dependent Required only when stage, team size, or compliance justifies it.

This keeps the harness proportionate to the project's actual risk surface rather than applying everything everywhere.