Agentic Engineering

The Agentic Project Harness

A baseline operating layer for agent-friendly repositories — four readiness levels from context harness to product harness.

May 11, 2025

Every repository that an AI agent touches needs a harness: instructions, checks, scripts, docs, pipelines, observability, recovery, and feedback loops. Not the product itself, but the operating layer around it.

The goal is to make a repo easy for humans and agents to understand, modify, verify, deploy, recover, and operate — without relying on hidden context.

The core rule

If a repo needs hidden memory to be changed safely, the harness is incomplete. Encode that memory in files, scripts, checks, dashboards, or documented workflows.

Ground truth is disk, not memory.

Four readiness levels

The harness is organized into four levels so you can adopt incrementally. Each level adds what matters at that stage.

L0: Context harness

Make the repo legible.

Item	Purpose
Agent instructions (AGENTS.md / CLAUDE.md)	Defines repo-specific behavior, guardrails, commands, and forbidden actions
README	Fast entry point for humans
Skills	Encodes repeatable workflows agents should follow
MCP configs	Gives agents structured access to external tools
Docs folder	Stores durable product, architecture, and operational context
Scratch-pad	Safe place for temporary investigation notes
Scripts folder	Holds repeatable operational commands
Decision records	Preserves why important choices were made

L1: Development harness

Make local development repeatable and reviewable.

.env.example — documents required environment variables without leaking secrets
.gitignore / .dockerignore — keeps noise and secrets out of git and Docker contexts
Local bootstrap — one documented setup path for a new machine
Linting and formatting — catches problems before review, keeps diffs focused on behavior
Pre-commit hooks — moves cheap checks earlier than CI
Tests and type checks — verifies correctness
Automated migrations — handles schema changes reproducibly
Changelog folder — records what changed and why
Dependency management — controls the supply chain

L2: Production harness

Make the project shippable and recoverable.

This level is required before production traffic, sensitive data, payments, or customer commitments.

CI/CD pipelines with gated deployment
Health checks and monitoring
Database backup and restore automation
Alerting and runbooks
Security audits and vulnerability scanning
Secrets management
Performance testing
Incident history

L3: Product harness

Learn from users and operate as a product.

This level is project-dependent — adopt it when stage, team size, compliance, or growth motion justifies it.

User feedback collection and analysis
License and legal documentation
Feature flags and experimentation
Customer support system
Issue tracking
Marketing and outreach

Recommended repo shape

.
├── AGENTS.md or CLAUDE.md
├── README.md
├── .env.example
├── .gitignore
├── .dockerignore
├── docs/
├── scripts/
├── changelog/
├── scratch-pad/
├── infra/
├── migrations/
├── tests/
└── .github/workflows/

Folder names matter less than discoverability. A new maintainer or agent should find instructions, commands, docs, verification, release history, and runbooks within minutes.

Completion discipline

The harness includes a completion protocol that agents follow before declaring work done:

Recover state from disk — start with git status and git diff to find the actual changed files
Inspect — look for edge cases, stale assumptions, missing tests, state leakage, and docs drift
Verify — run the completion check script and fix issues before final response
Report — state files changed, checks run, and remaining risk

A lightweight completion_check.sh script prints dirty files, touched files, untracked files, conflict markers, debug/TODO markers, and large uncommitted files. Wire it into git hooks later.

When context is tight, scale the pass down but don't skip it. Prioritize: money/payment state, auth/privacy, production/deploy risk, data correctness, runtime state, user-facing UX, docs/tests.

Standards

Each harness item uses one of three requirement levels:

Standard	Meaning
Required	Every serious repo needs this. Missing it creates avoidable risk.
Production-required	Required before production traffic, sensitive data, or customer commitments.
Project-dependent	Required only when stage, team size, or compliance justifies it.

This keeps the harness proportionate to the project's actual risk surface rather than applying everything everywhere.