Agentic Engineering

Handling Context Compaction

What coding agents lose to compaction, resumes, and long goal-driven loops — and the disk-anchored protocol that keeps work correct across them.

May 16, 2026

Context compaction is the silent failure mode of long-running coding agents. A session runs long, the harness compacts the older turns into a summary, the agent picks up where it left off, and the resumed agent's beliefs about its own work are now built on a lossy snapshot. The diff it remembers may not match the diff on disk. Approvals it remembers asking for may have been for a different scope. Skill instructions it thinks it loaded may have been silently dropped.

The same shape shows up across long interactive sessions, manual resumes, interrupted runs, subagent handoffs, and goal-driven loops like Codex's /goal. The longer the work, the worse the compression, and the more confident the agent gets about state it can no longer actually verify.

This post is about how to handle that. The mechanic is simple: after compaction or any meaningful interruption, treat disk — not conversation memory — as the source of truth, and rebuild before continuing.

What compaction quietly loses

A compaction summary is a paraphrase, not a record. Things that routinely drop out, in roughly the order they cause problems:

The actual diff. The summary may say "we added the X handler" but lose the exact files touched, the exact strings changed, the surrounding edits, and the parts of the change that didn't survive review iterations.
Loaded skill bodies. The skill index — the visible list of available skills — is usually preserved. The actual content of each skill (its rules, gotchas, do/don'ts) is not. The agent sees the name and assumes it still knows the body. It does not.
Source-of-truth doc state. The agent's mental model of AGENTS.md, CLAUDE.md, repo-local skills, or the PRD is a paraphrase of what it last read, not what is currently on disk.
Prior approvals. A human OK to deploy, push, delete, or commit was tied to a specific scope. The summary may carry "user approved" and lose the scope. The agent then applies the approval to something the human never sanctioned.
Live command state. Terminals, processes, emulators, SSH sessions, deploys, watchers, and background automations have continued to run during the work. The summary doesn't model them. The agent has to look.
Implementation details outside the summary. The reason a particular path was taken, the edge case that motivated a guard, the test that flaked once — these rarely make it into the compaction.

If the agent acts on memory after compaction, every one of these is a path to a wrong answer.

Ground truth is disk, not memory

The recovery rule is short: after compaction, resume, or any long-running stretch, the agent's belief about its work cannot be trusted to be complete. The agent must rebuild from disk before continuing.

This is the same principle behind The Agentic Project Harness: if a repo needs hidden memory to be changed safely, the harness is incomplete. Compaction makes this concrete — hidden memory is exactly what compaction destroys.

The context reset protocol

Run this any time the agent's context has been compacted, resumed, or stretched far past its original window — not only at the end of a task.

Restart from git status and git diff. Re-run them even if you remember the answer. The touched-file set is often different from what the summary said.
Re-open relevant skills. Read the actual skill body for any work-matching skill. The visible skill list is an index, not a substitute for the body.
Re-read source-of-truth docs. Only the ones the current change touches. Don't bulk-load every doc after a reset — that just consumes context without buying correctness.
Treat prior approvals as expired. Production deletions, deploys, catalog mutations, publishing, pushing, and committing all need fresh confirmation unless the exact approval is visible in the active context.
Recover live command state. Inspect terminals, processes, emulators, SSH sessions, deploys, watchers, and automations directly. Don't assume prior commands finished, failed, or are safe to ignore.
Don't assume the compaction summary captured every implementation detail. Important details routinely drop out. Treat the summary as a hint, not a record.

Codex session showing a 'Context automatically compacted' divider followed by the agent's message: I'm resuming from disk because the thread compacted after implementation. I'll re-open the relevant Android/UI skill guidance, inspect the actual diff, and only then close this out.

A Codex agent resuming after a thread compaction — re-opening the relevant skill body, inspecting the actual diff, and only then closing the task.

If the agent cannot inspect the diff meaningfully after context loss, the right move is to report an incomplete handoff — not to fill in the gap with plausible text.

The completion pass

Compaction is most dangerous at the moment the agent declares work done — that is when the wrong-state belief gets written into a final response the human will trust. So the same disk-anchored protocol runs as a fixed pass after any non-trivial edit:

Recover the touched-file set. git status --short and git diff --name-only. Trust this over recollection.
Review the actual diff. With eyes on the bytes that changed, not on a paraphrase.
Re-read every touched file and its surroundings. Each file and the nearby call sites, tests, docs, and resources that make the change correct.
Re-open authoritative docs and skills. When the change touches a sensitive surface — payments, auth, privacy, deploy, schema, release — open the source-of-truth doc or repo skill.
Look for the usual failure shapes. Edge cases, stale assumptions, missing or wrong tests, bad user-facing strings, state and cache leakage, money-state or idempotency issues, privacy leaks, docs drift.
Run a mechanical guardrail. A small completion-check script that surfaces dirty files, untracked files, conflict markers, debug/TODO markers, and large uncommitted blobs. This is a guardrail, not a substitute for semantic review.
Fix issues found in the pass. Before final response, not after.
Run the narrowest meaningful verification. The most targeted lint, type-check, or test command for what changed. For high-risk surfaces, widen it.
If a check is skipped, say so explicitly. Name what was skipped, why, and the residual risk.
Final response. State files changed, checks run, and remaining risk. Never claim completion from implementation alone.

Whether the work was compacted three times or never compacted at all, the final pass is identical. The point is to remove the compaction question from the equation: the agent doesn't have to know whether memory drifted, because it is verifying against disk anyway.

When even the pass doesn't fit

Late in a long session, the context budget for the completion pass can itself be squeezed. Don't skip it. Scale it down by priority:

Priority	Surface
1	Money or payment state
2	Auth and privacy
3	Production and deploy risk
4	Data correctness
5	Runtime state — caches, sessions, background jobs
6	User-facing UX
7	Docs and tests

Re-read only the highest-risk touched files and the closest call sites. Run at least one narrow verification command. If even that doesn't fit, stop and report an incomplete handoff that names: files changed, what was checked, what remains unchecked, and the next verification command to run.

A truthful incomplete handoff is more useful than a false completion claim. The human can finish the verification; they cannot recover lost trust in a confident final response that turned out to be wrong.

The final response

Every "done" response after a file edit names three things:

Files changed — derived from git status and git diff, not from memory.
Checks run — the actual commands invoked, with a note where something was skipped or blocked.
Remaining risk — known gaps, unverified paths, areas that need a human eye.

These three are the minimum. After a compaction, they are also the only parts of the final response the human can trust — because they are derived from disk, not from the lossy summary the agent is working from.

A drop-in for AGENTS.md or CLAUDE.md

Below is the discipline in the form most agents can follow directly. Drop it into your repo's AGENTS.md, CLAUDE.md, or whichever file your harness loads at session start. Reference it from any skill or workflow that edits files.

## Completion Discipline

After any file edit, completion is not done until a fresh pass is run from repository state, not conversation memory. If context limits prevent that pass, report an incomplete handoff instead of a completed task.

- Start the completion pass with `git status --short` and `git diff --name-only` to recover the actual touched-file set.
- Review the actual diff before final response.
- Re-read every touched file and the nearby call sites, tests, docs, or resources that make the change correct.
- If the thread was compacted, resumed, interrupted, or the work spanned many steps, first reconstruct state from disk: `git status`, `git diff`, relevant files, and recent test output if available.
- Check the relevant repo skill or source-of-truth doc when the change touches a sensitive surface (payments, auth, privacy, deploy, schema, release).
- Look for edge cases, stale assumptions, missing tests, bad user-facing strings, state or cache leakage, money-state or idempotency issues, privacy leaks, and docs drift.
- Run the repo completion-check script when present. It is a mechanical guardrail, not a substitute for semantic review.
- Fix issues found during the completion pass before final response.
- Run the narrowest meaningful verification. For high-risk surfaces (money, auth, deploy, release), widen the verification unless blocked.
- If verification is skipped or blocked, say exactly why and name the residual risk.
- Final response must state files changed, checks run, and remaining risk. Do not claim completion from implementation alone.

### Context Budget Rule

If context is tight, scale the completion pass down but do not skip it.

- Prioritize review in this order: money/payment state, auth/privacy, production/deploy risk, data correctness, runtime state, user-facing UX, docs/tests.
- Re-read only the highest-risk touched files and the closest call sites when there is not room for all of them.
- Run at least one narrow verification command when feasible.
- If there is not enough context to inspect the diff and verify meaningfully, stop and say the completion pass is incomplete. Name the files changed, what was checked, what remains unchecked, and the next verification command to run.

### Context Reset Protocol

Agents cannot rely on conversation memory after long work, compaction, or interruption.

- After compaction, resume, interruption, or a long-running session, restart Completion Discipline from disk.
- Treat loaded skill bodies as non-durable across compaction. Re-open the body of each relevant skill before continuing work that matches it; the visible skill list alone is not enough.
- Re-read only the relevant skills and source-of-truth docs. Do not bulk-load unrelated skills after reset.
- Treat prior approvals as non-durable unless the exact approval is visible in the active context. Re-ask before production deletion, deploys, publishing, pushing, committing, or other irreversible or external actions.
- Recover live command state before continuing. Inspect current terminal, process, SSH, deploy, test, watcher, or automation state instead of assuming prior commands finished, failed, or are safe to ignore.
- Do not assume the compaction summary captured all important implementation details.
- If the agent cannot inspect the diff meaningfully after context loss, report an incomplete handoff instead of completion.

Adapt the surfaces to your stack — replace the generic high-risk list with the ones that matter in your codebase (Stripe webhooks, Kubernetes rollouts, schema migrations, regulated data), and point the skill body reference at whatever format your harness uses.

Why this compounds in long loops

Goal-driven loops (/goal, agent runners, autonomous loops) amplify this problem. Each turn of the loop trusts the previous turn's summary, which already trusts the turn before, and the loss compounds. By turn ten, the agent's belief about what it has built is several compactions removed from the actual repo. Without a disk-anchored recovery at each turn, the loop produces a confident final state that doesn't match the code.

With the protocol, the loop is safe — not because compaction stops happening, but because every turn re-anchors. The mechanics are the same as a one-shot edit: recover from disk, inspect, verify, report. Only the cadence changes.

The discipline doesn't prevent compaction. Compaction is the harness doing its job. The discipline prevents the agent from acting as if compaction didn't happen.