Handling Context Compaction
What coding agents lose to compaction, resumes, and long goal-driven loops — and the disk-anchored protocol that keeps work correct across them.
Context compaction is the silent failure mode of long-running coding agents. The session runs long, the harness compresses older turns into a summary, the agent picks up where it left off — and its beliefs about its own work are now built on a lossy snapshot. The diff it remembers may not match the diff on disk. Approvals it remembers asking for may have been for a different scope. Skill instructions it thinks it loaded may have been silently dropped.
The same shape shows up across long interactive sessions, manual resumes, interrupted runs, subagent handoffs, and goal-driven loops like Codex's /goal. The longer the work, the worse the compression — and the more confident the agent gets about state it can no longer actually verify.
The fix is one rule, applied consistently: after compaction or any meaningful interruption, treat disk as the source of truth and rebuild before continuing.
What compaction quietly loses
A compaction summary is a paraphrase, not a record. Things drop out, in roughly the order they cause problems.
The actual diff goes first. The summary may say "we added the X handler" but lose the exact files touched, the exact strings changed, the surrounding edits, and the parts of the change that didn't survive review iterations. Then the skill bodies — the index of available skills is usually preserved, but the actual content (rules, gotchas, do/don'ts) is not. The agent sees the name and assumes it still knows the body. It doesn't. Source-of-truth doc state goes next: the agent's mental model of AGENTS.md, CLAUDE.md, or the PRD is a paraphrase of what it last read, not what is currently on disk.
Prior approvals are particularly dangerous. A human OK to deploy, push, delete, or commit was scoped to a specific action. The summary may carry "user approved" and lose the scope, and the agent applies the approval to something the human never sanctioned. Live command state is the same problem in a different form — terminals, processes, deploys, and watchers continued running through the work, and the summary doesn't model them. The agent has to look. And the why — the reason a particular path was taken, the edge case behind a guard, the test that flaked once — rarely makes it through compaction at all.
If the agent acts on memory after compaction, every one of these is a path to a wrong answer.
Disk is the source of truth
The recovery rule is short: after compaction, resume, or any long-running stretch, the agent's belief about its work cannot be trusted to be complete. The agent must rebuild from disk before continuing. This is the principle behind The Agentic Project Harness made concrete — hidden memory is exactly what compaction destroys.
The context reset protocol
Run this any time the context has been compacted, resumed, or stretched far past its original window — not only at the end of a task.
- Restart from
git statusandgit diffeven if you remember the answer. The touched-file set is often different from what the summary said. - Re-open the actual body of any skill matching the work. The visible skill list is an index, not a substitute.
- Re-read the source-of-truth docs the current change touches — and only those. Bulk-loading after a reset consumes context without buying correctness.
- Treat prior approvals as expired. Production deletions, deploys, catalog mutations, publishing, pushing, and committing all need fresh confirmation unless the exact approval sits in the active context.
- Recover live command state directly rather than assuming prior commands finished.
- Treat the summary as a hint, not a record.

A Codex agent resuming after a thread compaction — re-opening the relevant skill body, inspecting the actual diff, and only then closing the task.
If the agent cannot inspect the diff meaningfully after context loss, the right move is to report an incomplete handoff — not to fill the gap with plausible text.
The completion pass
Compaction is most dangerous at the moment the agent declares work done — that's when the wrong-state belief gets written into a final response the human will trust. The same disk-anchored protocol runs as a fixed pass after any non-trivial edit:
- Recover the touched-file set.
git status --shortandgit diff --name-only. Trust this over recollection. - Review the actual diff. Eyes on the bytes, not on a paraphrase.
- Re-read every touched file and its surroundings. Call sites, tests, docs, resources that make the change correct.
- Re-open authoritative docs and skills when the change touches payments, auth, privacy, deploy, schema, or release.
- Look for the usual failure shapes. Edge cases, stale assumptions, missing or wrong tests, bad user-facing strings, state and cache leakage, idempotency or money-state issues, privacy leaks, docs drift.
- Run the mechanical guardrail. A small completion-check script — dirty files, untracked files, conflict markers, debug/TODO markers, large uncommitted blobs. Guardrail, not a substitute for semantic review.
- Fix what the pass found before final response, not after.
- Run the narrowest meaningful verification. The most targeted lint, type-check, or test. For high-risk surfaces, widen it.
- If a check is skipped, say so. Name it, the reason, and the residual risk.
- Final response. Files changed, checks run, remaining risk. Never claim completion from implementation alone.
Whether the work was compacted three times or never compacted at all, the pass is identical. The point is to take the compaction question off the table entirely — the agent doesn't have to know whether memory drifted, because it's verifying against disk anyway.
When the pass itself doesn't fit
Late in a long session, the context budget for the pass can itself get squeezed. Don't skip it. Scale it down by priority:
| Priority | Surface |
|---|---|
| 1 | Money / payment state |
| 2 | Auth and privacy |
| 3 | Production and deploy risk |
| 4 | Data correctness |
| 5 | Runtime state — caches, sessions, background jobs |
| 6 | User-facing UX |
| 7 | Docs and tests |
Re-read only the highest-risk touched files and their closest call sites. Run at least one narrow verification. If even that doesn't fit, stop and report an incomplete handoff that names: files changed, what was checked, what remains unchecked, and the next verification command to run.
A truthful incomplete handoff is more useful than a false completion claim. The human can finish the verification; they cannot recover lost trust in a confident final response that turned out to be wrong.
A drop-in for AGENTS.md or CLAUDE.md
The discipline in the form most agents can follow directly. Drop it into your repo's agent instructions and reference it from any skill that edits files.
## Completion Discipline
After any file edit, completion is not done until a fresh pass is run from repository state, not conversation memory. If context limits prevent the pass, report an incomplete handoff instead.
- Start with `git status --short` and `git diff --name-only` to recover the touched-file set.
- Review the actual diff before final response.
- Re-read every touched file and the nearby call sites, tests, docs, or resources.
- If the thread was compacted, resumed, or interrupted, reconstruct state from disk first.
- Check the relevant skill or source-of-truth doc when the change touches a sensitive surface (payments, auth, privacy, deploy, schema, release).
- Look for edge cases, stale assumptions, missing tests, bad user-facing strings, state or cache leakage, money-state or idempotency issues, privacy leaks, docs drift.
- Run the completion-check script when present. Mechanical guardrail, not a substitute for semantic review.
- Fix issues during the pass, not after.
- Run the narrowest meaningful verification. Widen it for high-risk surfaces.
- If verification is skipped, say why and name the residual risk.
- Final response states files changed, checks run, remaining risk. Do not claim completion from implementation alone.
### Context Budget
If context is tight, scale the pass down — do not skip it.
- Priority order: money/payment state, auth/privacy, deploy risk, data correctness, runtime state, user-facing UX, docs/tests.
- Re-read only the highest-risk touched files and their closest call sites.
- Run at least one narrow verification command.
- If even that does not fit, stop and report an incomplete handoff. Name files changed, what was checked, what remains unchecked, and the next command to run.
### Context Reset
Agents cannot rely on conversation memory after long work, compaction, or interruption.
- After compaction, resume, or a long-running session, restart Completion Discipline from disk.
- Treat loaded skill bodies as non-durable. Re-open the body of each relevant skill; the visible list is not enough.
- Re-read only relevant skills and source-of-truth docs. No bulk-loading.
- Treat prior approvals as expired unless the exact approval is visible in active context.
- Recover live command state — terminals, processes, SSH, deploys, watchers — before continuing.
- Do not assume the compaction summary captured all important implementation details.
- If the diff cannot be inspected meaningfully after context loss, report an incomplete handoff.
Adapt the surfaces to your stack — Stripe webhooks, Kubernetes rollouts, schema migrations, regulated data — and point the skill reference at whatever your harness uses.
Goal-driven loops amplify the problem. Each turn trusts the previous turn's summary, which already trusts the turn before, and the loss compounds. By turn ten, the agent's belief about what it has built is several compactions removed from the actual repo. The protocol doesn't stop compaction — compaction is the harness doing its job. It stops the agent from acting as if compaction didn't happen.