Multi-Agent Workflow Patterns
Four structured workflows — research, implementation, documentation, and security — that use multi-agent pipelines with built-in review cycles.
Building software with AI agents works better when each agent has a clear role and explicit handoffs. These four workflows cover the basics — researching, building, documenting, and assessing security. Each one uses a primary agent that coordinates specialist subagents, with GPT-backed review cycles before output is finalized.
Research
The research workflow produces a single written deliverable, then improves confidence with separate passes for writing quality and factual accuracy.
- Draft. The Researcher writes a full markdown document and saves it to
documents/research/. - Self-proofread. Clean up grammar, clarity, and structure before external review.
- Proofread (GPT). A proofreader subagent runs the editorial pass — grammar, clarity, structure, consistency, completeness.
- Accuracy check (GPT). A separate subagent runs the factual pass — technical correctness, logical consistency, source validity, currency.
- Reconcile and save. The Researcher dedupes findings from both passes, independently verifies each one, applies the valid fixes, and saves the final document.
Writing quality and factual accuracy are different skills, and a single reviewer tends to blend them and miss issues in both categories. Splitting the pass lets each reviewer focus on one job and produce sharper findings. The final output states what was researched, where it was saved, how many findings came from each reviewer, and how many were accepted.
Implementation
The implementation workflow moves from design through code to structured review, with two complete review cycles before delivery. It opens with a routing decision: large work that needs design goes through the Architect, while small, well-scoped work goes directly to the Implementer. If a direct-path task grows too large, it routes back to Architect.
The Architect side runs three steps before handoff. First, design — load project knowledge, ask clarifying questions, offload codebase research to an Explore subagent, write the architecture to .architect-output/architecture.md. Then scrutinize — a GPT pass reviews the architecture against requirements coverage, reference patterns, database design, API consistency, permissions, migration sanity, and task split validity. Then hand off to Implementer with explicit instructions to follow the architecture document.
The Implementer side then runs five steps of its own:
- Implement. Load relevant skills, work task by task, track every changed file in
.architect-output/changed-files.md. - Review cycle 1 (GPT). Code Reviewer checks the full changed-file set, self-verifies findings, then runs two GPT scrutiny passes. Findings go back to Implementer.
- Fix. Every finding — minor, major, critical — before the second review.
- Review cycle 2 (GPT). Verify the fixes with another two-pass scrutiny. Anything still open is escalated rather than looped.
- Summary. Files created, files modified, migrations, review cycle count, auto-fixed findings, anything unresolved.
The first cycle finds issues; the second checks the fixes actually resolved them without introducing new problems. Past cycle two, unresolved issues go to a human — the loop is bounded by design.
Documentation
The documentation workflow produces matched pairs of business and technical documents, working through a backlog one feature at a time.
- Queue. Documenter reads
documents/documentation-backlog.mdand creates a todo list, one feature per item. - Draft business doc. A Draft Documenter subagent reads every relevant source file and writes a business-focused document for PM and analyst audiences.
- Draft technical doc. A separate Draft Documenter run reads the same source code plus the sibling business document, then writes the technical reference.
- Verify (GPT). A verifier reads both documents and the source code together, checking completeness, accuracy, pattern adherence, cross-document consistency, and clarity.
- Fix and update. Apply fixes for critical and major findings, save the files, update the backlog status.
- Repeat for the next feature.
The workflow runs sequentially because parallelizing across features risks inconsistencies between the business and technical documents for the same feature. It also avoids fast exploration subagents for code reading — documentation accuracy requires the higher-quality drafting subagent to read source files directly rather than relying on summarized output.
Security
The security workflow is a structured read-only assessment that ends with a prioritized report, not code changes.
- Scope and threat intel. Pen Tester defines the assessment scope and researches current CVEs, advisories, and relevant threat intelligence before touching any code.
- Dependency and config audit. Review requirements files, security configuration, rate limiting, Docker setup, secrets handling.
- OWASP Top 10. Step through access control, injection, cryptographic failures, misconfiguration, logging gaps, SSRF, and the rest. Remove false positives and calibrate severity.
- Scrutiny round 1 (GPT). Verify exploitability of findings, look for missed attack paths, challenge severity ratings, merge or reject findings.
- Scrutiny round 2 (GPT). Check the merged report for remaining blind spots, severity consistency, remediation quality.
- Report. Executive summary, threat intelligence, findings, documented exceptions, recommendations.
Two reasons it's read-only. First, the deliverable is a report — letting the assessor silently fix issues hides things the team should see and breaks assessor independence. Second, every codebase has intentional security decisions that look like vulnerabilities, so the workflow carries a documented exceptions list. Those don't get flagged again on every run.
What's shared across all four
One coordinator owns each workflow; the heavy lifting goes to focused subagents. External GPT review is always a separate step from the work itself, run in two passes rather than one. Workflow state lives on disk — architecture docs, changed-file lists, backlog files — not in conversation memory. Findings from GPT review get deduplicated and independently verified, and the Implementer or Documenter can reject them with justification rather than accepting blind. Review cycles have a fixed count, usually two, and unresolved issues escalate to a human instead of looping. Every workflow ends with a concrete summary of what was produced, what was reviewed, and what remains open.