Orchestrating Multiple Claude Code Agents with a Director Loop

The common pattern every serious Claude Code user eventually adopts: open two terminal windows, run one Claude Code session in each, and play telephone operator between them. One session writes the code. One session reviews it. Copy the verdict from window A, paste it into window B, hit enter, alt-tab back, wait, repeat.

It works. It also wastes an afternoon. The agents do the work in seconds; the human shuttling text between them is the bottleneck.

The obvious fix: a third Claude Code session that doesn't write code at all. It directs the other two. It reads their screens, drafts their next prompts, approves their permission requests, and decides when to ship. The user talks to it; everything else is its job.

The tool that makes this real is cmux. It's a terminal multiplexer that exposes every pane and surface as an addressable CLI object: cmux send, cmux read-screen, cmux wait-for, the whole thing. The orchestrator session uses that CLI to drive other Claude Code sessions running in adjacent panes. cmux is the wiring; Claude Code on both ends is the brain.

Below is the full anatomy of the setup, the antipatterns that cost an afternoon, and the workflow that's now the default for any non-trivial change.

The director principle

The orchestrator session has one rule that everything else falls out of: it never writes code, never edits files, never runs implementation commands. Its three jobs are:

Stay aligned with the user. When asked "what's going on?", it answers in one breath.
Take decisions. Read the implementing sessions, evaluate output, approve or deny permissions, call SHIP or NO-SHIP.
Direct. Spawn new panes, draft precise prompts, redirect drifting sessions, order reverts.

The user does not tell it to do those things. The user states a goal. "Refactor this 1,500-line file into a folder package." "Add a docstring to these five modules." It picks the tool, the parallelism, the model per session, the sequencing. That is the entire bargain. Goals in; the how is handled.

This means the orchestrator's context window is precious. Every raw cmux send-key Enter Bash call that lands in its conversation is mechanics-noise stealing space from the next decision. The single biggest win in this whole setup came from one realization: the orchestrator should delegate cmux mechanics to a subagent the same way it delegates implementation to a child session.

Two delegation patterns, one default

Early on the orchestrator picked between two ways to spawn an implementer:

Pattern A, interactive cmux pane. A real Claude Code session living in a visible split, permission prompts surfacing naturally, the orchestrator able to cmux send corrections mid-run.
Pattern B, headless claude -p subprocess. Bounded one-shot, output piped to a JSONL file, no interactivity.

Pattern B sounds clean. It is a trap. The instant the orchestrator runs claude -p ... | tee output.jsonl inside its own Bash, two things go wrong: the tee triggers the orchestrator's own permission wall, and the subprocess blocks the orchestrator's main loop so the user can't message it. A director that can't take messages from the user is no longer a director.

Pattern A is the default. Always. The rule: if the task involves writing or editing any file at all, open a real cmux pane. Pattern B is for genuinely fire-and-forget read-only probes, and even then it's rarely used.

The spawn ritual collapses to:

cmux new-split right --pane pane:1
cmux list-pane-surfaces --pane pane:2          # → surface:5
cmux send --surface surface:5 "claude --model claude-sonnet-4-6 --permission-mode acceptEdits"
cmux send-key --surface surface:5 Enter
# Then point the session at a task file written via the Write tool

Two flags do enormous lifting there. --model is explicit; never let Claude pick. Opus 4.7 for the orchestrator, Sonnet 4.6 for implementers, Haiku 4.5 for cheap read-only probes. --permission-mode acceptEdits makes the implementing session auto-approve file edits, so it doesn't stall waiting for a "yes" click on its own output. Bash commands still surface; that's what the permission pipe-back is for.

The polling antipattern (and the one primitive that actually works)

The first version of the orchestrator did this constantly:

until cmux read-screen --surface surface:5 --lines 40 | grep -qE "done|complete"; do
  sleep 4
done

Three problems. Claude Code's harness will block bare sleep calls. Short-polling burns the prompt cache. And it locks the orchestrator's Bash tool, so the user can't talk to it mid-task.

The next attempt used cmux notify. The implementing session would fire cmux notify --surface <orchestrator> when it finished. The orchestrator would sit idle until... nothing. The notification appeared in cmux's UI as a system ping, but the Claude Code agent loop on the orchestrator never re-engaged.

This is the most important non-obvious finding in the whole setup: a Claude Code session at its idle prompt only re-engages on three triggers: user input, a background Bash subprocess it started completing, or an explicit scheduled wakeup. A cmux notify from another pane satisfies none of those. It is UI ornamentation.

The primitive that actually works is cmux wait-for running in run_in_background: true:

# Orchestrator launches this as a background Bash:
cmux wait-for impl-myslug-done --timeout 1200

# The implementing session's last action in its task prompt:
cmux wait-for -S impl-myslug-done

When the implementer signals, the orchestrator's background Bash exits. The harness flags background-task-complete. The orchestrator's agent loop wakes up. Then one targeted cmux read-screen to verify what actually happened, and decision time. Zero polling, fully event-driven, orchestrator stays conversational the whole time.

cmux notify still has a place: as a desktop courtesy ping, badging a surface, flashing a screen corner. It just cannot be the sequencing mechanism. Always wait-for. Always with a timeout so a crashed implementer doesn't hang the loop forever.

The workflow-manager subagent

After a first full run, the orchestrator's conversation contained nine raw cmux Bash entries for one tiny task. cmux new-split, cmux list-pane-surfaces, cmux send for the model launch, cmux send-key Enter, cmux send for /rename, cmux send-key Enter, cmux send for the task pointer, cmux send-key Enter, cmux trigger-flash. At three tasks per session that's a quarter of the orchestrator's window gone before any decisions get made.

The fix is the same trick the orchestrator pattern uses on the user side: delegate. A subagent, call it cmux-workflow-manager, owns every low-level cmux command. The orchestrator invokes it with intent; the subagent returns structured results.

Seven operations, all with the same input/output discipline:

Operation	What it does
`draft_task`	Reads source files, derives responsibilities, writes a task file, returns the expected diff scope as one line
`spawn`	Creates the pane, launches Claude with the right flags, sends `/rename` + task pointer, returns surface + wait-token
`verify`	Reads the actual file on disk (not the implementer's claims), runs `git diff` + AST parse, returns SHIP / NO-SHIP / NEEDS-CODE-REVIEW
`redirect`	Sends corrective instructions to a drifting session, optionally issues a new wait-token
`approve_permission`	Reads a permission prompt, takes the orchestrator's yes/no, sends the keystroke
`status`	Snapshots all live `impl-*` sessions and classifies each as running / idle / blocked / done
`close`	Exports the transcript, appends a ledger entry with a real timestamp, closes the pane, equalizes remaining panes

After this, the orchestrator's context shows two entries per task instead of nine:

⏺ cmux-workflow-manager(spawn impl-myslug)
  ⎿ SPAWNED slug=impl-myslug surface:9 wait_token=impl-myslug-done
⏺ Bash(cmux wait-for impl-myslug-done --timeout 600, run_in_background: true)

Roughly 80% less mechanics-noise. The director gets to be a director.

There's one constraint to be careful about: the cmux wait-for itself has to live in the orchestrator's Bash, not in the subagent. The harness only re-engages an agent when a background task that agent started completes. A subagent's Bash is its own context; it doesn't wake the orchestrator. So the subagent does spawn and returns; the orchestrator starts the wait-for; on completion the orchestrator invokes verify.

Why the orchestrator never trusts the implementer

The first NO-SHIP came on a one-line lint fix. The task: drop as exc from a except Exception as exc: block because the linter said exc was unused. The implementing session did exactly that, and also stripped exc from a downstream f"status_flip: {exc}" log call, silently degrading the rollback error report. The linter was wrong (or rather, too narrow). The implementer didn't notice. It reported "done."

The orchestrator's verify step read the actual file, diffed it, saw the f-string mutation, and called NO-SHIP. Redirect: "You stripped exc from the f-string at line 720. Keep the f-string, only drop the as exc binding from the except clause." Same session, same context, single corrective message. SHIP on the second try.

This is why the workflow manager's verify does three independent ground-truth checks: Read the file, git diff --stat, python -c "import ast; ast.parse(...)". Not because the implementing session lies; because it can over-fix, under-fix, or hallucinate completion in entirely good faith.

The hard rule baked into CLAUDE.md: the orchestrator never SHIPs based on the implementing session's self-report. Always re-read the file on disk. The session's narration is a hypothesis; the diff is the evidence.

This also creates a natural two-tier SHIP gate. For trivial changes (docstrings, comments, formatting, naming), mechanical verify is enough. For changes touching production code paths (agent loops, request handlers, financial computations, migrations), verify returns NEEDS-CODE-REVIEW and the orchestrator must invoke a production-code-reviewer subagent before SHIP. Any CRITICAL finding from the reviewer is automatic NO-SHIP with the findings used as the corrective message.

Naming, exports, and a session ledger

Once 2+ implementing panes are alive, surface:7 and surface:9 are meaningless. The implementing session's first action, always, is /rename impl-<slug>. The slug names the work: impl-trade-doctor-decomp, impl-alert-hygiene, impl-data-lens-docstring. Now cmux list-pane-surfaces reads like a project board instead of a number salad.

The export discipline mirrors this. When work ships, /export tmp/sessions/<slug>.txt dumps the full transcript to a predictable path. The orchestrator can Read it in chunks for deep audits or to build a handoff prompt when an implementing session nears its context limit.

And one append-only file ties everything together: tmp/sessions/_ledger.md. The close operation appends one line per session in this format:

2026-05-15T21:28:26+05:30 | SHIP | tools-docstrings | Expanded module docstrings across 10 tool modules

Real ISO-8601 timestamp from date -Iseconds. Outcome token strictly one of SHIP, NO-SHIP, ABANDON. Not SHIPPED, not Closed, not Done. Slug stripped of the impl- prefix. Concrete summary naming files or functions touched. The orchestrator reads the ledger at startup (or after /clear) to recover context about prior work in the session.

These look like nitpicks until a subagent fabricates sequential placeholder timestamps (T00:00:00, T00:01:00, T00:02:00) for parallel ledger entries. Format strictness is what makes the ledger trustworthy at audit time. The way to get the subagent to honor it is to put GOOD / BAD examples at the top of the operation section, not in a "Hard Rules" appendix 200 lines later. Rules buried deep in a long prompt don't propagate; rules with concrete contrast right above the step do.

The permission system, demystified the hard way

There are two settings files. They look similar. They do not behave similarly.

.claude/settings.json accepts an allowedTools array. This is not the file that controls per-command allow lists at runtime. A clean one of these has no effect.

.claude/settings.local.json has a permissions.allow array. This is the real file. The format is Bash(cmd:*) with a colon-asterisk, not Bash(cmd*) with just an asterisk. The space-vs-colon thing eats hours if unknown.

Even with the right file and the right format, glob patterns have weird gaps. Bash(cmux send:*) covers cmux send ..., but a fresh subcommand like cmux set-buffer ... slips through until you add it. The catch-all Bash(cmux:*) is the cleanest answer. Same trick for git: Bash(git -C:*) handles every git -C <path> ... invocation regardless of path.

Two layers, two control surfaces:

Layer	What it gates	How you control it
Bash	Shell commands	`permissions.allow` entries `Bash(cmd:*)` in `settings.local.json`
Write / Edit / Read	File operations	`--permission-mode acceptEdits` on the session itself

These are independent. acceptEdits does not bypass a Bash prompt; an allow-list entry does not bypass a Write prompt for a new file outside the project. The orchestrator's cmo alias is claude --model opus --permission-mode acceptEdits precisely so Write prompts for task files never appear.

One path must not hold handoff files: .claude/. Claude Code treats anything inside that directory as configuration territory and prompts every time, regardless of permission mode. "Yes, allow Claude to edit its own settings." That's a deliberate safety guard against runaway sessions rewriting their own config. Use tmp/sessions/ at the project root instead: a normal directory, acceptEdits covers it, .gitignore it, done.

The other gotcha is static-analysis permission prompts. Long multi-line strings inside cmux send get blocked because Claude Code's analyzer can't statically prove what the command will do. The robust fix: never embed the prompt body inline. Use the Write tool to drop the task at tmp/sessions/<slug>-task.txt, then send one short cmux send pointing the session at the file path. Every cmux call stays single-line and statically clean.

Clarification: the two-way channel

When the implementer reads the task and realizes the goal is ambiguous, without a channel it either guesses (bad) or sits idle (waste). The clarification protocol gives it a way out.

Every drafted task ends with:

If anything is ambiguous, or you discover the goal conflicts with what the code
actually does:
  1. Write your question to tmp/sessions/<slug>-question.txt
  2. Run: cmux wait-for -S <slug>-question
Do NOT guess. Do NOT proceed on assumption.

The orchestrator's background wait-for accepts either <slug>-done or <slug>-question as the signal. When the question token fires instead, the orchestrator reads the question file, drafts an answer, and invokes redirect with the clarification plus a new wait-token. The implementer never has to guess.

This is one of those features that pays for itself the first time. A 30-second clarification beats a 20-minute respawn after the implementer goes the wrong direction.

The SessionStart hook

Every fresh orchestrator session used to dump ~50 lines of cmux identify --json output into its context just to learn its own surface ID. Useful once, never again.

A SessionStart hook fixes that:

# ~/.claude/hooks/orchestrator-context.sh
echo "── cmux context ─────────────────────────"
echo "self: $(cmux identify --json | jq -r '"\(.focused.surface_ref) in \(.focused.pane_ref) (workspace \(.focused.workspace_ref))"')"
echo "layout: $(cmux list-panes)"
echo "named surfaces visible: $(cmux list-pane-surfaces | tr '\n' ' ')"
echo "─────────────────────────────────────────"

Wired into ~/.claude/settings.json as a SessionStart hook. Every Claude Code session in cmux now boots with a one-block topology summary already in context. No JSON dump, no orchestrator self-discovery, ~4k tokens saved per session.

Three Claude Codes, one human

Once everything above is in place, the loop looks like this:

Open a fresh orchestrator with the cmo alias (claude --model opus --permission-mode acceptEdits). SessionStart hook injects topology. State "this is an orchestrator session. Refactor file X into a folder package, same style as Y."
The orchestrator invokes cmux-workflow-manager.draft_task. The subagent reads X and Y inside its own context, writes a precise task spec to tmp/sessions/impl-x-refactor-task.txt, returns the expected diff scope as one line.
The orchestrator invokes cmux-workflow-manager.spawn. A new cmux pane appears with a named Sonnet session, task file pointed at, wait-token armed.
The orchestrator starts cmux wait-for impl-x-refactor-done --timeout 1200 in a background Bash. Returns to the chat. The user can ask it anything; it's still listening.
The implementer does the work. File edits auto-approve (acceptEdits). Bash commands prompt through approve_permission if they're not in the allow-list. When done, it runs cmux wait-for -S impl-x-refactor-done.
The orchestrator's background Bash exits. The harness wakes it. It invokes verify: read the actual file, diff stat, AST parse. For a hot-path refactor it returns NEEDS-CODE-REVIEW; the orchestrator spawns production-code-reviewer against the transcript and the file. Any CRITICAL is automatic NO-SHIP plus redirect.
SHIP. The orchestrator invokes close: export transcript, real timestamp from date -Iseconds, append the ledger line, close the pane, equalize panes via cmux send-key "C-Cmd-=".
A one-line SHIP summary appears in the chat. Total raw cmux calls in the orchestrator's context for this task: 1, the initial topology check from the SessionStart hook.

For five parallel implementations the same thing happens five times concurrently, with five wait-for background Bashes and five verify / close cycles. The orchestrator does not block on any one of them.

What this actually buys

The honest measurement: the work the agents do is the same. Sonnet writes Sonnet's code; the reviewer catches what the reviewer catches. LLM output per minute is not 10x'd.

What does change is the human role. The user used to be the message bus, the permission adjudicator, the context-switcher between four windows, the audit gate. All of that is now the orchestrator's job. The user is back to being the product owner: stating goals, evaluating outcomes, redirecting when things drift. Hands leave the keyboard for minutes at a time during a refactor that used to demand them constantly.

That is the 10x. Not "the AI got faster." "The human stopped being the slow part."

It also produces visibly better code, for one structural reason. The audit gate is now non-skippable. Before this setup the review step got hand-waved when tired ("looks fine, ship it"). Now the orchestrator can't ship a hot-path change without a NEEDS-CODE-REVIEW round, and the user can't override that without typing the override manually. Discipline that depends on user discipline fails on Friday at 6pm. Discipline encoded into the workflow doesn't.

A few hard-won rules

Sticky-note rules:

cmux notify is not a sync primitive. Use cmux wait-for in run_in_background: true. Always with a timeout.
acceptEdits goes on the alias, not patched per-session. Write prompts are gated by permission mode, not by allow-list patterns.
tmp/sessions/, never .claude/sessions/. The latter is treated as settings territory and prompts every time.
Two settings files exist. settings.local.json with permissions.allow and the Bash(cmd:*) format is the real one.
Make cmux resolve to the CLI shim, not the GUI app binary. If which cmux returns the .app bundle path, prepend /Applications/cmux.app/Contents/Resources/bin to PATH.
Never embed long prompts inside cmux send. Write the prompt to a file with the Write tool; the cmux call just points at the path.
Format-strict rules need GOOD / BAD examples at the top of the operation, not buried in an appendix. Subagents don't internalize rules they don't see in context.
Pre-approve foreseeable patterns in settings.local.json before running, not reactively in the middle of a run. If three impl sessions will each run git -C <path> diff, add the entry first.
The orchestrator never SHIPs on self-report. The implementer's narration is a hypothesis; the file on disk is the evidence.

The whole setup is roughly: one CLAUDE.md section for the rules, one subagent definition for the workflow manager, one SessionStart hook for topology, the right alias for the orchestrator, the right entries in the allow-list. A weekend of iteration to get it right; a permanent change in how shipping works after.

For anyone juggling Claude Code windows like a switchboard operator, this is the stop-doing-that move. The agents are fast. cmux is the wiring. The orchestrator is the missing layer. Build it once and stop being the bottleneck.

References

cmux: the terminal multiplexer this workflow is built on.
Claude Code: Anthropic's official CLI for Claude.
Claude Code documentation: hooks, slash commands, subagents, settings reference.