Swarm-Launcher Redesign — Parallel Execution Plan
Companion to swarm-launcher-redesign-plan.md. Maps the six phases onto a dependency DAG and assigns parallel work waves so the redesign lands in ~3–4 days of wall time instead of ~7–10 done serially.
Dependency DAG
Wave 1 — Preflight (parallel, ~3 hours)
┌──────────────────────────────────────────────────────┐
│ Agent P1: 0.1 DB ergo + 0.2 SWARM_DB_PATH plumbing │
│ Agent P2: 0.3 label format + 0.4 worker contract doc │
│ + shared fake-harness test fixture │
└────────────────────┬─────────────────────────────────┘
│ merge to main
┌────────────────────┴─────────────────────────────────┐
│ │
│ Wave 2A — Worker spawn track (~1.5 days) │
│ Agent W: Phase 1 (swarmDb, reservationKeeper) │
│ → Phase 2 (swarmLauncher) │
│ Files: src/agents/swarmDb.ts │
│ src/agents/swarmReservationKeeper.ts │
│ src/agents/swarmLauncher.ts │
│ (small) src/agents/codeAgentSwarm.ts edit │
│ │
│ Wave 2B — Peer track (~1 day, parallel with 2A) │
│ Agent P: Phase 3 (swarmPeer, peer manager) │
│ Files: src/agents/swarmPeer.ts │
│ src/agents/swarmPeerManager.ts │
│ (read-only) swarm-mcp/src/{registry,tasks, │
│ messages,context,kv}.ts │
└────────────────────┬─────────────────────────────────┘
│ both merged
▼
Wave 3 — Integration (~1 day, sequential)
Agent I: Phase 4 (code_task rewires to swarm path)
Files: src/tools/replyTools.ts (heavy edit)
src/agents/swarmTaskWaiter.ts (new)
src/voice/voiceSessionManager.ts (small)
src/bot.ts (small, runtime construction)
src/bot/agentTasks.ts (small)
src/settings/settingsSchema.ts (execution.mode flag)
│
▼
Wave 4 — Soak (release cycle, operator-driven, no agent)
│
▼
Wave 5 — Deletion (~half day, one agent)
Agent D: Phase 6 (delete old in-process session machinery)
Why this DAG
The tracks parallelize cleanly because the new code is mostly new files, not edits to shared files:
swarmDb.ts,swarmReservationKeeper.ts,swarmLauncher.ts,swarmPeer.ts,swarmPeerManager.ts,swarmTaskWaiter.ts— all new. Zero merge surface with each other.- The few shared-edit files (
codeAgentSwarm.ts,settingsSchema.ts,replyTools.ts) are touched by exactly one wave each, so no two agents fight over the same lines. - Wave 2A and Wave 2B share only the low-level DB connection helper from Phase 1.1. Wave 2B can stub that helper for its first day of work and swap to the real one when 2A's branch lands.
The narrow point is Wave 3 (integration): it must read all of 2A and 2B's output. That's why it's a single agent with full context, not a parallel job.
Agent assignments
Wave 1 — Preflight
Agent P1 (~2 hours)
- Inputs: this plan + the redesign doc.
- Deliverables:
src/agents/swarmDbConnection.ts— small helper that opensSWARM_DB_PATH(or~/.swarm-mcp/swarm.db) with WAL + 3s busy timeout viabun:sqlite. Exported for reuse by Phase 1 and Phase 3.src/agents/swarmDbConnection.test.ts— verifies WAL mode + busy timeout + concurrent open.- Schema-snapshot test: spawn
bun run /path/to/swarm-mcp/src/index.tsonce, then assert theinstancestable hasid, scope, directory, root, file_root, pid, label, adopted, heartbeat, registered_at. - Settings resolver
getSwarmDbPath(settings)if not already present.
Agent P2 (~2 hours)
- Inputs: this plan +
src/agents/codeAgentSwarm.ts. - Deliverables:
- Updated
buildSwarmLabelinsrc/agents/codeAgentSwarm.ts:66to emitorigin:clanky provider:<harness> role:<role> thread:<channelId> user:<userId>. - New
docs/architecture/swarm-worker-contract.mddescribing what every Clanky-spawned worker must do (auto-adopt, claim/update task, post result+metadata, error/exit semantics, progress viaannotate). - Shared fake-harness fixture at
src/agents/__fixtures__/fakeSwarmWorker.ts— a small Bun script Wave 2A and 2B will both use in tests. It registers via swarm-mcp adoption, optionally claims a task, optionally posts a fake result, and exits. Parameterized via env vars (FAKE_WORKER_BEHAVIOR=adopt_then_exit | claim_and_complete | hang | etc.).
- Updated
Both agents land independently. Merge order doesn't matter.
Wave 2A — Worker spawn track
Agent W (~1.5 days, sequential within the track)
- Inputs: Wave 1 merged. Reads
src/agents/codeAgentWorkspace.ts,src/llm/llmClaudeCode.ts,src/llm/llmCodexCli.ts,swarm-mcp/apps/swarm-ui/src-tauri/src/writes.rs(as reference). - Deliverables (in order):
- Phase 1:
src/agents/swarmDb.ts(reserveInstance,heartbeatUnadopted,deleteUnadopted,fullDeregister) +swarmReservationKeeper.ts+ tests using the fake harness. - Phase 2:
src/agents/swarmLauncher.tsexportingspawnPeer({...}) → SpawnedPeer. Wires Phase 1 + workspace provisioning + env injection + adoption polling. Includes the new first-turn preamble builder. - Updates
src/agents/codeAgentSwarm.tsto dropapplyCodeAgentFirstTurnPreamble's register-instructions in favor of behavioral-only preamble. - Tests: full reserve → spawn → adopt → exit using
fakeSwarmWorker.ts. Adoption-timeout test. Cancellation test.
- Phase 1:
Wave 2B — Peer track
Agent P (~1 day, parallel with Wave 2A)
- Inputs: Wave 1 merged. Reads
swarm-mcp/src/{registry,tasks,messages,context,kv,events,paths}.tsto understand surface area, then ports / imports needed pieces. - Deliverables:
- Phase 3:
src/agents/swarmPeer.tsexportingClankyPeerwithsendMessage,broadcast,pollMessages,requestTask,assignTask,getTask,updateTask,waitForActivity,annotate. src/agents/swarmPeerManager.tsexportingClankySwarmPeerManager.ensurePeer(scope, repoRoot, fileRoot).- Heartbeat loop (10s, mirroring swarm-mcp's own).
- Tests: peer registers/heartbeats/deregisters; multi-scope isolation; restart-recovery (stale peer rows cleanly re-registered).
- Phase 3:
Implementation choice: pick option A from the redesign plan (embed swarm-mcp DB modules in-process by re-implementing the small subset of operations Clanky calls, against the shared swarmDbConnection.ts). Do not spawn a swarm-mcp child for Clanky's own peer — that's option B and adds latency.
Wave 3 — Integration
Agent I (~1 day, sequential after 2A and 2B both merged)
- Inputs: everything in 2A + 2B + the redesign plan's Phase 4 spec.
- Deliverables:
src/tools/spawnCodeWorker.ts— the new tool handler. Owns permissions gate, resource caps, cwd resolution, and thepeerManager.ensurePeer → peer.requestTask → spawnPeer → peer.assignTasksequence. Returns{ workerId, taskId, scope }. ~50 lines.src/tools/sharedToolSchemas.ts— add thespawn_code_workertool schema. Add the conditional swarm-mcp tool schemas (request_task,wait_for_activity,get_task,update_task,send_message,broadcast,annotate,lock_file,unlock_file,check_file,list_instances,whoami,kv_*). Do not deletecode_taskyet — Wave 5 does that.src/tools/toolRegistry.ts— extend the conditional-tool pattern to gate the swarm-mcp tool surface behindpermissions.devTasks.allowedUserIds+ dev-channel allowlist. Bothspawn_code_workerand the swarm tools mount/unmount together for a given turn.- Each conditional swarm-mcp tool is a thin wrapper around the corresponding
peer.*method — they all resolve through the per-scope planner peer, so a single peer identity speaks for the orchestrator across the turn. src/agents/swarmActivityBridge.ts— runtime-side subscription registered once per active planner-peer scope. Watches swarm task events; emitscode_task_progress/code_task_resultsynthetic messages into the reply pipeline; routes voice-realtime completions throughVoiceSessionManager.requestRealtimeCodeTaskFollowup(...). ~120 lines.src/agents/swarmTaskWaiter.ts— small helper forwait_for_activity-style blocking used internally by both the conditional swarm tool and the/clank codeslash command. Returns theSubAgentTurnResult-shaped object.- Wire
peerManager,swarmReservationKeeper, andswarmActivityBridgeintosrc/bot.ts:353-426runtime construction. All three lifecycle-managed alongsidesubAgentSessions. - Tests: spawn happy path, cancel via
update_task, timeout, permission gating (non-dev user sees neitherspawn_code_workernor the swarm tools), resource cap rejection before any DB writes — all using the Wave 1 fake harness.
Wave 4 — Soak
No agent. Operator enables the new tool surface (spawn_code_worker + conditional swarm tools) for owner-only Discord users via agentStack.overrides.devTasks.allowedUserIds, runs for one release cycle, watches:
- Adoption-failure rate
- Median
spawn_code_workerreturn → terminal task event time - Error classification for
spawn_code_worker+ swarm-tool runs, compared against pre-cutover baselines where available - Cost/usage drift (worker self-report vs receipts)
- Orchestrator behavior signals: how often does the model use
send_messagefollowups vs spawning fresh workers?request_taskwithparent_task_id? Multi-worker fan-out?
Wave 5 — Deletion
Agent D (~half day, after soak passes)
- Deliverables: Phase 6. Delete the
code_taskschema fromsharedToolSchemas.tsand the entireexecuteCodeTaskblock inreplyTools.ts. Deletesrc/agents/codeAgent.ts(most of),codexCliAgent.ts,backgroundTaskRunner.ts. ShrinkcodeAgentSwarm.ts. MoveresolveCodeAgentConfig+isCodeAgentUserAllowedtocodeAgentSettings.ts. TrimsubAgentSession.tsto drop the"code"type variant. TrimllmClaudeCode.tsandllmCodexCli.tsto keep only arg-builders. Rewire/clank codeand the dashboard form to callspawn_code_worker+wait_for_activityserver-side. Update all remaining call sites.
This is mechanical. One agent, single PR, no parallelism needed.
Coordination mechanics
Worktrees, not the same checkout
Each Wave 2 agent works in its own git worktree off main:
clanky/ # operator
clanky-worktrees/wave2a-spawn/ # Agent W
clanky-worktrees/wave2b-peer/ # Agent P
These worktrees are operator-managed at development time, not Clanky-spawned at runtime. The redesign drops the runtime isolated_worktree workspace mode (Clanky no longer creates worktrees per worker); manually creating worktrees here is just a normal multi-agent dev workflow.
Reasons:
- The redesign deletes a chunk of
src/agents/later. Agents shouldn't see each other's WIP. - Both agents get a clean branch off
mainso neither inherits the other's unmerged commits.
Shared fixtures live on main
fakeSwarmWorker.ts (Wave 1 P2) lands on main before Wave 2 starts. Both Wave 2 agents pull from main, so both have the fixture. No cross-branch dependency.
Merge order between 2A and 2B
Either order works — they don't touch each other's files. Whichever PR is reviewed first lands first. The second rebases onto main (no conflicts expected).
Merge sequence for Wave 3
Wave 3 (integration) requires both 2A and 2B merged. If 2B is delayed, Wave 3 can start against 2A only and stub peer.requestTask/peer.waitForTaskCompletion — but this is wasteful, so prefer to wait.
Optional: dogfood by running the agents through swarm-mcp itself
Each Wave 2 agent registers in a swarm scope at the clanky repo root with role:implementer name:wave2a / name:wave2b. They use lock_file before editing the few shared files (codeAgentSwarm.ts, settingsSchema.ts). They post annotate calls as they finish each phase. The operator (or a role:planner peer) watches via swarm-ui.
This is meta — using swarm-mcp to build clanky's swarm-mcp integration — and surfaces real ergonomic issues. Recommended but optional.
Headcount and wall-clock estimate
| Wave | Agents | Wall time | Reason for shape |
|---|---|---|---|
| 1 | 2 parallel | ~3 hours | Both tracks small and independent |
| 2 | 2 parallel (W + P) | ~1.5 days | Bottlenecked by Wave 2A which has two phases sequentially |
| 3 | 1 | ~1 day | Integration must see everything |
| 4 | 0 (operator) | release cycle | Soak, not coding |
| 5 | 1 | ~0.5 day | Mechanical deletion |
Total wall time (excluding soak): ~3.5 days with 2-3 concurrent agents at peak. Serial baseline (one agent, no parallelism): ~7–10 days. Parallelism gain: roughly 2x.
You don't get more than 2x because Phase 4 is a hard serialization point. Spending agents on 4 different sub-tasks within Phase 4 costs more in coordination than it saves in time.
Risks specific to parallel execution
| Risk | Mitigation |
|---|---|
Wave 2A and 2B both touch swarmDbConnection.ts differently | Wave 1 P1 ships this first as a stable, tested helper. No edits in Wave 2. |
| Wave 2B stubs the DB helper differently from Wave 2A's real one | Wave 2B uses the real helper from Wave 1. No stubs. |
| Fake harness ships too late | Wave 1 P2 includes it. Wave 2 cannot start without it. |
| Wave 2B drifts from swarm-mcp's actual DB semantics | Re-port from swarm-mcp/src/registry.ts etc. directly. Add a "schema parity" test that calls swarm-mcp's register tool and Clanky's peer.register against the same DB and asserts identical row state. |
| Two agents both write to the same DB during tests | Each test uses an isolated temp SWARM_DB_PATH. Enforce in test fixtures. |
| Wave 3 integration agent runs out of context loading 2A + 2B | Land 2A and 2B before kicking off Wave 3 so the integration agent reads finalized files, not WIP diffs. |
| Operator becomes the merge bottleneck | Two PRs in Wave 1, two in Wave 2, one in Wave 3, one in Wave 5. Six merges over ~4 days is fine. If reviews block, route through one trusted reviewer agent (role:reviewer in the dogfood swarm). |
Bootstrapping the parallel run
If you start now:
- Operator: create
git worktrees forwave1-p1,wave1-p2. Spawn two agents (/clank codeor external claude/codex). Hand each their wave-1 deliverable list from this doc. - Merge both Wave 1 PRs (~3h later).
- Operator: create
wave2a-spawn,wave2b-peerworktrees. Spawn two agents. - As each PR lands, rebase the other onto main (no expected conflicts).
- After both merge, spawn Wave 3 agent in a fresh worktree.
- Soak over the release cycle.
- Spawn Wave 5 deletion agent at the end.
Total operator-active touchpoints: ~6 (one per agent kickoff + one per merge). Most of the wall time is agents working in parallel.
