docs/01-product-spec.md

Product Spec

Purpose

Build a local-first, graph-based orchestration harness for multiple coding agents. The system should make agent autonomy safe, visible, and controllable without forcing a fixed pipeline. Users can build custom workflows by wiring agent nodes together, while an orchestrator can supervise and coordinate complex tasks when needed.

Product vision

Customizable orchestration: Users build their own workflows by connecting agent nodes in a graph.
Agent autonomy: Each node acts as an autonomous CLI agent that can exchange handoffs with peers.
Orchestrator as supervisor: The orchestrator can act as the user's delegate to command many agents, but it should not be required for simple tasks.
Visibility over limits: We prioritize full visibility and stall detection over hard caps on run length.
Local-first privacy: All runs, events, and artifacts live on the user's machine.

Goals

Provide a flexible orchestration layer that works with existing CLI agents (Codex, Claude Code, Gemini, others).
Enable fast, safe iteration by letting agents hand off work and collaborate through structured payloads.
Maintain a complete, readable event log of everything the system did and why.
Support both skip-permissions and non-skip-permissions modes for all providers, with skip-permissions as the default.
Allow agents to apply changes directly, while still giving the orchestrator and user the ability to review.

Non-goals

Cloud hosting, multi-tenant collaboration, or remote runners.
Automated merge/conflict resolution. Agents must be prompted to edit in harmony; orchestrator reviews outcomes.
Hidden agent behavior. Every action should be observable.

Mobile companion

v0 includes a mobile companion (Expo / React Native) for run monitoring, approvals, and node inspection.

Core concepts (product-level)

Run: A single orchestration session with a graph of nodes and edges.
Node: A single agent session (CLI process or wrapper) with its own context and tool access.
Edge: A data link between nodes that agents can use when sending handoffs.
Orchestrator: A supervisor node that can delegate, review, and reconcile.
Global workflow mode: Planning vs Implementation. Planning is docs + research only. Implementation allows code edits.
Orchestration mode: Auto vs Interactive. Auto can re-prompt the orchestrator to achieve the goal. Interactive pauses orchestration for user input.

System overview

ASCII overview:

User
  |
  v
UI (graph + inspector) <----> Daemon (scheduler + event log)
  |                                  |
  |                                  v
  |                              Providers (CLI agents)
  |                                  |
  v                                  v
Workspace / Repo <---------------- Event + Artifact Store

Experience principles

Graph-first: The graph is the workflow. There is no fixed pipeline.
Autonomy with constraints: Nodes can work independently but must follow role-based permissions.
Consent and review: Risky actions can require approvals; orchestrator can always review.
Determinism where possible: Inputs, outputs, and handoffs are logged for replay.
Operational clarity: The UI should show what each agent is doing, what it consumed, and what it produced.

Functional requirements (high-level)

Multi-provider orchestration

Support multiple CLI providers with session continuity.
Prefer true stateful CLI sessions (long-lived process per node) for Claude Code, Codex (local fork via codex vuhlp), and Gemini (only when using a fork with stdin streaming) to preserve runtime state.
Unified event model for messages, tools, diffs, and approvals.

Graph workflow builder

Create nodes, connect edges, and run workflows.
Inputs are auto-consumed when delivered.

Orchestration modes

Auto: Orchestrator can self-loop and pursue the run goal.
Interactive: Orchestrator waits for user prompts; other nodes still process incoming inputs.

Planning vs Implementation

Planning: read-only repo access; writes limited to docs (configurable).
Implementation: code edits allowed; docs can also be updated.

Loop safety without hard caps

Detect stalls and useless loops via repeated outputs, unchanged diffs, or no new artifacts.
On stall, pause orchestration and notify the user with evidence.

Approval handling

Support skip-permissions (default) and non-skip-permissions modes for each provider.
Forward provider approval requests to the UI and allow user responses.

Complete observability

Event log for every run, including prompts, tool usage, and outputs.
Artifacts for diffs, logs, and transcripts.
Ability to reset a node's context quickly (clear session and start fresh).

Success criteria

A developer can rebuild the system from these docs and achieve parity or better behavior.
A user can run a complex task with multiple agents, see everything, and intervene safely.
The system prevents useless loops without hard-stopping long productive sessions.

Open questions (track here as needed)

None. Update as product decisions evolve.