docs/23-failure-modes-and-recovery.md
Failure modes and recovery
Status in v0
- Provider spawn errors mark nodes as failed.
- Errors are emitted as
node.progress/turn.statusevents. - On daemon start, persisted runs are rehydrated; running runs are set to
pausedand node connections are markeddisconnected. - Nodes can be started/stopped/reset via API.
What exists
- Run pause/resume: PATCH run status via
/api/runs/:runId - Node lifecycle:
/api/runs/:runId/nodes/:nodeId/start|stop|reset|interrupt
Not implemented in v0
- Automatic retries with backoff
- Structured failure artifacts
- Merge conflict reconciliation
- Loop-stall auto escalation beyond pause
