docs/diagrams/voice-assistant-output-state.mmd
stateDiagram-v2 direction LR
state "Idle" as Idle
state "Response Pending" as ResponsePending
state "Awaiting Tool Outputs" as AwaitingToolOutputs
state "Speaking Live" as SpeakingLive
state "Speaking Buffered" as SpeakingBuffered
[*] --> Idle
Idle --> ResponsePending: "reply requested"
ResponsePending --> AwaitingToolOutputs: "tool call emitted"
AwaitingToolOutputs --> ResponsePending: "tool outputs submitted
follow-up requested" ResponsePending --> SpeakingLive: "audio delta received" ResponsePending --> SpeakingBuffered: "buffered TTS exists (API TTS / queued playback)" SpeakingLive --> SpeakingBuffered: "live deltas stop clankvox still buffered" SpeakingLive --> Idle: "reply drained no buffered playback" SpeakingBuffered --> Idle: "clankvox idle buffer depth = 0" ResponsePending --> Idle: "silent response cleared or stale active response recovered" AwaitingToolOutputs --> Idle: "request cancelled or session ends" SpeakingLive --> Idle: "barge-in or stop playback" SpeakingBuffered --> Idle: "barge-in or stop playback"
note right of ResponsePending
Bun owns the canonical phase.
OpenAI realtime contributes request/response signals.
end note
note right of SpeakingBuffered
clankvox contributes
tts_playback_state and buffer_depth.
Bun still owns the phase.
Buffered playback can enter here
without a prior live-delta phase.
end note
