docs/diagrams/voice-assistant-output-state.mmd

stateDiagram-v2 direction LR

state "Idle" as Idle
state "Response Pending" as ResponsePending
state "Awaiting Tool Outputs" as AwaitingToolOutputs
state "Speaking Live" as SpeakingLive
state "Speaking Buffered" as SpeakingBuffered

[*] --> Idle
Idle --> ResponsePending: "reply requested"
ResponsePending --> AwaitingToolOutputs: "tool call emitted"
AwaitingToolOutputs --> ResponsePending: "tool outputs submitted

follow-up requested" ResponsePending --> SpeakingLive: "audio delta received" ResponsePending --> SpeakingBuffered: "buffered TTS exists (API TTS / queued playback)" SpeakingLive --> SpeakingBuffered: "live deltas stop clankvox still buffered" SpeakingLive --> Idle: "reply drained no buffered playback" SpeakingBuffered --> Idle: "clankvox idle buffer depth = 0" ResponsePending --> Idle: "silent response cleared or stale active response recovered" AwaitingToolOutputs --> Idle: "request cancelled or session ends" SpeakingLive --> Idle: "barge-in or stop playback" SpeakingBuffered --> Idle: "barge-in or stop playback"

note right of ResponsePending
  Bun owns the canonical phase.
  OpenAI realtime contributes request/response signals.
end note

note right of SpeakingBuffered
  clankvox contributes
  tts_playback_state and buffer_depth.
  Bun still owns the phase.
  Buffered playback can enter here
  without a prior live-delta phase.
end note