clankvox
ClankVox
ClankVox is Clanky's main Rust package for voice and media transport. It is the native media plane Clanky delegates to when a platform needs realtime sockets, codec work, packet timing, encryption, or low-level media telemetry.
ClankVox handles Discord voice and Go Live, and future platform-specific voice/media transports should live at the same layer.
Clanky stays agentic: prompts, settings, gateway control, Realtime sessions, tools, Pi delegation, and product behavior. ClankVox stays deterministic: platform media transport mechanics, RTP/RTCP, codecs, transport encryption, playback pacing, media capture/publish, and IPC. For Discord, that means Opus, DAVE, H264/VP8, music PCM, and Go Live watch/publish.
1. What You Can Do
ClankVox makes native media workflows available to Clanky. Its Discord-backed workflows include:
- join a Discord voice channel and capture speaker audio
- stream assistant speech or music back into Discord with correct pacing
- keep transport truth local instead of guessing from Node-side queued bytes
- receive native Go Live frames for screen-watch workflows
- publish narrow H264-backed Go Live sources when Clanky orchestrates the source
- keep voice, screen, and playback telemetry available to Clanky's floor-control policy
2. What Clanky Lets ClankVox Handle
Clanky delegates realtime media transport work that should not live in the Node runtime. For Discord, that includes:
- Discord voice and stream-server sockets
- UDP/RTP send and receive
- codec advertisement and packet framing
- DAVE session lifecycle and encryption/decryption
- Opus encode/decode and PCM normalization
- inbound audio/video capture events
- outbound TTS/music playback cadence
- native Go Live watch and self-publish media paths
ClankVox does not decide whether the agent should answer, interrupt, remember, search, or delegate. It reports transport truth. Clanky applies product policy.
3. Mental Model
flowchart TB
platforms["Platform media surfaces"]
discord["Discord voice and stream servers"]
future["Future platform transports"]
clanky["Clanky Node runtime<br/>control plane, Realtime, tools, policy"]
ipc["stdin line JSON<br/>stdout framed messages"]
vox["ClankVox Rust media plane"]
transports["Transport implementations"]
voice["Discord voice<br/>audio capture + playback"]
watch["Discord stream_watch<br/>Go Live receive"]
publish["Discord stream_publish<br/>Go Live send"]
media["RTP, codecs, encryption, PCM"]
clanky <--> ipc
ipc <--> vox
vox --> transports
transports --> voice
transports --> watch
transports --> publish
transports -.-> future
voice --> media
watch --> media
publish --> media
platforms --> discord
platforms --> future
media <--> discord
The Discord transport exposes three roles. voice is the anchor voice
connection. stream_watch and stream_publish are separate stream-server legs
because Discord Go Live is not just an extra field on the normal voice socket.
Future platform transports can add their own roles while keeping product policy
in Clanky and media mechanics in ClankVox.
Runtime Shape
The entrypoint is src/main.rs. At startup ClankVox:
- installs rustls crypto
- starts a single IPC writer and reader
- creates shared
AppState - enters one
tokio::select!loop - multiplexes IPC, voice events, music events, reconnect timers, and the 20ms send tick
Most behavior is split across supervisor-style modules:
- src/app_state.rs: shared state and transport slots
- src/connection_supervisor.rs: connect, disconnect, and reconnect control
- src/capture_supervisor.rs: inbound audio/video events and subscriptions
- src/playback_supervisor.rs: TTS/music playback and periodic send tick
- src/stream_publish.rs: outbound Go Live sender pipeline
- src/voice_conn.rs: Discord voice/stream transport, WebSocket, UDP, RTP, codec negotiation, DAVE, and packetization
- src/ipc.rs: Clanky <-> Rust message contracts
What To Read
- Architecture: process model, ownership boundaries, transport roles, IPC, and module map.
- Diagram: docs-UI-friendly media-plane map.
- Audio Pipeline: capture, TTS, music, playback pacing, and telemetry.
- Go Live: native screen watch, native self publish, stream discovery, sender/receiver flows.
- Development: build/test commands, logs, and where to make changes.
For the product layer above ClankVox, jump to Clanky Start Here.
Build And Test
cargo test
OPUS_STATIC=1 OPUS_NO_PKG=1 cargo build --release
pnpm docs:dev
From Clanky, the Node wrapper normally uses:
pnpm voice:native:test
pnpm voice:build
Discord Boundaries
- ClankVox is Clanky's main package for native voice/media transports; Discord voice and Go Live are the implemented transport family.
- Inbound native screen watch is integrated end to end through
stream_watch. - Outbound publish exists and is intentionally narrow: YouTube-backed music/video URLs plus browser-session PNG frames, H264 sender transport, and Clanky-owned source orchestration.
- Native Go Live behavior depends on Discord user-token/selfbot flows.
- Go Live DAVE video decrypt and raw UDP keyframe feedback remain the important transport constraints; see Go Live for detail.
