clankvox
clankvox
clankvox is the Rust media-plane subprocess for Clanky.
It owns the Discord voice and Go Live transport work that Bun should not do itself:
- main voice transport for audio send/receive
- Discord DAVE lifecycle and transport encryption
- Opus encode/decode and RTP/UDP packet handling
- capture events for ASR and speaking state
- outbound playback for TTS and music
- native Go Live stream watch receive
- native Go Live stream publish send
- JSON-line IPC with the Bun runtime
Bun still owns the selfbot gateway session, orchestration, tools, prompts, settings, and product behavior. clankvox is deliberately the transport/media layer, not a second application runtime.
Current Roles
clankvox currently runs three transport roles:
voice: the normal Discord voice connection for audio capture and playbackstream_watch: a second stream-server connection for inbound Go Live video receivestream_publish: a second stream-server connection for outbound Go Live video send
The voice role is always the anchor session. The stream roles are optional and are started by Bun when Discord stream discovery produces the needed credentials.
Runtime Shape
The process entrypoint is src/main.rs.
At startup clankvox:
- installs rustls crypto
- starts a single IPC writer and an IPC reader
- creates shared
AppState - enters one
tokio::select!loop - multiplexes:
- inbound IPC commands from Bun
- voice events from
voice_conn - music events from the ffmpeg/yt-dlp pipeline
- a reconnect timer
- a 20ms send tick for live audio/video pacing
Most behavior is split across supervisor-style modules:
- src/app_state.rs: shared state and transport slots
- src/connection_supervisor.rs: connect/disconnect and reconnect control
- src/capture_supervisor.rs: inbound audio/video events and subscriptions
- src/playback_supervisor.rs: TTS/music playback and periodic send tick
- src/stream_publish.rs: outbound Go Live sender pipeline
- src/voice_conn.rs: Discord voice WebSocket, UDP, RTP, codec negotiation, DAVE, packetization
- src/dave.rs:
daveywrapper and role-specific encryption/decryption helpers - src/ipc.rs: Bun <-> Rust message contracts
What To Read
- docs/architecture.md: process model, ownership boundaries, transport roles, IPC, module map
- docs/audio-pipeline.md: capture, TTS, music, playback pacing, telemetry
- docs/go-live.md: native screen watch, native self publish, stream discovery, sender/receiver flows
- docs/development.md: build/test commands, logs, and where to make changes
Build And Test
From this directory:
cargo test
Release build on this repo normally uses static opus:
OPUS_STATIC=1 OPUS_NO_PKG=1 cargo build --release
From the repo root, the Bun wrapper uses:
bun run build:voice
Current Product Boundaries
The Go Live sender path exists, but current rollout is intentionally narrow:
- outbound publish is driven by Bun orchestration, currently through music video relay or explicit browser-session share
- current source gate is YouTube-backed music URLs plus browser-session PNG frames
- sender transport is H264-only
- live sender validation on Discord is still a separate follow-up from the code and test coverage already in place
Inbound native screen watch is already integrated end to end through stream_watch.
Known Limitations
- DAVE video decrypt on Go Live. DAVE video decrypt works at near 100% on the main voice connection (after the RTP padding strip fix in
rtp.rs). Go Live streams are a separate problem: thedaveycrate returnsUnencryptedWhenPassthroughDisabledfor every Go Live video frame, but those frames are actually encrypted. Video frames that fail DAVE decrypt are dropped. Screen watch relies on the unencrypted frames that arrive during the DAVE session transition window. Fixing Go Live requires either updatingdaveyto handle Go Live video DAVE framing, or switching to the WebRTC protocol path. - PLI/FIR does not work. Discord's media server does not process RTCP PLI/FIR feedback from raw UDP peers (only from WebRTC peers). PLI/FIR packets are still sent as a best-effort hint (periodic, after decoder reset, after DAVE ready), but keyframes cannot be requested on demand. The H264 depacketizer compensates by always prepending cached SPS+PPS to every emitted frame (after DAVE decrypt), and the persistent OpenH264 decoder processes all frames (IDR + P-frames) for reference state accumulation.
- VP8 ffmpeg decode EOF. VP8 still uses per-frame ffmpeg decode on the Bun side. The ffmpeg raw demuxer hangs on single-frame input; Bun works around this by piping through
cat | ffmpeg -fflags +genpts -f ivf -i pipe:0. H264 decode is handled entirely in-process by clankvox's persistent OpenH264 decoder and does not use ffmpeg.
Related Top-Level Docs
These local docs consolidate the transport/media-plane parts of several repo docs. The product-facing and cross-runtime docs still live at the repo root:
