Created: March 13, 2026
Last commit: April 4, 2026
Rust100.0%
rustdiscordgo-livedavewebrtcrtpopusffmpegyt-dlpbunipcGo LiveGoLivedaveyrustlsOpenH264WebSocketUDPASRAutomatic Speech RecognitionJSON-lines IPCbun (Bun runtime)DAVE (davey wrapper)
README.md

clankvox

clankvox is the Rust media-plane subprocess for Clanky.

It owns the Discord voice and Go Live transport work that Bun should not do itself:

  • main voice transport for audio send/receive
  • Discord DAVE lifecycle and transport encryption
  • Opus encode/decode and RTP/UDP packet handling
  • capture events for ASR and speaking state
  • outbound playback for TTS and music
  • native Go Live stream watch receive
  • native Go Live stream publish send
  • JSON-line IPC with the Bun runtime

Bun still owns the selfbot gateway session, orchestration, tools, prompts, settings, and product behavior. clankvox is deliberately the transport/media layer, not a second application runtime.

Current Roles

clankvox currently runs three transport roles:

  • voice: the normal Discord voice connection for audio capture and playback
  • stream_watch: a second stream-server connection for inbound Go Live video receive
  • stream_publish: a second stream-server connection for outbound Go Live video send

The voice role is always the anchor session. The stream roles are optional and are started by Bun when Discord stream discovery produces the needed credentials.

Runtime Shape

The process entrypoint is src/main.rs.

At startup clankvox:

  1. installs rustls crypto
  2. starts a single IPC writer and an IPC reader
  3. creates shared AppState
  4. enters one tokio::select! loop
  5. multiplexes:
    • inbound IPC commands from Bun
    • voice events from voice_conn
    • music events from the ffmpeg/yt-dlp pipeline
    • a reconnect timer
    • a 20ms send tick for live audio/video pacing

Most behavior is split across supervisor-style modules:

What To Read

Build And Test

From this directory:

cargo test

Release build on this repo normally uses static opus:

OPUS_STATIC=1 OPUS_NO_PKG=1 cargo build --release

From the repo root, the Bun wrapper uses:

bun run build:voice

Current Product Boundaries

The Go Live sender path exists, but current rollout is intentionally narrow:

  • outbound publish is driven by Bun orchestration, currently through music video relay or explicit browser-session share
  • current source gate is YouTube-backed music URLs plus browser-session PNG frames
  • sender transport is H264-only
  • live sender validation on Discord is still a separate follow-up from the code and test coverage already in place

Inbound native screen watch is already integrated end to end through stream_watch.

Known Limitations

  • DAVE video decrypt on Go Live. DAVE video decrypt works at near 100% on the main voice connection (after the RTP padding strip fix in rtp.rs). Go Live streams are a separate problem: the davey crate returns UnencryptedWhenPassthroughDisabled for every Go Live video frame, but those frames are actually encrypted. Video frames that fail DAVE decrypt are dropped. Screen watch relies on the unencrypted frames that arrive during the DAVE session transition window. Fixing Go Live requires either updating davey to handle Go Live video DAVE framing, or switching to the WebRTC protocol path.
  • PLI/FIR does not work. Discord's media server does not process RTCP PLI/FIR feedback from raw UDP peers (only from WebRTC peers). PLI/FIR packets are still sent as a best-effort hint (periodic, after decoder reset, after DAVE ready), but keyframes cannot be requested on demand. The H264 depacketizer compensates by always prepending cached SPS+PPS to every emitted frame (after DAVE decrypt), and the persistent OpenH264 decoder processes all frames (IDR + P-frames) for reference state accumulation.
  • VP8 ffmpeg decode EOF. VP8 still uses per-frame ffmpeg decode on the Bun side. The ffmpeg raw demuxer hangs on single-frame input; Bun works around this by piping through cat | ffmpeg -fflags +genpts -f ivf -i pipe:0. H264 decode is handled entirely in-process by clankvox's persistent OpenH264 decoder and does not use ffmpeg.

Related Top-Level Docs

These local docs consolidate the transport/media-plane parts of several repo docs. The product-facing and cross-runtime docs still live at the repo root:

clankvox - JCV's Portfolio | JCV Portfolio