Native Discord Screen Share
Scope: native Discord Go Live in this repo: inbound watch, outbound self publish, what is built, what is still open, and how the share-link fallback fits. Canonical media hub:
../capabilities/media.mdProduct surface:screen-share-system.mdTransport stack:voice-provider-abstraction.mdDirect integration plan:../archive/selfbot-stream-watch.mdclankvoxlocal docs:../../src/voice/clankvox/README.mdReference implementations:Discord-video-stream,Discord-video-selfbot
This repo supports native Discord video in both directions:
- Native Go Live screen watch: subscribe to an active Discord Go Live stream through Discord's voice media protocol. Uses a separate
stream_watchRTC transport. - Webcam video watch: subscribe to a user's webcam video on the main voice connection. No separate transport needed — video arrives alongside audio on the same UDP socket. Activated as a fallback when the target has their webcam on but is not Go Live streaming.
- Native Discord self publish: create our own Go Live stream and send H264 video through the stream server connection.
- Share-link fallback: send
/share/:token, capture withgetDisplayMedia(), and POST JPEG frames back into the bot. This transport can be disabled completely withSTREAM_LINK_FALLBACK=false.
The model still only sees start_screen_watch for inbound visual context (both Go Live and webcam). Outbound publish is a runtime capability tied to the music pipeline, not a new conversational tool.
In the full-brain voice path, start_screen_watch is exposed as soon as native
screen watch is supported and enabled for the session, even if only discovered
Go Live state exists and no active share has been bound yet. That lets the
brain initiate OP20 STREAM_WATCH in response to "can you see my stream now?"
instead of waiting for a fully ready watch state or frame-backed sharer roster
before the tool appears.
Current Status
Status validated March 13, 2026.
The native Discord screen watch pipeline is built end to end in clankvox and Bun:
- selfbot gateway stream discovery for
VOICE_STATE_UPDATE.self_stream,STREAM_CREATE,STREAM_SERVER_UPDATE,STREAM_DELETE, andGUILD_CREATEvoice state scan for pre-existing streamers on connect - Gateway OP20
STREAM_WATCHrequest dispatch from the selfbot session clankvoxstream_watchIPC + second transport role- video receive, DAVE decrypt, H264/VP8 depacketization
- H264: persistent in-process OpenH264 decoder in clankvox decodes all frames (IDR + P-frames), turbojpeg encodes to JPEG, emitted as
DecodedVideoFrameIPC - VP8: raw bitstream forwarded to Bun for per-frame ffmpeg keyframe decode
- explicit-target native watch can start from discovered Go Live state before native video-state frames have populated the sharer list
Validated live on the selfbot runtime (March 13, 2026):
- the selfbot receives stream discovery events
STREAM_WATCHyields stream credentials- Bun forwards those credentials into
clankvox clankvoxopens the second stream transport, completes the modern watcher handshake, reaches DAVE-ready, and forwards encrypted H264 frames back to Bun- DAVE MLS E2EE session completes successfully on the stream watch transport. DAVE video decrypt works at near 100% on the main voice connection (after the RTP padding strip fix), but per-frame video decrypt still fails for Go Live streams — the
daveycrate classifies Go Live video asUnencryptedWhenPassthroughDisabled. Screen watch frames arrive during the post-commit unencrypted window instead. - H264 frames are decoded in-process by clankvox's persistent OpenH264 decoder and emitted as pre-encoded JPEG via
DecodedVideoFrameIPC - VP8 keyframes are decoded by Bun via per-frame ffmpeg
- stream-watch note/commentary pipeline ingests frames and produces accurate visual commentary
- DAVE channel ID derivation (
BigInt(rtcServerId) - 1) confirmed working - two-checkpoint link fallback suppression prevents duplicate transports when native watch is active
The native Discord self-publish path is also built in code:
- selfbot gateway can send OP18
STREAM_CREATE, OP19STREAM_DELETE, and OP22STREAM_SET_PAUSED - Bun tracks self stream discovery and routes self-owned credentials into a dedicated
stream_publishtransport role clankvoxopens a second sender-side stream connection, advertises H264, encrypts video with DAVE, and packetizes outbound RTP- Bun currently drives publish from two source families:
- music/video relay: start on music play, pause on music pause, stop on music idle/error
- browser-session share: explicit
share_browser_sessionstart/stop around an active browser session
voice.streamWatch.visualizerModecontrols how music publish renders:"cqt"is the default and starts one sharedffmpegpipeline insideclankvoxthat emits PCM for the main voice connection and H264 visualizer access units for Go Livestream_publish_play_visualizerattaches the sender transport to that already-running visualizer feed instead of starting a second media fetch"off"preserves the legacy URL-backed source-video relay path when a real source video track is preferred
- music publish is no longer limited to YouTube-backed video tracks when a visualizer is active; any playback URL the music pipeline already resolved for the active track can drive the visualizer
Sender-side live Discord validation is still pending in this repo snapshot. What is validated today is protocol shape, transport wiring, automated coverage for create/resume/switch control-plane behavior, browser-session frame forwarding coverage in Bun, and cargo test coverage inside clankvox.
The share-link fallback remains the recovery transport if native watch does not
connect or later becomes unhealthy. When the native stream_watch transport
fails or disconnects after start, Bun tears the native watch down and requests
the share-link path directly when requester + text-channel context are known.
Set STREAM_LINK_FALLBACK=false to disable that transport entirely. In that
mode, runtime stays native-only: capability reporting stops advertising the
share-link path, native startup failures do not create fallback sessions, and
post-connect transport failures log a skipped recovery instead of issuing a
share-link request.
Link fallback suppression
When a native stream watch is active, the link fallback is suppressed at two
checkpoints inside tryStartLinkFallback:
- pre_create — before creating the link session, check if native watch is already active for the target user with a ready transport or decoded frame
- post_compose — after composing the link message but before sending, re-check in case native watch became ready during async link creation
This prevents the race where the voice brain's start_screen_watch tool call
resolves before stream credentials arrive, triggers a link fallback, and then
native watch connects 200ms later — resulting in both transports running in
parallel. shouldSuppressLinkFallbackDueToNativeWatch checks transport status,
decoded frame presence, or active sharer state.
If the brain asks for start_screen_watch again for the same target while the
native watch is already active, Bun reuses that watch instead of reconnecting
the stream_watch transport. Tool results now separate transport attachment
from visual readiness: frameReady=false means the watch is up but no decoded
or buffered image frame exists yet, so the agent should not claim it can
already see the screen.
startVoiceScreenWatch also accepts a preferredTransport parameter. Callers
can pass "link" to skip native entirely and go straight to the share-link
path — used for recovery when native transport fails after initial connection.
If STREAM_LINK_FALLBACK=false, "link" requests are rejected.
Why The Regular Voice Connection Cannot See Go Live Streams
Discord uses a dual-connection architecture for Go Live:
┌─────────────────────────────────────────────────────────────┐
│ Regular Voice Connection │
│ - Audio (Opus) send/receive │
│ - Speaking state (OP5) │
│ - DAVE encryption │
│ - No Go Live video SSRCs on the main voice leg │
│ - No OP12 video state received │
│ - Endpoint: from VOICE_SERVER_UPDATE │
│ - Server ID: guild_id │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Stream Connection (separate voice server) │
│ - Video frame send/receive │
│ - Video SSRCs assigned in OP2 Ready streams[] array │
│ - OP12 video state exchanged │
│ - OP15 media sink wants for quality negotiation │
│ - Endpoint: from STREAM_SERVER_UPDATE │
│ - Server ID: rtc_server_id from STREAM_CREATE │
└─────────────────────────────────────────────────────────────┘
Confirmed via live debugging in the older bot-token runtime on March 12, 2026:
- OP2 Ready on the main voice connection returned
video_ssrc=None - No OP12 or OP18 video state was ever sent to the main voice connection
- Adding
video: trueandstreamsto the main voice Identify payload did not change this - Sending OP12 on the main voice connection did not change this either
The selfbot fork changes the account/control-plane model, not the fact that Go Live uses a separate stream server connection.
Go Live video state and video frames live on a separate stream server that requires its own connection.
Webcam Video on the Voice Connection
Unlike Go Live, webcam video (when a user enables their camera in a voice channel) lives on the main voice connection:
- Discord sends OP12/OP18 video state updates with
streamType: "video"on the main voice WebSocket - The user's webcam SSRC appears alongside their audio SSRC on the same UDP socket
- No separate transport, stream key, or Go Live discovery is needed
- clankvox sends OP15 media sink wants on the voice connection to request the webcam SSRC
- The same H264 depacketization, DAVE E2EE decryption, persistent decoder, and JPEG pipeline applies
When both a Go Live stream_watch transport and webcam video exist simultaneously, clankvox partitions media sink wants: screen-share SSRCs route to stream_watch, webcam SSRCs route to the voice connection. The capture supervisor allows webcam frames from the voice transport to pass through even when a stream_watch connection is active (it only suppresses screen-type frames to avoid duplicates).
On the TS side, enableWatchStreamForUser first tries Go Live discovery. If that fails (no stream key — user isn't Go Live streaming), it checks sharerHasWebcamOnly() for webcam streams and subscribes with preferredStreamType: null. Frames flow through the same DecodedVideoFrame IPC → ingestStreamFrame → vision model path.
Discord Protocol: Go Live Stream Architecture
Gateway events (main Discord gateway, not voice gateway)
These are standard Discord gateway dispatch events, not voice WebSocket opcodes:
| Gateway opcode | Event name | Direction | Purpose |
|---|---|---|---|
| 18 | STREAM_CREATE | Client → Server | Start a Go Live stream |
| 19 | STREAM_DELETE | Client → Server | Stop a Go Live stream |
| 20 | STREAM_WATCH | Client → Server | Subscribe to watch someone's stream |
| 22 | STREAM_SET_PAUSED | Client → Server | Pause/resume a stream |
Gateway dispatch events received from Discord:
| Event | Payload | Purpose |
|---|---|---|
STREAM_CREATE | { stream_key, rtc_server_id } | Stream created, provides stream server ID |
STREAM_SERVER_UPDATE | { stream_key, endpoint, token } | Stream server ready, provides connection credentials |
STREAM_DELETE | { stream_key } | Stream ended |
VOICE_STATE_UPDATE | { ..., self_stream, self_video } | User voice state, includes streaming flag |
Stream key format
Stream keys identify a specific Go Live stream:
guild:<guildId>:<channelId>:<userId> # Guild voice channel
call:<channelId>:<userId> # DM / group call
Voice gateway opcodes (on the stream connection)
These are the same opcodes as the regular voice connection but exchanged on the stream-specific voice server:
| Voice opcode | Name | Purpose |
|---|---|---|
| 0 | IDENTIFY | Authenticate with stream server |
| 1 | SELECT_PROTOCOL | Negotiate codecs and encryption |
| 2 | READY | Server assigns SSRCs (including video SSRCs in streams[]) |
| 4 | SELECT_PROTOCOL_ACK | Session description with encryption keys |
| 5 | SPEAKING | Signal speaking/streaming status (flag 2 = video) |
| 12 | VIDEO | Announce/receive video stream attributes |
| 15 | MEDIA_SINK_WANTS | Request specific video quality/SSRCs |
Protocol Flow: Watching a Go Live Stream
Sending (reference control-plane shape)
1. joinVoice()
→ Gateway OP4 VOICE_STATE_UPDATE { guild_id, channel_id }
→ Receive VOICE_STATE_UPDATE { session_id }
→ Receive VOICE_SERVER_UPDATE { endpoint, token }
→ Connect regular voice WebSocket to endpoint
→ Standard voice handshake (OP0→OP2→OP1→OP4)
2. createStream()
→ Gateway OP18 STREAM_CREATE { type, guild_id, channel_id }
→ Gateway OP22 STREAM_SET_PAUSED { stream_key, paused: false }
→ Receive STREAM_CREATE { stream_key, rtc_server_id }
→ Receive STREAM_SERVER_UPDATE { stream_key, endpoint, token }
→ Connect stream voice WebSocket to stream endpoint
→ Stream handshake: OP0 Identify with { server_id: rtc_server_id, video: true, streams: [...] }
→ OP2 Ready includes video SSRCs in streams[] array
→ OP12 VIDEO to announce stream attributes
→ Send video frames via RTP/UDP
Publishing / sending (runtime flow)
1. A publishable source becomes active
→ music playback resolves a publishable source URL
→ if `voice.streamWatch.visualizerMode != "off"`, `music_play` starts one shared `ffmpeg` pipeline that emits both PCM audio and H264 visualizer video
→ if `voice.streamWatch.visualizerMode == "off"`, Bun keeps the legacy URL-backed source-video relay path
→ browser-session share resolves a live browser session key
2. Create or reuse self stream
→ If no self stream exists, send OP18 STREAM_CREATE
→ Send OP22 STREAM_SET_PAUSED { paused: false }
→ Wait for STREAM_CREATE + STREAM_SERVER_UPDATE discovery
3. Connect sender transport
→ Bun sends stream_publish_connect to clankvox
→ clankvox opens stream voice connection to stream endpoint
→ OP0 Identify uses rtc_server_id + main voice session_id
→ OP1 SelectProtocol advertises sender-side H264
→ OP12 VIDEO announces active screen stream attributes
4. Push media
→ visualizer-mode music publish sends `stream_publish_play_visualizer`
→ clankvox attaches the sender transport to the shared music visualizer H264 queue
→ legacy URL-backed music publish sends `stream_publish_play`
→ browser-session share sends stream_publish_browser_start + stream_publish_browser_frame payloads into clankvox
→ clankvox encodes or relays the active source to H264 Annex-B access units
→ DAVE encrypt + RTP packetize each access unit
→ OP5 speaking uses flag 2 on the stream connection
5. Pause / stop
→ music pause sends OP22 STREAM_SET_PAUSED { paused: true }
→ music resume reuses the same stream when discovery is still live
→ browser-session share stop sends OP19 STREAM_DELETE and tears the sender transport down
→ music stop/error/idle sends OP19 STREAM_DELETE and tears the sender transport down
Receiving / watching (runtime flow)
1. Detect Go Live
→ Listen for VOICE_STATE_UPDATE with self_stream: true
→ Seed provisional per-user session Go Live state from that early signal
→ OR listen for STREAM_CREATE dispatch event
→ OR scan GUILD_CREATE voice_states on connect for users already streaming
The voice session keeps all discovered Go Live users in `goLiveStreams`.
`goLiveStream` is only the current primary/legacy view, not the source of truth.
2. Subscribe to stream
→ Gateway OP20 STREAM_WATCH { stream_key }
→ If STREAM_CREATE has not arrived yet, synthesize stream_key from guild/channel/user IDs
→ Receive STREAM_CREATE { stream_key, rtc_server_id }
→ Receive STREAM_SERVER_UPDATE { stream_key, endpoint, token }
3. Connect to stream server
→ Open second voice WebSocket to stream endpoint
→ OP0 Identify with { server_id: rtc_server_id, video: true, streams: [...] }
→ OP2 Ready assigns video SSRCs in streams[] array
→ OP12 VIDEO with video state from the streamer
4. Request video quality
→ OP15 MEDIA_SINK_WANTS with `streams`, `pixelCounts`, and `any` default quality
5. Receive video frames
→ Encrypted RTP/UDP video packets arrive
→ Transport decrypt (AES-GCM/XChaCha20) → depacketize H264/VP8 → DAVE decrypt attempt
→ DAVE video decrypt works on the main voice connection; Go Live streams still fail (frames forwarded from post-commit unencrypted window)
→ H264: persistent OpenH264 decoder in clankvox → turbojpeg JPEG encode → DecodedVideoFrame IPC → stream-watch pipeline
→ VP8: raw bitstream IPC → Bun ffmpeg decode to JPEG → stream-watch pipeline
OP0 Identify (stream connection)
{
"op": 0,
"d": {
"server_id": "<rtc_server_id from STREAM_CREATE>",
"user_id": "<selfbot user id>",
"session_id": "<session_id from VOICE_STATE_UPDATE>",
"token": "<token from STREAM_SERVER_UPDATE>",
"video": true,
"streams": [{ "type": "screen", "rid": "100", "quality": 100 }],
"max_dave_protocol_version": 1
}
}
OP2 Ready (stream connection response)
{
"op": 2,
"d": {
"ssrc": 12345,
"ip": "1.2.3.4",
"port": 50000,
"modes": ["aead_aes256_gcm_rtpsize", "aead_xchacha20_poly1305_rtpsize"],
"streams": [
{
"type": "video",
"rid": "100",
"ssrc": 12346,
"rtx_ssrc": 12347,
"active": false,
"quality": 100
}
]
}
}
Video SSRCs are in d.streams[].ssrc, not in a top-level d.video_ssrc field.
OP15 Media Sink Wants
clankvox sends Discord media sink wants with streams and pixelCounts
maps plus a top-level any default quality:
{
"op": 15,
"d": {
"any": 100,
"streams": {
"12346": 100,
"12347": 0
},
"pixelCounts": {
"12346": 921600.0,
"12347": 230400.0
}
}
}
The pixelCounts map is required for Discord to send video at the requested
resolution. Without it, Discord may send the lowest quality layer where
keyframes are very sparse. The any field provides a default quality for
SSRCs not explicitly listed.
Stream speaking flag
On the stream connection, speaking flag 2 indicates video (vs flag 1 for audio on the regular connection):
{ "op": 5, "d": { "speaking": 2, "delay": 0, "ssrc": 12345 } }
What Is Built
Video receive pipeline (clankvox, Rust)
All of this is implemented and tested:
- OP12/OP18 remote video state parsing and tracking (
voice_conn.rs) - OP14 session/codec update handling (
voice_conn.rs) - OP15 media sink wants send (
voice_conn.rs,capture_supervisor.rs) - Protected RTCP receiver-report / PLI / FIR keyframe feedback while waiting for the first renderable stream frame (
voice_conn.rs,capture_supervisor.rs) - Video codec advertisement in SelectProtocol: H264, VP8 decode (
voice_conn.rs) - Non-Opus RTP acceptance: H264 (PT 103), VP8 (PT 105) (
voice_conn.rs) - DAVE video decrypt path (
dave.rs) - H264 depacketization: single NAL, STAP-A, FU-A (
video.rs) - VP8 depacketization: payload descriptor stripping, frame reassembly (
video.rs) - RTP sequence gap detection to prevent cross-packet frame corruption (
video.rs) - Persistent H264 decode via OpenH264 (
openh264crate v0.9) with per-user decoder state (video_decoder.rs) - JPEG encode via turbojpeg (
turbojpegcrate v1.4) for decoded H264 frames (video_decoder.rs) DecodedVideoFrameIPC emission with pre-encoded JPEG, dimensions, and scene-cut metrics (video_decoder.rs,ipc.rs)- Video frame IPC:
UserVideoState,UserVideoFrame,UserVideoEnd,DecodedVideoFrame(ipc.rs) - Dedicated bounded outbound video queue with backpressure logging (
ipc.rs) - Per-user video subscription management (
capture_supervisor.rs) - Remote video state merge semantics for partial updates (
capture_supervisor.rs) - Session reconnect teardown of stale video state (
capture_supervisor.rs)
Audio receive pipeline (clankvox, Rust)
All of this is implemented and tested:
- OP5 speaking payload parsing into the audio SSRC map (
voice_conn.rs) - DAVE audio decrypt on the main voice transport (
dave.rs,voice_conn.rs) - Audio SSRC remap fallback from successful DAVE decrypt when OP5 is delayed or missing (
voice_conn.rs) - Opus decode plus PCM downmix/resample into Bun-visible
UserAudioIPC (capture_supervisor.rs,audio_pipeline.rs,ipc.rs)
Bun runtime integration
clankvoxClient.ts: IPC forsubscribe_user_video,unsubscribe_user_video, video eventsnativeDiscordScreenShare.ts: Active sharer tracking, target resolutionsessionLifecycle.ts: Video state/frame/end event handlers,onDecodedVideoFramefor H264 JPEG ingestscreenShare.ts: Native-first with share-link fallback, telemetryvoiceStreamWatch.ts: stream-watch pipeline, frame ingest from DecodedVideoFrame (H264) and ffmpeg (VP8)
Video send pipeline (clankvox + Bun)
Implemented in code:
streamDiscovery.ts: OP18/OP19/OP22 send helpers plus self stream discoveryvoiceStreamPublish.ts: session state, visualizer-aware source resolution, and self-stream connect orchestrationvoiceBrowserStreamPublish.ts: browser-session capture loop and frame forwarding intoclankvoxsessionLifecycle.ts: publish transport lifecycle, recovery, and stream discovery callbacksclankvoxClient.ts:stream_publish_*IPC commands, includingstream_publish_play_visualizer, and transport state relayclankvox: sender-sidestream_publishtransport role, H264 codec negotiation, DAVE video encrypt, RTP packetization, shared music visualizer pipeline, legacy URL relay, and browser-frame encode path
Current rollout boundary:
- outbound publish is wired to music/video relay and explicit browser-session share, not a general-purpose arbitrary file/share tool
- music relay defaults to a shared audio visualizer (
voice.streamWatch.visualizerMode: "cqt") and only falls back to source-video relay when that setting is"off" - sender transport is H264-only
What is still open
- RTX retransmission receive — the current UDP receive path still drops RTX packets, so packet-loss recovery remains limited until retransmission support lands
- Live sender validation — the sender path is implemented, but this repo snapshot still needs Discord-live validation against a real selfbot session
- Broader publish surfaces — outbound publish still centers on music lifecycle and explicit browser-session share; there is no general-purpose arbitrary file/share tool yet
Implementation Plan
Current Receive Flow
- Selfbot gateway records active Go Live streams and credentials in stream discovery state
- On connect (or full reconnect),
GUILD_CREATEvoice states are scanned for users withself_stream: trueto detect streams that were already active before the bot connected start_screen_watchrequests OP20STREAM_WATCHand records the expected stream key on the voice session- If credentials are already present, Bun immediately sends
stream_watch_connecttoclankvox - If credentials arrive later, discovery callbacks connect the
stream_watchtransport whenSTREAM_SERVER_UPDATElands clankvoxemits role-awaretransport_state,userVideoState,userVideoFrame, anduserVideoEnd- For H264,
clankvoxdecodes all frames (IDR + P-frames) via a persistent OpenH264 decoder, encodes to JPEG via turbojpeg, and emitsDecodedVideoFrameIPC messages; Bun ingests the pre-encoded JPEG directly - For VP8, Bun decodes sampled keyframes to JPEG via per-frame ffmpeg
- Decoded frames from either codec path feed into the existing stream-watch commentary path
STREAM_DELETEprunes the cached sharer and tears native watch down- Native transport failure/disconnect tears native watch down and directly requests share-link fallback when requester + text-channel context exist
Current Publish Flow
- music playback enters
playing voiceMusicPlayback.tsresolves the playback URL for the active track and startsmusic_play- if
voice.streamWatch.visualizerMode != "off",music_playstarts a sharedffmpegpipeline inclankvoxthat emits PCM audio plus H264 visualizer access units voiceStreamPublish.tsresolves a publishable source from session music state or an active browser-session share- if no self stream exists yet, Bun sends OP18
STREAM_CREATE - self stream discovery callbacks hand
STREAM_CREATE/STREAM_SERVER_UPDATEcredentials back into the owning voice session - Bun sends
stream_publish_connect, then eitherstream_publish_play_visualizerfor shared visualizer mode orstream_publish_playfor legacy URL relay clankvoxstarts or reuses the sender-side stream transport, marks video active, and sends H264 RTP frames- music pause sends
stream_publish_pauseplus OP22paused: true - music resume reuses the live stream when possible instead of recreating it
- music idle/error/stop sends
stream_publish_stop,stream_publish_disconnect, and OP19STREAM_DELETE
Product Behavior
The model still only sees start_screen_watch. Runtime resolves transport:
- Check if target user has
self_stream: true(Go Live active) - If yes → open stream connection, subscribe, receive native frames
- If no → fall back to share-link path
The share-link fallback remains the recovery path when:
- The target user is not Go Live sharing
- Stream connection fails to establish
- Selfbot stream subscription fails or the native transport is unavailable
- Multiple sharers are active and target is ambiguous
Settings
Native subscription tuning and music Go Live visualizer settings live under voice.streamWatch:
visualizerMode(default:"cqt")nativeDiscordMaxFramesPerSecond(default: 2)nativeDiscordPreferredQuality(default: 100)nativeDiscordPreferredPixelCount(default: 921600 / 1280x720)nativeDiscordPreferredStreamType(default:"screen")
There is no separate standalone settings block for outbound native publish. Music playback reuses voice.streamWatch.visualizerMode to choose between the shared audio-visualizer path and the legacy source-video relay path.
Screen Watch Note Pipeline
voice.streamWatch now uses one screen-watch pipeline:
- Every admitted frame updates the latest-frame buffer.
- A separate note loop keeps rolling screen notes fresh with
noteProvider,noteModel,noteIntervalSeconds,noteIdleIntervalSeconds,staticFloor,changeThreshold, andchangeMinIntervalSeconds. - Proactive commentary uses
commentaryIntervalSecondsplus the normal voice quiet-window gates, and can optionally override the voice model withcommentaryProvider/commentaryModel. - Raw frames are attached only for commentary turns and direct screen questions. Normal conversation turns rely on the rolling notes.
The full product-level behavior and settings contract live in screen-share-system.md.
Observability
Stream discovery layer (implemented)
native_discord_go_live_detected— selfbot gateway sawself_stream: truestream_discovery_existing_streamer_detected—GUILD_CREATEvoice state scan found a user already streaming on connectstream_discovery_guild_create_scan_complete—GUILD_CREATEscan finished with count of existing streamers foundstream_discovery_go_live_bootstrap_seeded— Bun seeded provisional session Go Live state fromself_stream: truestream_discovery_go_live_bootstrap_cleared— Bun cleared stale provisional session Go Live state afterself_stream: falsenative_discord_stream_watch_requested— Gateway OP20 sentnative_discord_stream_server_received—STREAM_SERVER_UPDATEreceivednative_discord_stream_connection_started— second clankvox connection opening
Video receive pipeline (implemented)
clankvox_discord_video_state_observed— voice connection saw OP12/OP18 video stateclankvox_native_video_state_received— capture supervisor applied video stateclankvox_native_video_state_emitted— state emitted over IPC to Bunnative_discord_screen_share_state_updated— Bun updated active-sharer stateclankvox_native_video_subscribe_requested— Rust accepted video subscriptionclankvox_video_sink_wants_updated— Rust recalculated OP15 media sink wantsclankvox_first_video_frame_forwarded— first video frame forwarded to Bunstream_watch_frame_ingested— Bun ingested frame into stream-watch pipeline
Failure diagnostics
screen_watch_native_start_failed— native path failed, includesreason,nativeActiveSharerCount,selectionReasonscreen_watch_link_fallback_started— fell back to share-link, includesnativeFailureReasonnative_discord_stream_transport_link_fallback_requested— native transport degraded after start and Bun requested link-only recoverynative_discord_stream_transport_link_fallback_failed— degraded native transport could not recover into the share-link path
Risk Assessment
- DAVE video decrypt. DAVE video decrypt works at near 100% on the main voice connection after the RTP padding strip fix (
strip_rtp_paddinginrtp.rs). Go Live streams are a separate, still-open problem: thedaveylibrary classifies Go Live video frames asUnencryptedWhenPassthroughDisabledwhen they are actually encrypted. Video frames that fail DAVE decrypt are dropped (validated bylooks_like_valid_h264). Screen watch relies on the initial unencrypted frames that arrive during the DAVE transition window after session commit. Stream connections use DAVE channel IDBigInt(rtc_server_id) - 1n(confirmed working live). - PLI/FIR keyframe requests do not work. clankvox uses
protocol: "udp"for stream connections. Discord's media server only processes RTCP PLI/FIR feedback from WebRTC peers. PLI/FIR packets are still sent as a best-effort hint (periodic, after decoder reset, after DAVE ready), but keyframes cannot be requested on demand. The persistent H264 decoder compensates by processing all frames (IDR + P-frames) for reference state accumulation, and auto-resets with PLI after 50 consecutive errors. - Transport crypto AAD for RTP. Discord's
rtpsizeAEAD modes authenticate the RTP fixed header + CSRC + 4-byte extension prefix as AAD. The extension body is part of the ciphertext.decrypt()recomputes AAD from raw packet bytes and must NOT useparse_rtp_header'sheader_size(which includes the extension body). Inbound RTCP (PT 72-76 per RFC 5761 mux) is filtered before decrypt. - VP8 ffmpeg decode EOF. VP8 still uses per-frame ffmpeg decode on the Bun side. H264 decode is handled entirely in-process by clankvox's persistent OpenH264 decoder and does not use ffmpeg.
- RTX loss recovery is still limited. The receive path currently traces and drops RTX payloads; retransmission support remains future work.
- Sender compatibility is narrower than receive. Outbound publish is still H264-only and still centered on music lifecycle or browser-session share.
visualizerMode: "off"also depends on a URL-backed source relay path being available.
Reference: Discord-video-stream
The Discord-video-stream library implements the sending side of Go Live for selfbot accounts. Key observations:
- Uses
discord.js-selfbot-v13, not regulardiscord.js— selfbot tokens may have different capabilities than bot tokens VoiceConnectionfor regular voice,StreamConnectionfor Go Live — both extendBaseMediaConnection- Stream connection uses
rtc_server_idfromSTREAM_CREATEas itsserverId - OP0 Identify always includes
video: trueandstreams: [{ type: "screen", rid: "100", quality: 100 }] - Video SSRCs come from
OP2 Ready d.streams[0].ssrcandd.streams[0].rtx_ssrc, notd.video_ssrc - Uses
protocol: "webrtc"with SDP in SelectProtocol, notprotocol: "udp" STREAM_WATCH(OP20) is defined but not implemented (library only sends, doesn't watch)- Speaking flag 2 on the stream connection indicates video streaming
Reference: Discord-video-selfbot
The Discord-video-selfbot library is an older sender-side Go Live implementation. Useful observations:
- Uses a second
StreamConnectionfor Go Live, separate from the main voice connection - Fills the stream connection
serverIdfromSTREAM_CREATE.rtc_server_id - Reuses the main voice
session_idfor the stream leg - Uses
protocol: "udp"with standard IP discovery andSELECT_PROTOCOL - Does not implement
STREAM_WATCHor inbound video receive - Uses older
xsalsa20_*encryption modes, so it is not a modern DAVE reference
Product Language
Prefer:
- "start screen watch"
- "watch your screen"
- "I can start watching that share"
- For spoken voice replies, say "open the link I sent" or "open that screen-share link" instead of reading the full URL aloud.
Avoid:
- "I can already see your screen" unless frame context is actually active
- Reading full screen-share URLs, hostnames, or token strings aloud in voice
- "native Discord watch always works" because it depends on stream discovery working for the selfbot session
