Clanky Self Technical Architecture
This document explains the live runtime shape of the experimental selfbot: where core decisions happen, how settings flow through the system, and which modules own text, voice, memory, tools, persistence, and the Rust media plane.
Docs map: ../README.md
Unified media product surface: ../capabilities/media.md
Product relationship tiers: relationship-model.md
For the clankvox-local transport/media docs, start at ../../src/voice/clankvox/README.md.
Canonical companion docs:
../reference/settings.mdpresence-and-attention.mdactivity.mdinitiative.md../voice/voice-provider-abstraction.mdpresets.md
1. High-Level Components
Account/runtime model:
- one Discord user account is the primary runtime identity
- Bun owns gateway/session orchestration, prompts, tools, memory, and the dashboard
clankvoxis the Rust voice/media subprocess that owns Discord RTP, Opus, DAVE, music, TTS pacing, and native stream-watch media transport- the product model is one Discord-native community entity: a deep personal assistant for the operator, and a higher-trust collaborator for explicitly approved others, not separate public/private bots
Code entrypoint:
src/app.ts: bootstraps storage, services, the selfbot runtime,clankvox, and the dashboard server
Core runtime:
src/bot.ts: Discord orchestration and scheduler entrypoints for the selfbot runtimesrc/bot/*: text reply admission, reply pipeline, ambient text initiative, automations, permissions, continuity, memory slices, and message historysrc/settings/settingsSchema.ts: canonical persisted settings schemasrc/settings/agentStack.ts: preset resolution and runtime binding helperssrc/settings/dashboardSettingsState.ts: dashboard settings envelope (intent,effective,bindings)src/settings/settingsIntent.ts: intent minimization before persistencesrc/store/settingsNormalization.ts: settings normalization into the canonical shapesrc/llm.ts: provider/runtime abstraction for generation, embeddings, image/video generation, ASR, and TTSsrc/memory/*: durable memory extraction, storage, lookup, and reflectionsrc/services/discovery.ts: passive feed collection for the unified initiative cyclesrc/voice/*: session lifecycle, capture, turn processing, voice-side admission, tool dispatch, output, and ambient voice thought deliverysrc/voice/clankvox/*: Rust media-plane subprocess for Discord voice transport, DAVE lifecycle, RTP/media parsing, and native screen-watch receivesrc/tools/*: shared text/voice tool schemas and execution wrapperssrc/agents/*: browser runtime, code-task swarm launcher and peer, and Minecraft session gluesrc/dashboard.tsanddashboard/src/*: REST control plane and dashboard UI
Behaviorally, the selfbot is documented as one shared attention system with text and voice spokes. That attention layer is currently implemented across several modules rather than one single package: text reply admission and recent windows, initiative, voice reply admission, thought generation, and music/floor overlays.
Permissioning and capability depth are part of that same product shape: community-safe capabilities are broadly available in shared Discord contexts, higher-trust collaborator capabilities are intentionally gated for approved users and resources, and owner-only device powers stay local to the operator's instance. The product-level relationship model is documented in relationship-model.md.
2. Runtime Lifecycle

At a high level:
- settings are loaded and normalized
- Discord gateway events and schedulers enter
src/bot.ts - shared conversational attention is shaped by direct address, recent engagement, and ambient cadence
- active text turns route into immediate reply admission, ambient text falls through to the initiative cycle, and voice sessions route into their domain handlers
- the LLM/tool layer is consulted only after deterministic guardrails pass
- actions and messages are persisted back into SQLite and memory logs
Text and voice are separate transports under that shared attention layer. Music playback, wake latch, and barge-in are overlays on the voice side, not separate attention modes.
3. Tool Orchestration
The orchestrator is still tool-using and LLM-driven. The preset system resolves which external runtimes back those capabilities.
Shared tool schemas in src/tools/sharedToolSchemas.ts are concise capability contracts. The local non-MCP tool registry in src/tools/toolRegistry.ts is the central source of truth for which local tools exist on the reply and provider-native voice surfaces. Tool descriptions state what the tool does and the key contrast with nearby tools. Cross-modal tool-choice guidance lives in src/prompts/toolPolicy.ts; the text prompt, voice prompt, and provider-native realtime instructions all consume that shared policy and then add modality-specific constraints locally.
Core shared conversational tools:
conversation_searchmemory_writeweb_searchweb_scrapebrowser_browse- media generation tools
Reply-loop conditional tools:
memory_searchimage_lookupstart_screen_watchspawn_code_workerplus the swarm-mcp tool surface (request_task,wait_for_activity,get_task,update_task,send_message,annotate,lock_file,kv_*, …) — mounted only for callers inpermissions.devTasks.allowedUserIdson dev-allowed channels. See../capabilities/code.md.
Voice-centric tools:
music_*play_soundboardjoin_voice_channel/leave_voice_channel
Core routing:
- local tool registry:
src/tools/toolRegistry.ts - text:
src/tools/replyTools.ts - voice:
src/voice/voiceToolCallInfra.tsandsrc/voice/voiceToolCallDispatch.ts - browser tasks:
src/tools/browserTaskRuntime.ts - code tasks:
src/agents/swarmLauncher.ts(worker spawn) andsrc/agents/swarmPeer.ts(Clanky's controller peer)
Current voice dispatch modules:
src/voice/voiceToolCallMemory.tssrc/voice/voiceToolCallMusic.tssrc/voice/voiceToolCallWeb.tssrc/voice/voiceToolCallAgents.ts
Search runtime is resolved in src/services/search.ts:
openai_native_web_search: hosted OpenAI Responses API lookup via theweb_search_previewtoollocal_external_search: local provider-ordered search through Brave / SerpApi plus direct page-summary reads
There is no separate directive tool handler in the live architecture.
4. Settings Flow
The canonical settings contract now lives in ../reference/settings.md.
At a high level:
- the store persists authored
intent, not materialized runtime state - normalization produces the fully resolved
effectiveruntime object - dashboard/runtime helpers are exposed as derived
bindings - dashboard saves use compare-and-swap on
settings.updated_at - persistence and live-session application are separate outcomes

5. Persistence Model
Main runtime stores:
data/clanker.db: SQLite databasememory/YYYY-MM-DD.md: append-only daily logsmemory/MEMORY.md: operator-facing memory snapshot
Important tables:
settings(key = 'runtime_settings'stores authored settings intent;updated_atis the dashboard save version)messagesactionsmemory_factsmemory_fact_vectors_nativeshared_linksautomationsautomation_runsresponse_triggers

6. Text Reply Flow
Entrypoint: Discord messageCreate handling in src/bot.ts for the selfbot user session.
This fork is message-first, but slash/app commands remain intentional fallback control surfaces for directly invoking capabilities when explicit operator control is useful. They complement the primary conversational surface rather than replacing it.
Main stages:
- permission and channel checks
- reply admission
- continuity and memory assembly
- LLM/tool loop
- delivery and persistence
The user-facing activity model for these paths is documented in activity.md.

7. Voice Runtime
Voice is split into independent layers:
- capture and turn promotion
- transcription
- reply admission
- generation and tool ownership
- output / barge-in
- proactive voice thought generation
These layers are the voice spoke of the shared attention model. They decide how attention becomes audible in a room; they do not define a separate voice-only mind.
Transport ownership in this fork:
- Bun owns Discord session lifecycle, turn orchestration, tools, and settings application
clankvoxowns the Discord media plane: voice sockets, Opus/RTP, DAVE, queued audio output, and native Go Live stream receive/send- the Go Live stream-server legs are documented as additional
clankvoxtransport roles in../archive/selfbot-stream-watch.mdand../voice/discord-streaming.md
Canonical public surfaces:
- transport/runtime:
agentStack.runtimeConfig.voice.* - conversation behavior:
voice.conversationPolicy.* - admission:
voice.admission.* - transcription:
voice.transcription.* - session limits:
voice.sessionLimits.* - proactive cadence:
initiative.voice.*
Voice-specific docs:
../voice/voice-provider-abstraction.md../voice/voice-capture-and-asr-pipeline.md../voice/voice-client-and-reply-orchestration.md../voice/voice-output-and-barge-in.md
8. Unified Initiative Flow
Ambient text delivery is owned by src/bot/initiativeEngine.ts.
The runtime splits responsibility like this:
permissions.replies.discoveryChannelIds: canonical eligible initiative channel poolinitiative.text.*: initiative cadence, budgets, and tool-loop limitsinitiative.discovery.*: feed collection, self-curation, and media infrastructuresrc/services/discovery.ts: gathers passive feed candidates
The model decides:
- whether to post now, hold a thought for later, or drop it
- which eligible channel fits
- whether to use tools
- whether to include links
- whether to request media
This is the text spoke's ambient delivery path. The corresponding voice spoke is the voice thought engine.
Canonical references:
9. Memory Model
Durable memory is centered on memory_facts.
Facts use dual scope:
scope='user'for user-portable facts (guild_id=NULL, optionaluser_idowner)scope='guild'for server-specific lore/rules (guild_idrequired)
Current behavioral guidance model:
guidancefacts: always-on operating/persona guidancebehavioralfacts: retrieved by relevance
There is no separate directive store in the live runtime.
Relevant modules:
src/memory/memoryManager.tssrc/memory/memoryToolRuntime.tssrc/bot/memorySlice.tssrc/prompts/promptText.tssrc/prompts/promptVoice.ts
10. Dashboard And Control Plane
The dashboard serves as the live control plane for:
- reading and writing settings
- inspecting memory
- inspecting actions and stats
- resetting to preset defaults
- viewing voice/session data
Key server entrypoints:
GET /api/settingsPUT /api/settingsPOST /api/settings/preset-defaultsGET /api/statsGET /api/actions- memory and voice history endpoints
DELETE /api/memory/guildfor confirmed community-memory purges within a guild
11. Latency-Critical Model Choices
The main levers that change cost and latency are:
- resolved orchestrator binding
interaction.replyGeneration.*(temperature, max output tokens, reasoning effort)agentStack.runtimeConfig.voice.runtimeModeagentStack.runtimeConfig.voice.generationvoice.conversationPolicy.replyPathvoice.admission.mode
Voice classifier provider/model binding is resolved through preset defaults or agentStack.overrides.voiceAdmissionClassifier.
12. Action Log Kinds
Common action kinds include:
sent_replysent_messagereply_skippedinitiative_postautomation_postllm_callllm_errorimage_callvideo_callvoice_error
These power stats, diagnostics, and initiative/discovery feedback loops.
