Runtime Debugger And Incident Replay Design
Status: proposed
References:
docs/operations/logging.mddocs/architecture/overview.mddocs/architecture/activity.mddocs/capabilities/memory.mddocs/voice/voice-provider-abstraction.mddocs/voice/voice-client-and-reply-orchestration.mddocs/voice/voice-capture-and-asr-pipeline.mddocs/voice/voice-output-and-barge-in.md

Goal
Build a first-class runtime debugger for Clanker that turns raw structured logs into incident reconstruction, replay, search, and operator-grade debugging for:
- text reply turns
- multimodal turns
- voice sessions and voice turns
- tool loops
- memory side effects
- transport/runtime failures
The product is not "more analytics." The product is a turn and session debugger that makes causality legible.
Product Thesis
Grafana/Loki already gives value because it exposes the raw event stream. The main remaining gap is reconstruction:
- operators can find the right events
- operators still have to manually stitch them into one story
The debugger should remove that stitching work.
Primary job:
- explain what happened
- explain why it happened
- explain what the bot saw
- explain what the runtime did next
Secondary job:
- summarize cost, latency, and trend data
If we have to choose, incident replay matters more than aggregate metrics.
Why This Is Worth Building
Clanker is not a simple request/response bot. It has:
- coalesced text turns
- direct and ambient reply admission
- prompt bundles and followup tool loops
- multimodal captioning and image lookup
- memory retrieval and reply ingestion
- multiple voice reply paths
- VAD, ASR bridge, admission, output lock, barge-in, and playback telemetry
- music and screen-share overlays
That means failures are usually causal chains, not isolated lines.
The debugger pays for itself if it reduces these categories of debugging from "open five panels and grep logs" to "open one replay view":
- odd or unsafe text replies
- mysterious skips
- wrong tool choices
- stale or missing memory
- voice sessions that heard something but said nothing
- transport/runtime regressions after provider or prompt changes
Current State
Existing useful surfaces:
- raw structured action log in
actions - Grafana/Loki search over
runtime-actions.ndjson - dashboard
ActionStream - dashboard
TextTab - dashboard
VoiceMonitor - persisted voice session history APIs
- live voice runtime snapshot and prompt-state views
response_triggersindex for trigger-message linkage
Current pain:
- text and multimodal turns are still event-centric instead of incident-centric
- voice history is session-scoped but not reconstructed into per-turn causality
- tool loops, memory events, and prompt context are visible but fragmented
- cross-surface debugging requires manual correlation by time and ID
- the current dashboard is implicitly single-bot even though the long-term need is broader observability across bots
Scope
This design targets Clanker first, but the contract should be reusable across all bots that emit the same structured action envelope.
That means the debugger must not assume:
- one bot name
- one guild
- one runtime family
- one provider stack
From the start, every replay/search API should be able to carry bot identity and deployment identity even if the first shipped UI uses only Clanker.
Operator Workflows
1. "Why did it say that?"
Open the final sent message or reply and see:
- triggering message(s)
- recent coalesced window
- admission/addressing decision
- prompt bundle
- memory and tool context
- model output(s)
- final delivered action
2. "Why did it not reply?"
Open the user message and see the precise stop point:
- deterministic admission deny
- classifier/generation
[SKIP] - cooldown or reply budget gate
- output lock
- empty ASR
- error or cancellation
3. "Why did voice feel broken?"
Open the affected VC session and see:
- session lifecycle
- speaker activity and VAD promotion
- ASR revisions/finals
- addressing/admission decisions
- generation and tool activity
- playback/output lock/barge-in
- recovery or failure
4. "What changed after a regression?"
Search incidents by:
- model or provider
- reply path (
bridge,brain,native) - guild/channel/user
- tool usage
- output lock reason
- trace source
- time window
Then compare before/after incident shapes.
Ideal Information Architecture
1. Debugger tab
Add a top-level dashboard tab dedicated to debugging rather than folding all new
surfaces into ActionStream, TextTab, or VoiceMonitor.
Sub-surfaces:
IncidentsReplayVoiceSearchSaved
2. Incident inbox
Default landing view.
Shows recent reconstructed incidents across the selected scope:
- text turn
- multimodal turn
- voice turn
- voice session anomaly
- tool/runtime error
Each row should show:
- bot
- time
- incident type
- guild/channel
- user/speaker
- short summary
- model/provider
- tool badges
- latency
- cost
- anomaly flags
3. Replay pane
Single-incident detail view with:
- readable timeline
- raw JSON panel
- linked artifacts
- correlation IDs
- deep link / permalink
4. Search
Structured filters plus free-text search over event content and selected metadata fields.
5. Saved views
Operators should be able to save:
- recurring queries
- incident permalinks
- named anomaly views
Canonical Debugger Data Model
Raw actions remains the source of truth.
The debugger adds derived entities on top of it.
1. Incident
Top-level reconstructed object.
Fields:
incidentIdincidentTypebotIdbotNamedeploymentguildIdchannelIduserIdstartedAtendedAtstatussummarycorrelationKeyslatencycostanomalies
Incident types:
text_turnmultimodal_turnvoice_sessionvoice_turnruntime_errortool_session
2. Event fragment
Normalized projection of one raw action row into the debugger domain.
Fields:
actionIdcreatedAtkindphaselaneshortLabelraw
3. Artifact
Renderable payload tied to an incident:
- prompt bundle
- transcript
- attachment thumbnail
- image lookup result
- web result set
- tool result payload
- memory facts
- screen-share frame summary
4. Anomaly
Derived flag created by projector rules, not only by explicit errors.
Examples:
reply_missing_after_direct_addresstool_loop_stalledempty_asr_after_promotionoutput_lock_stalehigh_followup_latencyrepeated_silence_gate_drop
Correlation Model
The debugger should reconstruct incidents from stable keys already present in logs, plus a few new ones that we should add.
Existing keys to use immediately
message_idguild_idchannel_iduser_idmetadata.triggerMessageIdmetadata.triggerMessageIdsmetadata.sessionIdmetadata.sourcemetadata.eventmetadata.traceSource
New keys to add
These should become part of the common action envelope for Clanker and any future bots using the same debugger:
botIdbotNameruntimeInstanceIddeploymentgitShaenvironmentincidentIdwhen a runtime already knows itturnIdfor text and voice replay groupingtoolCallIdandtoolSteppromptBundleIdartifactRefs
Correlation rules
Text and multimodal:
- use
triggerMessageIdas the primary join key - use
triggerMessageIdsfor coalesced context and sibling linkage - treat sent reply/message, reaction, llm call, tool results, and memory ingest as one incident when they share the same trigger set and close timestamps
Voice session:
- use
sessionIdas the primary join key - group all
voice_*actions with that session
Voice turn:
- add a canonical
turnId - until then, derive provisional turns from a chain of:
- promotion event
- transcript events
- addressing/admission
- generation/tool/output events
Ideal Replay UI For Text And Multimodal
The text replay view should feel like a linear story, not a table dump.
Sections:
1. Trigger
- final anchor message
- coalesced trigger list
- recent-room context window
- addressing signal
2. Inputs
- prompt bundle
- relevant memory slice
- detected attachments
- captioning results
- retrieved web/image/browser context
3. Model loop
- first model call
- tool requests in order
- tool results in order
- followup model call(s)
- parse/recovery events if structured output failed
4. Delivery
- reaction
- sent message or reply
- standalone vs reply
- mention resolution
- media directives
5. Persistence
- memory ingest
- fact writes
- conversation history writes
6. Performance
- queue
- memory
- llm
- followup
- typing
- send
- cost
Ideal Debugger For VC Runtime And Events
This is the most important part of the design. Voice debugging should not stop at "session history plus a flat event list."
It should reconstruct how the room moved.
Voice surface split
The voice debugger should have two connected views:
Live session debuggerSession replay
Live helps operators while the room is active. Replay helps after a bad session or regression.
Live session debugger
Top summary bar:
- bot
- guild / voice channel
- mode and reply path
- provider/runtime
- session age
- active participants
- current floor owner
- output lock state
- pending response state
- active response state
- pending deferred turns
Pinned health chips:
- transport connected / disconnected
- ASR healthy / degraded
- generation healthy / degraded
- output healthy / degraded
- music active
- screen-share active
Voice session timeline
Render the session as synchronized lanes instead of one flat feed.
Required lanes:
Session- joins, leaves, reconnects, session end
Membership- participant joins/leaves
Capture/VAD- activity start, provisional capture, promotion, silence gate
ASR- speech started/stopped, transcript revisions, finals, empty transcript
Admission- addressing, wake latch, command-only, output lock, deny reasons
Generation- model calls, prompts, followup loops, cancellations
Tools- tool start, tool result, tool failure, ownership path
Output- bot audio start/stop, preplay stash, interruption, replay recovery
Screen share- frame ingest, commentary request, brain-context refresh
Music- playback state, ducking, soundboard overlays
Operators should be able to collapse to only the lanes relevant to the current incident.
Voice turn reconstruction
Within a session, the debugger should reconstruct human-readable voice turns.
One voice turn card should include:
- provisional capture and promotion metadata
- utterance IDs / transcript revision chain
- final transcript
- direct address / command follow-up / ambient reasoning
- output lock reason if blocked
- generation prompt snapshot
- tool calls
- generated text
- whether speech was played, cancelled, or replayed
- barge-in or interruption details
Voice prompt and context surfaces
Ideal operator surfaces:
- current instructions
- classifier prompt snapshot
- generation prompt snapshot
- bridge prompt snapshot
- last generation context
- memory facts used for the session
- stream-watch brain context payload
These already exist in pieces in the live voice snapshot. The debugger should preserve them and present them in replay alongside the relevant turn.
Voice anomaly callouts
The voice debugger should surface likely failure causes automatically.
First anomaly set:
- promoted capture with empty ASR
- direct address denied by output lock
- output lock stuck after response done
- repeated silence-gate drops for same speaker/session
- generation succeeded but output never started
- tool call ownership mismatch
- repeated reconnects
- screen-share commentary storms
- music playback masking expected speech
Voice ideal drilldown
When an operator clicks "why didn’t it answer here?", the UI should answer in one screen:
- user speech was promoted at
t0 - transcript finalized at
t1 - admission denied or allowed at
t2 - model/tool loop happened at
t3 - output was blocked, cancelled, played, or replayed at
t4
That should be true for both live sessions and historical sessions.
Backend Design
1. Keep actions as the event ledger
Do not replace the raw action log.
The debugger should be a projection layer over:
actionsresponse_triggersmessages- memory and voice snapshot helpers where relevant
2. Add a projector layer
Introduce a debugger projection module that can:
- reconstruct incidents on read for MVP
- optionally persist derived summaries later for speed
Suggested module split:
src/debugger/incidentProjector.tssrc/debugger/textTurnProjector.tssrc/debugger/voiceSessionProjector.tssrc/debugger/voiceTurnProjector.tssrc/debugger/anomalyRules.ts
3. Add derived storage only after MVP proves the shape
MVP can project on read from actions.
If query latency or cross-bot scale becomes painful, add derived tables:
debug_incidentsdebug_incident_eventsdebug_voice_turnsdebug_artifacts
Those tables should cache projections, not become the source of truth.
4. Add debugger APIs
Required APIs:
GET /api/debug/incidentsGET /api/debug/incidents/:incidentIdGET /api/debug/text-turns/:triggerMessageIdGET /api/debug/voice-sessions/:sessionIdGET /api/debug/voice-sessions/:sessionId/turnsGET /api/debug/searchGET /api/debug/saved-viewsPOST /api/debug/saved-views
Search filters:
- bot
- guild
- channel
- user
- incident type
- kind
- model
- provider
- reply path
- tool name
- anomaly
- output lock reason
- trace source
- time range
5. Indexing and query performance
The current action store indexes are enough for raw feeds but not ideal for a rich debugger.
Expected additions:
idx_actions_guild_timeidx_actions_channel_timeidx_actions_message_ididx_actions_user_timeidx_response_triggers_created_at- targeted JSON indexes for frequently queried metadata keys where SQLite makes it worthwhile
If we add derived incident tables, index by:
incidentType, startedAtguildId, startedAtchannelId, startedAtuserId, startedAtsessionIdtriggerMessageIdanomaly
Frontend Design
1. New Debugger tab
Do not overload ActionStream into a full debugger.
Keep:
ActionStreamas the event microscopeTextTabas the message-centric quick inspectorVoiceMonitoras the live runtime console
Add:
Debuggeras the causal replay product
2. Deep links from existing surfaces
Every existing surface should link into the debugger:
- action row -> open related incident
- text message card -> open turn replay
- voice session card -> open session replay
- voice event row -> open containing turn
3. Layout rules
- default to two-pane: timeline left, details right
- no floating toasts
- keep status inline near the active query or replay
- raw JSON must always be one click away
Logging Gaps To Close Before Or During MVP
The debugger will work better if these gaps are closed early:
- canonical
turnIdfor text - canonical
turnIdfor voice - stable tool call IDs in all tool-owning paths
- bot identity and deployment identity on every action
- prompt bundle IDs
- artifact references for attachments, fetched pages, and returned media
- explicit output-start / output-finish events for all reply paths
- explicit "no reply because X" terminal markers where today silence is only inferable
Rollout Plan
Phase 1. Contract hardening
- audit current action coverage
- add missing correlation keys
- add missing terminal-state logs for text and voice
- add bot/deployment identity fields
Primary files:
src/store/storeActionLog.tssrc/bot.tssrc/bot/replyPipeline.tssrc/voice/*docs/operations/logging.md
Phase 2. Text and multimodal replay MVP
- implement text turn projector
- add replay API by
triggerMessageId - build first replay UI for sent reply, sent message, and skipped turns
- deep-link from
TextTabandActionStream
Primary files:
src/debugger/textTurnProjector.tssrc/dashboard/routesDebug.tsdashboard/src/components/Debugger/*dashboard/src/components/TextTab.tsxdashboard/src/components/ActionStream.tsx
Phase 3. Voice session debugger MVP
- implement session replay from current
voice_*actions - upgrade voice history view from flat events to causal lanes
- preserve existing live prompt-state and stream-watch surfaces
- add anomaly callouts for the top voice failure modes
Primary files:
src/debugger/voiceSessionProjector.tssrc/debugger/voiceTurnProjector.tssrc/dashboard/routesVoice.tsdashboard/src/components/VoiceMonitor.tsxdashboard/src/components/Debugger/*
Phase 4. Search and saved incident views
- global debugger query API
- structured filters
- saved views and permalinks
- operator-friendly incident summaries
Phase 5. Derived indexes and cached projections
- add incident cache tables only if projection-on-read becomes slow
- backfill existing actions into incident summaries
- keep raw log replay as fallback when cached summaries disagree
Phase 6. Cross-bot federation
- standardize the action envelope across bots
- make debugger APIs filter by bot
- add deployment and version dimensions
- allow one dashboard to search across multiple bot runtimes
Verification Plan
Focused verification only. No giant test explosion.
Critical test targets:
- text turn projector reconstructs tool loops correctly
- skipped-turn replay identifies the stop reason correctly
- voice session projector groups events into the correct session
- voice turn projector handles empty-ASR, interrupted, and replay-recovered turns
- anomaly rules do not mislabel ordinary successful turns as failures
- debugger APIs preserve raw references back to source action IDs
Non-Goals
- replacing Grafana/Loki
- hiding raw logs from operators
- inventing a second source of truth separate from
actions - building a generic enterprise observability suite before the replay core works
- fully normalizing every event type before shipping the first useful debugger
Done Criteria
- an operator can open any odd text reply and see one coherent replay
- an operator can open any skipped text turn and see the exact stop point
- an operator can open a bad VC session and understand the causal path from capture to output
- prompt, tool, memory, and delivery context are visible without manual grep
- existing raw event surfaces still exist for low-level forensics
- the debugger contract is reusable across bots that emit the same structured action model
Product language: Clanker should be debuggable like a person with a memory and attention trail, not like a pile of unrelated log lines.
