docs/tmp/archive/quality-improvement-plan.md

Quality Improvement Plan

Date: March 6, 2026 Baseline: 769 tests, 0 :any, typecheck clean, 5 subsystem state machine docs written Focus: Bug prevention through interaction tests, E2E coverage, and error handling fixes

This plan targets the gaps that actually cause production bugs — timing-sensitive cross-domain interactions, missing E2E scenarios, and silent error swallowing — rather than further code extraction.


Priority 1: Fix Known Bugs (Quick Wins)

1A. Delete dead code — voiceRuntimeState.ts

src/voice/voiceRuntimeState.ts is dead code, superseded by src/voice/voiceRuntimeSnapshot.ts. It has no importers. Delete it.

Also the source of the getDeferredQueuedUserTurns / getJoinGreetingOpportunity LSP phantom errors — those errors go away with the file.

Effort: 5 minutes Files: Delete src/voice/voiceRuntimeState.ts

1B. Fix fire-and-forget error swallowing (HIGH risk)

src/voice/voiceToolCallDispatch.ts:103.catch(() => {}) on endSession(). If this fails silently, the bot gets stuck in a voice channel permanently with no logs.

Fix: .catch((err) => logger.error("endSession failed in scheduleLeaveVoiceChannel", { error: err })) (or whatever the project's logging pattern is).

Effort: 10 minutes Files: src/voice/voiceToolCallDispatch.ts

1C. Fix fire-and-forget error swallowing (MEDIUM risk)

3 instances:

FileLineFix
src/services/screenShareSessionManager.ts399Log the error before coercing to null
src/bot/conversationContinuity.ts100Log memory retrieval failure before returning empty
src/video/videoContextService.ts773Log the error and don't permanently cache the failure result

Effort: 20 minutes Files: 3 files above


Priority 2: Cross-Domain Interaction Tests

These test the timing-sensitive state reads identified in the subsystem docs. They are unit/integration tests (no Discord connection needed) but they validate the contracts between subsystems.

2A. Barge-in timing edge cases

Test the shouldBargeIn gate sequence under the exact conditions that cause production false-positives:

Test CaseWhat It Validates
Pre-audio guard: user speaking while response pending but no audio delta yetBarge-in should NOT fire — user can't interrupt what they haven't heard
Active flow guard: bot finished generating, subprocess draining buffered framesBarge-in should NOT fire — response is effectively complete
Echo guard: bot audio started <1500ms agoBarge-in should NOT fire — likely echo
Post-cancel race: response_done arrives between audio chunk and barge-in checkIf cancel fails (response already done), should NOT queue retry or set full suppression
Assertiveness during bot speech: peak < 0.05 or active ratio < 0.06Barge-in should NOT fire — signal too weak to confirm intentional interruption
Interruption policy: scope="speaker" with non-matching userIdBarge-in should be blocked for non-speaker

Test file: src/voice/bargeInController.test.ts Dependencies: Mock VoiceSession with assistantOutput, pendingResponse, botTurnOpen, botTurnOpenAt, bargeInSuppressionUntil. Mock capture signal metrics. Effort: 1-2 hours

2B. Deferred turn flush timing

Test the interaction between output lock release and deferred turn flushing:

Test CaseWhat It Validates
Phase transitions to idle → deferred turns flushThe syncAssistantOutputStaterecheckDeferredVoiceActions path works
Active promoted capture blocks deferred turn flushhasDeferredTurnBlockingActiveCapture prevents reply during active speech
Silence-only capture does NOT block deferred flushWeak captures that never promoted shouldn't hold up deferred turns
Deferred turn re-runs admission gate on flushCoalesced turn is re-evaluated, not blindly dispatched
Deferred action expires before output freesStale actions are cleaned up, not fired
Queued turns flush once output is genuinely clearDeferred actions only exist for queued user turns now

Test file: src/voice/deferredActionQueue.test.ts Dependencies: Mock VoiceSession with output state, capture state, deferred actions. Effort: 1-2 hours

2C. ASR bridge commit race conditions

Test the shared ASR user lock and handoff logic:

Test CaseWhat It Validates
Shared ASR user lock prevents concurrent accessSecond user's beginAsrUtterance returns false while first user holds lock
Lock released after commitreleaseSharedAsrActiveUser unlocks, next user can proceed
Handoff replays buffered PCMtryHandoffSharedAsr finds waiting promoted capture and flushes its audio
Circuit breaker after 3 empty commitsForces close + reconnect
Audio buffer overflow drops oldest, not newest10s cap preserves recent audio
Commit during connecting phase buffers correctlyAudio queued as pending, flushed when ready

Test file: src/voice/voiceAsrBridge.test.ts Dependencies: Mock OpenAiRealtimeTranscriptionClient, mock VoiceSession with ASR state. Effort: 1-2 hours

2D. Output state machine transition contracts

Test the assistantOutput phase transitions that drive the output lock:

Test CaseWhat It Validates
response_done before subprocess drain keeps lockPhase should be speaking_buffered, not idle
Stale positive clankvox telemetry expiresBuffer depth updates that stop arriving eventually release the lock
Stale OpenAI active response clearedisResponseInProgress() returns true but pendingResponse is gone → phase returns to idle
Barge-in forces immediate idleBoth speaking_live and speaking_bufferedidle
Tool call lifecycle: response_pendingawaiting_tool_outputsresponse_pendingTool call doesn't lose the pending response

Test file: src/voice/assistantOutputState.test.ts (extend existing) Dependencies: Already has test infrastructure. Extend with new scenarios. Effort: 1 hour

2E. Capture promotion contract tests

Test the two-phase capture lifecycle:

Test CaseWhat It Validates
server_vad_confirmed requires matching utterance ID AND local thresholdsServer VAD alone doesn't promote — local signal must also pass
strong_local_audio promotes without server VADHigh-confidence local signal bypasses server VAD
Near-silence early abort at 1sCaptures with very weak signal abort early
Max duration timer forces finalize at 8sLong captures don't run forever
speakingEndspeakingStart within debounce continues same captureDebounce prevents premature finalization
Promotion cancels pending system speechJoin greetings / thoughts cancelled when user starts speaking

Test file: src/voice/captureManager.test.ts Dependencies: Mock VoiceSession, mock clankvox events, mock ASR state. Effort: 1-2 hours


Priority 3: E2E Test Gaps

These require the bot-to-bot test infrastructure (DriverBot, test guild, separate bot tokens).

3A. Barge-in E2E test

The biggest E2E gap. No test currently validates the bot's behavior when a user interrupts mid-speech.

Scenario:

  1. Driver summons bot, asks a question that produces a long response
  2. While bot is speaking (wait for first audio bytes, then ~2s), driver plays a new audio fixture (interruption)
  3. Assert: bot stops speaking within a reasonable window
  4. Assert: bot processes the new input and responds to it

Prerequisite: Need a DriverBot helper that can play audio while capturing the bot's output simultaneously. Current playAudio is sequential. May need playAudioNonBlocking() or similar.

Test file: tests/e2e/voiceBargeIn.test.ts Effort: 3-4 hours (including DriverBot helper extension)

3B. Supersede / rapid input E2E test

Test that newer input supersedes stale replies:

  1. Driver plays two utterances in rapid succession (second starts before bot finishes responding to first)
  2. Assert: bot's final response addresses the second utterance, not the first

This partially exists in voicePhysicalHarness.test.ts ("rapid sequential utterances") but doesn't validate response content, only that audio is received.

Enhancement to: tests/e2e/voicePhysicalHarness.test.ts Effort: 1-2 hours

3C. Voice history API integration for test assertions

Current E2E tests only assert on audio byte counts (bot spoke / didn't speak). The dashboard exposes /api/voice/history/sessions and /api/voice/history/sessions/:id/events which could provide:

  • Exact event sequence (turn received, reply sent, barge-in, tool call)
  • Timing data
  • Transcript content
  • Distinction between TTS speech and music audio

Task: Build a VoiceHistoryAssertionHelper in tests/e2e/driver/ that polls the voice history API and provides assertion methods like assertEventSequence(["turn_received", "reply_started", "reply_completed"]).

Test file: tests/e2e/driver/voiceHistory.ts Effort: 2-3 hours


Concurrency Plan

Priority 1 (quick fixes) should be done sequentially on master — they're small and touch different files.

Priority 2 (interaction tests) can run as 3 parallel worktrees:

WorktreeTestsFiles Owned
W12A (barge-in) + 2D (output state)bargeInController.test.ts, extend assistantOutputState.test.ts
W22B (deferred flush) + 2E (capture promotion)deferredActionQueue.test.ts, captureManager.test.ts
W32C (ASR bridge)voiceAsrBridge.test.ts

All three create test files only — zero production code changes, zero conflicts.

Priority 3 (E2E tests) should be done after Priority 2 merges, since they require a running bot and test guild.


Expected Outcome

MetricCurrentAfter Plan
Tests769~830-850
Fire-and-forget (HIGH risk)10
Fire-and-forget (MEDIUM risk)30
Dead code files10
Barge-in interaction tests0~6
Deferred action interaction tests0~6
ASR bridge interaction tests0~6
Capture promotion tests0~6
E2E barge-in coveragenone1 scenario
Cross-domain timing contracts tested0~24