Quality Improvement Plan
Date: March 6, 2026
Baseline: 769 tests, 0 :any, typecheck clean, 5 subsystem state machine docs written
Focus: Bug prevention through interaction tests, E2E coverage, and error handling fixes
This plan targets the gaps that actually cause production bugs — timing-sensitive cross-domain interactions, missing E2E scenarios, and silent error swallowing — rather than further code extraction.
Priority 1: Fix Known Bugs (Quick Wins)
1A. Delete dead code — voiceRuntimeState.ts
src/voice/voiceRuntimeState.ts is dead code, superseded by src/voice/voiceRuntimeSnapshot.ts. It has no importers. Delete it.
Also the source of the getDeferredQueuedUserTurns / getJoinGreetingOpportunity LSP phantom errors — those errors go away with the file.
Effort: 5 minutes
Files: Delete src/voice/voiceRuntimeState.ts
1B. Fix fire-and-forget error swallowing (HIGH risk)
src/voice/voiceToolCallDispatch.ts:103 — .catch(() => {}) on endSession(). If this fails silently, the bot gets stuck in a voice channel permanently with no logs.
Fix: .catch((err) => logger.error("endSession failed in scheduleLeaveVoiceChannel", { error: err })) (or whatever the project's logging pattern is).
Effort: 10 minutes
Files: src/voice/voiceToolCallDispatch.ts
1C. Fix fire-and-forget error swallowing (MEDIUM risk)
3 instances:
| File | Line | Fix |
|---|---|---|
src/services/screenShareSessionManager.ts | 399 | Log the error before coercing to null |
src/bot/conversationContinuity.ts | 100 | Log memory retrieval failure before returning empty |
src/video/videoContextService.ts | 773 | Log the error and don't permanently cache the failure result |
Effort: 20 minutes Files: 3 files above
Priority 2: Cross-Domain Interaction Tests
These test the timing-sensitive state reads identified in the subsystem docs. They are unit/integration tests (no Discord connection needed) but they validate the contracts between subsystems.
2A. Barge-in timing edge cases
Test the shouldBargeIn gate sequence under the exact conditions that cause production false-positives:
| Test Case | What It Validates |
|---|---|
| Pre-audio guard: user speaking while response pending but no audio delta yet | Barge-in should NOT fire — user can't interrupt what they haven't heard |
| Active flow guard: bot finished generating, subprocess draining buffered frames | Barge-in should NOT fire — response is effectively complete |
| Echo guard: bot audio started <1500ms ago | Barge-in should NOT fire — likely echo |
Post-cancel race: response_done arrives between audio chunk and barge-in check | If cancel fails (response already done), should NOT queue retry or set full suppression |
| Assertiveness during bot speech: peak < 0.05 or active ratio < 0.06 | Barge-in should NOT fire — signal too weak to confirm intentional interruption |
Interruption policy: scope="speaker" with non-matching userId | Barge-in should be blocked for non-speaker |
Test file: src/voice/bargeInController.test.ts
Dependencies: Mock VoiceSession with assistantOutput, pendingResponse, botTurnOpen, botTurnOpenAt, bargeInSuppressionUntil. Mock capture signal metrics.
Effort: 1-2 hours
2B. Deferred turn flush timing
Test the interaction between output lock release and deferred turn flushing:
| Test Case | What It Validates |
|---|---|
Phase transitions to idle → deferred turns flush | The syncAssistantOutputState → recheckDeferredVoiceActions path works |
| Active promoted capture blocks deferred turn flush | hasDeferredTurnBlockingActiveCapture prevents reply during active speech |
| Silence-only capture does NOT block deferred flush | Weak captures that never promoted shouldn't hold up deferred turns |
| Deferred turn re-runs admission gate on flush | Coalesced turn is re-evaluated, not blindly dispatched |
| Deferred action expires before output frees | Stale actions are cleaned up, not fired |
| Queued turns flush once output is genuinely clear | Deferred actions only exist for queued user turns now |
Test file: src/voice/deferredActionQueue.test.ts
Dependencies: Mock VoiceSession with output state, capture state, deferred actions.
Effort: 1-2 hours
2C. ASR bridge commit race conditions
Test the shared ASR user lock and handoff logic:
| Test Case | What It Validates |
|---|---|
| Shared ASR user lock prevents concurrent access | Second user's beginAsrUtterance returns false while first user holds lock |
| Lock released after commit | releaseSharedAsrActiveUser unlocks, next user can proceed |
| Handoff replays buffered PCM | tryHandoffSharedAsr finds waiting promoted capture and flushes its audio |
| Circuit breaker after 3 empty commits | Forces close + reconnect |
| Audio buffer overflow drops oldest, not newest | 10s cap preserves recent audio |
| Commit during connecting phase buffers correctly | Audio queued as pending, flushed when ready |
Test file: src/voice/voiceAsrBridge.test.ts
Dependencies: Mock OpenAiRealtimeTranscriptionClient, mock VoiceSession with ASR state.
Effort: 1-2 hours
2D. Output state machine transition contracts
Test the assistantOutput phase transitions that drive the output lock:
| Test Case | What It Validates |
|---|---|
response_done before subprocess drain keeps lock | Phase should be speaking_buffered, not idle |
| Stale positive clankvox telemetry expires | Buffer depth updates that stop arriving eventually release the lock |
| Stale OpenAI active response cleared | isResponseInProgress() returns true but pendingResponse is gone → phase returns to idle |
Barge-in forces immediate idle | Both speaking_live and speaking_buffered → idle |
Tool call lifecycle: response_pending → awaiting_tool_outputs → response_pending | Tool call doesn't lose the pending response |
Test file: src/voice/assistantOutputState.test.ts (extend existing)
Dependencies: Already has test infrastructure. Extend with new scenarios.
Effort: 1 hour
2E. Capture promotion contract tests
Test the two-phase capture lifecycle:
| Test Case | What It Validates |
|---|---|
server_vad_confirmed requires matching utterance ID AND local thresholds | Server VAD alone doesn't promote — local signal must also pass |
strong_local_audio promotes without server VAD | High-confidence local signal bypasses server VAD |
| Near-silence early abort at 1s | Captures with very weak signal abort early |
| Max duration timer forces finalize at 8s | Long captures don't run forever |
speakingEnd → speakingStart within debounce continues same capture | Debounce prevents premature finalization |
| Promotion cancels pending system speech | Join greetings / thoughts cancelled when user starts speaking |
Test file: src/voice/captureManager.test.ts
Dependencies: Mock VoiceSession, mock clankvox events, mock ASR state.
Effort: 1-2 hours
Priority 3: E2E Test Gaps
These require the bot-to-bot test infrastructure (DriverBot, test guild, separate bot tokens).
3A. Barge-in E2E test
The biggest E2E gap. No test currently validates the bot's behavior when a user interrupts mid-speech.
Scenario:
- Driver summons bot, asks a question that produces a long response
- While bot is speaking (wait for first audio bytes, then ~2s), driver plays a new audio fixture (interruption)
- Assert: bot stops speaking within a reasonable window
- Assert: bot processes the new input and responds to it
Prerequisite: Need a DriverBot helper that can play audio while capturing the bot's output simultaneously. Current playAudio is sequential. May need playAudioNonBlocking() or similar.
Test file: tests/e2e/voiceBargeIn.test.ts
Effort: 3-4 hours (including DriverBot helper extension)
3B. Supersede / rapid input E2E test
Test that newer input supersedes stale replies:
- Driver plays two utterances in rapid succession (second starts before bot finishes responding to first)
- Assert: bot's final response addresses the second utterance, not the first
This partially exists in voicePhysicalHarness.test.ts ("rapid sequential utterances") but doesn't validate response content, only that audio is received.
Enhancement to: tests/e2e/voicePhysicalHarness.test.ts
Effort: 1-2 hours
3C. Voice history API integration for test assertions
Current E2E tests only assert on audio byte counts (bot spoke / didn't speak). The dashboard exposes /api/voice/history/sessions and /api/voice/history/sessions/:id/events which could provide:
- Exact event sequence (turn received, reply sent, barge-in, tool call)
- Timing data
- Transcript content
- Distinction between TTS speech and music audio
Task: Build a VoiceHistoryAssertionHelper in tests/e2e/driver/ that polls the voice history API and provides assertion methods like assertEventSequence(["turn_received", "reply_started", "reply_completed"]).
Test file: tests/e2e/driver/voiceHistory.ts
Effort: 2-3 hours
Concurrency Plan
Priority 1 (quick fixes) should be done sequentially on master — they're small and touch different files.
Priority 2 (interaction tests) can run as 3 parallel worktrees:
| Worktree | Tests | Files Owned |
|---|---|---|
| W1 | 2A (barge-in) + 2D (output state) | bargeInController.test.ts, extend assistantOutputState.test.ts |
| W2 | 2B (deferred flush) + 2E (capture promotion) | deferredActionQueue.test.ts, captureManager.test.ts |
| W3 | 2C (ASR bridge) | voiceAsrBridge.test.ts |
All three create test files only — zero production code changes, zero conflicts.
Priority 3 (E2E tests) should be done after Priority 2 merges, since they require a running bot and test guild.
Expected Outcome
| Metric | Current | After Plan |
|---|---|---|
| Tests | 769 | ~830-850 |
| Fire-and-forget (HIGH risk) | 1 | 0 |
| Fire-and-forget (MEDIUM risk) | 3 | 0 |
| Dead code files | 1 | 0 |
| Barge-in interaction tests | 0 | ~6 |
| Deferred action interaction tests | 0 | ~6 |
| ASR bridge interaction tests | 0 | ~6 |
| Capture promotion tests | 0 | ~6 |
| E2E barge-in coverage | none | 1 scenario |
| Cross-domain timing contracts tested | 0 | ~24 |
