docs/tmp/archive/presence-attention-next-steps-plan.md

Presence And Attention Next Steps Plan

Status: proposed

Reference:

Goal

Finish the last important convergence work after the recent shared-attention alignment:

  • keep one social mind across text and voice
  • preserve modality-specific floor ownership
  • treat bridge as an explicit transport exception, not a conceptual failure
  • keep cross-modal continuity strong without making Clanker jump into unrelated surfaces just because he is already active somewhere else

Current State

The runtime is now much closer to the canonical model:

  • text and voice both speak in terms of shared ACTIVE / AMBIENT
  • ambient text and ambient voice both support pending-thought continuity
  • docs now describe shared attention with transport-specific spokes instead of separate minds

The remaining work is mostly about sharpening boundaries and making the shipped exceptions explicit.

Remaining Gaps

1. Bridge exception is real, but not yet treated as a first-class design boundary

bridge reply mode cannot rely on native [SKIP], so classifier-first admission is still required there.

That is acceptable, but the code and docs should make the rule unambiguous:

  • bridge is classifier-first because of transport constraints
  • brain should stay generation-owned by default
  • native follows provider-native floor behavior

We should avoid language that makes the bridge exception look like a general statement about the whole voice spoke.

2. Brain-path voice admission still needs a cleaner ownership story

For brain reply mode, the desired rule is:

  • deterministic safety/cost/floor gates may still run first
  • after that, real active turns should reach the main brain
  • the model decides reply vs [SKIP]

The remaining cleanup is to make sure generation_decides really means that in practice, and to keep optional classifier-first behavior clearly optional instead of semantically central.

3. Cross-modal continuity needs stronger asymmetry guarantees

The important product behavior is continuity, not a literal shared state-machine module.

Desired behavior:

  • a text ping can make Clanker more awake in VC
  • a VC wakeup can make nearby text context feel continuous
  • being active in VC should not by itself make him jump into random text threads
  • being active in text should not by itself force voice output

The code is close to this already, but the contract should become more explicit in prompt framing, docs, and tests.

4. Lightweight continuity observability is still weaker than the behavior deserves

We now have shared continuity semantics, but runtime visibility is still fragmented.

We should be able to inspect:

  • current continuity mode
  • why it is ACTIVE
  • which speaker or author currently owns the live thread
  • whether continuity came from text, voice, interruption recovery, or command follow-up

This matters for debugging future regressions without re-litigating the model each time.

Execution Plan

1. Make reply-path ownership explicit

  • Audit voiceReplyDecision, settings mapping, and dashboard wording so bridge, brain, and native describe ownership clearly.
  • Keep bridge classifier-first by design.
  • Keep brain on generation_decides by default.
  • Treat optional classifier-first on brain as an override, not as the default mental model.

Primary files:

2. Lock in cross-modal asymmetry

  • Keep text ACTIVE promotion text-local: direct address, reply-to-bot, same-author follow-up, or other explicitly local thread signals.
  • Do not let “active voice session exists” silently widen unrelated text reply admission.
  • Keep voice floor-taking voice-local even when text recently promoted shared attention.
  • Add prompt language that frames other-surface activity as continuity context, not as automatic permission to speak.

Primary files:

3. Add a lightweight continuity debug surface

  • Expose current attention mode and reason in runtime snapshots/logs for both text and voice paths.
  • Make cross-modal continuity sources visible, not just inferred from scattered timestamps.
  • Prefer one canonical summary shape reused by logs, snapshots, and prompts.

This does not require a giant new hub module. A small shared summary contract is enough.

Primary files:

  • src/bot/replyAdmission.ts
  • src/voice/voiceReplyDecision.ts
  • src/voice/voiceRuntimeSnapshot.ts
  • src/voice/voiceSessionTypes.ts
  • any text-side runtime snapshot/debug surface that already exists

4. Tighten focused tests around the intended product contract

Add or update only the tests that protect the important behavior:

  • bridge stays classifier-first
  • brain active turns are generation-owned by default
  • active VC does not auto-promote unrelated text turns into ACTIVE
  • recent cross-modal continuity can still show up as prompt context without forcing output
  • runtime snapshots/logs expose the same attention reason the prompts see

Primary files:

  • src/voice/voiceReplyDecision.test.ts
  • src/bot/replyAdmission.test.ts
  • src/bot/voiceReplies.test.ts
  • other focused prompt/runtime tests only where the contract is otherwise unprotected

Non-Goals

  • Do not build a giant monolithic shared attention state machine just to satisfy the docs.
  • Do not erase modality-specific floor rules in the name of “unification.”
  • Do not make VC activeness a blanket text-participation permission.
  • Do not remove the bridge classifier path unless the transport can natively express [SKIP].

Done Criteria

  • bridge, brain, and native each have a crisp documented ownership model.
  • brain mode is clearly generation-owned by default after deterministic safety/cost/floor gates.
  • Cross-modal continuity is preserved, but unrelated text/voice surfaces do not auto-activate each other.
  • Shared attention reasons are visible in runtime/debug output, not only buried in prompt assembly.
  • Canonical docs describe the shipped behavior without apologizing for the distributed implementation.

Product language: Clanker should feel like one person whose attention carries across text and voice, while each medium still keeps its own natural right to the floor.