Skip to content

Wrong-context replies still appear on longer follow-ups that bypass the 30-char gate #397

@rockfordlhotka

Description

@rockfordlhotka

Symptom

After #395 shipped the 30-char short-message defense in depth (BM25 dampening + tier-selector active-thread override + AgentLoopRunner memory-summary guard + behavioural directive), the shorter failure case from #383 is gone. But the longer variant from the same incident report is not addressed and remains reproducible.

From #383, Incident 1 (2026-05-10T02:53):

user:      Hopefully we can go this coming winter. My health seems better now   (67 chars, 12 words)
assistant: Noted. I've got that on the travel ledger: hoping for Cathedral City this coming winter,
           with health in a better place now.

The user message DOES introduce a new fact ("hoping to go this winter, health better now") that is on-topic for the very recent thread, but the reply still summarises the just-stored memory entry as a non-sequitur closing rather than continuing the conversation — and gets routed through Low tier since the prompt is conversational, not complex.

Why the existing defenses don't catch this

Diagnosis

The 30-char threshold cleanly catches one-liner follow-ups ("ok", "I'll find out soon"). It does NOT catch fact-introducing short paragraphs because they're above the lexical-noise band. The failure mode in this incident isn't "low-signal message + noisy BM25 hits" — it's "user introduces a small fact AND the model fixates on storing-and-summarising it instead of continuing the discussion."

A length-only heuristic can't tell the difference between:

  • "Hopefully we can go this coming winter. My health seems better now" → fact-introducing follow-up that SHOULD save AND continue the conversation
  • "What is the capital of France?" → similar length, no thread context to continue

What differs is recent-window topical overlap: the user's prompt overlaps highly with the last few assistant/user turns. The BM25 / embedding query that injects the noisy entries should be biased toward the recent window, not the raw incoming message in isolation.

Proposed approaches

Roughly cheapest → deepest, similar shape to the original #383 proposals:

  1. Per-turn BM25 query enrichment. Instead of querying long-term memory with just the incoming user message, query with the incoming message concatenated with the last N=2-3 user/assistant turns. The current top-of-thread is in the query, so BM25 stays anchored on what's actually being discussed.

  2. Recent-window embedding overlap gate. Compute the cosine similarity between the user message embedding and an embedding of the recent-history window. When overlap is high (≥ some threshold), suppress the BM25-delta injections entirely — the model already has the topic in conversation history, no need to inject memory.

  3. Tier-selector window-overlap signal. Like the active-thread heuristic in Short-message wrong-context defense in depth (#383) #395, but instead of priorTurns.Count >= 3 && recent, compute embedding overlap between the new message and the recent window. Promote Low → Balanced when overlap is high (means the message IS continuing an established thread, regardless of length).

  4. Expand the memory-summary guard's user-message length gate. Replace the hard 30-char cap with a more sophisticated condition — e.g. "user message length ≤ 80 chars AND `SaveMemory` content has low novel-keyword overlap with the user message AND response matches `MemorySummaryReplyRegex`." Higher false-positive risk but catches the longer variant.

  5. Behavioural directive refinement. The current common-directive section is gated implicitly on the model deciding what counts as a "short follow-up." Tighten the language to call out the specific failure: "Even when the user introduces a new fact, your reply must address what they said in the context of the active thread, not just narrate that you stored the fact."

Goals

  • The Cathedral City reproducer (67-char fact-introducing follow-up on a topical thread) produces a thread-continuing reply, not a memory-summary closing.
  • Defenses combine; this should NOT replace the 30-char path, just extend coverage above it.
  • Telemetry on whether the recent-window overlap signal is reliable enough to gate routing or guard decisions on it.

Non-goals

Evidence

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions