Wrong-context replies still appear on longer follow-ups that bypass the 30-char gate

## Symptom

After #395 shipped the 30-char short-message defense in depth (BM25 dampening + tier-selector active-thread override + AgentLoopRunner memory-summary guard + behavioural directive), the **shorter** failure case from #383 is gone. But the **longer** variant from the same incident report is not addressed and remains reproducible.

From #383, Incident 1 (2026-05-10T02:53):

```
user:      Hopefully we can go this coming winter. My health seems better now   (67 chars, 12 words)
assistant: Noted. I've got that on the travel ledger: hoping for Cathedral City this coming winter,
           with health in a better place now.
```

The user message DOES introduce a new fact ("hoping to go this winter, health better now") that is on-topic for the very recent thread, but the reply still summarises the just-stored memory entry as a non-sequitur closing rather than continuing the conversation — and gets routed through Low tier since the prompt is conversational, not complex.

## Why the existing defenses don't catch this

- **`AgentContextBuilder` short-message gate (#384):** 67 chars > 30, so the BM25 dampening doesn't engage.
- **Tier-selector active-thread override (#395):** Gated on `promptText.Length ≤ 30`, so this prompt routes Low as normal.
- **`AgentLoopRunner` memory-summary guard (#395):** Gated on `originalUserRequest.Length ≤ 30`, so the guard doesn't fire even though the response *does* match `MemorySummaryReplyRegex`.
- **Common directive (#395):** Soft pressure on the model. Likely partially effective on stronger tiers; insufficient alone on `gpt-5.4-mini` at Low tier.

## Diagnosis

The 30-char threshold cleanly catches one-liner follow-ups ("ok", "I'll find out soon"). It does NOT catch fact-introducing short paragraphs because they're above the lexical-noise band. The failure mode in this incident isn't "low-signal message + noisy BM25 hits" — it's "user introduces a small fact AND the model fixates on storing-and-summarising it instead of continuing the discussion."

A length-only heuristic can't tell the difference between:

- "Hopefully we can go this coming winter. My health seems better now" → fact-introducing follow-up that SHOULD save AND continue the conversation
- "What is the capital of France?" → similar length, no thread context to continue

What differs is **recent-window topical overlap**: the user's prompt overlaps highly with the last few assistant/user turns. The BM25 / embedding query that injects the noisy entries should be biased toward the recent window, not the raw incoming message in isolation.

## Proposed approaches

Roughly cheapest → deepest, similar shape to the original #383 proposals:

1. **Per-turn BM25 query enrichment.** Instead of querying long-term memory with just the incoming user message, query with the incoming message **concatenated** with the last N=2-3 user/assistant turns. The current top-of-thread is in the query, so BM25 stays anchored on what's actually being discussed.

2. **Recent-window embedding overlap gate.** Compute the cosine similarity between the user message embedding and an embedding of the recent-history window. When overlap is high (≥ some threshold), suppress the BM25-delta injections entirely — the model already has the topic in conversation history, no need to inject memory.

3. **Tier-selector window-overlap signal.** Like the active-thread heuristic in #395, but instead of `priorTurns.Count >= 3 && recent`, compute embedding overlap between the new message and the recent window. Promote Low → Balanced when overlap is high (means the message IS continuing an established thread, regardless of length).

4. **Expand the memory-summary guard's user-message length gate.** Replace the hard 30-char cap with a more sophisticated condition — e.g. "user message length ≤ 80 chars AND \`SaveMemory\` content has low novel-keyword overlap with the user message AND response matches \`MemorySummaryReplyRegex\`." Higher false-positive risk but catches the longer variant.

5. **Behavioural directive refinement.** The current common-directive section is gated implicitly on the model deciding what counts as a "short follow-up." Tighten the language to call out the specific failure: "Even when the user introduces a new fact, your reply must address what they said in the context of the active thread, not just narrate that you stored the fact."

## Goals

- The Cathedral City reproducer (67-char fact-introducing follow-up on a topical thread) produces a thread-continuing reply, not a memory-summary closing.
- Defenses combine; this should NOT replace the 30-char path, just extend coverage above it.
- Telemetry on whether the recent-window overlap signal is reliable enough to gate routing or guard decisions on it.

## Non-goals

- Replacing or undoing the 30-char defenses from #384 / #395.
- Building a per-turn embedding-similarity service if one doesn't already exist — start with cosine over the existing shared query embedding.
- A general "don't fixate on stored memory" rule. The fix should be targeted at the specific signature: model invoked SaveMemory AND the reply is a memory-state summary rather than a topical continuation.

## Evidence

- Original incident: see #383 (Incident 1, 2026-05-10T02:53 in \`blazor-session\`).
- Live cluster on 0.10.58 has the 30-char defenses deployed and verified working for the short variant — see PR #395 verification logs.
- \`AgentLoopRunner.MemorySummaryReplyRegex\` is already calibrated to match this phrasing; the guard just doesn't engage because the length gate excludes it. Reusing the regex (with a different length gate) is one viable path.

## Related

- Stacked behind #395 (merged 2026-05-11).
- Closes the remaining out-of-scope item from #383 so that issue can be closed cleanly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong-context replies still appear on longer follow-ups that bypass the 30-char gate #397

Symptom

Why the existing defenses don't catch this

Diagnosis

Proposed approaches

Goals

Non-goals

Evidence

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong-context replies still appear on longer follow-ups that bypass the 30-char gate #397

Description

Symptom

Why the existing defenses don't catch this

Diagnosis

Proposed approaches

Goals

Non-goals

Evidence

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions