Skip to content

SDK 0.3.0 regression: per-turn mutating state in <system_reminder> block defeats Anthropic prompt cache #1296

@pmella

Description

@pmella

Summary

The Copilot CLI now auto-injects mutating session state (SQL tables list, todo status, added/removed tools, previous/new model) into a <system_reminder> block inside every user message. Because this content is part of the cached prefix and changes turn-by-turn within a session, it defeats Anthropic prompt-cache prefix matching for any consumer relying on cache_control: ephemeral for cross-turn cache reuse.

This is a regression in 0.3.0 vs 0.2.2 and has not been fixed through 1.0.0-beta.4 / CLI 1.0.47.

Observed in production

We measured a sustained 9 percentage-point drop in Anthropic prompt-cache hit rate for Claude Opus 4.5 traffic in our service immediately after upgrading from @github/copilot-sdk 0.2.2 (+ @github/copilot 1.0.34) to 0.3.0 (+ 1.0.40):

  • Pre-upgrade (months stable): ~89% cache hit rate
  • Post-upgrade (sustained through today): ~80% cache hit rate

The drop appeared within the typical container rollout window after merging the SDK bump. Cache rate has not recovered.

Behavior diff between 0.2.2 and 0.3.0 (reproduced locally)

Same input, same Claude Opus 4.5 model, same conversation prompt sequence:

SDK 0.2.2 user-message transformedContent (turns 1 and 2 identical, modulo timestamp):

<current_datetime>...</current_datetime>

Hello, can you help me with my spreadsheet?

<reminder>
<sql_tables>No tables currently exist. Default tables (todos, todo_deps) will be created automatically when you first use the SQL tool.</sql_tables>
</reminder>

SDK 0.3.0 user-message transformedContent:

Turn 1:

<current_datetime>...</current_datetime>

Hello, can you help me with my spreadsheet?

<system_reminder>
<sql_tables>No tables currently exist. Default tables (todos, todo_deps) will be created automatically when using the SQL tool for the first time.</sql_tables>
</system_reminder>

Turn 2 (same session):

<current_datetime>...</current_datetime>

Now do the same for B1 through B10.

<system_reminder>
<sql_tables>Available tables: todos, todo_deps, inbox_entries</sql_tables>
</system_reminder>

Two changes:

  1. Tag rename: <reminder> -> <system_reminder> (one-time invalidation on upgrade)
  2. Auto-mutating state: the SDK now tracks session state internally and injects it per turn. In 0.2.2 the <reminder> block was static across turns; in 0.3.0 the <system_reminder> block mutates as the SDK observes tool usage, model changes, todo updates, etc.

The second change is the ongoing cache-rate drag.

Why this matters

Anthropic prompt caching (cache_control: ephemeral) requires byte-exact prefix matching. When the cached prefix includes content that changes turn-by-turn within a session:

  • Cache writes still happen every turn (paying the cache-write surcharge of 1.25 input rate)
  • Cache reads find a shorter matching prefix because the mutating block ends the byte-match
  • Net effect: paying for cache writes that never get fully read back

For high-volume API consumers, the 9-pp drop translates to a measurable input-token cost increase and lost end-user latency benefits.

Where in the code

yUr function in the CLI bundle, around node_modules/@github/copilot/sdk/index.js line 3942 in CLI 1.0.40:

yUr = ({customAgentPrompt, problemStatement, capabilities, hasImages,
        ..., sqlTables, todoStatus, addedTools, removedTools,
        previousModel, newModel, ...}) => {
    let E = sqlTables !== void 0
        ? `<sql_tables>Available tables: ${tablesList}</sql_tables>`
        : `<sql_tables>No tables currently exist...</sql_tables>`;
    // ...similar for todoStatus, addedTools, removedTools, model changes...
    return template.with({
        ...,
        current_datetime: q0t(),
        additional_instructions: assembled,
    }).asXML().trim();
};

These state fields are passed in by the CLI's internal session state machine and are not exposed to SDK consumers.

Proposed fixes (ranked)

1. Session config to opt out of auto state injection (preferred)

Expose a session option like:

client.createSession({
    autoInjectSessionState: false,  // disables <system_reminder> auto-population
    // ...
});

When false, the SDK omits the <system_reminder> block from user messages, or includes only static defaults. Consumers who want the state-tracking behavior keep the default; consumers optimizing for prompt caching opt out.

2. Move mutating state out of the cached prefix

Restructure the user message so mutating state lives in a separate content block placed after any consumer-set cache_control marker. Multi-block user messages with structured content arrays are supported by Anthropic; the SDK could emit them.

3. Document the cache impact

At minimum, document that <system_reminder> content is per-turn dynamic, so consumers can plan around it (e.g., avoid setting cache_control on the user-message block until this is opt-out-able).

Evidence and reproduction

We have a self-contained reproduction harness:

  • Two minimal Node.js test apps, one per SDK version
  • Same input prompt sent through each
  • Captures transformedContent from user.message session events
  • Side-by-side byte-level diff

Stage 1 (transformation diff) confirms the wrapper change. Stage 2 (5 independent sessions of the same prompt per version) confirms cross-session first-turn caching is consistent in both versions; the regression is specifically in intra-session multi-turn caching.

Happy to extract and share the harness publicly if helpful.

Related SDK behaviors worth flagging while we're here

While investigating, we also noted:

  1. <current_datetime> is prepended at byte 0 of every user message, with sub-second precision. This single field guarantees that cache_control on the user-message block never reuses across requests. A flag to disable or move it would help.
  2. Single-block user message means cache_control is all-or-nothing. A documented multi-block user-message API would let consumers mark stable portions (user profile, session metadata) with cache_control separately from per-request content.

These are separate from the regression above and could be addressed in follow-up issues if you prefer.

Versions

  • Repro'd in: @github/copilot-sdk 0.3.0 + @github/copilot 1.0.40
  • Last verified unfixed in: SDK 1.0.0-beta.4 + CLI 1.0.47

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions