Skip to content

[Bug] MCP Server json.dumps() escapes non-ASCII characters, causing 2.5-3x token overhead for CJK content #1962

@wchy1128

Description

@wchy1128

crawl4ai version

latest

Expected Behavior

Bug Description

In deploy/docker/mcp_bridge.py, json.dumps() is called without ensure_ascii=False, causing all non-ASCII characters (Chinese, Japanese, Korean, etc.) to be escaped as \uXXXX sequences in MCP tool results.

Root Cause

There are 3 occurrences of json.dumps() without ensure_ascii=False:

# Line ~192
return [t.TextContent(type="text", text=json.dumps(err))]

# Line ~193
return [t.TextContent(type="text", text=json.dumps(res, default=str))]

# Line ~211 (read_resource)
return [t.TextContent(type="text", text=json.dumps(res, default=str))]

Python's json.dumps() defaults to ensure_ascii=True, which escapes all non-ASCII characters.

Impact

Token overhead measured with tiktoken (gpt-4 encoding):

Text ensure_ascii=True ensure_ascii=False Ratio
30 CJK chars 58 tokens 21 tokens 2.8x
189 CJK chars 625 tokens 206 tokens 3.0x
  • Short CJK pages: ~300-600 extra tokens
  • Long CJK pages (e.g. Wikipedia): thousands of extra tokens
  • English pages: no impact (\uXXXX only affects non-ASCII)

Current Behavior

Evidence

HTTP API returns native UTF-8, confirming the issue is in MCP serialization layer only:

curl -s -X POST "http://localhost:11235/md" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://zh.wikipedia.org/wiki/Hello"}'
# Returns: "跳转到内容" (native Chinese)

MCP tool result:

{"markdown": "\u8df3\u8f6c\u5230\u5185\u5bb9..." }

Suggested Fix

# Line ~192
return [t.TextContent(type="text", text=json.dumps(err, ensure_ascii=False))]

# Line ~193
return [t.TextContent(type="text", text=json.dumps(res, default=str, ensure_ascii=False))]

# Line ~211
return [t.TextContent(type="text", text=json.dumps(res, default=str, ensure_ascii=False))]

Related Issues

This is a common problem across MCP implementations:

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Crawl4AI version: 0.8.x (Docker deployment)

Python version

Python: 3.12

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions