crawl4ai version
latest
Expected Behavior
Bug Description
In deploy/docker/mcp_bridge.py, json.dumps() is called without ensure_ascii=False, causing all non-ASCII characters (Chinese, Japanese, Korean, etc.) to be escaped as \uXXXX sequences in MCP tool results.
Root Cause
There are 3 occurrences of json.dumps() without ensure_ascii=False:
# Line ~192
return [t.TextContent(type="text", text=json.dumps(err))]
# Line ~193
return [t.TextContent(type="text", text=json.dumps(res, default=str))]
# Line ~211 (read_resource)
return [t.TextContent(type="text", text=json.dumps(res, default=str))]
Python's json.dumps() defaults to ensure_ascii=True, which escapes all non-ASCII characters.
Impact
Token overhead measured with tiktoken (gpt-4 encoding):
| Text |
ensure_ascii=True |
ensure_ascii=False |
Ratio |
| 30 CJK chars |
58 tokens |
21 tokens |
2.8x |
| 189 CJK chars |
625 tokens |
206 tokens |
3.0x |
- Short CJK pages: ~300-600 extra tokens
- Long CJK pages (e.g. Wikipedia): thousands of extra tokens
- English pages: no impact (
\uXXXX only affects non-ASCII)
Current Behavior
Evidence
HTTP API returns native UTF-8, confirming the issue is in MCP serialization layer only:
curl -s -X POST "http://localhost:11235/md" \
-H "Content-Type: application/json" \
-d '{"url":"https://zh.wikipedia.org/wiki/Hello"}'
# Returns: "跳转到内容" (native Chinese)
MCP tool result:
{"markdown": "\u8df3\u8f6c\u5230\u5185\u5bb9..." }
Suggested Fix
# Line ~192
return [t.TextContent(type="text", text=json.dumps(err, ensure_ascii=False))]
# Line ~193
return [t.TextContent(type="text", text=json.dumps(res, default=str, ensure_ascii=False))]
# Line ~211
return [t.TextContent(type="text", text=json.dumps(res, default=str, ensure_ascii=False))]
Related Issues
This is a common problem across MCP implementations:
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Crawl4AI version: 0.8.x (Docker deployment)
Python version
Python: 3.12
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
crawl4ai version
latest
Expected Behavior
Bug Description
In
deploy/docker/mcp_bridge.py,json.dumps()is called withoutensure_ascii=False, causing all non-ASCII characters (Chinese, Japanese, Korean, etc.) to be escaped as\uXXXXsequences in MCP tool results.Root Cause
There are 3 occurrences of
json.dumps()withoutensure_ascii=False:Python's
json.dumps()defaults toensure_ascii=True, which escapes all non-ASCII characters.Impact
Token overhead measured with tiktoken (gpt-4 encoding):
ensure_ascii=Trueensure_ascii=False\uXXXXonly affects non-ASCII)Current Behavior
Evidence
HTTP API returns native UTF-8, confirming the issue is in MCP serialization layer only:
MCP tool result:
{"markdown": "\u8df3\u8f6c\u5230\u5185\u5bb9..." }Suggested Fix
Related Issues
This is a common problem across MCP implementations:
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Crawl4AI version: 0.8.x (Docker deployment)
Python version
Python: 3.12
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response