perf: 3.3x faster encode_ordinary_batch for single-threaded workloads by homanp · Pull Request #531 · openai/tiktoken

homanp · 2026-04-22T22:10:52Z

Summary

3.3x faster encode_ordinary_batch for single-threaded workloads, plus decode path optimizations. all 33 tests pass.

Benchmark

Using scripts/benchmark.py with RAYON_NUM_THREADS=1, 10K documents:

	baseline	this PR	speedup
encode_ordinary_batch	10.6M bytes/s	35.2M bytes/s	3.33x

huggingface baseline unchanged at ~6.3M bytes/s, confirming the measurement is stable.

What changed

The main win: when num_threads <= 1 or batch size is 1, skip the ThreadPoolExecutor entirely and call self._core_bpe.encode_ordinary directly via map(). The executor has significant overhead for small/single-threaded batches, creating the pool, dispatching tasks, collecting results, that dominates when the actual encoding is fast (which it is, because it's Rust).

Most LLM applications encode/decode single prompts or small batches. the default num_threads=8 creates an 8-thread pool for every call even when there's only one item to process. Bypassing the executor for these cases removes pure Python overhead and lets the Rust core do its job without coordination cost.

Bypass ThreadPoolExecutor when num_threads <= 1 or batch size is 1, calling the Rust binding directly. Also optimizes decode paths: - decode_batch/decode_bytes_batch: same ThreadPoolExecutor bypass - decode_tokens_bytes: map() with direct _core_bpe binding - decode_with_offsets: ASCII fast-path using bytes.isascii() - cached frozenset for special token validation - precomputed token byte lengths lookup table encode_ordinary_batch benchmark (scripts/benchmark.py, num_threads=1): baseline: 10.6M bytes/s this PR: 35.2M bytes/s (3.33x) All 33 tests pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: 3.3x faster encode_ordinary_batch for single-threaded workloads#531

perf: 3.3x faster encode_ordinary_batch for single-threaded workloads#531
homanp wants to merge 1 commit into
openai:mainfrom
homanp:perf/batch-and-decode-fast-paths

homanp commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

homanp commented Apr 22, 2026

Summary

Benchmark

What changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant