221 Silent Integer Truncations in vLLM

Security audit documenting 221 silent int64-to-int32 truncation sites in vLLM's CUDA/C++ extensions that enable GPU buffer overflow via crafted GGUF model files.

Overview

This repository contains the full research report documenting a systemic pattern of silent integer truncation across the C++ and CUDA extensions of vLLM, the most widely deployed open-source inference engine for large language model serving.

Key findings:

221 instances where PyTorch's 64-bit tensor metadata (int64_t) is silently narrowed to 32-bit int variables in vLLM's csrc/ directory
Truncated values are used in GPU buffer allocations, kernel launch parameters, and loop bounds
For GGUF model file code paths, tensor dimensions are directly attacker-controlled through the model file header
This is the same vulnerability class that has produced 10 CVEs in llama.cpp and Ollama
A proof-of-concept attack chain demonstrates how a crafted GGUF file triggers a deterministic GPU buffer overflow

Report

Read the full report (PDF)

The report covers:

A formal definition of the weakness class (silent narrowing of attacker-controlled tensor metadata)
The complete 221-site audit with breakdowns by file and API call
A proof-of-concept attack chain from crafted GGUF file to GPU buffer overflow
Comparative CVE evidence from llama.cpp and Ollama
An argument for formal CWE classification of model-file-sourced memory corruption
Recommended fixes for vLLM and the broader ML infrastructure ecosystem

Reproducing the Audit

The truncation count can be independently verified against vLLM commit 63babd1:

git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 63babd1

# tensor.sizes()[idx] -> int (12 sites)
grep -rn 'int [a-z_]* = .*\.sizes()\[' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# tensor.size(dim) -> int (159 sites)
grep -rn 'int [a-z_]* = .*\.size(' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | grep -v constexpr | wc -l

# tensor.numel() -> int (22 sites)
grep -rn 'int [a-z_]* = .*\.numel()' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# tensor.stride(dim) -> int (28 sites)
grep -rn 'int [a-z_]* = .*\.stride(' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# Total: 221

Disclosure Timeline

Date	Event
2026-03-29	Identified int64-to-int truncation pattern in vLLM GGUF kernels
2026-03-29	Completed systemic audit: 221 truncation sites across 20+ files
2026-03-29	Submitted GHSA-w2f2-mvhx-4xpg (memcpy_to_shm OOB)
2026-03-29	Submitted GHSA-5jv2-g5wq-cmr4 (int64-to-int truncation)
2026-03-30	Both GHSAs closed by maintainers
2026-04-01	Report shared with vLLM security lead for review
2026-04-05	Public release

Both findings were submitted through coordinated disclosure via GitHub Security Advisory and were closed by the vLLM maintainers as not meeting their security bar. No embargo applies. The full disclosure narrative is in Section 6 of the report.

Related Work

This research is part of a broader effort to document memory corruption vulnerabilities across the ML inference stack. Related CVEs in the same vulnerability class:

llama.cpp: CVE-2025-53630, CVE-2026-27940, CVE-2025-49847, CVE-2026-33298, TALOS-2024-1913, TALOS-2024-1914, TALOS-2024-1915

Ollama: CVE-2025-0315, CVE-2025-0317, CVE-2025-66959

A proposal for a new CWE entry (child of CWE-197) specific to integer truncation of tensor metadata in ML inference engines has been submitted to MITRE using this research as supporting evidence.

Citation

If you reference this work in your research, please cite:

@techreport{srivastava2026vllm,
  title     = {221 Silent Integer Truncations in vLLM: How a Single Malicious
               Model File Can Corrupt GPU Memory in the Most Widely Deployed
               LLM Inference Engine},
  author    = {Srivastava, Aviral},
  year      = {2026},
  month     = {April},
  url       = {https://github.com/YOUR_USERNAME/vllm-integer-truncation-audit},
  note      = {Independent security research report}
}

Author

Aviral Srivastava -- Independent Security Researcher

CVE-2026-33017 (Langflow RCE, Critical 9.3, CISA KEV)
CVE-2026-32628 (AnythingLLM SQL Injection)
RSA Security Scholar 2025
NIST OLIR contributor, MITRE CWE/ATLAS contributor

Contact: LinkedIn

License

This work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

You are free to share and adapt this material for any purpose, including commercial, as long as appropriate credit is given.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
vLLM_221_Integer_Truncations_Srivastava_2026.pdf		vLLM_221_Integer_Truncations_Srivastava_2026.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

221 Silent Integer Truncations in vLLM

Overview

Report

Reproducing the Audit

Disclosure Timeline

Related Work

Citation

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

221 Silent Integer Truncations in vLLM

Overview

Report

Reproducing the Audit

Disclosure Timeline

Related Work

Citation

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages