Skip to content

Aviral2642/vllm-integer-truncation-audit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

221 Silent Integer Truncations in vLLM

Security audit documenting 221 silent int64-to-int32 truncation sites in vLLM's CUDA/C++ extensions that enable GPU buffer overflow via crafted GGUF model files.

License: CC BY 4.0

Overview

This repository contains the full research report documenting a systemic pattern of silent integer truncation across the C++ and CUDA extensions of vLLM, the most widely deployed open-source inference engine for large language model serving.

Key findings:

  • 221 instances where PyTorch's 64-bit tensor metadata (int64_t) is silently narrowed to 32-bit int variables in vLLM's csrc/ directory
  • Truncated values are used in GPU buffer allocations, kernel launch parameters, and loop bounds
  • For GGUF model file code paths, tensor dimensions are directly attacker-controlled through the model file header
  • This is the same vulnerability class that has produced 10 CVEs in llama.cpp and Ollama
  • A proof-of-concept attack chain demonstrates how a crafted GGUF file triggers a deterministic GPU buffer overflow

Report

Read the full report (PDF)

The report covers:

  1. A formal definition of the weakness class (silent narrowing of attacker-controlled tensor metadata)
  2. The complete 221-site audit with breakdowns by file and API call
  3. A proof-of-concept attack chain from crafted GGUF file to GPU buffer overflow
  4. Comparative CVE evidence from llama.cpp and Ollama
  5. An argument for formal CWE classification of model-file-sourced memory corruption
  6. Recommended fixes for vLLM and the broader ML infrastructure ecosystem

Reproducing the Audit

The truncation count can be independently verified against vLLM commit 63babd1:

git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 63babd1

# tensor.sizes()[idx] -> int (12 sites)
grep -rn 'int [a-z_]* = .*\.sizes()\[' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# tensor.size(dim) -> int (159 sites)
grep -rn 'int [a-z_]* = .*\.size(' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | grep -v constexpr | wc -l

# tensor.numel() -> int (22 sites)
grep -rn 'int [a-z_]* = .*\.numel()' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# tensor.stride(dim) -> int (28 sites)
grep -rn 'int [a-z_]* = .*\.stride(' csrc/ --include="*.cu" --include="*.cpp" \
  | grep -v int64_t | wc -l

# Total: 221

Disclosure Timeline

Date Event
2026-03-29 Identified int64-to-int truncation pattern in vLLM GGUF kernels
2026-03-29 Completed systemic audit: 221 truncation sites across 20+ files
2026-03-29 Submitted GHSA-w2f2-mvhx-4xpg (memcpy_to_shm OOB)
2026-03-29 Submitted GHSA-5jv2-g5wq-cmr4 (int64-to-int truncation)
2026-03-30 Both GHSAs closed by maintainers
2026-04-01 Report shared with vLLM security lead for review
2026-04-05 Public release

Both findings were submitted through coordinated disclosure via GitHub Security Advisory and were closed by the vLLM maintainers as not meeting their security bar. No embargo applies. The full disclosure narrative is in Section 6 of the report.

Related Work

This research is part of a broader effort to document memory corruption vulnerabilities across the ML inference stack. Related CVEs in the same vulnerability class:

llama.cpp: CVE-2025-53630, CVE-2026-27940, CVE-2025-49847, CVE-2026-33298, TALOS-2024-1913, TALOS-2024-1914, TALOS-2024-1915

Ollama: CVE-2025-0315, CVE-2025-0317, CVE-2025-66959

A proposal for a new CWE entry (child of CWE-197) specific to integer truncation of tensor metadata in ML inference engines has been submitted to MITRE using this research as supporting evidence.

Citation

If you reference this work in your research, please cite:

@techreport{srivastava2026vllm,
  title     = {221 Silent Integer Truncations in vLLM: How a Single Malicious
               Model File Can Corrupt GPU Memory in the Most Widely Deployed
               LLM Inference Engine},
  author    = {Srivastava, Aviral},
  year      = {2026},
  month     = {April},
  url       = {https://github.com/YOUR_USERNAME/vllm-integer-truncation-audit},
  note      = {Independent security research report}
}

Author

Aviral Srivastava -- Independent Security Researcher

  • CVE-2026-33017 (Langflow RCE, Critical 9.3, CISA KEV)
  • CVE-2026-32628 (AnythingLLM SQL Injection)
  • RSA Security Scholar 2025
  • NIST OLIR contributor, MITRE CWE/ATLAS contributor

Contact: LinkedIn

License

This work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

You are free to share and adapt this material for any purpose, including commercial, as long as appropriate credit is given.

About

Security audit documenting 221 silent int64-to-int32 truncation sites in vLLM's CUDA/C++ extensions that enable GPU buffer overflow via crafted GGUF model files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors