CommitLLM

CommitLLM is a cryptographic commit-and-audit protocol for open-weight LLM inference.

When you call an LLM provider, you have no proof they ran the model they claim, with the configuration they advertise, or that the output wasn't tampered with. CommitLLM closes that gap: the provider serves responses normally on GPU and returns a compact receipt. Trace data is opened only when challenged. A verifier checks the opening on CPU against committed model weights, deployment configuration, transcript state, and decode policy.

Status: active development. Lean 4 formalization is in progress. Internal crate names still use the verilm prefix and are pending rename to commitllm.

What It Gives You

Exact checks for the large linear shell, quantization and dequantization boundaries, and supported canonical replay steps
Exact audit detection for wrong weights, wrong deployment configuration, and wrong decode policy on opened traces
Approximate replay only at the attention interior, where native GPU FP16 and BF16 execution is not bit reproducible
Statistical routine-audit provenance for unopened prefix KV state
CPU-only verification on the client side
A normal GPU serving path for the provider, without a proving circuit or per-response proof generation

Current Prototype

Measured online tracing overhead is about 12-14% during generation
Measured verifier cost for Llama 70B is about 1.3 ms per challenged token under the 10-layer routine audit, and about 10 ms for a 1-token full audit
The corrected replay path is currently measured on Qwen2.5-7B-W8A8 and Llama-3.1-8B-W8A8
On those measured paths, worst-case INT8 attention mismatch beyond 1k tokens is L_inf = 8 for Qwen and L_inf = 9 for Llama, with more than 99.8% of elements staying within one quantization bucket

Protocol

Setup. The verifier builds a key from the public checkpoint. It contains a Merkle root over weights, secret Freivalds vectors for each matrix family (Wq, Wk, Wv, Wo, Wgate, Wup, Wdown, LM_head), and the model configuration needed for replay.
Commit. During inference the provider runs the model normally, captures retained state, and returns a compact receipt binding the execution trace, KV state, deployment manifest, prompt, sampling randomness, and token count.
Audit. When challenged, the provider opens the requested token positions and layer range. Routine audit samples prefix state. Deep audit opens all requested layers and prefix state.
Verify. The verifier checks:
- Embedding Merkle proof
- Shell matmuls via Freivalds
- Exact INT8 bridge tensors by canonical recomputation
- KV provenance
- Attention replay against the committed post-attention output
- Final-token tail from the captured residual
- LM-head binding
- Decode and output policy replay

The protocol is commitment-bound end-to-end. Within that binding, large linear components are verified by verifier-secret, information-theoretically sound algebraic checks, exact bridge tensors and supported nonlinear subcomputations are checked by canonical re-execution, attention is checked by bounded approximate replay, and routine KV provenance is statistical unless deep audit is used. Unsupported semantics fail closed.

Repository Layout

Component	Path
Core types and traits	`crates/verilm-core/`
Key generation	`crates/verilm-keygen/`
Verifier	`crates/verilm-verify/`
Prover (Rust)	`crates/verilm-prover/`
Prover (Python sidecar)	`sidecar/`
Python bindings	`crates/verilm-py/`
Test vectors	`crates/verilm-test-vectors/`
Benchmark and repro scripts	`scripts/modal/`
Lean formalization	`lean/`
Paper	`paper/`

Abstract

Large language models are increasingly used in settings where integrity matters, but users still lack technical assurance that a provider actually ran the claimed model, decode policy, and output behavior. Fingerprinting and statistical heuristics can provide signals, but not exact per-response verification. Zero-knowledge proof systems provide stronger guarantees, but at prover costs that remain impractical for production LLM serving.

We present CommitLLM, a cryptographic commit-and-audit protocol for open-weight LLM inference. CommitLLM keeps the provider on the normal serving path and keeps verifier work fast and CPU-only. It combines commitment binding, direct audit, and randomized algebraic fingerprints, including Freivalds-style checks for large matrix products, rather than per-response proof generation or full re-execution. Its main costs are retained-state memory over the audit window and audit bandwidth, not per-response proving.

The protocol is commitment-bound end-to-end. Within that binding, large linear layers are verified by verifier-secret, information-theoretically sound algebraic checks, quantization/dequantization boundaries and supported nonlinear subcomputations are checked by canonical re-execution, attention is verified by bounded approximate replay, and routine prefix-state provenance is statistical unless deep audit is used. Unsupported semantics fail closed.

In the current prototype, online tracing adds roughly 12-14% during generation. The larger commitment and finalization cost is measured separately, currently runs synchronously, and is a candidate for asynchronous deferral in production. For Llama 70B, the measured verifier cost is about 1.3 ms per challenged token under the 10-layer routine audit, and about 10 ms for a 1-token full audit. On the corrected replay path, the measured attention corridor is small on Qwen2.5-7B-W8A8 and Llama-3.1-8B-W8A8: at context lengths beyond 1k tokens, worst-case INT8 mismatch is single-digit (L_inf = 8 and 9) and more than 99.8% of elements remain within one quantization bucket.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
article		article
crates		crates
lean		lean
paper		paper
redteam		redteam
research		research
scripts		scripts
sidecar		sidecar
site		site
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
roadmap.md		roadmap.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommitLLM

What It Gives You

Current Prototype

Protocol

Repository Layout

Abstract

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CommitLLM

What It Gives You

Current Prototype

Protocol

Repository Layout

Abstract

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages