Tech review of the vllm quantization LP by pareenaverma · Pull Request #3307 · ArmDeveloperEcosystem/arm-learning-paths

pareenaverma · 2026-05-19T10:28:31Z

Before submitting a pull request for a new Learning Path, please review Create a Learning Path

[x ] I have reviewed Create a Learning Path

Please do not include any confidential information in your contribution. This includes confidential microarchitecture details and unannounced product information.

[x ] I have checked my contribution for confidential information

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the Creative Commons Attribution 4.0 International License.

…learning-paths into content_review

nikhil-arm · 2026-05-19T10:38:48Z

+| - stem           |      2|none  |     0|acc   |↑  |0.6053|±  |0.0345|
+```
+
+The INT8 model scores 0.6614 on MMLU compared to 0.6895 for BF16 — a drop of approximately 3%, which is consistent with the expected accuracy cost of INT8 weight quantization. For full reference results, see the [Red Hat model card](https://huggingface.co/RedHatAI/Meta-Llama-3.1-8B-quantized.w8a8#accuracy).


What is the reference point to claim that a 3% drop is expected with INT8? I think its slightly wrong to say this specially with limit 10 benchmarking

Fair point can remove this 3% drop but still keep the output reference for the reader to know what to expect from running it. Same phrasing instead as previous comment

nikhil-arm · 2026-05-19T10:40:08Z

 ```

-We expect INT8 inference to show a slight accuracy drop compared to BF16. For reference results and expected accuracy differences, see the Red Hat model card: https://huggingface.co/RedHatAI/Meta-Llama-3.1-8B-quantized.w8a8#accuracy
+The output is similar to:


IMO its better to not post accuracy numbers unless we have run full benchmarking on certain tasks because these numbers would heavily change based on what prompts gets randomly picked up for benchmarking/

Happy to add some phrasing out that addresses that concern like The output above shows the format you can expect from lm_eval. At --limit 10, only 10 prompts are randomly selected per task, so the specific values will vary significantly between runs and are not a reliable basis for comparison. For published full-dataset accuracy figures, see the Red Hat model card

nikhil-arm · 2026-05-19T10:40:39Z

 lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B,dtype=bfloat16,max_model_len=4096 --tasks mmlu,gsm8k --batch_size auto
 ```

+The output is similar to:


IMO its better to not post accuracy numbers unless we have run full benchmarking on certain tasks because these numbers would heavily change based on what prompts gets randomly picked up for benchmarking.
I understand the point of showcasing a representative output table though.

almayne · 2026-05-19T13:55:00Z

+### Accuracy recovery: INT8/BF16 (--limit 10)
+| MMLU | GSM8k |
+|---|---|
+| 97% | 92% |


I generated these values without any limit applied. If you generated these yourself with that arg then no issue here.

pareenaverma added 3 commits May 19, 2026 11:26

Tech review of the vllm quant LP

9597ea0

Merge branch 'content_review' of https://github.com/pareenaverma/arm-…

3cc3cb4

…learning-paths into content_review

Merge branch 'ArmDeveloperEcosystem:main' into content_review

c621984

nikhil-arm reviewed May 19, 2026

View reviewed changes

nikhil-arm suggested changes May 19, 2026

View reviewed changes

almayne reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech review of the vllm quantization LP#3307

Tech review of the vllm quantization LP#3307
pareenaverma wants to merge 3 commits into
ArmDeveloperEcosystem:mainfrom
pareenaverma:content_review

pareenaverma commented May 19, 2026

Uh oh!

nikhil-arm May 19, 2026

Uh oh!

pareenaverma May 19, 2026

Uh oh!

nikhil-arm May 19, 2026 •

edited

Loading

Uh oh!

pareenaverma May 19, 2026

Uh oh!

nikhil-arm May 19, 2026 •

edited

Loading

Uh oh!

almayne May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pareenaverma commented May 19, 2026

Uh oh!

nikhil-arm May 19, 2026

Choose a reason for hiding this comment

Uh oh!

pareenaverma May 19, 2026

Choose a reason for hiding this comment

Uh oh!

nikhil-arm May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pareenaverma May 19, 2026

Choose a reason for hiding this comment

Uh oh!

nikhil-arm May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

almayne May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikhil-arm May 19, 2026 •

edited

Loading

nikhil-arm May 19, 2026 •

edited

Loading