Skip to content

🐛 [Bug] Torch-TensorRT Slower than Onnx TensorRT in Alpamayo-10B #4239

@cehongwang

Description

@cehongwang

Benchmark Result

Per-clip minADE (m)

# Clip ONNX-TRT Torch-TRT Δ
1 0043b781… 0.1719 0.1721 +0.0002
2 00ee8960… 0.9539 0.9582 +0.0043
3 0145f6e0… 0.3925 0.3929 +0.0004
4 01460b45… 0.0876 0.0874 −0.0002
5 01cf8186… 0.8914 0.7777 −0.1137
6 021a4585… 0.0976 0.1003 +0.0027
7 0347d9f9… 1.5423 1.5467 +0.0044
8 03aa0b51… 1.3659 1.3488 −0.0171
9 0455a6e4… 0.2546 0.2537 −0.0009
10 047c0263… 0.5622 0.5603 −0.0019
11 049445b3… 0.4437 0.4453 +0.0016
Average 0.6149 0.6039 −0.0110

Stage timings (mean of clips 2–11, ms; clip 1 excluded as warm-up)

Stage ONNX-TRT Torch-TRT Δ Δ%
ViT 364.0 365.4 +1.4 +0.4%
LLM Prefill 890.7 875.0 −15.7 −1.8%
LLM Generation 146.5 188.7 +42.2 +28.8%
Diffusor 169.9 173.5 +3.6 +2.1%
E2E 1571.9 1602.6 +30.7 +2.0%

Engine sizes

Engine ONNX-TRT (MiB) Torch-TRT (MiB) Δ
LLM 14484 14495 +11
Visual 1106 1114 +8
Action 4357 4380 +23

Shared execution context (per-runner workspace, bytes)

Runner ONNX-TRT Torch-TRT Ratio
LLM 2,776,893,952 2,776,893,952 1.0×
Vision 4,074,504,192 4,074,505,216 1.0×
Action 265,029,632 2,025,095,168 ~7.6×

Peak shared exec context is bounded by the vision runner (~4.07 GB) in both cases, so peak GPU memory is unchanged. Only the action runner's reserved workspace balloons under Torch-TRT.

Verdict

  • Accuracy: equivalent on average (Torch-TRT actually 1 cm better on this 11-clip subset). Per-clip differences are sub-cm except for clip 5, where Torch-TRT happens to be 11 cm closer.
  • Throughput: Torch-TRT is ~2% slower end-to-end. The regression is concentrated in LLM decode (+29%); prefill is actually marginally faster.
  • Memory: action-runner workspace ~7.6× larger under Torch-TRT (peak GPU memory unchanged because vision dominates).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions