Is your feature request related to a problem? Please describe.
The current tools/llm directory only supports text-generation models (LLMs and VLMs). The transformer architecture now underpins a much wider range of model families — diffusion models, speech models, encoder models, video generation — and users optimizing those models with Torch-TRT have no equivalent tooling. Each model type requires different I/O handling, evaluation metrics, and benchmarking methodology that can't be shoehorned into run_llm.py.
Describe the solution you'd like
Extend the tools/ directory with one entry point per task strategy, alongside the existing llm/ directory, each with its own run_.py entry point and benchmarking logic:
tools/
llm/ # existing: LLMs and VLMs
diffusion/ # new
audio/ # new
encoder/ # new
video/ # new
neural_operator/ #new
The model types to add:
Each runner should expose the same --benchmark flag and report latency/throughput in
model-appropriate units (tokens/s for text, images/s for diffusion, real-time factor for
audio).
Describe alternatives you've considered
- Extend run_llm.py with --model-type — a single script becomes unwieldy; diffusion models have fundamentally different I/O (image tensors, noise schedules, guidance scale) that doesn't map onto the LLM text-generation loop.
- Leave each model type to users — this is the status quo and means there is no canonical Torch-TRT path for non-LLM workloads, even though TRT has published optimizations for all of these families (diffusion, BERT, Whisper).
- Only add diffusion — diffusion models are the most visible gap, but ASR (Whisper) and encoder models (BERT) are equally well-established TensorRT use cases with existing NVIDIA blog coverage and deserve first-class support.
Additional context
Is your feature request related to a problem? Please describe.
The current tools/llm directory only supports text-generation models (LLMs and VLMs). The transformer architecture now underpins a much wider range of model families — diffusion models, speech models, encoder models, video generation — and users optimizing those models with Torch-TRT have no equivalent tooling. Each model type requires different I/O handling, evaluation metrics, and benchmarking methodology that can't be shoehorned into run_llm.py.
Describe the solution you'd like
Extend the tools/ directory with one entry point per task strategy, alongside the existing llm/ directory, each with its own run_.py entry point and benchmarking logic:
tools/
llm/ # existing: LLMs and VLMs
diffusion/ # new
audio/ # new
encoder/ # new
video/ # new
neural_operator/ #new
The model types to add:
Each runner should expose the same --benchmark flag and report latency/throughput in
model-appropriate units (tokens/s for text, images/s for diffusion, real-time factor for
audio).
Describe alternatives you've considered
Additional context