Skip to content

✨[Feature] Multi-Framework Runner Support in tools #4220

@peterkisfaludi

Description

@peterkisfaludi

Is your feature request related to a problem? Please describe.
The current tools/llm directory only supports text-generation models (LLMs and VLMs). The transformer architecture now underpins a much wider range of model families — diffusion models, speech models, encoder models, video generation — and users optimizing those models with Torch-TRT have no equivalent tooling. Each model type requires different I/O handling, evaluation metrics, and benchmarking methodology that can't be shoehorned into run_llm.py.

Describe the solution you'd like
Extend the tools/ directory with one entry point per task strategy, alongside the existing llm/ directory, each with its own run_.py entry point and benchmarking logic:

tools/
llm/ # existing: LLMs and VLMs
diffusion/ # new
audio/ # new
encoder/ # new
video/ # new
neural_operator/ #new

The model types to add:

Image

Each runner should expose the same --benchmark flag and report latency/throughput in
model-appropriate units (tokens/s for text, images/s for diffusion, real-time factor for
audio).

Describe alternatives you've considered

  • Extend run_llm.py with --model-type — a single script becomes unwieldy; diffusion models have fundamentally different I/O (image tensors, noise schedules, guidance scale) that doesn't map onto the LLM text-generation loop.
  • Leave each model type to users — this is the status quo and means there is no canonical Torch-TRT path for non-LLM workloads, even though TRT has published optimizations for all of these families (diffusion, BERT, Whisper).
  • Only add diffusion — diffusion models are the most visible gap, but ASR (Whisper) and encoder models (BERT) are equally well-established TensorRT use cases with existing NVIDIA blog coverage and deserve first-class support.

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions