Skip to content

feat(ai): add ROCm and MIGraphX execution providers for AMD GPUs#335

Draft
Benehiko wants to merge 1 commit into
deven96:mainfrom
Benehiko:feat/rocm-execution-provider
Draft

feat(ai): add ROCm and MIGraphX execution providers for AMD GPUs#335
Benehiko wants to merge 1 commit into
deven96:mainfrom
Benehiko:feat/rocm-execution-provider

Conversation

@Benehiko
Copy link
Copy Markdown

@Benehiko Benehiko commented May 18, 2026

Note: This PR was generated with Claude. The patch has been compile-checked, unit-tested, regenerated for Go/Node/Python SDKs, and smoke-tested end-to-end via ahnlich-cli — but ROCm/MIGraphX on AMD hardware has not been validated because the current upstream Dockerfile ships a CUDA-flavour ONNX Runtime bundle. Treat the runtime side of this PR as enabling the wire-up; an AMD-hardware validation is a follow-up that depends on packaging a ROCm-enabled or MIGraphX-enabled ORT image.

Summary

Adds two opt-in execution providers so ahnlich-ai can target AMD GPUs:

  • ROCM — ONNX Runtime's ROCm execution provider (works on onnxruntime < 1.23)
  • MIGRAPHX — AMD's recommended replacement after onnxruntime removed the ROCm provider in 1.23 (ROCm EP removal note)

Both wire through the same path the existing CUDA, TENSOR_RT, DIRECT_ML, and CORE_ML providers use — no new dependencies, no behaviour change for existing providers.

The ort crate (2.0.0-rc.5, already pinned in ahnlich/ai/Cargo.toml) exposes both as ROCmExecutionProvider and MIGraphXExecutionProvider.

Why two providers

ahnlich currently pins ort to 2.0.0-rc.5 (against ONNX Runtime 1.19), where the ROCm execution provider still ships. AMD removed the ROCm provider from onnxruntime in 1.23 and recommends MIGraphX for new builds. To stay useful both today and after an ORT bump, this PR adds both variants. Maintainers can keep both or drop ROCM once ahnlich bumps the ORT pin past 1.23.

Changes

Protocol + Rust core

  • protos/ai/execution_provider.proto — add ROCM = 4 and MIGRAPHX = 5 with doc comments that flow through to every language binding via the generators
  • protos/README.md — mention ROCM and MIGRAPHX alongside CUDA / TENSOR_RT
  • ahnlich/types/src/ai/execution_provider.rs — regenerated Rust enum (mirrors what build.rs produces from the updated .proto)
  • ahnlich/ai/src/engine/ai/providers/ort/mod.rs
    • import ROCmExecutionProvider and MIGraphXExecutionProvider from ort
    • add InnerAIExecutionProvider::ROCm and InnerAIExecutionProvider::MIGraphX
    • extend the From<AIExecutionProvider> impl for both variants
    • extend register_provider to call ROCmExecutionProvider::default().register(...) and MIGraphXExecutionProvider::default().register(...)

DSL grammar + parser + tests

  • ahnlich/dsl/src/syntax/syntax.pest — extend the execution_provider rule so the pest tokeniser actually emits rocm / migraphx. Without this the DSL rejects the new keywords at the parser layer before parse_to_execution_provider ever sees them.
  • ahnlich/dsl/src/ai.rs — accept "rocm" and "migraphx" in parse_to_execution_provider
  • ahnlich/dsl/src/tests/ai.rs — add test_get_sim_n_parse_rocm_execution_provider and test_get_sim_n_parse_migraphx_execution_provider, mirroring the existing TensorRT / CUDA round-trips

SDK regeneration

  • sdk/ahnlich-client-go/grpc/ai/execution_provider/execution_provider.pb.go — regenerated via buf generate
  • sdk/ahnlich-client-node/grpc/ai/execution_provider_pb.ts — regenerated via buf generate
  • sdk/ahnlich-client-py/ahnlich_client_py/grpc/ai/execution_provider/__init__.pyROCM + MIGRAPHX variants added in the shape betterproto emits. A full make grpc-update-python run in the maintainer environment is recommended so any incidental codegen drift (formatter / generator version) is captured properly.

Docs

  • README.md — document ROCm and MIGraphX prerequisites under "Execution Providers", including the note that upstream ORT removed the ROCm provider in 1.23

Verification done locally

  • cargo check -p ahnlich_types -p dsl — passes
  • cargo test -p dsl30 / 30 passing (the two new tests round-trip "rocm" and "migraphx" end-to-end through parse_ai_query)
  • cargo check -p ai — passes against the actual libonnxruntime.so 1.19.0 shipped in the official ahnlich-ai image (used as ORT_LIB_LOCATION), confirming the patched ort/mod.rs compiles cleanly
  • End-to-end smoke test: built the patched binary into a local Docker image, ran it under a container runtime, and verified via ahnlich-cli:
    • executionprovider rocm and executionprovider migraphx parse successfully (server responds with the expected store-lookup error against an empty store)
    • executionprovider bogus is rejected at the parser layer (the negative test that surfaced the missing syntax.pest rule in the first place)
  • Proto regen: confirmed build.rs regenerates ahnlich/types/src/ai/execution_provider.rs from the updated .proto to the same Rust shape this PR ships; confirmed buf generate reproduces the committed Go and Node stubs

What is intentionally not included

  • Dockerfile / image work. The existing Dockerfile builds against onnxruntime-linux-x64-gpu-1.19.0.tgz, which is the CUDA flavour. To actually exercise ROCm or MIGraphX at runtime, ahnlich-ai needs an image built against a ROCm-enabled or MIGraphX-enabled ORT (either built from source with --use_rocm / --use_migraphx, or pulled from AMD's rocm-onnxruntime package). That is a packaging concern that deserves its own PR + maintainer input — probably Dockerfile.rocm and/or Dockerfile.migraphx plus a CI matrix entry.
  • AMD-hardware validation. Without a ROCm/MIGraphX-enabled ORT image the new register_provider arms can only be exercised through the DSL → gRPC layer (which is done above), not through actual model inference.
  • A full make grpc-update-python run. The Python __init__.py change is a minimal manual mirror of what betterproto would emit. A maintainer-side poetry run generate_from_protos is recommended to catch any formatter drift.

Manual checklist for whoever picks this up

  • Run make grpc-update-python from a clean environment to confirm the Python stub matches the patch
  • Decide whether to keep both ROCM and MIGRAPHX or drop ROCM after the next ORT bump past 1.23
  • Add Dockerfile.rocm (or Dockerfile.migraphx) bundling a ROCm-enabled / MIGraphX-enabled ORT, plus a CI matrix entry
  • On an AMD host with the appropriate runtime installed: confirm a model loads through each new provider
  • Confirm CPU fallback still works when the AMD runtime is absent (InnerAIExecutionProvider::CPU path is unchanged, so this should be free, but worth a sanity run)

References

@Iamdavidonuh
Copy link
Copy Markdown
Collaborator

This looks interesting.

CC: @Ayobami-00

@Benehiko Benehiko force-pushed the feat/rocm-execution-provider branch 2 times, most recently from 750bffe to 27ce036 Compare May 19, 2026 18:28
@Benehiko Benehiko changed the title feat(ai): add ROCm execution provider for AMD GPUs feat(ai): add ROCm and MIGraphX execution providers for AMD GPUs May 19, 2026
@Benehiko Benehiko force-pushed the feat/rocm-execution-provider branch 2 times, most recently from 517c297 to 5cb85ec Compare May 19, 2026 19:43
Wires ort's ROCmExecutionProvider and MIGraphXExecutionProvider through
the same path as the existing CUDA / TensorRT / DirectML / CoreML
providers so ahnlich-ai can target AMD GPUs on Linux when the host has a
matching ROCm or MIGraphX runtime and an ORT build that supports either
provider.

Two variants are included because upstream onnxruntime removed the ROCm
execution provider in release 1.23 and recommends MIGraphX as its
replacement. ahnlich currently pins ort to 2.0.0-rc.5 (against ORT 1.19,
which still ships ROCm), so both variants stay useful until the ORT pin
moves past 1.23.

- protos/ai/execution_provider.proto: add ROCM = 4 and MIGRAPHX = 5
- ahnlich/types/src/ai/execution_provider.rs: regenerated Rust enum
- ahnlich/ai/src/engine/ai/providers/ort/mod.rs: register
  ROCmExecutionProvider and MIGraphXExecutionProvider via
  InnerAIExecutionProvider::ROCm and ::MIGraphX
- ahnlich/dsl/src/ai.rs: accept "rocm" and "migraphx" in
  parse_to_execution_provider
- ahnlich/dsl/src/syntax/syntax.pest: extend the execution_provider rule
  to tokenise "rocm" and "migraphx" (without this the DSL rejects the
  new keywords at the parser layer before parse_to_execution_provider
  runs)
- ahnlich/dsl/src/tests/ai.rs: add round-trip tests for "rocm" and
  "migraphx" through parse_ai_query (mirrors the existing TensorRT and
  CUDA tests)
- sdk/ahnlich-client-go/grpc/ai/execution_provider/execution_provider.pb.go:
  regenerate via `buf generate`
- sdk/ahnlich-client-node/grpc/ai/execution_provider_pb.ts:
  regenerate via `buf generate`
- sdk/ahnlich-client-py/ahnlich_client_py/grpc/ai/execution_provider/__init__.py:
  add ROCM and MIGRAPHX variants (matches betterproto's emit; full
  `make grpc-update-python` regen left to the maintainer's environment)
- README.md / protos/README.md: document ROCm and MIGraphX prerequisites
  and the ORT 1.23 ROCm removal

Generated as a reference patch with Claude — not validated against AMD
hardware. Verified locally with:

- `cargo test -p dsl` (30 / 30 passing, +2 new tests)
- `cargo check -p ai` against the actual libonnxruntime.so 1.19.0
  bundled in the official ahnlich-ai image
- end-to-end DSL smoke test via ahnlich-cli: the patched binary boots,
  accepts `executionprovider rocm` and `executionprovider migraphx`
  through the gRPC layer, and rejects unknown tokens at the parser
- proto regen confirmed reproducible (build.rs round-trips
  execution_provider.rs cleanly, `buf generate` produces the same
  Go/Node stubs committed here)
@Benehiko Benehiko force-pushed the feat/rocm-execution-provider branch from 5cb85ec to e63d12e Compare May 19, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants