Changelog for MIGraphX

Full documentation for MIGraphX is available at https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/.

Develop

Added

Added auto_pad attribute support for the ONNX ConvTranspose operator, supporting SAME_UPPER, SAME_LOWER, and VALID padding modes for static shapes (#4638).
Added a dedicated logger for MIGraphX.
[Linux] Use HSA API to query number of chiplets for architectures where this is applicable (ex. gfx90a).
Added Eigen third party headers for ref GEMMs (#4631).
Added a fuse_horizontal pass which batches independent cross embedding gather instructions (#4599).
Added GPU JIT Resize kernel (#4553).
Added environment variable MIGRAPHX_SKIP_BENCHMARKING which when enabled, skips tuning of MIGraphX and rocMLIR kernels (#4628).
Added Cubic resize jit kernel (#4652).
Added JIT compiler for fill operation (#4666).
Added JIT compiler for multinomial operation (#4721).
Added build support for python 3.14 (#4754).
Added debug symbols for MIGraphX instructions such that parsed and compiled instructions can be tracked back to their ONNX origin node (#4626)

Changed

Converted nonzero operator from device implementation to JIT compilation (#4720).
Converted prefix_scan_sum operator from device implementation to JIT compilation (#4720).
Converted reverse operator from device implementation to JIT compilation (#4645).
Refactored instruction output alias to return a vector of aliases (#4540).
Changed parsing of ONNX ops like ConstantOfShape to insert undefined if expected shape has 0 elements (#4567).
Updated the ONNX clip operator to support opset 13 (#4518).
Updated argmin and argmax ops to be implemented as reduction ops, so they now have JIT support and can fuse (#4620).
Replaced usages of std::cout and std::cerr with the logger (#4732)
Converted RNN variable sequence length operations (rnn_var_sl_shift_sequence, rnn_var_sl_shift_output, rnn_var_sl_last_output) from device implementation to JIT compilation (#4755).

Resolved issues

Fixed a regression in simplify_algebra where find_conv_broadcast_input could trigger Dimensions do not match for padded broadcast-convolution rewrites in no-interior spatial cases (#4738).
Fixed a bug with operators pack_fp4, unpack_fp4, and the fuse_mlir pass handling non-standard input shapes (#4560).
Fixed an issue in propagate_precision pass where precision could be incorrectly propagated across type boundaries (e.g., from integral to floating-point) (#4603).
Fixed an issue with clip operator when using fp16 input type on opset 6 (#4518).
Fixed an issue with reshape_lazy's shape computation that was leading to invalid reshapes (#4594).
Fixed eliminate_pad pass bug that was removing nonzero pad instructions (#4600).
Fixed an issue with convert output overflowing when converting inf/-inf to integral types (#4669).
Fixed issue with find_concat_op matcher merging converted int32 inputs after bf16/fp16 quant during compilation (#4745)

Optimized

Replaced Hillis-Steele scan algorithm with a wave-based hierarchical scan, reducing work complexity from O(N log N) to O(N) and synchronization from O(log N) to 2 __syncthreads() calls (#4720).
Optimized fusion for local_window mode of GQA operator (#4617).
Removed extra assignments and inserts of op names in find_nop_reshapes(#4696).
Added a new pass to replace convolution with constant broadcast input with a reduced GEMM which improves model compilation time (#4621).
Implemented JIT compilation for logsoftmax by decomposing it into fusible operations (log, exp, reduce_max, reduce_sum), enabling kernel fusion. (#4630).
Improved find_attention to move evaluable constant inputs inside the operator, allowing rocMLIR to detect causal masks. (#4660)
Added early return for find_conv_dot_horiz_fusion matcher based on if operator output size is less than two (#4662).
Add matcher to simplify_algebra to find and replace pow(x, 2) with mul(x, x) (#4681)

Removed

Removed legacy device implementations for argmin and argmax in favor of the JIT implementations recently added (#4658).

MIGraphX 2.15 for ROCm 7.2.0

Added

Added MXFP4 support for Quark and Brevitas quantized models.
Added dynamic shape support for DepthToSpace Op.
Added bias and key_mask_padding inputs for the MultiHeadAttention operator.
Added GEMM+GEMM fusions.
Added the dim_params input parameter to the parse_onnx python call.
Created an API to query supported ONNX Operators get_onnx_operators().
Added right pad masking mode for Multihead Attention.
Added support for Flash Decoding.
Added Torch-MIGraphX installation instructions.
Added Operator Builders with supporting documentation.
Added index range check to the Gather operator.
Added log(exp(x)) → x and log(a/b) → log(a) - log(b) algebraic simplifications (#4630).

Changed

Updated the Resize operator to support linear mode for Dynamic shapes.
Switch to --input-dim instead of --batch to set any dynamic dimensions when using migraphx-driver.
Different stride sizes are now supported in ONNX if branches.
ONNX version change to 1.18.0 to support PyTorch 2.9.
Refactor GroupQueryAttention.
Enable PipelineRepoRef parameter in CI.
Hide LLVM symbols that come from ROCmlir and provide option for stripping in release mode.
Model compilation failures now produce an mxr file for debugging the failure.
Bump SQlite3 to 3.50.4.

Resolved issues

Fixed an issue in propagate_precision pass where precision could be incorrectly propagated across type category boundaries (e.g., from integral to floating-point types).
Quiet nrvo and noreturn warnings (#4429).
Fixed pointwise: Wrong number of arguments error when quantizing certain models to int8 (#4398).
TopK exception bugfix (#4329).
Updated SD3 example for change in optimum-onnx[onnxruntime] (#4344).
Fixed an issue with Torch-MIGraphX where the model compilation would fail (#4388)
Fixed an issue where a reduction was broadcast with different dimensions than the input (#4408).
Resolved a path name issue stopping some files being created on Windows for debugging (#4420).
Fix "reduce_sum: axes: value out of range" error in simplify_reshapes (#4443).
Updated README rbuild installation instructions to use python venv to avoid warning (#4405).
Ensured directories exist when generating files for debugging (#4383).
Resolved a compilation hang issue (#4428).

Optimized

Converted the LRN operator to an optimized pooling operator.
Streamlined the find_matches function.
Reduce the number of splits used for split_reduce.
Improve layout propagation in poinwise fusion when using broadcasted inputs.

Removed

MIGraphX 2.14 for ROCm 7.1.0

Added

Added Python 3.13 support.
Added PyTorch wheels to the Dockerfile.
Added Python API for returning serialized bytes.
Added fixed_pad operator for padding dynamic shapes to the maximum static shape.
Added matcher to upcast base Softmax operations.
Added support for the convolution_backwards operator through rocMLIR.
Added LSE output to attention fusion.
Added flags to EnableControlFlowGuard due to BinSkim errors.
Added new environment variable documentation and reorganized structure.
Added stash_type attribute for LayerNorm and expanded test coverage.
Added operator builders (phase 2).
Added MIGRAPHX_GPU_HIP_FLAGS to allow extra HIP compile flags.

Changed

Updated C API to include current() caller information in error reporting.
Updated documentation dependencies:
- rocm-docs-core bumped from 1.21.1 → 1.25.0 across releases.
- Doxygen updated to 1.14.0.
- urllib3 updated from 2.2.2 → 2.5.0.
Updated src/CMakeLists.txt to support msgpack 6.x (msgpack-cxx).
Updated model zoo test generator to fix test issues and add summary logging.
Updated rocMLIR and ONNXRuntime mainline references across commits.
Updated module sorting algorithm for improved reliability.
Restricted FP8 quantization to dot and convolution operators.
Moved ONNX Runtime launcher script into MIGraphX and updated build scripts.
Simplified ONNX Resize operator parser for correctness and maintainability.
Updated any_ptr assertion to avoid failure on default HIP stream.
Print kernel and module information on compile failure.

Resolved issues

Fixed error in MIGRAPHX_GPU_COMPILE_PARALLEL documentation (#4337).
Fixed rocMLIR rewrite_reduce issue (#4218).
Fixed bug with invert_permutation on GPU (#4194).
Fixed compile error when MIOPEN is disabled (missing std includes) (#4281).
Fixed ONNX Resize parsing when input and output shapes are identical (#4133, #4161).
Fixed issue with MHA in attention refactor (#4152).
Fixed synchronization issue from upstream ONNX Runtime (#4189).
Fixed spelling error in “Contiguous” (#4287).
Fixed tidy complaint about duplicate header (#4245).
Fixed reshape, transpose, and broadcast rewrites between pointwise and reduce operators (#3978).
Fixed extraneous include file in HIPRTC-based compilation (#4130).
Fixed CI Perl dependency issue for SLES builds (#4254).
Fixed compiler warnings for ROCm 7.0 of error: unknown warning option '-Wnrvo'(#4192).

Optimized

Reduced nested visits in reference operators to improve compile time.
Avoided dynamic memory allocation during kernel launches.
Removed redundant NOP instructions for GFX11/12 platforms.
Improved Graphviz output (node color and layout updates).
Optimized interdependency checking during compilation.
Skip hipBLASLt solutions requiring workspace size larger than 128 MB for efficient memory utilization.

Removed

Removed Perl dependency from SLES builds.
Removed redundant includes and unused internal dependencies.

MIGraphX 2.13 for ROCm 7.0.0

Added

Support for OCP FP8 on AMD Instinct MI350X accelerators.
Support for PyTorch 2.7 via Torch-MIGraphX.
Support for the Microsoft ONNX Contrib Operators (Self) Attention, RotaryEmbedding, QuickGelu, BiasAdd, BiasSplitGelu, SkipLayerNorm.
Support for Sigmoid and AddN TensorFlow operators.
Added GroupQuery Attention support for LLMs.
Added support for edge mode in the ONNX Pad operator.
Added ONNX runtime Python driver.
Added FLUX e2e example.
Added C++ and Python APIs to save arguments to a graph as a msgpack file, and then read the file back.
Added rocMLIR fusion for kv-cache attention.
Introduced a check for file-write errors.

Changed

quantize_bf16 for quantizing the model to BF16 has been made visible in the MIGraphX user API.
Print additional kernel/module information in the event of compile failure.
Use hipBLASLt instead of rocBLAS on newer GPUs.
1x1 convolutions are now rewritten to GEMMs.
BF16::max is now represented by its encoding rather than its expected value.
Direct warnings now go to cout rather cerr.
FP8 uses hipBLASLt rather than rocBLAS.
ONNX models are now topologically sorted when nodes are unordered.
Improved layout of Graphviz output.
Enhanced debugging for migraphx-driver: consumed environment variables are printed, timestamps and duration are added to the summary.
Add a trim size flag to the verify option for migraphx-driver.
Node names are printed to track parsing within the ONNX graph when using the MIGRAPHX_TRACE_ONNX_PARSER flag.
Update accuracy checker to output test data with the --show-test-data flag.
The MIGRAPHX_TRACE_BENCHMARKING option now allows the problem cache file to be updated after finding the best solution.

Removed

ROCM_USE_FLOAT8 macro.
The BF16 GEMM test was removed for Navi21, as it is unsupported by rocBLAS and hipBLASLt on that platform.

Optimized

Use common average in compile_ops to reduce run-to-run variations when tuning.
Improved the performance of the TopK operator.
Conform to a single layout (NHWC or NCHW) during compilation rather than combining two.
Slice Channels Conv Optimization (slice output fusion)
Horizontal fusion optimization after pointwise operations.
Reduced the number of literals used in GridSample linear sampler.
Fuse multiple outputs for pointwise operations.
Fuse reshapes on pointwise inputs for MLIR output fusion.
MUL operation not folded into the GEMM when the GEMM is used more than once.
Broadcast not fused after convolution or GEMM MLIR kernels.
Avoid reduction fusion when operator data-types mismatch.

Resolved issues

Compilation workaround ICE in clang 20 when using views::transform.
Fix bug with reshape_lazy in MLIR.
Quantizelinear fixed for Nearbyint operation.
Check for empty strings in ONNX node inputs for operations like Resize.
Parse Resize fix: only check keep_aspect_ratio_policy attribute for sizes input.
Nonmaxsuppression: fixed issue where identical boxes/scores not ordered correctly.
Fixed a bug where events were created on the wrong device in a multi-gpu scenario.
Fixed out of order keys in value for comparisons and hashes when caching best kernels.
Fixed Controlnet MUL types do not match error.
Fixed check for scales if ROI input is present in Resize operation.
Einsum: Fixed a crash on empty squeeze operations.

MIGraphX 2.12 for ROCm 6.4.0

Added

Support for gfx1200 and gfx1201
hipBLASLt support for contiguous transpose GEMM fusion and GEMM pointwise fusions for improved performance
Support for hardware specific FP8 datatypes (FP8 OCP and FP8 FNUZ)
Add support for the BF16 datatype
ONNX Operator Support for com.microsoft.MultiHeadAttention, com.microsoft.NhwcConv, and com.microsoft.MatMulIntgerFloat
migraphx-driver can now produce output for use with Netron
migraphx-driver now includes a time parameter (similar to perf) that is more accurate for very fast kernels
An end-to-end Stable Diffusion 3 example with option to disable T5 encoder on VRAM-limited GPUs has been added
Added support to track broadcast axes in shape_transform_descriptor
Added support for unsigned types with rocMLIR
Added a script to convert mxr files to ONNX models
Added the MIGRAPHX_SET_GEMM_PROVIDER environment variable to choose between rocBLAS and hipBLASLt. Set MIGRAPHX_SET_GEMM_PROVIDER to rocblas to use rocBLAS, or to hipblaslt to use hipBLASLt.

Changed

With the exception of gfx90a, switched to using hipBLASLt instead of rocBLAS
Included the min/max/median of the perf run as part of the summary report
Enable non-packed inputs for rocMLIR
Always output a packed type for q/dq after determining non-packed tensors were inefficient
Even if using NHWC, MIGraphX will always convert group convolutions to NCHW for best performance
Renamed the layout_nhwc to layout_convolution and ensured that either the weights are the same layout as the inputs or set the input and weights to NHWC
Minimum version of Cmake is now 3.27

Removed

Removed fp8e5m2fnuz rocBLAS support
__AMDGCN_WAVEFRONT_SIZE has been deprecated.
Removed a warning that printed to stdout when using FP8 types
Remove zero point parameter for dequantizelinear when its zero

Optimized

Prefill buffers when MLIR produces a multioutput buffer
Improved the resize operator performance which should improve overall performance of models that use it
Allow the reduce operator to be split across an axis to improve fusion performance. The MIGRAPHX_SPLIT_REDUCE_SIZE environment variable has been added to allow the minimum size of the reduction to be adjusted for a possible model specific performance improvement
Added MIGRAPHX_DISABLE_PASSES environment variable for debugging
Added MIGRAPHX_MLIR_DUMP environment variable to be set to a folder where individual final rocMLIR modules can be saved for investigation
Improved the C++ API to allow onnxruntime access to fp8 quantization

Resolved Issues

Fixed multistream execution with larger models (#3757)
Peephole LSTM Error (#3768)
Fixed BertSquad example that could include a broken tokenizers package (#3556)
Fixed Attention fusion ito not error with a shape mismatch when a trailing pointwise contains a literal (#3758)
Fixed instruction::replace() logic to handle more complex cases (#3574)
MatMulNBits could fail with a shape error (#3698)
Fixed a bug were some models could fail to compile with an error flatten: Shapes are not in standard layout (#3579)

MIGraphX 2.11 for ROCm 6.3.0

Added

Initial code to run on Windows
Support for gfx120x GPU
Support for FP8, and INT4
Support for the Log2 internal operator
Support for the GCC 14 compiler
The BitwiseAnd, Scan, SoftmaxCrossEntropyLoss, GridSample, and NegativeLogLikelihoodLoss ONNX operators
The MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimplifiedLayerNormalization Microsoft Contrib operators
Dynamic batch parameter support to OneHot operator
Split-K as an optional performance improvement
Scripts to validate ONNX models from the ONNX Model Zoo
GPU Pooling Kernel
--mlir flag to the migraphx-driver program to offload entire module to rocMLIR
Fusing split-reduce with MLIR
Multiple outputs for the MLIR + Pointwise fusions
Pointwise fusions with MLIR across reshape operations
MIGRAPHX_MLIR_DUMP environment variable to dump MLIR modules to MXRs
The 3 option to MIGRAPHX_TRACE_BENCHMARKING to print the MLIR program for improved debug output
MIGRAPHX_ENABLE_HIPBLASLT_GEMM environment variable to call hipBlasLt libraries
MIGRAPHX_VERIFY_DUMP_DIFF to improve the debugging of accuracy issues
reduce_any and reduce_all options to the Reduce operation via Torch MIGraphX
Examples for RNNT, and ControlNet

Changed

Switched to MLIR's 3D Convolution operator.
MLIR is now used for Attention operations by default on gfx942 and newer ASICs.
Names and locations for VRM specific libraries have changed.
Use random mode for benchmarking GEMMs and convolutions.
Python version is now printed with an actual version number.

Removed

Disabled requirements for MIOpen and rocBlas when running on Windows.
Removed inaccurate warning messages when using exhaustive-tune.
Remove the hard coded path in MIGRAPHX_CXX_COMPILER allowing the compiler to be installed in different locations.

Optimized

Improved:
- Infrastructure code to enable better Kernel fusions with all supported data types
- Subsequent model compile time by creating a cache for already performant kernels
- Use of Attention fusion with models
- Performance of the Softmax JIT kernel and of the Pooling operator
- Tuning operations through a new 50ms delay before running the next kernel
- Performance of several convolution based models through an optimized NHWC layout
- Performance for the FP8 datatype
- GPU utilization
- Verification tools
- Debug prints
- Documentation, including gpu-driver utility documentation
- Summary section of the migraphx-driver perf command
Reduced model compilation time
Reordered some compiler passes to allow for more fusions
Preloaded tiles into LDS to improve performance of pointwise transposes
Exposed the external_data_path property in onnx_options to set the path from onnxruntime

Resolved Issues

Fixed a bug with gfx1030 that overwrote dpp_reduce.
Fixed a bug in 1arg dynamic reshape that created a failure.
Fixed a bug with dot_broadcast and inner_broadcast that caused compile failures.
Fixed a bug where some configs were failing when using exhaustive-tune.
Fixed the ROCM Install Guide URL.
Fixed an issue while building a whl package due to an apostrophe.
Fixed the BERT Squad example requirements file to support different versions of Python.
Fixed a bug that stopped the Vicuna model from compiling.
Fixed failures with the verify option of migraphx-driver that would cause the application to exit early.

MIGraphX 2.10 for ROCm 6.2.0

Additions

Added support for ONNX Runtime MIGraphX EP on Windows
Added FP8 Python API
Added examples for SD 2.1 and SDXL
Improved Dynamic Batch to support BERT
Added a --test flag in migraphx-driver to validate the installation
Added support for ONNX Operator: Einsum
Added uint8 support in ONNX Operators
Added fusion for group convolutions
Added rocMLIR conv3d support
Added rocgdb to the Dockerfile

Optimizations

Improved ONNX Model Zoo coverage
Reorganized memcpys with ONNX Runtime to improve performance
Replaced scaler multibroadcast + unsqueeze with just a multibroadcast
Improved MLIR kernel selection for multibroadcasted GEMMs
Improved details of the perf report
Enable mlir by default for GEMMs with small K
Allow specifying dot or convolution fusion for mlir with environmental flag
Improve performance on small reductions by doing multiple reduction per wavefront
Add additional algebraic simplifications for mul-add-dot sequence of operations involving constants
Use MLIR attention kernels in more cases
Enables MIOpen and CK fusions for MI300 gfx arches
Support for QDQ quantization patterns from Brevitas which have explicit cast/convert nodes before and after QDQ pairs
Added Fusion of "contiguous + pointwise" and "layout + pointwise" operations which may result in performance gains in certain cases
Added Fusion for "pointwise + layout" and "pointwise + contiguous" operations which may result in performance gains when using NHWC layout
Added Fusion for "Pointwise + concat" operation which may help in performance in certain cases
Fixes a bug in "concat + pointwise" fusion where output shape memory layout wasn't maintained
Simplifies "slice + concat" pattern in SDXL UNet
eliminates ZeroPoint/Shift in QuantizeLinear or DeQuantizeLinear ops if zero points values are zeros
Improved inference performance by fusing Reduce to Broadcast
Added additional information when printing the perf report
Improve scalar fusions when not all strides are 0
Added support for multi outputs in pointwise ops
Improve reduction fusion with reshape operators
Use the quantized output when an operator is used again

Resolved issues

Super Resolution model verification failed with FP16
Suppressed confusing messages when compiling the model
Mod operator failed to compile with int8 and int32 inputs
Prevented spawning too many threads for constant propagation when parallel STL is not enabled
Fixed a bug when running migraphx-driver with the --run 1 option
Layernorm Accuracy fix: calculations in FP32
Update Docker generator script to ROCm 6.1 to point at Jammy
Floating Point exception fix for dim (-1) in reshape operator
Fixed issue with int8 accuracy and models which were failing due to requiring a fourth bias input
Fixed missing inputs not previously handled for quantized bias for the weights, and data values of the input matrix
Fixed order of operations for int8 quantization which were causing inaccuracies and slowdowns
Removed list initializer of prefix_scan_sum which was causing issues during compilation and resulting in the incorrect constructor to be used at compile
Fixed the MIGRAPHX_GPU_COMPILE_PARALLEL flag to enable users to control number of threads used for parallel compilation

Changes

Changed default location of libraries with release specific ABI changes
Reorganized documentation in GitHub

Removals

Removed the --model flag with migraphx-driver

MIGraphX 2.9 for ROCm 6.1.0

Additions

Added beta version of FP8, functional, not performant
Created a dockerfile with MIGraphX+ONNX Runtime EP+Torch
Added support for the Hardmax, DynamicQuantizeLinear, Qlinearconcat, Unique, QLinearAveragePool, QLinearSigmoid, QLinearLeakyRelu, QLinearMul, IsInf operators
Created web site examples for Whisper, Llama-2, and Stable Diffusion 2.1
Created examples of using the ONNX Runtime MIGraphX Execution Provider with the InceptionV3 and Resnet50 models
Updated operators to support ONNX Opset 19
Enable fuse_pointwise and fuse_reduce in the driver
Add support for dot-(mul)-softmax-dot offloads to MLIR
Added Blas auto-tuning for GEMMs
Added dynamic shape support for the multinomial operator
Added fp16 to accuracy checker
Added initial code for running on Windows OS

Optimizations

Improved the output of migraphx-driver command
Documentation now shows all environment variables
Updates needed for general stride support
Enabled Asymmetric Quantization
Added ScatterND unsupported reduction modes
Rewrote softmax for better performance
General improvement to how quantization is performed to support INT8
Used problem_cache for gemm tuning
Improved performance by always using rocMLIR for quantized convolution
Improved group convolutions by using rocMLIR
Improved accuracy of fp16 models
ScatterElements unsupported reduction
Added concat fusions
Improved INT8 support to include UINT8
Allow reshape ops between dq and quant_op
Improve dpp reductions on navi
Have the accuracy checker print the whole final buffer
Added support for handling dynamic Slice and ConstantOfShape ONNX operators
Add support for the dilations attribute to Pooling ops
Add layout attribute support for LSTM operator
Improved performance by removing contiguous for reshapes
Handle all slice input variations
Add scales attribute parse in upsample for older opset versions
Added support for uneven Split operations
Improved unit testing to run in python virtual environments

Resolved issues

Fixed outstanding issues in autogenerated documentation
Update model zoo paths for examples
Fixed promote_literals_test by using additional if condition
Fixed export API symbols from dynamic library
Fixed bug in pad operator from dimension reduction
Fixed using the LD to embed files and enable by default when building shared libraries on linux
fixed get_version()
Fixed Round operator inaccuracy
Fixed wrong size check when axes not present for slice
Set the .SO version correctly

Changes

Cleanup LSTM and RNN activation functions
Placed gemm_pointwise at a higher priority than layernorm_pointwise
Updated README to mention the need to include GPU_TARGETS when building MIGraphX

Removals

Removed unused device kernels from Gather and Pad operators
Removed int8x4 format

MIGraphX 2.8 for ROCm 6.0.0

Additions

Support for MI300 GPUs
Support for TorchMIGraphX via PyTorch
Boosted overall performance by integrating rocMLIR
INT8 support for ONNX Runtime
Support for ONNX version 1.14.1
Added new operators: Qlinearadd, QlinearGlobalAveragePool, Qlinearconv, Shrink, CastLike, and RandomUniform
Added an error message for when gpu_targets is not set during MIGraphX compilation
Added parameter to set tolerances with migraphx-driver verify
Added support for MXR files > 4 GB
Added MIGRAPHX_TRACE_MLIR flag
BETA added capability for using ROCm Composable Kernels via the MIGRAPHX_ENABLE_CK=1 environment variable

Optimizations

Improved performance support for INT8
Improved time precision while benchmarking candidate kernels from CK or MLIR
Removed contiguous from reshape parsing
Updated the ConstantOfShape operator to support Dynamic Batch
Simplified dynamic shapes-related operators to their static versions, where possible
Improved debugging tools for accuracy issues
Included a print warning about miopen_fusion while generating mxr
General reduction in system memory usage during model compilation
Created additional fusion opportunities during model compilation
Improved debugging for matchers
Improved general debug messages

Resolved issues

Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
Provided a compile option to improve the accuracy of some models by disabling Fast-Math
Improved layernorm + pointwise fusion matching to ignore argument order
Fixed accuracy issue with ROIAlign operator
Fixed computation logic for the Trilu operator
Fixed support for the DETR model

Changes

Changed MIGraphX version to 2.8
Extracted the test packages into a separate deb file when building MIGraphX from source

Removals

Removed building Python 2.7 bindings

MIGraphX 2.7 for ROCm 5.7.0

Additions

hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a different directory than build time
Added support for multi-target execution
Added Dynamic Batch support with C++/Python APIs
Added migraphx.create_argument to Python API
Added dockerfile example for Ubuntu 22.04
Added TensorFlow supported ops in driver similar to exist onnx operator list
Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
You can now use the fast_math flag instead of ENV for GELU
Print message from driver if offload copy is set for compiled program

Optimizations

Optimized for ONNX Runtime 1.14.0
Improved compile times by only building for the GPU on the system
Improved performance of pointwise/reduction kernels when using NHWC layouts
Loaded specific version of the migraphx_py library
Annotated functions with the block size so the compiler can do a better job of optimizing
Enabled reshape on nonstandard shapes
Used half HIP APIs to compute max and min
Added support for broadcasted scalars to unsqueeze operator
Improved multiplies with dot operator
Handled broadcasts across dot and concat
Added verify namespace for better symbol resolution

Resolved issues

Resolved accuracy issues with FP16 resnet50
Updated cpp generator to handle inf from float
Fixed assertion error during verify and made DCE work with tuples
Fixed convert operation for NaNs
Fixed shape typo in API test
Fixed compile warnings for shadowing variable names
Added missing specialization for the nullptr hash function

Changes

Bumped version of half library to 5.6.0
Bumped CI to support ROCm 5.6
Made building tests optional
Replaced np.bool with bool per NumPy request

Removals

Removed int8x4 rocBlas calls due to deprecation
Removed std::reduce usage because not all operating systems support it

MIGraphX 2.5 for ROCm 5.5.0

Additions

Y-Model feature will store tuning information with the optimized model
Added Python 3.10 bindings
Accuracy checker tool based on ONNX runtime
ONNX operators parse_split, and Trilu
Build support for ROCm MLIR
Added the migraphx-driver flag to print optimizations in Python (--python)
Added JIT implementation of the Gather and Pad operators, which results in better handling for larger tensor sizes

Optimizations

Improved performance of Transformer-based models
Improved performance of the Pad, Concat, Gather, and Pointwise operators
Improved ONNX/pb file loading speed
Added a general optimize pass that runs several passes, such as simplify_reshapes, algebra, and DCE in a loop

Resolved issues

Improved parsing for TensorFlow Protobuf files
Resolved various accuracy issues with some ONNX models
Resolved a gcc-12 issue with MIVisionX
Improved support for larger sized models and batches
Use --offload-arch instead of --cuda-gpu-arch for the HIP compiler
Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow
Changes inside JIT to temporarily use cosine to compute sine function

Changes

Changed version and location of third-party build dependencies in order to pick up fixes

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog for MIGraphX

Develop

Added

Changed

Resolved issues

Optimized

Removed

MIGraphX 2.15 for ROCm 7.2.0

Added

Changed

Resolved issues

Optimized

Removed

MIGraphX 2.14 for ROCm 7.1.0

Added

Changed

Resolved issues

Optimized

Removed

MIGraphX 2.13 for ROCm 7.0.0

Added

Changed

Removed

Optimized

Resolved issues

MIGraphX 2.12 for ROCm 6.4.0

Added

Changed

Removed

Optimized

Resolved Issues

MIGraphX 2.11 for ROCm 6.3.0

Added

Changed

Removed

Optimized

Resolved Issues

MIGraphX 2.10 for ROCm 6.2.0

Additions

Optimizations

Resolved issues

Changes

Removals

MIGraphX 2.9 for ROCm 6.1.0

Additions

Optimizations

Resolved issues

Changes

Removals

MIGraphX 2.8 for ROCm 6.0.0

Additions

Optimizations

Resolved issues

Changes

Removals

MIGraphX 2.7 for ROCm 5.7.0

Additions

Optimizations

Resolved issues

Changes

Removals

MIGraphX 2.5 for ROCm 5.5.0

Additions

Optimizations

Resolved issues

Changes