November 27th, 2024

AMD Releases ROCm Version 6.3

AMD released ROCm Version 6.3, enhancing AI and HPC workloads on Instinct GPUs with features like SGLang, FlashAttention-2, a Fortran Compiler, multi-node FFT support, and improved computer vision libraries.

Read original article

AMD has announced the release of ROCm Version 6.3, an open-source platform designed to enhance performance for AI, machine learning (ML), and high-performance computing (HPC) workloads on AMD Instinct GPU accelerators. This version introduces several key features aimed at improving developer productivity and application efficiency. Notably, SGLang is integrated for optimizing generative AI model inferencing, promising up to 6X higher throughput. The re-engineered FlashAttention-2 enhances transformer model training and inference, achieving up to 3X speedups. Additionally, the new AMD Fortran Compiler allows legacy Fortran applications to leverage GPU acceleration without extensive code rewrites. ROCm 6.3 also introduces multi-node Fast Fourier Transform (FFT) support for distributed computing, essential for industries like oil and gas and climate modeling. Enhancements to computer vision libraries, including AV1 codec support and GPU-accelerated JPEG decoding, further empower developers in media processing and AI applications. Overall, ROCm 6.3 continues AMD's commitment to providing cutting-edge tools that facilitate innovation and scalability in competitive sectors.

- ROCm 6.3 enhances AI and HPC workloads on AMD Instinct GPUs.

- Key features include SGLang for generative AI and optimized FlashAttention-2 for transformers.

- The AMD Fortran Compiler enables legacy code to utilize GPU acceleration.

- Multi-node FFT support improves distributed computing capabilities.

- Updates to computer vision libraries enhance media processing efficiency.

AMD ROCm 6.2 Release Appears Imminent for Advancing Open-Source GPU Compute

The upcoming AMD ROCm 6.2 release will enhance the open-source GPU compute stack with new features. AMD aims to expand support for consumer GPUs and package ROCm 6.2 for Fedora 41, including the AOMP compiler for OpenMP device offloading. Expectations include AI improvements, performance optimizations, and broader Radeon GPU support. Community engagement is encouraged.

Run Stable Diffusion 10x Faster on AMD GPUs

AMD GPUs now offer a competitive alternative to NVIDIA for AI image generation, achieving up to 10 times faster performance with Microsoft’s Olive tool, optimizing models for enhanced efficiency and accessibility.

FireAttention V3: Enabling AMD as a Viable Alternative for GPU Inference

FireAttention V3 enhances AMD's MI300 GPU for large language model inference, achieving significant performance improvements over NVIDIA's H100, but requires further optimizations for memory-intensive and compute-bound applications.

AMD ROCm Looks Like It Will Be Supporting OpenCL 3.0 Soon

AMD's ROCm compute stack is set to support OpenCL 3.0 soon, aligning with competitors like NVIDIA and Intel, following recent GitHub activity indicating progress and a completed upgrade ticket.

AMD Developing Next-Gen Fortran Compiler Based on Flang, Optimized for AMD GPUs

AMD is developing an open-source Fortran compiler based on LLVM's Flang, optimized for OpenMP offloading to AMD GPUs, enhancing support for ROCm and HIP frameworks, with a dedicated GitHub repository.

3 comments

By @ducviet00 - 5 months

AMD has great hardware, but their software is a different story. It’s poorly documented, unstable, and doesn’t deliver good performance for end users.

I’ve been working with the AMD MI300X for a few weeks, trying to get matrix multiplication running with tools like CK, Triton, or hipBLAS. However, the performance is only about 50% of the theoretical peak (FP16: 650 TFLOPS/s vs. 1300 TFLOPS/s in the whitepaper). Note that this is with matrices initialized to zero. When using random floats, performance drops by 20%—this is confirmed in AMD’s documentation.

Meanwhile, the H100, MI300X’s competitor, has a theoretical FP16 performance of 1000 TFLOPS, and I can achieve 800-900 TFLOPS with matrix multiplication using CUTLASS and random floats initialization.

AMD needs to improve their software quickly if they want to catch up with NVIDIA.

By @superkuh - 5 months

Anyone know how to find the list of AMD GPU/Accelerator hardware that ROCm 6.3 supports? Usually AMD drops an old line or two every time they update ROCm.

https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...

When looking at the latest support matrix it basically only supports these bleeding edge cards, "AMD Radeon RX 7900 XTX, AMD Radeon RX 7900 XT, AMD Radeon RX 7900 GRE, AMD Radeon PRO W7900, AMD Radeon PRO W7900DS, AMD Radeon PRO W7800".

Surely I'm misinterpreting this and that can't be all the cards they support with latest ROCm. Does anyone know a more complete list?

By @amstan - 5 months

Doesn't seem to be released yet. https://github.com/ROCm/llvm-project does not have a 6.3 tag.

Same for https://github.com/ROCm/rocm_smi_lib/releases

AMD ROCm 6.2 Release Appears Imminent for Advancing Open-Source GPU Compute

Run Stable Diffusion 10x Faster on AMD GPUs

FireAttention V3: Enabling AMD as a Viable Alternative for GPU Inference

AMD ROCm Looks Like It Will Be Supporting OpenCL 3.0 Soon

AMD's ROCm compute stack is set to support OpenCL 3.0 soon, aligning with competitors like NVIDIA and Intel, following recent GitHub activity indicating progress and a completed upgrade ticket.

AMD Releases ROCm Version 6.3

Related

AMD ROCm 6.2 Release Appears Imminent for Advancing Open-Source GPU Compute

Run Stable Diffusion 10x Faster on AMD GPUs

FireAttention V3: Enabling AMD as a Viable Alternative for GPU Inference

AMD ROCm Looks Like It Will Be Supporting OpenCL 3.0 Soon

AMD Developing Next-Gen Fortran Compiler Based on Flang, Optimized for AMD GPUs

Related

AMD ROCm 6.2 Release Appears Imminent for Advancing Open-Source GPU Compute

Run Stable Diffusion 10x Faster on AMD GPUs

FireAttention V3: Enabling AMD as a Viable Alternative for GPU Inference

AMD ROCm Looks Like It Will Be Supporting OpenCL 3.0 Soon

AMD Developing Next-Gen Fortran Compiler Based on Flang, Optimized for AMD GPUs