PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads
PyTorch 2.4 introduces support for Intel Data Center GPU Max Series, enhancing AI workloads with minimal code changes. Future updates in 2.5 will expand functionality and benchmarks, inviting community contributions.
Read original articlePyTorch 2.4 has introduced support for Intel Data Center GPU Max Series, enhancing AI workloads for both training and inference. This update allows users to maintain a consistent programming experience with minimal coding changes, facilitating the integration of various hardware backends. The support includes both eager and graph modes, with optimizations for performance-critical graphs and operators through the oneAPI Deep Neural Network Library (oneDNN) and oneAPI Math Kernel Library (oneMKL). Users can easily migrate from CUDA to Intel GPUs by simply changing the device name in their code. Future updates in PyTorch 2.5 are expected to include additional Aten operators, full support for Dynamo Torchbench and TIMM benchmarks, and enhanced profiling capabilities. The community is encouraged to evaluate these new features and contribute to the ongoing development of Intel GPU support in PyTorch.
- PyTorch 2.4 now supports Intel Data Center GPU Max Series for improved AI performance.
- Users can transition from CUDA to Intel GPUs with minimal code changes.
- Future updates in PyTorch 2.5 will enhance functionality and support for additional benchmarks.
- The integration aims to provide a seamless experience across different hardware platforms.
- Community contributions are welcomed to further develop Intel GPU support in PyTorch.
Related
TPU transformation: A look back at 10 years of our AI-specialized chips
Google has advanced its AI capabilities with Tensor Processing Units (TPUs), specialized chips for AI workloads, enhancing performance and efficiency, and making them available through Cloud services for external developers.
Geekbench AI 1.0
Geekbench AI 1.0 has been released as a benchmarking suite for AI workloads, offering three performance scores, accuracy measurements, and support for multiple frameworks across various platforms, with future updates planned.
dstack (K8s alternative) adds support for AMD accelerators on RunPod
dstack has introduced support for AMD accelerators on RunPod, enabling efficient AI container orchestration with MI300X GPUs, which offer higher VRAM and memory bandwidth, enhancing model deployment capabilities.
UntetherAI: Record-Breaking MLPerf Benchmarks
Untether AI has excelled in MLPerf® Inference v4.1 benchmarks, achieving top performance and energy efficiency with its At-Memory architecture and speedAI®240 Slim accelerator cards, alongside the imAIgine® SDK for deployment.
Run Stable Diffusion 10x Faster on AMD GPUs
AMD GPUs now offer a competitive alternative to NVIDIA for AI image generation, achieving up to 10 times faster performance with Microsoft’s Olive tool, optimizing models for enhanced efficiency and accessibility.
"Intel To Sunset First-Gen Max Series GPU To Focus On Gaudi, Falcon Shores Chips" (2024-05) https://www.crn.com/news/components-peripherals/2024/intel-s...
Related
TPU transformation: A look back at 10 years of our AI-specialized chips
Google has advanced its AI capabilities with Tensor Processing Units (TPUs), specialized chips for AI workloads, enhancing performance and efficiency, and making them available through Cloud services for external developers.
Geekbench AI 1.0
Geekbench AI 1.0 has been released as a benchmarking suite for AI workloads, offering three performance scores, accuracy measurements, and support for multiple frameworks across various platforms, with future updates planned.
dstack (K8s alternative) adds support for AMD accelerators on RunPod
dstack has introduced support for AMD accelerators on RunPod, enabling efficient AI container orchestration with MI300X GPUs, which offer higher VRAM and memory bandwidth, enhancing model deployment capabilities.
UntetherAI: Record-Breaking MLPerf Benchmarks
Untether AI has excelled in MLPerf® Inference v4.1 benchmarks, achieving top performance and energy efficiency with its At-Memory architecture and speedAI®240 Slim accelerator cards, alongside the imAIgine® SDK for deployment.
Run Stable Diffusion 10x Faster on AMD GPUs
AMD GPUs now offer a competitive alternative to NVIDIA for AI image generation, achieving up to 10 times faster performance with Microsoft’s Olive tool, optimizing models for enhanced efficiency and accessibility.