Today I would like to introduce you to AMD’s latest update for OpenCL, AMD APP SDK 2.5 and the Catalyst 11.7 drivers. Since the Catalyst 11.4 driver release back in April, AMD’s OpenCL runtime for Windows platforms has been integrated into the Catalyst drivers ensuring that end users have an easy path to benefit from important enhancements to performance and other features.
With this latest release we have added key performance enhancements for APUs that free applications from the CPU-to-GPU bandwidth limitation imposed by the PCIe bus, achieving effective data transfer rates as high as 15GB/s. See my other post "CPU-to-GPU data transfers exceed 15GB/s using APU Zero Copy path" for additional details.
On Windows platforms the run-time now includes broad multi-GPU support. Included in this is OpenCL support for APU plus discrete GPU providing compute performance scaling across the GPUs, and including support for PowerExpress.
The Khronos FP64 extension is enabled for the "Cypress", double precision capable, family of GPUs, and is planned to be enabled for all double precision capable GPUs as we go forward.
Further details on new features in this release are:
- Kernel launch times have been further reduced.
- The LLVM compiler version used for OpenCL kernels has been upgraded.
- Includes support for use of SSE3 and SSE4.
- Added support for partial use of FMA4 and XOP instructions.
- It is no longer necessary to use the -fno-alias compiler command line option.
- PCIe transfer overhead has been reduced under Linux.
- Transfers between CPUs and GPUs are improved for buffers declared with either the CL_MEM_USE_HOST_PTR or the CL_MEM_ALLOC_HOST_PTR flag.
- For APUs, zero copy buffers created as CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY offer improved GPU read performance.
- The runtime supports multi-GPU, including simultaneous use of the GPU on both and APU and a discrete GPU on systems running under Windows.
- OpenCL built-in functions leverage AVX on capable CPUs
- Support for PowerExpress 4.0.
- Support for atomic counters for discrete GPUs.
- Support for headless GPU operation.
- OpenCL can be used by a Windows service.
- UVD3 / MPEG-2 support.
- The clFFT library supports radix 3 and radix 5, including support for mixed radix 2/3/5.
- The BLAS library supports the D/S SYRK, D/S SYR2K, D/S GEMV, D/S SYMV functions.
- The Khronos FP64 extension is supported for the ATI Radeon™ HD 5900 and 5800 series, as well as the AMD FirePro™ V8800 and V8700 series.
- gDEBugger 6.0 extension is available for Visual Studio.
- Starting with Catalyst 11.8, improved runtime features appear regularly in the monthly Catalyst releases for Windows.
- Kernel Analyzer 1.9 supports Catalyst releases 11.4 to 11.7.
- APP Profiler provides
- Improved API trace.
- Improved timeline visualization
- Support for analyzing OpenCL Application trace.
- Thread ID and sequence number now are included in the profile output.