The approximate discrete Radon transform on graphics processing units: A case study in auto-tuning of OpenCL implementations.



The Open Computing Language (OpenCL) is designed to provide a platform-independent specification for programming heterogenous computing systems. The performance of an OpenCL program, however, is not easily transferrable from one platform to another. Auto-tuning is among the techniques that address this situation by automating the performance optimization of OpenCL programs via systematically applying program transformations. We introduce a novel auto-tuning framework to generate OpenCL programs and report on a case study computing an approximate discrete Radon transform. Experiments on four different graphics processing units indicate that, for a wide range of problem sizes and input parameters, the execution times of the auto-tuned OpenCL programs are smaller than those of three hand-tuned CUDA implementations.