Nvidia Hardly Interested in Optimal Performance of CUDA on x86 Processors

From X-bit Labs: Nvidia Corp. and Portland Group last week introduced a special compiler that can make software originally developed for Nvidia CUDA architecture to x86 and presumably vice versa. This allows software developers to ensure broad compatibility of their programs and offer certain advantages for highly-parallel GPU architecture. However, it will hardly make the life of software makers much easier, according to Alex Herrera, an analyst with Jon Peddie Research.

There are many reasons why different applications, including those in supercomputer space, are not remade for graphics processors, such as ATI Radeon or Nvidia GeForce, despite of bright prospects of higher performance. One of the main reasons is legacy code that continues to be used and that will hardly be dropped since it does already work. The compiler jointly developed by PGI and Nvidia will allow developers to test the CUDA-based software approach on x86 platforms and determine its reliability. Nonetheless, it may work stably enough, but performance of CUDA-based software on x86 will hardly be optimal, claims Mr. Herrera.

Just like GPU-based Nvidia's PhysX tools that rely on CUDA do not support SIMD extensions like SSE2, the new compiler may not support things like AVX found in AMD Bulldozer and Intel Sandy Bridge microprocessors. As a result, the application will not run with maximum possible performance on x86 platforms.

"CUDA on x86 is going to be slower than an application optimized to run on x86 without CUDA, probably a lot slower. So a developer running a CUDA application on x86 and then on Fermi is going to see a larger speed-up than he might otherwise have had had he first optimized on a conventional, non-CUDA x86 platform. Bigger speedup numbers serve Nvidia’s purposes of showcasing how much faster GPUs are than CPUs on many floating-point intensive applications," said the analyst.

In the end, designers of both special-purpose and commercial software for consumers will still have to implement different code-paths for different hardware, something that they already do.

View: Article @ Source Site