1st June 2013

Vasily Volkov (UC Berkeley): Unrolling parallel loops

Loop unrolling is not only good for sequential programming, it has similar dramatic effects in highly parallel codes as well, see Unrolling parallel loops (local copy), also see #pragma unroll in the NVidia CUDA programming guide.

Some bullet points of the presentation:

More resources consumed per thread

Note: each load costs 2 arithmetic instructions

Conclusion:

Dead link: On the homepage Vasily Volkov you find more information on CUDA optimizations.

C├ędric Augonnet, Samuel Thibault and Raymond Namyst call Vasily Volkov a "CUDA-hero" in How to get portable performance on accelerator-based platforms without the agonizing pain.

In a similar vein Dr. Mark Harris describes the beneficial effect of unrolling in parallel reduction.




Categories: CUDA, mathematics
Tags: , , ,
Author: Elmar Klausmeier