Day 2, Workshop Programming of Heterogeneous Systems in Physics
Day 2 of the conference had below talks. Prof. Dr. Bernd Brügmann gave a short introduction. He pointed out that Jena is number 10 in Physics in Germany, has ca. 100.000 inhabitants, and 20.000 students.
Dr. Karl Rupp, Vienna, Lessons Learned in Developing the Linear Algebra Library ViennaCL. Notes: C++ operator overloading normally uses temporary, special trickery necessary to circumvent this, ViennaCL not callable from Fortran due to C++/operator overloading, eigen.tuxfamily.org, Karl Rupp's slides, with CUDA 5+6 OpenCL and CUDA are more or less on par,
Prof. Dipl.-Ing. Dr. Gundolf Haase, Graz, Interpolation with Radial Basis Functions on GPGPUs using CUDA. Notes: AVL Graz, car industry, simulation software, OpenACC disappointing, significant speedup with GPU/CUDA, rule of thumb: start with OpenMP, then MIC, then OpenACC
Lars Kühne, Jena, A Concurrent Algorithm for Computing the Flow Complex.
Axel Hübl, Helmholtz-Zentrum Dresden-Rossendorf, Scaling Plasma Simulations to more than 18,000 GPUs.
Carsten Eye Frigaard, www.lab4241.com, Running GADGET2 on GPUs: Optimizing Tree-search Algorithms by Detailed Profiling of GPU Code. Notes: gpuprofgui, C-source level counters, PTX level counters, SASS level counters, BARRA, UNISIM
M. Sc. Moritz Kreutzer, Erlangen, Building blocks for sparse linear algebra on heterogeneous hardware. Notes: excellent speech, 45% comes from accelerator in Top50 supercomputers, vulnerability for hardware faults, fusing kernels, checkpoints, ESSEX programme/project, JDS, CRS, SSE, AVX, Sliced ELLPACK, computation done in permuted fashion
Dipl.-Phys. Marcus Noack, Oslo, Parallel and simultaneous computation of eikonal and transport equations by taking full advantage of GPU computer architecture. Notes: Oil, seismic, just a single CUDA kernel, used OpenMP
Dr. Manfred Liebmann, Graz, Optimal Control of the Schrödinger Equation on Many-Core Architectures. Notes: Crank-Nicholson much worse, Intel compiler not better than gcc/g++, 10+ PDEs per iteration, good initial approximation necessary, GPU two times faster, unitarity not a problem
Dipl.-Inf. Ralf Seidler, Jena, Implementing the Radon Transform using Advanced Techniques on GPGPUs. Notes: GTX750 consumer card, no problem with single precision
Prof. Dr. Gerhard Zumbusch, Jena, A parallel functional language for high performance finite difference stencil codes. Notes: Very interesting, excellent presentation, large gap between GFlop/s and memory speed, you have to fuse operations, Runge-Kutta discretization, you are measuring memory speed not computational speed
Mohammed Sourouri, Oslo, An Optimized Intra-Node Communication Scheme Using Multiple CUDA Streams and OpenMP Threads.
Carsten Eckert, Helmholtz-Zentrum Dresden-Rossendorf, An adaptive, load-balanced MPI/GPU-Code for calculating the gain in High Power Laser media. Notes: ArchLinux, 64 GPUs, all communication via MPI, 1 point = 1 kernel, Tesla K20M, Computational Radiation Physics, Monte-Carlo integration
Dr. Erik Rodner, Jena, Computational Challenges for Visual Recognition with Deep Learning Architectures.