Best paper prize winners

EuroMPI 2016 Best paper: Towards millions of communicating threads
Hoang-Vu Dang, Marc Snir and William Gropp.

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can support efficiently millions of concurrently communicating light-weight threads using send-receive communication.


EuroMPI 2016 Best Paper Runner-Up: Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning
Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh and Dhabaleswar Panda

Emerging paradigms like Deep Learning require communication of GPU buffers of very large sizes. However, existing MPI runtimes have been optimized for communication of CPU buffers with relatively smaller messages. In this paper, we investigate the performance bottlenecks in existing MPI runtimes and propose new designs to optimize large message broadcast of GPU buffers by exploiting NCCL and CUDA-Aware MPI. We illustrate benefits of our design using micro-benchmarks and CNTK framework.


EuroMPI 2016 Best Paper Runner-Up: Generalisation of Recursive Doubling for AllReduce
Martin Ruefenacht, Mark Bull and Stephen Booth

The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log 2 N ) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling.


Edinburgh Images

Last updated: 18 Jul 2016 at 22:58