Assessment of NVSHMEM for High Performance Computing

Chung-Hsing Hsu, Neena Imam


High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction.

In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes.

We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.


High Performance Computing (HPC); CUDA; OpenSHMEM; scalability

Full Text:



  • There are currently no refbacks.