Authors: Yusuke Nagasaka (Tokyo Institute of Technology)Akira Nukada (Tokyo Institute of Technology)Satoshi Matsuoka (Tokyo Institute of Technology)

Abstract: Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernel of preconditioner such as algebraic multigrid method or graph algorithms. The performance of SpGEMM is quite low because of its random memory access to both input and output matrices. Moreover, the pattern of non-zero elements of resulting matrix is not known beforehand, which makes it hard to manage the memory usage. There are several GPU implementations of fast SpGEMM computation while consuming large temporal memory. We devise new SpGEMM algorithm requiring small amount of memory so that we can compute larger matrices using limited device memory of GPU. Accesses to input matrices are optimized for coalesced memory access. We devise efficient hash table on shared memory to calculate output matrix with appropriate case analysis for better load-balancing. Our algorithm achieves speedups of up to x4.0 in single precision and x3.3 in double precision compared to existing fast SpGEMM libraries.

Poster: pdf

Two-page extended abstract: pdf

Poster Index