SC16 Salt Lake City, UT

31. Cache-Oblivious Wavefront Algorithms for Dynamic Programming Problems: Efficient Scheduling with Optimal Cache Performance and High Parallelism

Authors: Jesmin Jahan Tithi (Stony Brook University)Pramod Ganapathi (Stony Brook University)Rezaul Chowdhury (Stony Brook University)Yuan Tang (Fudan University)

Abstract: Wavefront algorithms are algorithms on grids where execution proceeds in a wavefront manner from the start-to-the-end of the execution. Iterative-wavefront algorithms for evaluating dynamic programming (DP) recurrences exploit optimal-parallelism but show poor cache performance. Tiled-iterative-wavefront algorithms achieve optimal cache-complexity and high-parallelism but are cache-aware and not portable, neither cache-adaptive. In contrast, standard cache-oblivious recursive divide-and-conquer (CORDAC) algorithms have optimal serial-cache-complexity but often have low-parallelism due to artificial-dependencies among subtasks. The cache-oblivious wavefront algorithms for DP problems are variants of CORDAC algorithms with reduced artificial-dependencies and, hence, have better parallelism. We show how to transform a CORDAC algorithm to a recursive-wavefront algorithm to achieve optimal parallel-cache-complexity and high-parallelism under state-of-the-art schedulers for fork-join programs. These cache-oblivious wavefront algorithms use closed-form formulas to compute at what time each divide-and-conquer function must be launched in-order-to achieve high-parallelism without losing cache-performance. We present experimental performance and scalability results showing superiority of these new algorithms.

Poster: pdf
Two-page extended abstract: pdf

Poster Index