SC16 Salt Lake City, UT

90. Fault-Tolerant Scheduler for Shareable Virtualized GPU Resource

Authors: Daeyoun Kang (Korea Advanced Institute of Science and Technology)Tae Joon Jun (Korea Advanced Institute of Science and Technology)Daeyoung Kim (Korea Advanced Institute of Science and Technology)

Abstract: Recently container-based virtualization is variously used to maximize utilization of computer resources, along with the traditional Virtual Machine. However, different from traditional resources, a GPU is hard to be shared by multiple containers. Lately, a GPU can be shared by multiple containers using volume share feature. In addition, high-end GPUs like the NVIDIA K20 supports Hyper-Q which allows multiple CPU processes to access a single GPU. However, problems still exist because of GPU’s distinctive characteristics. Unlike system memory, GPU memory cannot be swappable. Also, GPU kernels in a single Streaming Microprocessor cannot be switched during its running. These restrictions make it hard to share a GPU by multiple containers and may result in a deadlock situation. In this poster, we propose an interface for new fault-tolerant scheduler considering GPU memory usage. We have implemented this interface to restrict the usage of GPU memory for each container to prevent deadlock situation.

Poster: pdf
Two-page extended abstract: pdf

Poster Index