Dynamic Task Scheduling Scheme for a GPGPU Programming Framework

Kazuhiko Ohno, Rei Yamamoto, Hiroaki Tanaka


The computational power and the physical memory size of a single GPU device are often insufficient for large-scale problems. Using CUDA, the user must explicitly partition such problems into several tasks repeating the data transfers and kernel executions. To use multiple GPUs, explicit device switching is also needed. Furthermore, low-level hand optimizations such as load balancing and determining task granularity are required to achieve high performance. To handle large-scale problems without any additional user code, we introduce an implicit dynamic task scheduling scheme to our CUDA variation MESI-CUDA. MESI-CUDA is designed to abstract the low-level GPU features; virtual shared variables and logical thread mappings hide the complex memory hierarchy and physical characteristics. On the other hand, explicit parallel execution using kernel functions is the same as in CUDA. In our scheme, each kernel invocation in the user code is translated into a \textit{job} submission to the runtime scheduler. The scheduler partitions a job into tasks considering the device memory size and dynamically schedules them to the available GPU devices. Thus the user can simply specify kernel invocations independent of the execution environment. The evaluation result shows that our scheme can automatically utilize heterogeneous GPU devices with small overhead.


GPGPU; CUDA; parallel programming; compiler; optimization; scheduling

Full Text:



  • There are currently no refbacks.