Optimal Periods for Probing Convergence of Infinite-stage Dynamic Programmings on GPUs
In this paper, we propose a basic technique to minimize the computational time in executing the infinite-stage dynamic programming (DP) on a GPU. The infinite-stage DP involves computations to probe whether a value function gets sufficiently close to the optimal one. Such computations for probing convergence become obvious when an infinite-stage DP is executed on a GPU, since those computations are not necessary for finite-stage DPs, and hide behind loops for updating state values when a DP is executed on a CPU. The heart of the proposed technique is to suppress those computations for probing by thinning out them. By the proposed technique, differences between state values before and after being updated are periodically transferred to the main memory, then are checked to probe convergence. This intermittent probing makes contrast to ordinary methods in which computations for probing are processed every time. The technique also proposes a formulation to determine optimal periods for probing based on simple statistics given by preliminary experiments. The effectiveness of the proposed technique is examined on two problems; the one is a kind of the animat problem in which an agent moves around in a maze to collect foods, and the other is the mountain-car problem in which a pow- erless car on a slope struggles to pass over a higher peak. Computational results display that a method with the proposed technique decreases computational times for both problems compared to methods in which computations for probing convergence are processed every time, and the degree of decreasing seems remarkable when the state space is larger.
dynamic programming; value iteration; GPU
- There are currently no refbacks.