RL-driven Annealing Computation for QUBO on Multi-GPU System

Reo Gakumi; Ryota Yasudo

RL-driven Annealing Computation for QUBO on Multi-GPU System

Reo Gakumi, Ryota Yasudo

Abstract

Quadratic Unconstrained Binary Optimization (QUBO) has emerged as a unifying framework for diverse NP-hard combinatorial optimization problems. However, a major challenge in existing QUBO solvers is the need for extensive manual tuning of algorithmic hyperparameters, such as temperature schedules in simulated annealing, which can vary greatly in effectiveness depending on the problem instance. In this paper, we propose RL-driven annealing (RLA), a novel approach that integrates reinforcement learning (RL) with annealing-based local searches. Rather than relying on hand-crafted heuristics, RLA trains an agent to adaptively determine a bit-flip policy by observing the statistical properties of the energy differences in the objective function. Crucially, RLA encodes QUBO states as fixed-dimensional statistics, making the method scalable to various problem sizes. To further support large-scale problems, we employ distributed training on multi-GPU platforms using JAX's pmap, parallelizing both environment simulation and policy updates. Experimental evaluations on benchmark datasets, including TSP, QAP, Max-Cut, and randomly generated QUBO instances, demonstrate that RLA achieves solution qualities on par with or better than conventional annealing-based methods, while maintaining robust performance across diverse problem instances without extensive
hyperparameter tuning. These results highlight RLA as a promising step toward a flexible and practical solver for QUBO-based applications.

Full Text:

PDF

Refbacks

There are currently no refbacks.

International Journal of Networking and Computing