Enhancing neural combinatorial optimization by progressive training paradigm
Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforceme...
Saved in:
| Published in: | Neurocomputing (Amsterdam) Vol. 659; p. 131707 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.01.2026
|
| Subjects: | |
| ISSN: | 0925-2312 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforcement learning (RL) or supervised learning (SL). The one-stage training inevitably entails the computation-intensive labeling (i.e., solving optimal solutions) in SL or less-informative sparse rewards in RL. In this work, we propose a progressive training paradigm that pre-trains a neural network on small-scale instances using SL and then fine-tunes it using RL. In the former stage, the optimal solutions as labels effectively guide the neural network training, thereby bypassing the sparse reward issue. In the latter, the neural network is trained using RL to solve large-scale problems, avoiding the labels of optimal solutions that are hard to obtain. Moreover, we propose a decomposition-based approach that enables RL training with larger problem scales, alleviating the issue of insufficient memory induced by the heavy neural network. The proposed paradigm advances existing NCO models to obtain near-optimal solutions for the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) with up to 10,000 nodes. Additionally, it enhances the generalization performance across instances of different sizes and distributions, as well as real-world TSPLib and CVRPLib instances. |
|---|---|
| ISSN: | 0925-2312 |
| DOI: | 10.1016/j.neucom.2025.131707 |