Enhancing neural combinatorial optimization by progressive training paradigm

Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforceme...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) Vol. 659; p. 131707
Main Authors:	Cao, Zhi, Wu, Yaoxin, Hou, Yaqing, Ge, Hongwei
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01.01.2026
Subjects:	Attention model Neural combinatorial optimization Reinforcement learning Vehicle routing problem Attention model Neural combinatorial optimization Vehicle routing problem Reinforcement learning
ISSN:	0925-2312
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforcement learning (RL) or supervised learning (SL). The one-stage training inevitably entails the computation-intensive labeling (i.e., solving optimal solutions) in SL or less-informative sparse rewards in RL. In this work, we propose a progressive training paradigm that pre-trains a neural network on small-scale instances using SL and then fine-tunes it using RL. In the former stage, the optimal solutions as labels effectively guide the neural network training, thereby bypassing the sparse reward issue. In the latter, the neural network is trained using RL to solve large-scale problems, avoiding the labels of optimal solutions that are hard to obtain. Moreover, we propose a decomposition-based approach that enables RL training with larger problem scales, alleviating the issue of insufficient memory induced by the heavy neural network. The proposed paradigm advances existing NCO models to obtain near-optimal solutions for the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) with up to 10,000 nodes. Additionally, it enhances the generalization performance across instances of different sizes and distributions, as well as real-world TSPLib and CVRPLib instances.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2025.131707