DRL-ED-TSPP: A Deep Reinforcement Learning Model With Encoder-Decoder for Solving the Traveling Salesman Problem With Profits
Gespeichert in:
| Titel: | DRL-ED-TSPP: A Deep Reinforcement Learning Model With Encoder-Decoder for Solving the Traveling Salesman Problem With Profits |
|---|---|
| Autoren: | Bin Hu, Liying Zhao |
| Quelle: | IEEE Access, Vol 13, Pp 111372-111391 (2025) |
| Verlagsinformationen: | Institute of Electrical and Electronics Engineers (IEEE), 2025. |
| Publikationsjahr: | 2025 |
| Schlagwörter: | TSPP, travel planning, deep learning, encoder-decoder structure, Electrical engineering. Electronics. Nuclear engineering, TK1-9971 |
| Beschreibung: | The rapid growth of smart cultural tourism necessitates intelligent path planning solutions that harmonize diverse stakeholder interests, including tourist satisfaction, attraction utilization, and guide efficiency. Traditional approaches, however, struggle to address the nonlinear optimization objectives inherent in balancing cultural value density against spatiotemporal costs. This paper proposes DRL-ED-TSPP, a deep reinforcement learning (DRL) model with an Encoder-Decoder architecture, to solve the Traveling Salesman Problem with Profits (TSPP) for sustainable cultural tourism planning. The model addresses the critical challenge of designing closed-loop tours that maximize cultural value per unit time while minimizing travel costs, a problem exacerbated by the NP-hard nature of TSPP and the inefficiencies of existing heuristic or metaheuristic methods. Our solution integrates a Transformer-based encoder with a Composite Feature Embedding (CFE) layer to fuse heterogeneous inputs—spatial coordinates, travel costs, and cultural value matrices—into a unified semantic space, enabling robust representation of nonlinear relationships. The encoder employs stacked self-attention layers to capture spatial-temporal dependencies and thematic continuity, while the decoder leverages a two-stage attention mechanism combining multi-head global context modeling and single-head cultural value prioritization. A dynamic masking strategy ensures Hamiltonian cycle constraints, and dual exploration-exploitation strategies (stochastic sampling for training, greedy search for inference) enhance solution diversity and decision efficiency. Evaluations on synthetic datasets (SyntheticGrid-20, RealCluster-50, MixedCity-100) and real-world cultural sites (Hangzhou West Lake, Xi’an Qin-Han-Tang Zone) demonstrate that DRL-ED-TSPP outperforms state-of-the-art baselines, achieving 4.6–7.0% improvement in cost-to-value ratio, 2.5–7.5% higher cultural value accumulation, and 9–15% faster inference times. |
| Publikationsart: | Article |
| ISSN: | 2169-3536 |
| DOI: | 10.1109/access.2025.3582025 |
| Zugangs-URL: | https://doaj.org/article/427a1d552ef84104b1dd29915747f528 |
| Rights: | CC BY |
| Dokumentencode: | edsair.doi.dedup.....ab8d54d7492f3c79548e19dcaf4b60b2 |
| Datenbank: | OpenAIRE |
| Abstract: | The rapid growth of smart cultural tourism necessitates intelligent path planning solutions that harmonize diverse stakeholder interests, including tourist satisfaction, attraction utilization, and guide efficiency. Traditional approaches, however, struggle to address the nonlinear optimization objectives inherent in balancing cultural value density against spatiotemporal costs. This paper proposes DRL-ED-TSPP, a deep reinforcement learning (DRL) model with an Encoder-Decoder architecture, to solve the Traveling Salesman Problem with Profits (TSPP) for sustainable cultural tourism planning. The model addresses the critical challenge of designing closed-loop tours that maximize cultural value per unit time while minimizing travel costs, a problem exacerbated by the NP-hard nature of TSPP and the inefficiencies of existing heuristic or metaheuristic methods. Our solution integrates a Transformer-based encoder with a Composite Feature Embedding (CFE) layer to fuse heterogeneous inputs—spatial coordinates, travel costs, and cultural value matrices—into a unified semantic space, enabling robust representation of nonlinear relationships. The encoder employs stacked self-attention layers to capture spatial-temporal dependencies and thematic continuity, while the decoder leverages a two-stage attention mechanism combining multi-head global context modeling and single-head cultural value prioritization. A dynamic masking strategy ensures Hamiltonian cycle constraints, and dual exploration-exploitation strategies (stochastic sampling for training, greedy search for inference) enhance solution diversity and decision efficiency. Evaluations on synthetic datasets (SyntheticGrid-20, RealCluster-50, MixedCity-100) and real-world cultural sites (Hangzhou West Lake, Xi’an Qin-Han-Tang Zone) demonstrate that DRL-ED-TSPP outperforms state-of-the-art baselines, achieving 4.6–7.0% improvement in cost-to-value ratio, 2.5–7.5% higher cultural value accumulation, and 9–15% faster inference times. |
|---|---|
| ISSN: | 21693536 |
| DOI: | 10.1109/access.2025.3582025 |
Full Text Finder
Nájsť tento článok vo Web of Science