Utilising a two-dimensional action space as the basis for the PPO algorithm, job shops can jointly schedule AGVs and machinery

Saved in:
Bibliographic Details
Title: Utilising a two-dimensional action space as the basis for the PPO algorithm, job shops can jointly schedule AGVs and machinery
Authors: Zhenzhen Sun, Hailan Tian, Hongmeng Wang, Shaohua Yan, Tao Han, Haoyu Rong
Publisher Information: Springer Science and Business Media LLC, 2025.
Publication Year: 2025
Description: For the joint scheduling problem of AGVs and machines in job shops, traditional scheduling methods are often constrained and difficult to adapt to complex production environments and demand changes. For this reason, this paper proposes a PPO algorithm based on two-dimensional action space to solve the joint scheduling problem of AGVs and machines in the job shop. First, the computation of the loss function is modified for the characteristics of the two-dimensional action space to ensure a stable learning performance of the policy gradient. Then, the PPO-Clip version of the objective function is used for policy updating. The experimental results of randomized arithmetic cases show that the PPO algorithm based on two-dimensional action space has better training effect and convergence performance on the joint scheduling problem of AGVs and machines in the job shop. Compared with traditional methods, the algorithm has a large improvement in the optimal solution for most instances. In scalability tests in large-scale scenarios, the 2D-PPO-based algorithm exhibits high solution efficiency and robustness, especially in maintaining consistency with the solution quality of the optimal scheduling rule.
Document Type: Article
DOI: 10.21203/rs.3.rs-7413549/v1
Rights: CC BY
Accession Number: edsair.doi...........f3bb3a1f32d0eb6c22db621e4b2d5918
Database: OpenAIRE
Description
Abstract:For the joint scheduling problem of AGVs and machines in job shops, traditional scheduling methods are often constrained and difficult to adapt to complex production environments and demand changes. For this reason, this paper proposes a PPO algorithm based on two-dimensional action space to solve the joint scheduling problem of AGVs and machines in the job shop. First, the computation of the loss function is modified for the characteristics of the two-dimensional action space to ensure a stable learning performance of the policy gradient. Then, the PPO-Clip version of the objective function is used for policy updating. The experimental results of randomized arithmetic cases show that the PPO algorithm based on two-dimensional action space has better training effect and convergence performance on the joint scheduling problem of AGVs and machines in the job shop. Compared with traditional methods, the algorithm has a large improvement in the optimal solution for most instances. In scalability tests in large-scale scenarios, the 2D-PPO-based algorithm exhibits high solution efficiency and robustness, especially in maintaining consistency with the solution quality of the optimal scheduling rule.
DOI:10.21203/rs.3.rs-7413549/v1