A modified multi-agent proximal policy optimization algorithm for multi-objective dynamic partial-re-entrant hybrid flow shop scheduling problem

This paper extends a novel model for modern flexible manufacturing systems: the multi-objective dynamic partial-re-entrant hybrid flow shop scheduling problem (MDPR-HFSP). The model considers partial-re-entrant processing, dynamic disturbance events, green manufacturing demand, and machine workload....

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Engineering applications of artificial intelligence Ročník 140; s. 109688
Hlavní autori:	Wu, Jiawei, Liu, Yong
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Ltd 15.01.2025
Predmet:	Deep reinforcement learning Dynamic scheduling Multi-agent Multi-objective optimization Partial-re-entrant hybrid flow shop Proximal policy optimization Multi-agent Dynamic scheduling Deep reinforcement learning Proximal policy optimization Multi-objective optimization Partial-re-entrant hybrid flow shop
ISSN:	0952-1976
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This paper extends a novel model for modern flexible manufacturing systems: the multi-objective dynamic partial-re-entrant hybrid flow shop scheduling problem (MDPR-HFSP). The model considers partial-re-entrant processing, dynamic disturbance events, green manufacturing demand, and machine workload. Despite advancements in applying deep reinforcement learning to dynamic workshop scheduling, current methods face challenges in training scheduling policies for partial-re-entrant processing constraints and multiple manufacturing objectives. To solve the MDPR-HFSP, we propose a modified multi-agent proximal policy optimization (MMAPPO) algorithm, which employs a routing agent (RA) for machine assignment and a sequencing agent (SA) for job selection. Four machine assignment rules and four job selection rules are integrated to choose optimum actions for RA and SA at rescheduling points. In addition, reward signals are created by combining objective weight vectors with reward vectors, and training parameters under each weight vector are saved to flexibly optimize three objectives. Furthermore, we design an adaptive trust region clipping method to improve the constraint of the proximal policy optimization algorithm on the differences between new and old policies by introducing the Wasserstein distance. Moreover, we conduct comprehensive numerical experiments to compare the proposed MMAPPO algorithm with nine composite scheduling rules and the basic multi-agent proximal policy optimization algorithm. The results demonstrate that the proposed MMAPPO algorithm is more effective in solving the MDPR-HFSP and achieves superior convergence and diversity in solutions. Finally, a semiconductor wafer manufacturing case is resolved by the MMAPPO, and the scheduling solution meets the responsive requirement. •Consider partial-re-entrant flows, dynamic events, and multiple objectives in HFSP.•A novel multi-agent DRL scheme is developed for dynamic scheduling.•Adaptive trust region clipping is proposed to improve the constraint of policies.•Comprehensive experiments verify superiority in solutions quality and efficiency.
ISSN:	0952-1976
DOI:	10.1016/j.engappai.2024.109688