Parallel Multistep Evaluation With Efficient Data Utilization for Safe Neural Critic Control and Its Application to Orbital Maneuver Systems

Data-driven methods have significantly advanced optimal learning control, but some approaches overlook systematic considerations of data utilization, including safety, efficiency, and error accumulation. To address the neglects in safe neural critic control, this article introduces a parallel multis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems Jg. 36; H. 9; S. 17114 - 17127
Hauptverfasser:	Wang, Jiangyu, Wang, Ding, Ren, Jin, Liu, Derong, Qiao, Junfei
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	United States IEEE 01.09.2025
Schlagworte:	Adaptation models Adaptive dynamic programming (ADP) Aerospace electronics approximate errors Artificial neural networks Cost function Costs Data models data-driven Heuristic algorithms learning systems neural networks (NNs) Optimal control Q-learning safe optimal control Safety
ISSN:	2162-237X, 2162-2388, 2162-2388
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data-driven methods have significantly advanced optimal learning control, but some approaches overlook systematic considerations of data utilization, including safety, efficiency, and error accumulation. To address the neglects in safe neural critic control, this article introduces a parallel multistep evaluation mechanism that combines data from the system interaction with data generated by data-driven models. Based on this evaluation mechanism, we propose a novel parallel multistep Q-learning algorithm that enhances data utilization efficiency and mitigates the error accumulation. Furthermore, we formulate a novel control barrier function (CBF) to ensure safety during learning and control processes, which is capable of dealing with asymmetric constraints and adjusting the constraint strength. In addition, the analysis reveals that multistep information introduced by data-driven models influences the learning performance of actor-critic neural networks (NNs). Finally, parallel multistep Q-learning, which makes use of data in aspects of safety, efficiency, and error bounds, is validated within an orbital maneuver system.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2025.3570716