Efficient fuel-optimal multi-impulse orbital transfer via contrastive pre-trained reinforcement learning
Multi-impulse transfers between noncoplanar orbits are significant for on-orbit service spacecraft. This paper investigates the complex optimization problem of multi-impulse orbital transfer involving a chaser and a target. The chaser is subject to constraints on impulse magnitude and time, while th...
Uloženo v:
| Vydáno v: | Advances in space research Ročník 75; číslo 10; s. 7377 - 7396 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
15.05.2025
|
| Témata: | |
| ISSN: | 0273-1177 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Multi-impulse transfers between noncoplanar orbits are significant for on-orbit service spacecraft. This paper investigates the complex optimization problem of multi-impulse orbital transfer involving a chaser and a target. The chaser is subject to constraints on impulse magnitude and time, while the target may experience uncertain disturbances, causing it to deviate from the nominal orbit. The complexity of this problem imposes a significant computational burden on numerical methods, making it challenging for spacecraft to autonomously plan trajectory transfers in real time. To mitigate this burden, we propose a robust, fast, and autonomous algorithm for the optimization challenge, which can rapid plan transfer trajectories. Even if the terminal conditions suddenly change, our algorithm can quickly adjust the trajectory based on observed states without the need to completely re-plan. The algorithm comprises an intelligent trajectory generator and a Lambert transfer algorithm. The intelligent generator is based on a reinforcement learning (RL) method called contrastive-pre-trained Reinforcement Learning (CPRL), which emulates human learning habits to avoid the temporal credit assignment with long time horizons and sparse rewards during the training phase. When the chaser reaches an admissible range, determined by the impulse constraints and geometric relations of the conic curve, the algorithm adopts the Lambert transfer to complete the mission. Compared to traditional genetic and particle swarm algorithms, our method achieves a significant improvement in computational speed. Even with deviations, the average mission success rate remains at 96.8%. Numerical simulations confirm that our algorithm processes data quickly, can be deployed online, and is capable of handling various tasks in real time. |
|---|---|
| ISSN: | 0273-1177 |
| DOI: | 10.1016/j.asr.2025.02.049 |