HQA: Hybrid Q-learning and AODV multi-path routing algorithm for Flying Ad-hoc Networks
Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attenti...
Uloženo v:
| Vydáno v: | Vehicular Communications Ročník 55; s. 100947 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Inc
01.10.2025
|
| Témata: | |
| ISSN: | 2214-2096 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attention, face two significant challenges: 1) insufficient stability of Q-learning leads to unreliable route selection in certain scenarios and higher packet loss rates; 2) in void regions with frequent topology changes and vast path exploration spaces, the slow convergence of Q-learning fails to adapt quickly to dynamic environmental changes, thereby reducing the packet delivery rate (PDR). This paper proposes a hybrid Q-learning/AODV (HQA) multi-path routing algorithm that integrates Q-learning and the AODV protocols to address these challenges. HQA includes a Bayesian stability evaluator for adaptive Q-learning/AODV switching and a dual-update reward mechanism that integrates reliable AODV paths into Q-learning training, enabling rapid void recovery and latency-optimized routing. Experimental results demonstrate HQA's superiority over baseline protocols: Compared to AODV, HQA reduces average end-to-end delay by 13.6–23.9% and improves PDR by 5.4–9.1% in non-void and void states, respectively. It outperforms QMR by 2.2–6.3% in PDR while achieving 25.6% and 53.2% higher average PDR than QMR and AODV across network densities. The hybrid design accelerates convergence by 40% versus standalone Q-learning through AODV-assisted rewards, maintaining scalability under dynamic topology changes. These findings indicate that the HQA algorithm can more rapidly adapt to the rapid changes in FANETs and better handle void regions, offering a promising solution for enhancing the performance and reliability of FANETs. |
|---|---|
| ISSN: | 2214-2096 |
| DOI: | 10.1016/j.vehcom.2025.100947 |