HQA: Hybrid Q-learning and AODV multi-path routing algorithm for Flying Ad-hoc Networks

Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attenti...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Vehicular Communications Ročník 55; s. 100947
Hlavní autoři: Sun, Chen, Hou, Liang, Yu, Suqi, Shu, Jian
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.10.2025
Témata:
ISSN:2214-2096
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attention, face two significant challenges: 1) insufficient stability of Q-learning leads to unreliable route selection in certain scenarios and higher packet loss rates; 2) in void regions with frequent topology changes and vast path exploration spaces, the slow convergence of Q-learning fails to adapt quickly to dynamic environmental changes, thereby reducing the packet delivery rate (PDR). This paper proposes a hybrid Q-learning/AODV (HQA) multi-path routing algorithm that integrates Q-learning and the AODV protocols to address these challenges. HQA includes a Bayesian stability evaluator for adaptive Q-learning/AODV switching and a dual-update reward mechanism that integrates reliable AODV paths into Q-learning training, enabling rapid void recovery and latency-optimized routing. Experimental results demonstrate HQA's superiority over baseline protocols: Compared to AODV, HQA reduces average end-to-end delay by 13.6–23.9% and improves PDR by 5.4–9.1% in non-void and void states, respectively. It outperforms QMR by 2.2–6.3% in PDR while achieving 25.6% and 53.2% higher average PDR than QMR and AODV across network densities. The hybrid design accelerates convergence by 40% versus standalone Q-learning through AODV-assisted rewards, maintaining scalability under dynamic topology changes. These findings indicate that the HQA algorithm can more rapidly adapt to the rapid changes in FANETs and better handle void regions, offering a promising solution for enhancing the performance and reliability of FANETs.
ISSN:2214-2096
DOI:10.1016/j.vehcom.2025.100947