SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images

The rapid development of unmanned aerial vehicle (UAV) technology has contributed to the increasing sophistication of UAV-based object-detection systems, which are now extensively utilized in civilian and military sectors. However, object detection from UAV images has numerous challenges, including...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Remote sensing (Basel, Switzerland) Ročník 16; číslo 16; s. 3057
Hlavní autoři:	Li, Yangang, Li, Qi, Pan, Jie, Zhou, Ying, Zhu, Hongliang, Wei, Hongwei, Liu, Chong
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Basel MDPI AG 20.08.2024
Témata:	Accuracy Algorithms data collection Data integration Deep learning Drones Feature extraction feature fusion Feature maps Frequency dependence Ground stations Information processing Military technology Multisensor fusion neck object detection Object recognition Parameters Receptive field Semantics small objects Spatial data UAV Unmanned aerial vehicles
ISSN:	2072-4292, 2072-4292
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The rapid development of unmanned aerial vehicle (UAV) technology has contributed to the increasing sophistication of UAV-based object-detection systems, which are now extensively utilized in civilian and military sectors. However, object detection from UAV images has numerous challenges, including significant variations in the object size, changing spatial configurations, and cluttered backgrounds with multiple interfering elements. To address these challenges, we propose SOD-YOLO, an innovative model based on the YOLOv8 model, to detect small objects in UAV images. The model integrates the receptive field convolutional block attention module (RFCBAM) in the backbone network to perform downsampling, improving feature extraction efficiency and mitigating the spatial information sparsity caused by downsampling. Additionally, we developed a novel neck architecture called the balanced spatial and semantic information fusion pyramid network (BSSI-FPN) designed for multi-scale feature fusion. The BSSI-FPN effectively balances spatial and semantic information across feature maps using three primary strategies: fully utilizing large-scale features, increasing the frequency of multi-scale feature fusion, and implementing dynamic upsampling. The experimental results on the VisDrone2019 dataset demonstrate that SOD-YOLO-s improves the mAP50 indicator by 3% compared to YOLOv8s while reducing the number of parameters and computational complexity by 84.2% and 30%, respectively. Compared to YOLOv8l, SOD-YOLO-l improves the mAP50 indicator by 7.7% and reduces the number of parameters by 59.6%. Compared to other existing methods, SODA-YOLO-l achieves the highest detection accuracy, demonstrating the superiority of the proposed method.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs16163057