Leveraging large visual models for enhanced object detection: An improved SAM-YOLOv5 model
Although various object detection methods have been developed, the accuracy of existing algorithms remains insufficient, particularly for detecting small-size and long-distance objects. To address these challenges, we propose an improved object detection model, I-SAM-YOLOv5, which combines the stren...
Uloženo v:
| Vydáno v: | Knowledge-based systems Ročník 330; s. 114757 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
25.11.2025
|
| Témata: | |
| ISSN: | 0950-7051 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Although various object detection methods have been developed, the accuracy of existing algorithms remains insufficient, particularly for detecting small-size and long-distance objects. To address these challenges, we propose an improved object detection model, I-SAM-YOLOv5, which combines the strength of the large vision model (SAM) and YOLOv5. The framework incorporates a large visual feature fusion (LVFF) module, wherein powerful visual features of SAM are integrated into YOLOv5 to improve feature representation. Further, an enhanced fixed-resolution feature pyramid network (FRFPN) is employed to refine and strengthen feature extraction. The experimental results on the COCO and KITTI datasets demonstrate considerable improvements in detection accuracy across almost all model scales (n,s,m,l,andx). For the scale-n model, our model achieves a significant 8.47 % increase in mean average precision (mAP) on COCO and 5.48 % improvement on KITTI compared to the YOLOv5 baseline. To further assess the effectiveness of I-SAM-YOLOv5, we conduct ablation studies examining different LVFF variants, FRFPN designs, feature fusion positions, adapters and multi-layer perceptron (MLP) configurations. The results confirm the robust performance gains of our proposed framework. This study advances object detection and extends the application of large vision models to computer vision tasks such as intelligent transportation systems. |
|---|---|
| ISSN: | 0950-7051 |
| DOI: | 10.1016/j.knosys.2025.114757 |