Leveraging large visual models for enhanced object detection: An improved SAM-YOLOv5 model
Although various object detection methods have been developed, the accuracy of existing algorithms remains insufficient, particularly for detecting small-size and long-distance objects. To address these challenges, we propose an improved object detection model, I-SAM-YOLOv5, which combines the stren...
Saved in:
| Published in: | Knowledge-based systems Vol. 330; p. 114757 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
25.11.2025
|
| Subjects: | |
| ISSN: | 0950-7051 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Although various object detection methods have been developed, the accuracy of existing algorithms remains insufficient, particularly for detecting small-size and long-distance objects. To address these challenges, we propose an improved object detection model, I-SAM-YOLOv5, which combines the strength of the large vision model (SAM) and YOLOv5. The framework incorporates a large visual feature fusion (LVFF) module, wherein powerful visual features of SAM are integrated into YOLOv5 to improve feature representation. Further, an enhanced fixed-resolution feature pyramid network (FRFPN) is employed to refine and strengthen feature extraction. The experimental results on the COCO and KITTI datasets demonstrate considerable improvements in detection accuracy across almost all model scales (n,s,m,l,andx). For the scale-n model, our model achieves a significant 8.47 % increase in mean average precision (mAP) on COCO and 5.48 % improvement on KITTI compared to the YOLOv5 baseline. To further assess the effectiveness of I-SAM-YOLOv5, we conduct ablation studies examining different LVFF variants, FRFPN designs, feature fusion positions, adapters and multi-layer perceptron (MLP) configurations. The results confirm the robust performance gains of our proposed framework. This study advances object detection and extends the application of large vision models to computer vision tasks such as intelligent transportation systems. |
|---|---|
| ISSN: | 0950-7051 |
| DOI: | 10.1016/j.knosys.2025.114757 |