Research on Improved Attentional Video Instance Segmentation Algorithm
Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks,...
Uložené v:
| Vydané v: | International Conference on Industrial Mechatronics and Automation (Online) s. 2128 - 2133 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
06.08.2023
|
| Predmet: | |
| ISSN: | 2152-744X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks, including insensitivity to small objects, sluggish convergence during training, and significant complexity. To address these issues, the Deformable Relative Inter-frame Communication Transformers (DR_IFC) approach, which is based on the efficient video instance segmentation algorithm Inter-frame Communication Transformers (IFC), was proposed as an enhanced attentional video instance segmentation algorithm. Through improvements to the attentional module and position encoding module of the video instance segmentation algorithm, the proposed method improves segmentation and tracking performance while boosting training efficiency. Specifically, we develop the deformable attention module, which focuses on the main aspects highlighted in all feature image components and effectively enhances the Transformer's storage and processing efficiency. Furthermore, to strengthen the model's characterization capabilities, we employ relative position encoding to explicitly characterize the position relationship of any two Tokens in the Transformer input sequence. DR_IFC algorithm was evaluated using the YouTube-VIS dataset, which includes complex and diverse scenes. Experimental results demonstrate that the proposed improved method significantly enhances segmentation and tracking performance. Without adjusting any additional hyperparameters, such as learning rate and weight decay, DR_IFC achieved consistent gains in mAP metrics of up to 4.43% over the original IFC algorithm, proving the usefulness of the upgraded attention module and location encoding module. |
|---|---|
| ISSN: | 2152-744X |
| DOI: | 10.1109/ICMA57826.2023.10215956 |