Research on Improved Attentional Video Instance Segmentation Algorithm

Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks,...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	International Conference on Industrial Mechatronics and Automation (Online) s. 2128 - 2133
Hlavní autori:	Zhang, Wen, Mei, Konghao, Sun, Zhexuan, Chen, Guangkun, Lv, Shengrong
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 06.08.2023
Predmet:	Deformable attention Image coding Image segmentation Measurement Mechatronics Object detection Relative position encoding Training Transformer Transformers Video instance segmentation
ISSN:	2152-744X
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks, including insensitivity to small objects, sluggish convergence during training, and significant complexity. To address these issues, the Deformable Relative Inter-frame Communication Transformers (DR_IFC) approach, which is based on the efficient video instance segmentation algorithm Inter-frame Communication Transformers (IFC), was proposed as an enhanced attentional video instance segmentation algorithm. Through improvements to the attentional module and position encoding module of the video instance segmentation algorithm, the proposed method improves segmentation and tracking performance while boosting training efficiency. Specifically, we develop the deformable attention module, which focuses on the main aspects highlighted in all feature image components and effectively enhances the Transformer's storage and processing efficiency. Furthermore, to strengthen the model's characterization capabilities, we employ relative position encoding to explicitly characterize the position relationship of any two Tokens in the Transformer input sequence. DR_IFC algorithm was evaluated using the YouTube-VIS dataset, which includes complex and diverse scenes. Experimental results demonstrate that the proposed improved method significantly enhances segmentation and tracking performance. Without adjusting any additional hyperparameters, such as learning rate and weight decay, DR_IFC achieved consistent gains in mAP metrics of up to 4.43% over the original IFC algorithm, proving the usefulness of the upgraded attention module and location encoding module.
ISSN:	2152-744X
DOI:	10.1109/ICMA57826.2023.10215956