Research on Improved Attentional Video Instance Segmentation Algorithm

Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks,...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Conference on Industrial Mechatronics and Automation (Online) s. 2128 - 2133
Hlavní autoři:	Zhang, Wen, Mei, Konghao, Sun, Zhexuan, Chen, Guangkun, Lv, Shengrong
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 06.08.2023
Témata:	Deformable attention Image coding Image segmentation Measurement Mechatronics Object detection Relative position encoding Training Transformer Transformers Video instance segmentation
ISSN:	2152-744X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Video instance segmentation extends the concept of image instance segmentation through incorporating tracking approaches to address problems faced during object detection tasks, such as dense occlusion. Traditional Transformer-based methods, such as segmentation tracking, have a number of drawbacks, including insensitivity to small objects, sluggish convergence during training, and significant complexity. To address these issues, the Deformable Relative Inter-frame Communication Transformers (DR_IFC) approach, which is based on the efficient video instance segmentation algorithm Inter-frame Communication Transformers (IFC), was proposed as an enhanced attentional video instance segmentation algorithm. Through improvements to the attentional module and position encoding module of the video instance segmentation algorithm, the proposed method improves segmentation and tracking performance while boosting training efficiency. Specifically, we develop the deformable attention module, which focuses on the main aspects highlighted in all feature image components and effectively enhances the Transformer's storage and processing efficiency. Furthermore, to strengthen the model's characterization capabilities, we employ relative position encoding to explicitly characterize the position relationship of any two Tokens in the Transformer input sequence. DR_IFC algorithm was evaluated using the YouTube-VIS dataset, which includes complex and diverse scenes. Experimental results demonstrate that the proposed improved method significantly enhances segmentation and tracking performance. Without adjusting any additional hyperparameters, such as learning rate and weight decay, DR_IFC achieved consistent gains in mAP metrics of up to 4.43% over the original IFC algorithm, proving the usefulness of the upgraded attention module and location encoding module.
ISSN:	2152-744X
DOI:	10.1109/ICMA57826.2023.10215956