AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer

Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers Jg. 74; H. 5; S. 1620 - 1633
Hauptverfasser:	Shin, Jaekang, Kang, Myeonggu, Han, Yunki, Park, Junyoung, Kim, Lee-Sup
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	IEEE 01.05.2025
Schlagworte:	algorithm-architecture co-design Computational efficiency Computational modeling Computer architecture Computer vision Computers DNN accelerator Graphics processing units Hardware Heuristic algorithms Merging token merge transformer-based computer vision Transformers
ISSN:	0018-9340, 1557-9956
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq1-3540638.gif"/> </inline-formula>, 7.7<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq2-3540638.gif"/> </inline-formula>, and 5.4<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq3-3540638.gif"/> </inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq4-3540638.gif"/> </inline-formula>, 1.8<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq5-3540638.gif"/> </inline-formula>, and 16.7<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq6-3540638.gif"/> </inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq7-3540638.gif"/> </inline-formula> 1.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq8-3540638.gif"/> </inline-formula> higher effective throughput compared to existing transformer accelerators.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2025.3540638