AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer
Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on computers Jg. 74; H. 5; S. 1620 - 1633 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.05.2025
|
| Schlagworte: | |
| ISSN: | 0018-9340, 1557-9956 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq1-3540638.gif"/> </inline-formula>, 7.7<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq2-3540638.gif"/> </inline-formula>, and 5.4<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq3-3540638.gif"/> </inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq4-3540638.gif"/> </inline-formula>, 1.8<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq5-3540638.gif"/> </inline-formula>, and 16.7<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq6-3540638.gif"/> </inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq7-3540638.gif"/> </inline-formula> 1.9<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math display="inline"><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="shin-ieq8-3540638.gif"/> </inline-formula> higher effective throughput compared to existing transformer accelerators. |
|---|---|
| ISSN: | 0018-9340 1557-9956 |
| DOI: | 10.1109/TC.2025.3540638 |