Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency

In this work, we introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations. This modeling paradigm allows us to process images in a recurrent formulation with linear complexity relative to the...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Vol. 2025; pp. 30157 - 30166
Main Authors: Wang, Feng, Yang, Timing, Yu, Yaodong, Ren, Sucheng, Wei, Guoyizhe, Wang, Angtian, Shao, Wei, Zhou, Yuyin, Yuille, Alan, Xie, Cihang
Format: Conference Proceeding Journal Article
Language:English
Published: United States IEEE 01.06.2025
Subjects:
ISSN:1063-6919, 1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first