Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encoun...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 19541 - 19551
Main Authors: Wang, Peijie, Li, Zhong-Zhi, Yin, Fei, Ran, Dekang, Liu, Cheng-Lin
Format: Conference Proceeding
Language:English
Published: IEEE 10.06.2025
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we observe that MLLMs encounter substantial challenges in multi-visual math tasks, with a considerable performance gap relative to human capabilities on MV-MATH. Furthermore, we analyze the performance and error patterns of various models, providing insights into MLLMs' mathematical reasoning capabilities within multi-visual settings. The data and code: https://eternal8080.github.io/MV-MATH.github.io/.
AbstractList Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we observe that MLLMs encounter substantial challenges in multi-visual math tasks, with a considerable performance gap relative to human capabilities on MV-MATH. Furthermore, we analyze the performance and error patterns of various models, providing insights into MLLMs' mathematical reasoning capabilities within multi-visual settings. The data and code: https://eternal8080.github.io/MV-MATH.github.io/.
Author Liu, Cheng-Lin
Ran, Dekang
Wang, Peijie
Li, Zhong-Zhi
Yin, Fei
Author_xml – sequence: 1
  givenname: Peijie
  surname: Wang
  fullname: Wang, Peijie
  email: wangpeijie2023@ia.ac.cn
  organization: MAIS, Institute of Automation of Chinese Academy of Sciences
– sequence: 2
  givenname: Zhong-Zhi
  surname: Li
  fullname: Li, Zhong-Zhi
  email: lizhongzhi2022@ia.ac.cn
  organization: MAIS, Institute of Automation of Chinese Academy of Sciences
– sequence: 3
  givenname: Fei
  surname: Yin
  fullname: Yin, Fei
  email: fyin@nlpr.ia.ac.cn
  organization: MAIS, Institute of Automation of Chinese Academy of Sciences
– sequence: 4
  givenname: Dekang
  surname: Ran
  fullname: Ran, Dekang
  email: randekang2025@ia.ac.cn
  organization: MAIS, Institute of Automation of Chinese Academy of Sciences
– sequence: 5
  givenname: Cheng-Lin
  surname: Liu
  fullname: Liu, Cheng-Lin
  email: liucl@nlpr.ia.ac.cn
  organization: MAIS, Institute of Automation of Chinese Academy of Sciences
BookMark eNot0E9LwzAcxvEoCs7Zd7BD30Bq_jb5eZOyTWFFGWPXkbWJRrpUlnTou3dlnp7DB57D9x7dhD5YhGaUFJQSeKy272vJFBcFI0wWhGpGrlAGCjTnVApeCn2NJpSUHJdA4Q5lMX4RQjijtAQ9QYv6hGuTPp_y-cl0g0k-fOT10CV_6FvT5aPla2tiH0bx4YJ46-Nw5qoPyf6k-IBunemizf53ijaL-aZ6wau35Wv1vMIeeMJ075w2jQZw1nHVtk4ZAVITQxsCllvD9lKXrAEQVijGVCsaJ8VZmtYxxqdodrn11trd99EfzPF3N6ZgEhT_AzMfTyo
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52734.2025.01820
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798331543648
EISSN 1063-6919
EndPage 19551
ExternalDocumentID 11092597
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i93t-1bff8ac899fef37ddf7a49580a1c09e3ea2b5862c994e47227d4cf54e3ecdf223
IEDL.DBID RIE
IngestDate Wed Aug 20 06:20:57 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-1bff8ac899fef37ddf7a49580a1c09e3ea2b5862c994e47227d4cf54e3ecdf223
PageCount 11
ParticipantIDs ieee_primary_11092597
PublicationCentury 2000
PublicationDate 2025-June-10
PublicationDateYYYYMMDD 2025-06-10
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-10
  day: 10
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.3075953
Snippet Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However,...
SourceID ieee
SourceType Publisher
StartPage 19541
SubjectTerms Benchmark testing
Codes
Cognition
Computational modeling
Computer vision
Large language models
math reasoning
Mathematical models
multi-image reasoning
multimodal reasoning
Pattern recognition
Systematics
Visualization
Title Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
URI https://ieeexplore.ieee.org/document/11092597
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09a8MwED2a0KFT-pHSbzR0VeNvSV1DQpeEEELIFmTpVAytU2I79OdXUty0S4duxicwnBD33lnvHcAjk1GWqggpM8xYgiI1tSg_pYphoFWoWcJzP2yCTad8tRKzVqzutTCI6C-f4ZN79P_y9UY1rlU2cO6YFq6zDnQYy_ZirUNDJbZUJhO8lcfZlYPhcjZ3_mKudRI5d07upnr_GqLia8i498-vn0L_R41HZoc6cwZHWJ5Dr4WPpD2c1QWMJzs6sYDumYxaC-_ylXiB7ftGyzfiYmSOsvIdWFKU-yBdFlVjw96n6rOu-rAYjxbDF9qOSaCFiGsa5sZwqSxvMmhiprVh0rIeHshQBQJjlFGeWt6ihEjQWUMynSiTJjaitLHo4BK65abEKyAsVmmuDQ8zkyfIMxlowYLAbnOcOWOza-i7tKw_9kYY6--M3Pzx_hZOXObdzaowuINuvW3wHo7Vri6q7YPfvi-G45xN
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46BT3NHxN_m4PXuLRNm9SrbEzcxhhj7DbS5EUK2snaDf98k6xOLx68lb5AIY_wvu813_cQuucyTGIVAuGGG0tQpCYW5cdEcaBaBZozkflhE3w4FLNZOqrF6l4LAwD-8hk8uEf_L18v1Mq1ytrOHdPCdb6L9mLGQrqRa21bKpElM0kqaoGcXdt-mo7GzmHMNU9C588p3FzvX2NUfBXpNv_5_SPU-tHj4dG20hyjHShOULMGkLg-nuUp6g7WZGAh3SPu1CbexSv2Etv3hZZv2MXwGGTpe7A4LzZBMs3LlQ17p6rPqmyhSbczeeqRelACydOoIkFmjJDKMicDJuJaGy4t7xFUBoqmEIEMs9gyF5WmDJw5JNdMmZjZiNLG4oMz1CgWBZwjzCMVZ9qIIDEZA5FIqlNOqU10lDhrswvUctsy_9hYYcy_d-Tyj_d36KA3GfTn_efhyxU6dFlw96wCeo0a1XIFN2hfrau8XN76VH4B2EaflA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Mv-Math%3A+Evaluating+Multimodal+Math+Reasoning+in+Multi-Visual+Contexts&rft.au=Wang%2C+Peijie&rft.au=Li%2C+Zhong-Zhi&rft.au=Yin%2C+Fei&rft.au=Ran%2C+Dekang&rft.date=2025-06-10&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=19541&rft.epage=19551&rft_id=info:doi/10.1109%2FCVPR52734.2025.01820&rft.externalDocID=11092597