Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encoun...
Saved in:
| Published in: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 19541 - 19551 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
10.06.2025
|
| Subjects: | |
| ISSN: | 1063-6919 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we observe that MLLMs encounter substantial challenges in multi-visual math tasks, with a considerable performance gap relative to human capabilities on MV-MATH. Furthermore, we analyze the performance and error patterns of various models, providing insights into MLLMs' mathematical reasoning capabilities within multi-visual settings. The data and code: https://eternal8080.github.io/MV-MATH.github.io/. |
|---|---|
| AbstractList | Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we observe that MLLMs encounter substantial challenges in multi-visual math tasks, with a considerable performance gap relative to human capabilities on MV-MATH. Furthermore, we analyze the performance and error patterns of various models, providing insights into MLLMs' mathematical reasoning capabilities within multi-visual settings. The data and code: https://eternal8080.github.io/MV-MATH.github.io/. |
| Author | Liu, Cheng-Lin Ran, Dekang Wang, Peijie Li, Zhong-Zhi Yin, Fei |
| Author_xml | – sequence: 1 givenname: Peijie surname: Wang fullname: Wang, Peijie email: wangpeijie2023@ia.ac.cn organization: MAIS, Institute of Automation of Chinese Academy of Sciences – sequence: 2 givenname: Zhong-Zhi surname: Li fullname: Li, Zhong-Zhi email: lizhongzhi2022@ia.ac.cn organization: MAIS, Institute of Automation of Chinese Academy of Sciences – sequence: 3 givenname: Fei surname: Yin fullname: Yin, Fei email: fyin@nlpr.ia.ac.cn organization: MAIS, Institute of Automation of Chinese Academy of Sciences – sequence: 4 givenname: Dekang surname: Ran fullname: Ran, Dekang email: randekang2025@ia.ac.cn organization: MAIS, Institute of Automation of Chinese Academy of Sciences – sequence: 5 givenname: Cheng-Lin surname: Liu fullname: Liu, Cheng-Lin email: liucl@nlpr.ia.ac.cn organization: MAIS, Institute of Automation of Chinese Academy of Sciences |
| BookMark | eNot0E9LwzAcxvEoCs7Zd7BD30Bq_jb5eZOyTWFFGWPXkbWJRrpUlnTou3dlnp7DB57D9x7dhD5YhGaUFJQSeKy272vJFBcFI0wWhGpGrlAGCjTnVApeCn2NJpSUHJdA4Q5lMX4RQjijtAQ9QYv6hGuTPp_y-cl0g0k-fOT10CV_6FvT5aPla2tiH0bx4YJ46-Nw5qoPyf6k-IBunemizf53ijaL-aZ6wau35Wv1vMIeeMJ075w2jQZw1nHVtk4ZAVITQxsCllvD9lKXrAEQVijGVCsaJ8VZmtYxxqdodrn11trd99EfzPF3N6ZgEhT_AzMfTyo |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52734.2025.01820 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9798331543648 |
| EISSN | 1063-6919 |
| EndPage | 19551 |
| ExternalDocumentID | 11092597 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i93t-1bff8ac899fef37ddf7a49580a1c09e3ea2b5862c994e47227d4cf54e3ecdf223 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 20 06:20:57 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i93t-1bff8ac899fef37ddf7a49580a1c09e3ea2b5862c994e47227d4cf54e3ecdf223 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_11092597 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-10 |
| PublicationDateYYYYMMDD | 2025-06-10 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-10 day: 10 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.3075953 |
| Snippet | Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 19541 |
| SubjectTerms | Benchmark testing Codes Cognition Computational modeling Computer vision Large language models math reasoning Mathematical models multi-image reasoning multimodal reasoning Pattern recognition Systematics Visualization |
| Title | Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts |
| URI | https://ieeexplore.ieee.org/document/11092597 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09a8MwED2a0KFT-pHSbzR0VeNvSV1DQpeEEELIFmTpVAytU2I79OdXUty0S4duxicwnBD33lnvHcAjk1GWqggpM8xYgiI1tSg_pYphoFWoWcJzP2yCTad8tRKzVqzutTCI6C-f4ZN79P_y9UY1rlU2cO6YFq6zDnQYy_ZirUNDJbZUJhO8lcfZlYPhcjZ3_mKudRI5d07upnr_GqLia8i498-vn0L_R41HZoc6cwZHWJ5Dr4WPpD2c1QWMJzs6sYDumYxaC-_ylXiB7ftGyzfiYmSOsvIdWFKU-yBdFlVjw96n6rOu-rAYjxbDF9qOSaCFiGsa5sZwqSxvMmhiprVh0rIeHshQBQJjlFGeWt6ihEjQWUMynSiTJjaitLHo4BK65abEKyAsVmmuDQ8zkyfIMxlowYLAbnOcOWOza-i7tKw_9kYY6--M3Pzx_hZOXObdzaowuINuvW3wHo7Vri6q7YPfvi-G45xN |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46BT3NHxN_m4PXuLRNm9SrbEzcxhhj7DbS5EUK2snaDf98k6xOLx68lb5AIY_wvu813_cQuucyTGIVAuGGG0tQpCYW5cdEcaBaBZozkflhE3w4FLNZOqrF6l4LAwD-8hk8uEf_L18v1Mq1ytrOHdPCdb6L9mLGQrqRa21bKpElM0kqaoGcXdt-mo7GzmHMNU9C588p3FzvX2NUfBXpNv_5_SPU-tHj4dG20hyjHShOULMGkLg-nuUp6g7WZGAh3SPu1CbexSv2Etv3hZZv2MXwGGTpe7A4LzZBMs3LlQ17p6rPqmyhSbczeeqRelACydOoIkFmjJDKMicDJuJaGy4t7xFUBoqmEIEMs9gyF5WmDJw5JNdMmZjZiNLG4oMz1CgWBZwjzCMVZ9qIIDEZA5FIqlNOqU10lDhrswvUctsy_9hYYcy_d-Tyj_d36KA3GfTn_efhyxU6dFlw96wCeo0a1XIFN2hfrau8XN76VH4B2EaflA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Mv-Math%3A+Evaluating+Multimodal+Math+Reasoning+in+Multi-Visual+Contexts&rft.au=Wang%2C+Peijie&rft.au=Li%2C+Zhong-Zhi&rft.au=Yin%2C+Fei&rft.au=Ran%2C+Dekang&rft.date=2025-06-10&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=19541&rft.epage=19551&rft_id=info:doi/10.1109%2FCVPR52734.2025.01820&rft.externalDocID=11092597 |