GithubHelp home page GithubHelp logo

cmmmu's Introduction

CMMMU

🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub

This repo contains the evaluation code for the paper "CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark"

Introduction

CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures.

Alt text

Evaluation

Please refer to our eval folder for more details.

🏆 Mini-Leaderboard

Model Val (900) Test (11K)
GPT-4V(ision) (Playground) 42.5 43.7
Qwen-VL-PLUS* 39.5 36.8
Yi-VL-34B 36.2 36.5
Yi-VL-6B 35.8 35.0
InternVL-Chat-V1.1* 34.7 34.0
Qwen-VL-7B-Chat 30.7 31.3
SPHINX-MoE* 29.3 29.5
InternVL-Chat-ViT-6B-Vicuna-7B 26.4 26.7
InternVL-Chat-ViT-6B-Vicuna-13B 27.4 26.1
CogAgent-Chat 24.6 23.6
Emu2-Chat 23.8 24.5
Chinese-LLaVA 25.5 23.4
VisCPM 25.2 22.7
mPLUG-OWL2 20.8 22.2
Frequent Choice 24.1 26.0
Random Choice 21.6 21.6

*: results provided by the authors.

Disclaimers

The guidelines for the annotators emphasized strict compliance with copyright and licensing rules from the initial data source, specifically avoiding materials from websites that forbid copying and redistribution. Should you encounter any data samples potentially breaching the copyright or licensing regulations of any site, we encourage you to contact us. Upon verification, such samples will be promptly removed.

Contact

Citation

BibTeX:

@article{zhang2024cmmmu,
        title={CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark},
        author={Ge, Zhang and Xinrun, Du and Bei, Chen and Yiming, Liang and Tongxu, Luo and Tianyu, Zheng and Kang, Zhu and Yuyang, Cheng and Chunpu, Xu and Shuyue, Guo and Haoran, Zhang and Xingwei, Qu and Junjie, Wang and Ruibin, Yuan and Yizhi, Li and Zekun, Wang and Yudong, Liu and Yu-Hsuan, Tsai and Fengji, Zhang and Chenghua, Lin and Wenhao, Huang and Wenhu, Chen and Jie, Fu},
        journal={arXiv preprint arXiv:2401.20847},
        year={2024},
      }

cmmmu's People

Contributors

cmmmu-benchmark avatar xinrundu avatar

Stargazers

Vimos Tan avatar ZhenZhang avatar Zhuohan Xie avatar Xinyu Wang avatar kingfly avatar 茶豚 avatar benqiang avatar Tianheng Cheng avatar German Novikov avatar  avatar Yizhi Li avatar  avatar MagicSource avatar stevchen avatar  avatar Tiancheng Zhao (Tony)  avatar Li Zhongzhi avatar Zirui Song avatar Haolan avatar he neng avatar qianlanwyd avatar stzhao avatar Qingsong Liu avatar cpaaax avatar HuiZhang avatar syaxx avatar  avatar zhangtao avatar Zhe Chen avatar Haotian Wang avatar Jianhua Han avatar 姬忠鹏 avatar lorinma avatar danny avatar  avatar slyviacassell avatar 唐国梁Tommy avatar wenhu chen avatar Rui Shao avatar  avatar SRTTGD avatar Bei Chen avatar JIMMY ZHAO avatar

Watchers

 avatar  avatar

Forkers

pineking

cmmmu's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.