yuweihao / mm-vet Goto Github PK
View Code? Open in Web Editor NEWMM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
Home Page: https://arxiv.org/abs/2308.02490
License: Apache License 2.0
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
Home Page: https://arxiv.org/abs/2308.02490
License: Apache License 2.0
Thanks for the wonderful evaluation work!
But the Otter evaluation seems based on our early version (around May 2023), we have a stronger MPT7B version Otter since last month based on OpenFlamingoV2. Would you like to try to evaluate it instead of the LLama7B version.
Both OpenFlamingoV2 and Otter-MPT7B have much better performance than previous version. Please consider using them to rejustify the evaluation results.
We also have a video version of Otter, would you like to add it to further VideoLLM evaluation?
Otter-Image: https://huggingface.co/luodian/OTTER-Image-MPT7B
Otter-Video: https://huggingface.co/luodian/OTTER-Video-LLaMA7B-DenseCaption
Hi there,
Congrats on your work. I discovered it through the paper page: https://huggingface.co/papers/2408.00765. I see the paper already has a linked Space, it would be great to also make the benchmark available as a ๐ค Dataset, enabling people to load the data in 2 lines of code:
from datasets import load_dataset
dataset = load_dataset("your-hf-organization/mm-vet")
This also comes with a dataset viewer, enabling people to see the first few rows in the browser.
See for instance these multimodal datasets as examples:
This could perhaps be made available as part of the National University of Singapore organization: https://huggingface.co/NationalUniversityofSingapore.
Let me know if you need any help regarding this!
Cheers,
Niels
Open-source @ HF
Would you mind to share your GPT-4V inference and evaluation script? Really appreciate your help.
I have tried by myself, and only achieve score 61.8, far from 67.7.
rec ocr know gen spat math total std runs
gpt4v 54.8 74.1 39.6 42.9 74.7 75.2 61.8 0.0 [61.8]
know_gen_rec rec spat_ocr math_spat_ocr spat_rec ocr math_ocr know_rec know_ocr_gen_rec ... spat_ocr_rec ocr_rec know_spat_ocr spat_know_rec spat_ocr_gen math_spat_ocr_rec total std runs
gpt4v 33.9 81.1 79.6 92.1 66.7 73.3 59.0 27.8 71.2 ... 42.9 100.0 66.7 100.0 85.0 0.0 61.8 0.0 [61.8]
How do you evaluate BARD on your dataset? Do you manually input the questions, or do you have a API you access? It would be much appreciated if you could share your BARD inference scripts, thanks!
Hi,
For all the results, do you perform zero-shot testing or fine-tune on your dataset on the respective models?
Hi!
Link for leadervord for v2 pointed to paperswithcode.com is 404.
Hi, thanks very much for the contribution! However, the huggingface space seems to be down. Could you please check? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.