Topic: mllm Goto Github
Some thing interesting about mllm
Some thing interesting about mllm
mllm,Multimodal chatbot with computer vision capabilities integrated
Organization: 360cvgroup
mllm,[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
User: ahnsun
Home Page: https://ahnsun.github.io/merlin/
mllm,Composition of Multimodal Language Models From Scratch
User: alexander-moore
mllm,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
User: atfortes
mllm,Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
User: atomic-man007
mllm,EVE: Encoder-Free Vision-Language Models from BAAI
Organization: baaivision
mllm,A Video Chat Agent with Temporal Prior
Organization: bigai-nlco
mllm,✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
User: bradyfu
mllm,中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
User: buaadreamer
mllm,使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
User: buaadreamer
mllm,Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Organization: cambrian-mllm
Home Page: https://cambrian-mllm.github.io/
mllm,Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
User: charliedddd
mllm,[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
User: circleradon
mllm,Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
User: coobiw
mllm,Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
Organization: eric-ai-lab
Home Page: https://sites.google.com/view/multipanelvqa/home
mllm,[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Organization: foundationvision
mllm,[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Organization: foundationvision
Home Page: https://groma-mllm.github.io/
mllm,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
User: gokayfem
mllm,Official Repo of Graphist
Organization: graphic-design-ai
Home Page: https://arxiv.org/abs/2404.14368
mllm,InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Organization: internlm
mllm,Conducting learning and research on MLLM based on the MME rankings.
User: islinxu
mllm,Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Organization: microsoft
Home Page: https://aka.ms/GeneralAI
mllm,Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
Organization: parsee-ai
Home Page: https://parsee.ai
mllm,Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Organization: showlab
Home Page: https://fingerrec.github.io/visincontext/
mllm,Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
User: sterzhang
mllm,Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Organization: tiger-ai-lab
Home Page: https://tiger-ai-lab.github.io/Mantis/
mllm,This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Organization: ucsc-vlaa
mllm,Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Organization: visualwebbench
Home Page: https://visualwebbench.github.io/
mllm,Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Organization: x-plug
Home Page: https://arxiv.org/abs/2406.01014
mllm,mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Organization: x-plug
mllm,mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Organization: x-plug
mllm,mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Organization: x-plug
mllm,Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Organization: x-plug
mllm,Awesome list for attacks on large language models.
User: xirui-li
mllm,MOSSBench: A webpage for an oversensitivity benchmark
User: xirui-li
Home Page: https://xirui-li.github.io/MOSSBench/
mllm,MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
User: zzq2000
Home Page: https://arxiv.org/abs/2402.18169
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.