Topic: mllm Goto Github

Some thing interesting about mllm

👇 Here are 40 public repositories matching this topic...

360cvgroup / seechat

mllm,Multimodal chatbot with computer vision capabilities integrated

Organization: 360cvgroup

chatbot gpt4 mllm

ahnsun / merlin

mllm,[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds

User: ahnsun

Home Page: https://ahnsun.github.io/merlin/

mllm

alexander-moore / vlm

mllm,Composition of Multimodal Language Models From Scratch

User: alexander-moore

ai llm machine-learning mllm multimodal-large-language-models vision-language-model vlm mmllm

atfortes / awesome-llm-reasoning

mllm,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

User: atfortes

language-models reasoning prompt question-answering in-context-learning chatgpt chain-of-thought prompt-engineering cot awesome

atomic-man007 / awesome_multimodel_llm

mllm,Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

User: atomic-man007

chatgpt dataset gpt llm mllm multimodel nlp pretrained-models

baai-dcai / bunny

mllm,A family of lightweight multimodal models.

User: baai-dcai

mllm chatgpt gpt-4 multimodal-large-language-models vlm chinese english

baai-dcai / dataoptim

mllm,A collection of visual instruction tuning datasets.

User: baai-dcai

llm mllm visual-instruction-tuning

baaivision / eve

mllm,EVE: Encoder-Free Vision-Language Models from BAAI

Organization: baaivision

instruction-following large-language-models mllm multimodal-large-language-models vlm encoder-free-vlm llm clip vision-language-models

bigai-nlco / lstp-chat

mllm,A Video Chat Agent with Temporal Prior

Organization: bigai-nlco

llm mllm multimodal-large-language-models spatial-temporal video-language visual-instruction-tuning

bradyfu / woodpecker

mllm,✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

User: bradyfu

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

buaadreamer / chinese-llava-med

mllm,中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

User: buaadreamer

llava medical mllm multimodal chinese qwen1-5 ai gpt4v huggingface-datasets minigpt4

buaadreamer / mllm-finetuning-demo

mllm,使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

User: buaadreamer

llama-factory llava mllm paligemma yi-vl finetune-llm lora huggingface-datasets transformers pretraining

cambrian-mllm / cambrian

mllm,Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Organization: cambrian-mllm

Home Page: https://cambrian-mllm.github.io/

chatbot clip computer-vision dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning

charliedddd / aisurveypapers

mllm,Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey

User: charliedddd

agent agi ai-system llm mllm survey lvlm alignment

circleradon / osprey

mllm,[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

User: circleradon

mllm sam visual-instruction-tuning pixel-understanding

coobiw / minigpt4qwen

mllm,Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

User: coobiw

multimodal-large-language-models deepspeed model-parallel pipeline-parallelism mllm qwen fine-tuning pretraining video-language-model video-large-language-models

eric-ai-lab / multipanelvqa

mllm,Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"

Organization: eric-ai-lab

Home Page: https://sites.google.com/view/multipanelvqa/home

mllm vlm vqa multipanel-understanding screen-ai

foundationvision / generateu

mllm,[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Organization: foundationvision

mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world

foundationvision / groma

mllm,[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Organization: foundationvision

Home Page: https://groma-mllm.github.io/

grounding llm mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

gokayfem / comfyui_vlm_nodes

mllm,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

User: gokayfem

nodes comfyui custom-nodes llava llm siglip phi15 img2text joytag image-captioning

graphic-design-ai / graphist

mllm,Official Repo of Graphist

Organization: graphic-design-ai

Home Page: https://arxiv.org/abs/2404.14368

graphic-design hlg layout-generation llm lmm mllm

internlm / internlm-xcomposer

mllm,InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Organization: internlm

chatgpt visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model

islinxu / mllm-research-learn

mllm,Conducting learning and research on MLLM based on the MME rankings.

User: islinxu

mllm

kwaivgi / uniaa

mllm,Unified Multi-modal IAA Baseline and Benchmark

Organization: kwaivgi

image-aesthetic-assessment benchmark dataset llava mllm

microsoft / unilm

mllm,Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Organization: microsoft

Home Page: https://aka.ms/GeneralAI

nlp pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3

parsee-ai / parsee-datasets

mllm,Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai

Organization: parsee-ai

Home Page: https://parsee.ai

datasets llm mllm rag

showlab / visincontext

mllm,Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Organization: showlab

Home Page: https://fingerrec.github.io/visincontext/

efficient in-context-learning llm mllm

sterzhang / image-textualization

mllm,Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

User: sterzhang

dense-captioning mllm text-image

tidedra / vl-rlhf

mllm,A RLHF Infrastructure for Vision-Language Models

User: tidedra

dpo llm lmm mllm rlhf vlm

tiger-ai-lab / mantis

mllm,Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Organization: tiger-ai-lab

Home Page: https://tiger-ai-lab.github.io/Mantis/

language vision fuyu llava-llama3 lmm mantis mllm video vlm multi-image-understanding

ucsc-vlaa / sight-beyond-text

mllm,This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

Organization: ucsc-vlaa

llama2 llava llm mllm vicuna vision-language ai-alignment alignment vlm