dirtyharrylyl / llm-in-vision Goto Github PK

View Code? Open in Web Editor NEW

811.0 811.0 33.0 916 KB

Recent LLM-based CV and related works. Welcome to comment/contribute!

llm-in-vision's People

Stargazers

Watchers

llm-in-vision's Issues

Please Help me change the name of my paper

Can you help me update the name of my paper:

From

(arXiv 2023.6) Aligning Large Multi-Modal Model with Robust Instruction Tuning.

(arXiv 2023.6) Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning.

Thanks!

Please add this paper~

(arXiv 2024.3) MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control, [Paper], [Project]

Please add these paper

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
GLaMM: Pixel Grounding Large Multimodal Model
Planting a SEED of Vision in Large Language Model

Please add this paper https://arxiv.org/pdf/2312.17240.pdf

Please add this paper

EVE: Efficient Vision-Language Pre-training with Masked Prediction and
Modality-Aware MoE

Any idea to change the repo to awesome series?

Such as https://github.com/zhimin-z/awesome-awesome-machine-learning

Add this paper please

Efficient Multimodal Learning from Data-centric Perspective

Add these papers please

MM1: Methods, Analysis & Insights from
Multimodal LLM Pre-training (https://arxiv.org/pdf/2403.09611.pdf)
Mini-Gemini: Mining the Potential of Multi-modality
Vision Language Models (https://arxiv.org/pdf/2403.18814.pdf)

Paper proposal: "DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback"

Hi, thanks for your great efforts for establishing this list. Could you please add our recent paper about large vision-language model to the list? The name is: DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback. The arxiv link. Thanks!

Please add this paper

AUTORT: EMBODIED FOUNDATION MODELS FOR LARGE SCALE ORCHESTRATION OF ROBOTIC AGENTS

Add these papers please

Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Dej´a Vu Memorization in Vision-Language Models
Red Teaming Visual Language Models
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation

Please add this paper ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Please add these papers

DetGPT: Detect What You Need via Reasoning
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
INSTRUCTCV: INSTRUCTION-TUNED TEXT-TO-IMAGE DIFFUSION MODELS AS VISION GENERALISTS
GROUNDHOG : Grounding Large Language Models to Holistic Segmentation
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
OCTAVIUS: MITIGATING TASK INTERFERENCE IN MLLMS VIA LORA-MOE

dirtyharrylyl / llm-in-vision Goto Github PK

llm-in-vision's People

Stargazers

Watchers

Forkers

llm-in-vision's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs