GithubHelp home page GithubHelp logo

magic-research / bubogpt Goto Github PK

View Code? Open in Web Editor NEW
491.0 10.0 33.0 6.5 MB

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Home Page: https://bubo-gpt.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 99.64% Shell 0.36%

bubogpt's Introduction

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

A multi-modal LLM capable of jointly understanding of text, vision and audio and grounding knowledge into visual objects.

[Project Page] [Arxiv] [Demo Video] [Gradio] [Data] [Model]

bubogpt_framework

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao*, Zhijie Lin*, Daquan Zhou, Zilong Huang, Jiashi Feng and Bingyi Kang† (*Equal Contribution, †Project Lead)
Bytedance Inc.

HuggingFace space

News🔥

2023/07/21 - Huggingface demo released!

Setup

Clone this repository and navigate to the current folder.

Environment

Our code is based on Python 3.9, CUDA 11.7 and Pytorch 2.0.1.

pip3 install -r pre-requirements.txt
pip3 install -r requirements.txt

Models

Follow the instruction to prepare the pretrained Vicuna weights, and update the llama_model in bubogpt/configs/models/mmgpt4.yaml.

## get pre-trained checkpoints
mkdir checkpoints && cd checkpoints;
wget https://huggingface.co/spaces/Vision-CAIR/minigpt4/resolve/main/blip2_pretrained_flant5xxl.pth;
wget https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/ram_swin_large_14m.pth;
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth;
wget https://huggingface.co/spaces/abhishek/StableSAM/resolve/main/sam_vit_h_4b8939.pth;
wget https://huggingface.co/magicr/BuboGPT-ckpt/resolve/main/bubogpt_7b.pth

For training, down load MiniGPT-4 checkpoint to checkpoints.

Data

Stage1

Stage2

Usage

Gradio demo

Run gradio demo with:

python3 app.py --cfg-path eval_configs/mmgpt4_eval.yaml --gpu-id 0

Training

Browse the dataset config folder, and replace the storage item with path/to/your/data for each dataset.

Stage 1: Audio pre-training

bash dist_train.sh train_configs/mmgpt4_stage1_audio.yaml

Stage2: Multi-modal instruct tuning

bash dist_train.sh train_configs/mmgpt4_stage2_mm.yaml

Demo

1. Image Understanding with Grounding

2. Audio Understanding

3. Aligned Audio-Image Understanding

4. Arbitrary Audio-Image Understanding

For more demonstrations, please refer to the examples.

Acknowledgement

This codebase is mainly developed based on the following repos:

bubogpt's People

Contributors

awalkzy avatar ikuinen avatar magicfilm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bubogpt's Issues

About the bubogpt checkpoint that only completed the first stage of training

Thanks to the author for his outstanding contribution to the open source community, this is a great job! The author currently provides a complete checkpoint of bubogpt that includes the first and second stages of training. Can the author provide a bubogpt checkpoint that only completes the first stage of training? Thanks again for your contributions to the open source community!

Question about magicr/vicuna-7b

Thank you for your excellent work. The 'magicr/vicuna-7b' seems to be your private repository. I would like to know if it is different from other vicuna models.Thanks!

When loading ImageBind, EOFError, ran out of input

This is my mmhpt4.yaml file

  arch: mm_gpt4

  # Imagebind
  freeze_imagebind: True

  # Q-Former
  freeze_qformer: True
  q_former_model: "checkpoints/blip2_pretrained_flant5xxl.pth"
  num_query_token: 32

  # Vicuna
  llama_model: "saved_weight/tokenizer.model"

  # generation configs
  prompt: ""

preprocess:
    vis_processor:
        train:
          name: "imagebind_vision_train"
          image_size: 224
        eval:
          name: "imagebind_vision_eval"
          image_size: 224
    text_processor:
        train:
          name: "imagebind_caption"
        eval:
          name: "imagebind_caption"

Extending for Video

Do you have any plans on extending the current work for videos too?

I tried to modify it but it seems there are lots of things to be modified in between😅

命令行运行脚本

我在Linux服务器上进行部署,不支持用gradio来跑demo,有命令行的运行脚本吗?

How do you get the bubo icon?

Dear authors,

Thank you for your wonderful work! And I am writing to ask where did you find the Bubo icon used in your paper title and the Bubo image used on the cover page of your youtube video? Did you generate the images or download them?

Look forward to your reply.

Thanks,
Hiusam

No module named 'constants.constant'; 'constants' is not a package

Hi,

The install of requirements.txt went well, however i am getting the below error, after installing pip install constants the error is still there :

C:\Users\User1\Downloads\bubogpt-main\bubogpt-main>python eval_scripts/qualitative_eval.py --cfg-path eval_configs/mmgpt4_eval.yaml --gpu-id 0 Traceback (most recent call last): File "C:\Users\User1\Downloads\bubogpt-main\bubogpt-main\eval_scripts\qualitative_eval.py", line 15, in <module> from constants.constant import LIGHTER_COLOR_MAP_HEX ModuleNotFoundError: No module named 'constants.constant'; 'constants' is not a package

Can't install requirements.txt

image

Hello, I get an error when I try to install requirements.txt

ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cu117 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==2.0.0+cu117

和MiniGPT4的区别是什么呢?

如题
从文章来看,相比MiniGPT4,在支持的模态上引入了音频维度,在LLM-Vicuna输出后增加了一个pipeline对齐实体在图像中的位置;

About GPT-4 in match.py

I notice that you directly use OpenAI's GPT-4 to match caption and grounded entity. Why not train a custom model by leveraging existing datasets like the ones used in KOSMOS-2 or Shikra?

Loading ImageBind got Killed

When running

python3 app.py --cfg-path eval_configs/mmgpt4_eval.yaml --gpu-id 0

It gets this far but it gets killed
Initializing Chat
Loading ImageBind
Killed

Do you know how I can solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.