Light

ziplab / longvlm Goto Github PK

View Code? Open in Web Editor NEW

41.0 3.0 3.0 7.99 MB

Python 96.18% Shell 3.82%

longvlm's Introduction

LongVLM: Efficient Long Video Understanding via Large Language Models

[ECCV 2024] This is the official repository for our paper: LongVLM: Efficient Long Video Understanding via Large Language Models by Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang and Bohan Zhuang.

Architecture

Installation

conda create -n longvlm python==3.9
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1
pip install -r requirements.txt
export PYTHONPATH="./:$PYTHONPATH"

### For Training, install flash-attention
pip install ninja
git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
git checkout v1.0.7
python setup.py install

Pretrained model

Download pretrained base model and delta weights from Huggingface.

Apply delta weights to the base model:

python scripts/apply_delta.py \
    --base YOUR_PATH_TO_LLAMA_7B \
    --target YOUR_PATH_TO_LLAVA_7B \
    --delta YOUR_PATH_TO_LLAVA_DELTA_7B

Data preparation

Download ActivityNet dataset. Dataset architecture:

datasets
|___anet
|   |___v1-2/
|   |   |___train/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |   |___val/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |___v1-3/
|   |   |___train_val/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |___anet_benchmark_video_id.txt
|   |___video_list_v1_2_val.txt
|   |___video_list_v1_2_train.txt
|   |___video_list_v1_3_trainval.txt

Extract visual features using CLIP-VIT/L-14:

python scripts/save_features.py \
    --video_dir_path YOUR_PATH_TO_VIDEO_FOLDER \
    --list_file YOUR_PATH_TO_VIDEO_ID_FILE \
    --clip_feat_path_local YOUR_PATH_TO_SAVE_VISUAL_FEATURE_L \
    --clip_feat_path_mem YOUR_PATH_TO_SAVE_VISUAL_FEATURE_G

Train and Evaluation

sh run.sh

Examples

Acknowledgement

This work is built upon VideoChatGPT. Thanks to their awesome work.

longvlm's People

Contributors

Stargazers

Watchers

Forkers

engsamshamsan vhzy aoxu2000

longvlm's Issues

Unable to Reproduce videochatgpt Benchmark Results

Hello,

Thank you for your open-source contribution. I have trained a model using the code you provided. However, I am seeing different results on the videochatgpt benchmark compared to what is reported in the paper. My scores across the five metrics are 2.72, 2.47, 3.11, 2.29, and 2.71, with an average of 2.66, which is different from the reported average of 2.89.
Considering that different versions of ChatGPT might affect the outcomes, could you please provide a pretrained model for testing? This would help verify the results. Thank you for your assistance.

dependency error

When i run pip install -r requirements.txt, error occurs as blow:
ERROR: Cannot install -r requirements.txt (line 205) and nvidia-nccl-cu12==2.18.1 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested nvidia-nccl-cu12==2.18.1
torch 2.2.2 depends on nvidia-nccl-cu12==2.19.3; platform_system == "Linux" and platform_machine == "x86_64"

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

Could you please help me?

Thanks!

When will the LongVLM code be released?

LongVLM seems like an amazing leap forward in VLM space. And I am looking forward to the release of the video model!

question about evaluation and chatgpt version?

Thank you for the open source. May I ask when your experiment took place? Since different versions of ChatGPT can lead to different evaluation results, how did you compare it with Video-ChatGPT?

Coder release

Hi~ Thanks for the amazing work! Could I know when the code will be released?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs