GithubHelp home page GithubHelp logo

longvlm's Introduction

LongVLM: Efficient Long Video Understanding via Large Language Models

[ECCV 2024] This is the official repository for our paper: LongVLM: Efficient Long Video Understanding via Large Language Models by Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang and Bohan Zhuang.

Architecture

main

Installation

conda create -n longvlm python==3.9
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1
pip install -r requirements.txt
export PYTHONPATH="./:$PYTHONPATH"

### For Training, install flash-attention
pip install ninja
git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
git checkout v1.0.7
python setup.py install

Pretrained model

Download pretrained base model and delta weights from Huggingface.

Apply delta weights to the base model:

python scripts/apply_delta.py \
    --base YOUR_PATH_TO_LLAMA_7B \
    --target YOUR_PATH_TO_LLAVA_7B \
    --delta YOUR_PATH_TO_LLAVA_DELTA_7B

Data preparation

Download ActivityNet dataset. Dataset architecture:

datasets
|___anet
|   |___v1-2/
|   |   |___train/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |   |___val/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |___v1-3/
|   |   |___train_val/
|   |   |   |___xxxx.mp4
|   |   |   |___xxxx.mp4
|   |___anet_benchmark_video_id.txt
|   |___video_list_v1_2_val.txt
|   |___video_list_v1_2_train.txt
|   |___video_list_v1_3_trainval.txt

Extract visual features using CLIP-VIT/L-14:

python scripts/save_features.py \
    --video_dir_path YOUR_PATH_TO_VIDEO_FOLDER \
    --list_file YOUR_PATH_TO_VIDEO_ID_FILE \
    --clip_feat_path_local YOUR_PATH_TO_SAVE_VISUAL_FEATURE_L \
    --clip_feat_path_mem YOUR_PATH_TO_SAVE_VISUAL_FEATURE_G

Train and Evaluation

sh run.sh

Examples

main main main main

Acknowledgement

This work is built upon VideoChatGPT. Thanks to their awesome work.

longvlm's People

Contributors

yuetianweng avatar

Stargazers

Zhenglin Cheng avatar  avatar and_gate avatar Yongxin Guo avatar Feng Chen avatar Insaf Ismath avatar nihao avatar Andy Cheng avatar Mingfei Han avatar XU Ao avatar Junlin Han avatar hulianyu avatar Asım Sinan Yüksel avatar  avatar Aaron Han avatar Stoney Kang avatar  avatar  avatar  avatar  avatar Qin Liu avatar AllenZhang avatar Xiaolong avatar Yaya Shi avatar Naptmn avatar Mohamed KARAA avatar Yuxuan Fan avatar Sam Shamsan avatar Jeff Carpenter avatar Junseok Lee avatar Antonio Alliegro avatar Xanh Ho avatar Jett Sjöberg avatar Lixiang Ru avatar Xuchen Li (李旭宸) avatar he neng avatar cgoe avatar Zizheng Pan avatar  avatar yahooo avatar Zhuang Zhuang avatar

Watchers

he neng avatar Zizheng Pan avatar Sam Shamsan avatar

longvlm's Issues

Unable to Reproduce videochatgpt Benchmark Results

Hello,

Thank you for your open-source contribution. I have trained a model using the code you provided. However, I am seeing different results on the videochatgpt benchmark compared to what is reported in the paper. My scores across the five metrics are 2.72, 2.47, 3.11, 2.29, and 2.71, with an average of 2.66, which is different from the reported average of 2.89.
Considering that different versions of ChatGPT might affect the outcomes, could you please provide a pretrained model for testing? This would help verify the results. Thank you for your assistance.

dependency error

When i run pip install -r requirements.txt, error occurs as blow:
ERROR: Cannot install -r requirements.txt (line 205) and nvidia-nccl-cu12==2.18.1 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested nvidia-nccl-cu12==2.18.1
torch 2.2.2 depends on nvidia-nccl-cu12==2.19.3; platform_system == "Linux" and platform_machine == "x86_64"

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

Could you please help me?

Thanks!

question about evaluation and chatgpt version?

Thank you for the open source. May I ask when your experiment took place? Since different versions of ChatGPT can lead to different evaluation results, how did you compare it with Video-ChatGPT?

Coder release

Hi~ Thanks for the amazing work! Could I know when the code will be released?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.