GithubHelp home page GithubHelp logo

tyroneli / open-sora-plan Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pku-yuangroup/open-sora-plan

0.0 0.0 0.0 235.88 MB

This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

License: Other

Shell 0.26% Python 45.26% Jupyter Notebook 54.48%

open-sora-plan's Introduction

Open-Sora Plan

[Project Page] [中文主页]

This project aims to create a simple and scalable repo, to reproduce Sora (OpenAI, but we prefer to call it "CloseAI" ) and build knowledge about Video-VQVAE (VideoGPT) + DiT at scale. However, we have limited resources, we deeply wish all open-source community can contribute to this project. Pull request are welcome!!!

本项目希望通过开源社区的力量复现Sora,由北大-兔展AIGC联合实验室共同发起,当前我们资源有限仅搭建了基础架构,无法进行完整训练,希望通过开源社区逐步增加模块并筹集资源进行训练,当前版本离目标差距巨大,仍需持续完善和快速迭代,欢迎Pull request!!!

The architecture of Open-Sora-Plan

News

[2024.03.01] Training codes are available now! Learn more in our project page. Please feel free to watch 👀 this repository for the latest updates.

Todo

  • support variable aspect ratios, resolutions, durations training

  • add class-conditioning on embeddings

  • sampling script

  • fine-tune Video-VQVAE on higher resolution

  • incorporating more conditions

  • training with more data and more GPU

Requirements and Installation

The recommended requirements are as follows.

  • Python >= 3.8
  • Pytorch >= 1.13.1
  • CUDA Version >= 11.7
  • Install required packages:
git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan
cd Open-Sora-Plan
conda create -n opensora python=3.8 -y
conda activate opensora
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
cd VideoGPT
pip install -e .
cd ..

Usage

Datasets

We test the code with UCF-101 dataset. In order to download UCF-101 dataset, you can download the necessary files in here. The code assumes a ucf101 directory with the following structure

UCF-101/
    ApplyEyeMakeup/
        v1.avi
        ...
    ...
    YoYo/
        v1.avi
        ...

Video-VQVAE (VideoGPT)

Training

Refer to origin repo. Use the scripts/train_vqvae.py script to train a Video-VQVAE. Execute python scripts/train_vqvae.py -h for information on all available training settings. A subset of more relevant settings are listed below, along with default values.

cd VideoGPT
VQ-VAE Specific Settings
  • --embedding_dim: number of dimensions for codebooks embeddings
  • --n_codes 2048: number of codes in the codebook
  • --n_hiddens 240: number of hidden features in the residual blocks
  • --n_res_layers 4: number of residual blocks
  • --downsample 4 4 4: T H W downsampling stride of the encoder
Training Settings
  • --gpus 2: number of gpus for distributed training
  • --sync_batchnorm: uses SyncBatchNorm instead of BatchNorm3d when using > 1 gpu
  • --gradient_clip_val 1: gradient clipping threshold for training
  • --batch_size 16: batch size per gpu
  • --num_workers 8: number of workers for each DataLoader
Dataset Settings
  • --data_path <path>: path to an hdf5 file or a folder containing train and test folders with subdirectories of videos
  • --resolution 128: spatial resolution to train on
  • --sequence_length 16: temporal resolution, or video clip length

Reconstructing

python VideoGPT/rec_video.py --video-path "assets/origin_video_0.mp4" --rec-path "rec_video_0.mp4" --num-frames 500 --sample-rate 1
python VideoGPT/rec_video.py --video-path "assets/origin_video_1.mp4" --rec-path "rec_video_1.mp4" --resolution 196 --num-frames 600 --sample-rate 1

We present four reconstructed videos in this demonstration, arranged from left to right as follows:

3s 596x336 10s 256x256 18s 196x196 24s 168x96

VideoDiT (DiT)

Training

cd DiT
torchrun  --nproc_per_node=8 train.py \
  --model DiT-XL/122 --pt-ckpt DiT-XL-2-256x256.pt \
  --vae ucf101_stride4x4x4 \
  --data-path /remote-home/yeyang/UCF-101 --num-classes 101 \
  --sample-rate 2 --num-frames 8 --max-image-size 128 \
  --epochs 1400 --global-batch-size 256 --lr 1e-4 \
  --ckpt-every 1000 --log-every 1000 

Sampling

Coming soon.

Acknowledgement

  • DiT: Scalable Diffusion Models with Transformers.
  • VideoGPT: Video Generation using VQ-VAE and Transformers.
  • FiT: Flexible Vision Transformer for Diffusion Model.
  • Positional Interpolation: Extending Context Window of Large Language Models via Positional Interpolation.

License

  • The service is a research preview intended for non-commercial use only. See LICENSE.txt for details.

Contributors

open-sora-plan's People

Contributors

binzhu-ece avatar chg0901 avatar cxh0519 avatar junwuzhang19 avatar linb203 avatar liuhanchen-github avatar shyuanbest avatar tzy010822 avatar yuanli2333 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.