videobooth's Introduction

VideoBooth

This repository will contain the implementation of the following paper:

VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

From MMLab@NTU affliated with S-Lab, Nanyang Technological University and Shanghai AI Laboratory.

Overview

Our VideoBooth generates videos with the subjects specified in the image prompts.

TODO

Release the training code.
Release the training dataset.

Installation

Clone the repository.

git clone https://github.com/Vchitect/VideoBooth.git
cd VideoBooth

Install the environment.

conda env create -f environment.yml
conda activate videobooth

Download pretrained models (Stable Diffusion v1.4, VideoBooth), and put them under the folder ./pretrained_models/.

Inference

Here, we provide one example to perform the inference.

python sample_scripts/sample.py --config sample_scripts/configs/panda.yaml

If you want to use your own image, you need to segment the object first. We use Grounded-SAM to segment the subject from images.

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{jiang2023videobooth,
    author = {Jiang, Yuming and Wu, Tianxing and Yang, Shuai and Si, Chenyang and Lin, Dahua and Qiao, Yu and Loy, Chen Change and Liu, Ziwei},
    title = {VideoBooth: Diffusion-based Video Generation with Image Prompts},
    year = {2023}
}

videobooth's People