GithubHelp home page GithubHelp logo

videobooth's Introduction

VideoBooth

Paper Project Page Video Visitor

This repository will contain the implementation of the following paper:

VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

From MMLab@NTU affliated with S-Lab, Nanyang Technological University and Shanghai AI Laboratory.

Overview

Our VideoBooth generates videos with the subjects specified in the image prompts. overall_structure

TODO

  • Release the training code.
  • Release the training dataset.

Installation

  1. Clone the repository.
git clone https://github.com/Vchitect/VideoBooth.git
cd VideoBooth
  1. Install the environment.
conda env create -f environment.yml
conda activate videobooth
  1. Download pretrained models (Stable Diffusion v1.4, VideoBooth), and put them under the folder ./pretrained_models/.

Inference

Here, we provide one example to perform the inference.

python sample_scripts/sample.py --config sample_scripts/configs/panda.yaml

If you want to use your own image, you need to segment the object first. We use Grounded-SAM to segment the subject from images.

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{jiang2023videobooth,
    author = {Jiang, Yuming and Wu, Tianxing and Yang, Shuai and Si, Chenyang and Lin, Dahua and Qiao, Yu and Loy, Chen Change and Liu, Ziwei},
    title = {VideoBooth: Diffusion-based Video Generation with Image Prompts},
    year = {2023}
}

videobooth's People

Contributors

yumingj avatar

Stargazers

 avatar Yuzhuo Kong avatar hchen avatar  avatar  avatar Jacob avatar 摇曳的快乐 avatar Zhuowei Chen avatar Maki Nishikino avatar  avatar 冯祥卫 avatar ZZ-yuan avatar  avatar Stéphane Monté avatar lulihua avatar Guangtao Lyu ( 吕光涛 ) avatar Sherry X. Chen avatar  avatar Jiangning Zhang avatar Moein Heidari avatar Haofan Wang avatar  avatar  avatar Xuechao Zou avatar  avatar AHuier avatar  avatar  avatar WenKang Han avatar Corleone-Huang avatar  avatar  avatar MarTinGuo avatar Wanquan Feng avatar Jichao Zhang avatar  avatar Jianzong Wu avatar Jing Yang avatar Johnny avatar hrz avatar Víctor Pérez avatar Park Sang kil avatar NIRVANA avatar  avatar Yujie Wei avatar  avatar  avatar Tho avatar  avatar  avatar  avatar  avatar  avatar  avatar Viviancat avatar  avatar tomato avatar syddharth avatar  avatar Zafar Ansari avatar  avatar Tanguy avatar  avatar Yixin Yang avatar  avatar Ziming Zhong avatar Yiming Shi avatar Kairun Wen avatar Chenxin Li avatar xiaojieli0903 avatar Qilong avatar Mike Yang avatar Jiachen Zhou avatar Lewei Lu avatar Lum avatar Paul Zarudnev avatar  avatar Jeff Carpenter avatar Yepeng Jin avatar yuqi avatar  avatar Zhao (Dylan) Wang avatar Zhenhua Yang avatar Fahad Shamshad avatar Mitchell Mosure avatar Lingdong Kong avatar Jiawei Ren avatar Sen Liang avatar Owl Burger avatar  avatar Hao Zhang avatar Max Ku avatar Liu Gongye avatar Mike Oller avatar Shuzhou Yang avatar Yongsheng Yu avatar Pyjcsx avatar  avatar Jiayi Guo avatar  avatar

Watchers

Daniel Noskin avatar Ziwei Liu avatar  avatar Yaohui avatar PeterZs avatar Paragoner avatar Abhinav Bajaj avatar Pyjcsx avatar  avatar Tho avatar Deepak Mangla avatar  avatar Yujie Wei avatar KIHONG KIM avatar Tanguy avatar Louie Pecan avatar  avatar  avatar  avatar  avatar  avatar  avatar

videobooth's Issues

Some question about cross-frame attention

Thanks for your excellent work. But I didn't find any information related to cross-frame attention in your citation #30 and #76 in section 3.1, could you please provide the source for cross-frame attention? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.