GithubHelp home page GithubHelp logo

gitdzreal93 / cogvideo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thudm/cogvideo

0.0 0.0 0.0 131.76 MB

Text-to-video generation.

License: Apache License 2.0

Shell 4.02% Python 95.61% Dockerfile 0.37%

cogvideo's Introduction

CogVideo

This is the official repo for the paper: CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers.

News! The demo for CogVideo is available!

It's also integrated into Huggingface Spaces ๐Ÿค— using Gradio. Try out the Web Demo Hugging Face Spaces

News! The code and model for text-to-video generation is now available! Currently we only supports simplified Chinese input.

CogVideo_samples.mp4
@article{hong2022cogvideo,
  title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers},
  author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie},
  journal={arXiv preprint arXiv:2205.15868},
  year={2022}
}

Web Demo

The demo for CogVideo is at https://wudao.aminer.cn/cogvideo/, where you can get hands-on practice on text-to-video generation. The original input is in Chinese.

Generated Samples

Video samples generated by CogVideo. The actual text inputs are in Chinese. Each sample is a 4-second clip of 32 frames, and here we sample 9 frames uniformly for display purposes.

Intro images

More samples

CogVideo is able to generate relatively high-frame-rate videos. A 4-second clip of 32 frames is shown below.

High-frame-rate sample

Getting Started

Setup

  • Hardware: Linux servers with Nvidia A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size and --batch-size or training smaller models on less powerful GPUs.
  • Environment: install dependencies via pip install -r requirements.txt.
  • LocalAttention: Make sure you have CUDA installed and compile the local attention kernel.
pip install git+https://github.com/Sleepychord/Image-Local-Attention

Docker

Alternatively you can use Docker to handle all dependencies.

  1. Run ./build_image.sh
  2. Run ./run_image.sh
  3. Run ./install_image_local_attention

Optionally, after that you can recommit the image to avoid having to install image local attention again.

Download

Our code will automatically download or detect the models into the path defined by environment variable SAT_HOME. You can also manually download CogVideo-Stage1 , CogVideo-Stage2 and CogView2-dsr place them under SAT_HOME (with folders named cogvideo-stage1 , cogvideo-stage2 and cogview2-dsr)

Text-to-Video Generation

./script/inference_cogvideo_pipeline.sh

Arguments useful in inference are mainly:

  • --input-source [path or "interactive"]. The path of the input file with one query per line. A CLI would be launched when using "interactive".
  • --output-path [path]. The folder containing the results.
  • --batch-size [int]. The number of samples will be generated per query.
  • --max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
  • --stage1-max-inference-batch-size [int] Maximum batch size per forward in Stage 1. Reduce it if OOM.
  • --both-stages. Run both stage1 and stage2 sequentially.
  • --use-guidance-stage1 Use classifier-free guidance in stage1, which is strongly suggested to get better results.

You'd better specify an environment variable SAT_HOME to specify the path to store the downloaded model.

Currently only Chinese input is supported.

cogvideo's People

Contributors

ak391 avatar mallorbc avatar wenyihong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.