GithubHelp home page GithubHelp logo

ai-forever / kandinskyvideo Goto Github PK

View Code? Open in Web Editor NEW
144.0 14.0 20.0 187.94 MB

KandinskyVideo — multilingual end-to-end text2video latent diffusion model

License: Apache License 2.0

Python 100.00%
kandinsky latent-diffusion text-to-video video-generation

kandinskyvideo's Introduction

Kandinsky Video — a new text-to-video generation model

SoTA quality among open-source solutions

This repository is the official implementation of Kandinsky Video model.

Paper | Project page | Hugging Face Spaces | Telegram-bot | Habr post | Our text-to-image model | Replicate


Kandinsky Video is a text-to-video generation model, which is based on the FusionFrames architecture and Kandinsky 3.0 text-to-image model, consisting of two main stages: keyframe generation and interpolation. Our approach for temporal conditioning allows us to generate videos with high-quality appearance, smoothness and dynamics.

Pipeline


The encoded text prompt enters the U-Net keyframe generation model with temporal layers or blocks, and then the sampled latent keyframes are sent to the latent interpolation model in such a way as to predict three interpolation frames between two keyframes. A temporal MoVQ-GAN decoder is used to get the final video result.

Architecture details

  • Text encoder (Flan-UL2) - 8.6B
  • Latent Diffusion U-Net3D - 4.0B
  • MoVQ encoder/decoder - 256M

How to use

Check our jupyter notebooks with examples in ./examples folder

1. text2video

from video_kandinsky3 import get_T2V_pipeline

t2v_pipe = get_T2V_pipeline('cuda', fp16=True)

pfps = 'medium' # ['low', 'medium', 'high']
video = t2v_pipe(
    'a red car is drifting on the mountain road, close view, fast movement',
    width=640, height=384, fps=fps
)

Results

"A car moving on the road from the sea to the mountains" "A red car drifting, 4k video" "Chemistry laboratory, chemical explosion, 4k" "Erupting volcano raw power, molten lava, and the forces of the Earth"
"Luminescent jellyfish swims underwater, neon, 4k" "Majestic waterfalls in a lush rainforest power, mist, and biodiversity" "White ghost flies through a night clearing, 4k" "Wildlife migration herds on the move, crossing landscapes in harmony"
"Majestic humpback whale breaching power, grace, and ocean spectacle" "Evoke the sense of wonder in a time-lapse journey through changing seasons" "Explore the fascinating world of underwater creatures in a visually stunning sequence" "Polar ice caps the pristine wilderness of the Arctic and Antarctic"
"Rolling waves on a sandy beach relaxation, rhythm, and coastal beauty" "Sloth in slow motion deliberate movements, relaxation, and arboreal life" "Time-lapse of a flower blooming growth, beauty, and the passage of time" "Craft a heartwarming narrative showcasing the bond between a human and their loyal pet companion"

Authors

BibTeX

If you use our work in your research, please cite our publication:

@article{arkhipkin2023fusionframes,
  title     = {FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline},
  author    = {Arkhipkin, Vladimir and Shaheen, Zein and Vasilev, Viacheslav and Dakhova, Elizaveta and Kuznetsov, Andrey and Dimitrov, Denis},
  journal   = {arXiv preprint arXiv:2311.13073},
  year      = {2023}, 
}

kandinskyvideo's People

Contributors

anvilarth avatar chenxwh avatar denndimitrov avatar kuznetsoffandrey avatar oribetelgeuse avatar vivasilev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kandinskyvideo's Issues

How to save result to an mp4 file?

After your example code...

from video_kandinsky3 import get_T2V_pipeline

t2v_pipe = get_T2V_pipeline('cuda', fp16=True)

pfps = 'medium' # ['low', 'medium', 'high']
video = t2v_pipe(
    'a red car is drifting on the mountain road, close view, fast movement',
    width=640, height=384, fps=fps
)

how do I save the video to an mp4 file?

Also what is the recommended fps to use?

RuntimeError: size of tensors must match except in dimsion 1

from video_kandinsky3 import get_T2V_pipeline
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:8024"

t2v_pipe = get_T2V_pipeline('cuda', fp16=True)

pfps = 'medium' # ['low', 'medium', 'high']
video = t2v_pipe(
'a red car is drifting on the mountain road, close view, fast movement',
width=640, height=384, fps=fps
)


RuntimeError: size of tensors must match except in dimsion 1. Expected size 2 but got size 3 for tensor number 1 in the list

open source

Thank you for your outstanding work. Will the training code be open sourced?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.