GithubHelp home page GithubHelp logo

anminhhung / stable-diffusion-videos Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nateraw/stable-diffusion-videos

0.0 0.0 0.0 9.93 MB

Create ๐Ÿ”ฅ videos with Stable Diffusion by exploring the latent space and morphing between text prompts

License: Apache License 2.0

Python 83.27% Jupyter Notebook 16.73%

stable-diffusion-videos's Introduction

stable-diffusion-videos

Try it yourself in Colab: Open In Colab

TPU version (~x6 faster than standard colab GPUs): Open In Colab

Example - morphing between "blueberry spaghetti" and "strawberry spaghetti"

berry_good_spaghetti.2.mp4

Installation

pip install stable_diffusion_videos

Usage

Check out the examples folder for example scripts ๐Ÿ‘€

Making Videos

Note: For Apple M1 architecture, use torch.float32 instead, as torch.float16 is not available on MPS.

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=3,
    height=512,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
    width=512,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
    output_dir='dreams',        # Where images/videos will be saved
    name='animals_test',        # Subdirectory of output_dir where images/videos will be saved
    guidance_scale=8.5,         # Higher adheres to prompt more, lower lets model take the wheel
    num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
)

Making Music Videos

New! Music can be added to the video by providing a path to an audio file. The audio will inform the rate of interpolation so the videos move to the beat ๐ŸŽถ

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

# Seconds in the song.
audio_offsets = [146, 148]  # [Start, end]
fps = 30  # Use lower values for testing (5 or 10), higher values for better quality (30 or 60)

# Convert seconds to frames
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=num_interpolation_steps,
    audio_filepath='audio.mp3',
    audio_start_sec=audio_offsets[0],
    fps=fps,
    height=512,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
    width=512,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
    output_dir='dreams',        # Where images/videos will be saved
    guidance_scale=7.5,         # Higher adheres to prompt more, lower lets model take the wheel
    num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
)

Using the UI

from stable_diffusion_videos import StableDiffusionWalkPipeline, Interface
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

interface = Interface(pipeline)
interface.launch()

Credits

This work built off of a script shared by @karpathy. The script was modified to this gist, which was then updated/modified to this repo.

Contributing

You can file any issues/feature requests here

Enjoy ๐Ÿค—

Extras

Upsample with Real-ESRGAN

You can also 4x upsample your images with Real-ESRGAN!

It's included when you pip install the latest version of stable-diffusion-videos!

You'll be able to use upsample=True in the walk function, like this:

pipeline.walk(['a cat', 'a dog'], [234, 345], upsample=True)

The above may cause you to run out of VRAM. No problem, you can do upsampling separately.

To upsample an individual image:

from stable_diffusion_videos import RealESRGANModel

model = RealESRGANModel.from_pretrained('nateraw/real-esrgan')
enhanced_image = model('your_file.jpg')

Or, to do a whole folder:

from stable_diffusion_videos import RealESRGANModel

model = RealESRGANModel.from_pretrained('nateraw/real-esrgan')
model.upsample_imagefolder('path/to/images/', 'path/to/output_dir')

stable-diffusion-videos's People

Contributors

nateraw avatar charlielito avatar danielpatrickhug avatar philgzl avatar codefaux avatar niravprajapati1 avatar 0x1355 avatar thehappydinoa avatar atomic-germ avatar seriousran avatar eltociear avatar borda avatar ggozad avatar kennethgoodman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.