lucidrains / lumiere-pytorch Goto Github PK

View Code? Open in Web Editor NEW

224.0 24.0 7.0 1.17 MB

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

License: MIT License

Python 100.00%

artificial-intelligence deep-learning denoising-diffusion text-to-video

lumiere-pytorch's Introduction

lumiere-pytorch's People

Contributors

Stargazers

Watchers

Forkers

umcai f-amerehi lcsouzamenezes goswamig tengfei86 unanan gerardragbir

lumiere-pytorch's Issues

Is this complete?

This looks amazing. I am yet to give it a try, but is this complete or still a work in progress? Would love to share feedback once I set this up and try it out.

Thanks

What is the difference between the two `time`s?

1. `time` at the third dimension of the input tensor `video`:

https://github.com/lucidrains/lumiere-pytorch/blob/main/lumiere_pytorch/lumiere.py#L569

2. `time` as the input parameter passed into `KarrasUnet`'s method `forward`:

https://github.com/lucidrains/lumiere-pytorch/blob/main/README.md#usage
https://github.com/lucidrains/denoising-diffusion-pytorch/blob/main/denoising_diffusion_pytorch/karras_unet.py#L560

Incorrect time_dim for intermediate temporal layers

I have been working through your code trying to get it working, and I believe I found an issue when you set the time_dim for the temporal layers here:

def set_time_dim_(
    klasses: Tuple[Type[Module]],
    model: Module,
    time_dim: int
):
    for model in model.modules():
        if isinstance(model, klasses):
            model.time_dim = time_dim

You are setting the same time_dim for all of layers, but the size of the temporal dimension is cut in half after each step in the UNet. Because of this, the model crashes when trying to reshape/rearrange the tensors for intermediate layers (for instance here (maybe others as well?):

if is_video:
    batch_size = x.shape[0]
    x = rearrange(x, 'b c t h w -> b h w t c')
else:
    assert exists(batch_size) or exists(self.time_dim)

    rearrange_kwargs = dict(b = batch_size, t = self.time_dim)
    x = rearrange(x, '(b t) c h w -> b h w t c', **compact_values(rearrange_kwargs))

I am working on my on workaround in the same set_time_dim function but thought I would report it in case it is helpful.

lucidrains / lumiere-pytorch Goto Github PK

lumiere-pytorch's Introduction

Lumiere - Pytorch

Appreciation

Install

Usage

Todo

Citations

lumiere-pytorch's People

Contributors

Stargazers

Watchers

Forkers

lumiere-pytorch's Issues

1. time at the third dimension of the input tensor video:

2. time as the input parameter passed into KarrasUnet's method forward:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. `time` at the third dimension of the input tensor `video`:

2. `time` as the input parameter passed into `KarrasUnet`'s method `forward`: