tiger-ai-lab / consisti2v Goto Github PK

View Code? Open in Web Editor NEW

166.0 166.0 11.0 29.38 MB

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Home Page: https://tiger-ai-lab.github.io/ConsistI2V/

License: MIT License

Python 100.00%

diffusion-models image-to-video-generation video-generation video-synthesis

consisti2v's Issues

frame jumping issue in high resolution

@wren93 For higher resolution experiments, have you seen frame jumping issue? Namely, the generated video is different from the conditioned image?

Where to download the Training Dataset

Hi authors,
Thanks for this awesome work! In the paper, the ConsistI2V is trained with WebVid-10M dataset. If I want to reproduce the training, which website should I refer to download this dataset? Thanks

Discussion on Computing Resources and Training Details

Thank you for your significant contributions. I attempted to use consistI2V to train our tasks, and found that each iteration takes approximately 24.14s/it, based on the default parameters (8 GPUs with 3 batch size with 256x256 resolution). I'm curious about the duration of each iteration when you use the default training YAML. Could you share your experience?

关于结果几个问题：

请问如何去除水印；
生成分辨率太低，如何提高分辨率

Question about negative prompt

Thank you for sharing the excellent work! I am confused about the "negative prompt" shown in your demo and code. It seems that you didn't mention that in your paper. What is it used for?

The camera motion cannot be used

Hello, Thanks for your nice work! I want to use the code to achieve the camera motion result. By simply setting the camera motion (such as pan_left), the dim match would generate some problems in z_T calculation. So how to use the code correctly to get the camera motion results?

Why does the video generated by the model I trained appear to collapse?

Data: WebVid10M( 80K video data were selected)
train epochs: 120,000

Issue with blurry results in fine-tuned Model

Hello, your work is really cool!

I have been fine-tuning your model using my dataset, 25k videos, starting from your TIGER-Lab/ConsistI2V checkpoint. Due to limited resources, I used a batch size of 2 training on 2 RTX 6000, while keeping the rest of the configuration the same. However, I noticed that the geometry of the moving objects is blurry.

Is this an expected outcome since I cannot replicate batch size of 192? Are the number of GPU or dataset matter here? Did you observe this problem during training the model and was it gone after training for longer?

Code Availability for ConsistI2V Project?

Hi there!

I'm excited about the ConsistI2V project's ability to have consistent image to video as the source. I noticed the code isn't currently available in the repository. While I understand it's still under development, I'm curious if there's any information about a potential release timeframe.

I appreciate any insights you can share about the code's availability. Thanks for your time and the awesome project!

Discussion on Computing Resources and Training Details

Hello, I am very interested in your work, and I am really impressed with your demo. I would like to inquire about the number of GPUs used for training the diffusion model and the duration of the training time. Additionally, the dataset is sampled from WebVid-10M, and I noticed that you only sampled 16 frames each video. How do you ensure that the sampled series are sufficiently dynamic, and is this 16-frame sampling a tradeoff? Looking forward to your response!

watermark problem

Hi,

I wonder why there is always a watermark like pattern appealing in the generated video? Any idea how to get rid of it?

autoregressive doesn't work?

Hi there - thanks for this amazing project and releasing the code!

I'm trying to run autoregressive inference using the default yaml file inference_autoregress, but the resulting video ends up being the same length as using the regular inference

any ideas what I might be doing wrong?

memory

hallow，could you please tell me how much memory of cuda i should to prepare training

How to increase the resolution of the resulting video?

Thanks for your interesting work and the nice code!
I want to know how to increase the resolution of the resulting video, it need re-training using the dataset of bigger resolution?

How can I fine-tune the provided model?

Checkpoint can't be downloaded

Dear Authors,
Thank you for your great work. Could you please help check that the checkpoint is not available now on the Huggingface. Thank you!

low resolution output

Firstly, excellent work. Consistency with the first frame is very important in the actual image animation generation. I play with ConsistI2V on Replicate with different images. However, the image animation has a low-resolution issue. The input images are high-resolution, but the output video is in low resolution. Even the demo outputs on Replicate have low resolution.
1. What are the prompt settings for these outputs of video gallery on the project page?
2. Is the low resolution related to the prompt settings, or is it a limitation of the model itself?

Again, thank you for your excellent work.

tiger-ai-lab / consisti2v Goto Github PK

consisti2v's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs