tiger-ai-lab / consisti2v Goto Github PK
View Code? Open in Web Editor NEWConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Home Page: https://tiger-ai-lab.github.io/ConsistI2V/
License: MIT License
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Home Page: https://tiger-ai-lab.github.io/ConsistI2V/
License: MIT License
@wren93 For higher resolution experiments, have you seen frame jumping issue? Namely, the generated video is different from the conditioned image?
Hi authors,
Thanks for this awesome work! In the paper, the ConsistI2V is trained with WebVid-10M dataset. If I want to reproduce the training, which website should I refer to download this dataset? Thanks
Thank you for your significant contributions. I attempted to use consistI2V to train our tasks, and found that each iteration takes approximately 24.14s/it, based on the default parameters (8 GPUs with 3 batch size with 256x256 resolution). I'm curious about the duration of each iteration when you use the default training YAML. Could you share your experience?
关于结果几个问题:
Thank you for sharing the excellent work! I am confused about the "negative prompt" shown in your demo and code. It seems that you didn't mention that in your paper. What is it used for?
Hello, Thanks for your nice work! I want to use the code to achieve the camera motion result. By simply setting the camera motion (such as pan_left), the dim match would generate some problems in z_T calculation. So how to use the code correctly to get the camera motion results?
Data: WebVid10M( 80K video data were selected)
train epochs: 120,000
Hello, your work is really cool!
I have been fine-tuning your model using my dataset, 25k videos, starting from your TIGER-Lab/ConsistI2V
checkpoint. Due to limited resources, I used a batch size of 2 training on 2 RTX 6000, while keeping the rest of the configuration the same. However, I noticed that the geometry of the moving objects is blurry.
Is this an expected outcome since I cannot replicate batch size of 192? Are the number of GPU or dataset matter here? Did you observe this problem during training the model and was it gone after training for longer?
Hi there!
I'm excited about the ConsistI2V project's ability to have consistent image to video as the source. I noticed the code isn't currently available in the repository. While I understand it's still under development, I'm curious if there's any information about a potential release timeframe.
I appreciate any insights you can share about the code's availability. Thanks for your time and the awesome project!
Hello, I am very interested in your work, and I am really impressed with your demo. I would like to inquire about the number of GPUs used for training the diffusion model and the duration of the training time. Additionally, the dataset is sampled from WebVid-10M, and I noticed that you only sampled 16 frames each video. How do you ensure that the sampled series are sufficiently dynamic, and is this 16-frame sampling a tradeoff? Looking forward to your response!
Hi,
I wonder why there is always a watermark like pattern appealing in the generated video? Any idea how to get rid of it?
Hi there - thanks for this amazing project and releasing the code!
I'm trying to run autoregressive inference using the default yaml file inference_autoregress, but the resulting video ends up being the same length as using the regular inference
any ideas what I might be doing wrong?
hallow,could you please tell me how much memory of cuda i should to prepare training
Thanks for your interesting work and the nice code!
I want to know how to increase the resolution of the resulting video, it need re-training using the dataset of bigger resolution?
Dear Authors,
Thank you for your great work. Could you please help check that the checkpoint is not available now on the Huggingface. Thank you!
Firstly, excellent work. Consistency with the first frame is very important in the actual image animation generation. I play with ConsistI2V on Replicate with different images. However, the image animation has a low-resolution issue. The input images are high-resolution, but the output video is in low resolution. Even the demo outputs on Replicate have low resolution.
1. What are the prompt settings for these outputs of video gallery on the project page?
2. Is the low resolution related to the prompt settings, or is it a limitation of the model itself?
Again, thank you for your excellent work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.