picsart-ai-research / streamingt2v Goto Github PK
View Code? Open in Web Editor NEWStreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Home Page: https://streamingt2v.github.io/
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Home Page: https://streamingt2v.github.io/
This is my code error.
I have downloaded laion/CLIP-ViT-H-14-laion2B-s32B-b79K/
And it was placed under t2v_enhanced.
But it seems that the path does not quite match this rule with the data.
Errors:
When I run the code, following error occurred.
Traceback (most recent call last):
File "/home/StreamingT2V/t2v_enhanced/inference.py", line 97, in <module>
stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)
TypeError: stream_long_gen() takes 9 positional arguments but 10 were given
In infernece.py
10 parameters are passed in.
stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)
However, when stream_long_gen
is defined only 9 parameters there.
def stream_long_gen(prompt, short_video, n_autoreg_gen, seed, t, image_guidance, result_file_stem, stream_cli, stream_model):
I try to deply the StreamingT2V in runpod.
But it stuck in a modelscope downloading. I wonder what is that model and how to download it manually and move to suitable location.
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: `TransformerTemporalModel` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead.
deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-20 00:58:19,707 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-20 00:58:19,707 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-20 00:58:19,732 - modelscope - INFO - Loading done! Current index file version is 1.9.0, with md5 4dda297b0e7635fe0be07f1409d42589 and a total number of 921 components indexed
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:00<00:00, 6.76it/s]
It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, image_encoder, unet, scheduler, feature_extractor to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading.
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:173: UserWarning:
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.43it/s]
2024-04-20 00:58:26,276 - modelscope - INFO - Use user-specified model revision: v1.1.0
Downloading: 15%|█████████▊ | 800M/5.26G [00:23<00:50, 94.3MB/s]
Any chance you can replace the rm calls with something compatible with both linux and Windows?
When running inference.py I get
'rm' is not recognized as an internal or external command,
operable program or batch file.
No stats as to which script/line causes it.
Output movie is still crteated, so not a complete show stopper.
The following error occurred while running "python gradio_demo. py":
(st2v) root@autodl-container-e3fa488242-5bb059c5:~/autodl-tmp/StreamingT2V/t2v_enhanced# python gradio_demo.py
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: TransformerTemporalModelOutput
is deprecated and will be removed in version 0.29. Importing TransformerTemporalModelOutput
from diffusers.models.transformer_temporal
is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput
, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: TransformerTemporalModel
is deprecated and will be removed in version 0.29. Importing TransformerTemporalModel
from diffusers.models.transformer_temporal
is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel
, instead.
deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: TransformerTemporalModelOutput
is deprecated and will be removed in version 0.29. Importing TransformerSpatioTemporalModel
from diffusers.models.transformer_temporal
is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel
, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-18 13:24:47,828 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-18 13:24:47,829 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-18 13:24:47,863 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 f8e838123bfe60156f78863dc483a14b and a total number of 972 components indexed
Traceback (most recent call last):
File "/root/autodl-tmp/StreamingT2V/t2v_enhanced/gradio_demo.py", line 42, in
msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'
May I ask how to solve it?
I am trying to generate 248 frame long sequence. It generates low resolution video of 31 seconds, but the final enhanced video is only 4 sec long. I found this for many different cases with varying frame sizes. Why is enhanced (high res) video length limited to 4sec?
Please find a sample case with output files here.
https://drive.google.com/drive/folders/1eUNgtO_Ndf0qCza31BzgPEsU4Y75HZl8?usp=sharing
How to resolve this?
Please save Github some bandwidth and list average VRAM usage/minimum VRAM requirements per generation type on the home page. Thanks!
Hi, I wonder whether there are any plans on releasing the training pipeline of StreamingT2V? Thank you very much!
I want to use the downloaded clip model, so I change the 'pretrained' to the model path. like this:
model, _, _ = open_clip.create_model_and_transforms(
arch,
pretrained='/root/.cache/huggingface/hub/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/open_clip_pytorch_model.bin',
device=torch.device("cpu"),
#pretrained=version,
)
But it occurs the error:
Traceback (most recent call last):
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 785, in adapt_typehints
val = adapt_class_type(val, serialize, instantiate_classes, sub_add_kwargs, prev_val=prev_val)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 996, in adapt_class_type
return val_class(**{**init_args, **dict_kwargs})
File "/root/proj/StreamingT2V/t2v_enhanced/model/diffusers_conditional/models/controlnet/image_embedder.py", line 75, in init
model, _, _ = open_clip.create_model_and_transforms(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 387, in create_model_and_transforms
model = create_model(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 291, in create_model
load_checkpoint(model, checkpoint_path)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 160, in load_checkpoint
resize_text_pos_embed(state_dict, model)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/model.py", line 573, in resize_text_pos_embed
assert old_width == width, 'text pos_embed width changed!'
AssertionError: text pos_embed width changed!
Do I have another way to load the clip model locally?
I don't know why your framework needs to use a modelscope. But this thing will download a lot of models in Conda's environment. It takes up a lot of space. Suggest placing future versions under a folder in the project. This is better.
Thank you, I will continue to support this product.
Hi, is there any config/settings to reproduce the videos shown here as demos? My brief tests with the default settings in inference.py
yield bad results, see:
(.venv) PS D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced> python inference.py --prompt="A cat running on the street"
Traceback (most recent call last):
File "D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced\inference.py", line 11, in
from t2v_enhanced.model.video_ldm import VideoLDM
ModuleNotFoundError: No module named 't2v_enhanced'
VideoCrafter2 is a strong T2V model. Did you apply StreamingT2V to it?
I'd be interested in combining your model with the ones mentioned in the title. Any way to do that?
Hi authors, thanks for your work! Could you please provide the evaluation code for Table 8. Quanitative comparison to state-of-the-art open-source text-to-long-video generators?
Thank you for your great contributions! I noticed in your paper that the model is trained with a dataset from publicly available sources. Could you possibly provide further details about this ?
I encountered a problem while running: modelscope - WARNING - task video to video output keys are missing. How can I resolve this issue? Thank you
error: Validation failed: No action for key "trainer.barebones" to check its value.
如果可以的话,请帮忙出一个文档,万分感谢
Hi, I found that here:
https://github.com/Picsart-AI-Research/StreamingT2V/blame/c1b8068bcbcdbbfa0dd0df3371d3c93a1f5132de/t2v_enhanced/model_init.py#L71C7-L71C7
def init_streamingt2v_model(ckpt_file, result_fol):
...
cli = CustomCLI(VideoLDM)
It seems that VideoLDM must initialized from damo-vilab/text-to-video-ms-1.7b (in config pipeline_repo: damo-vilab/text-to-video-ms-1.7b). Moreover, ckpt_file, which is the path to Stream_t2v.ckpt, just add to sys.argv, but not used in any function.
I am confused about this piece of code and hope to get your explanation, thanks :)
File: model_init.py
pipe.enable_model_cpu_offload()
return pipe.to(device)
It seems after enabling CPU offloading option, model is send to CUDA device. It is done so in a number of model initializations. It seems the correct option would be:
pipe.enable_model_cpu_offload()
return pipe
For cases, where model offloading is not done:
return pipe.to(device)
Thanks,
python gradio_demo.py
gives this error then aborts
Traceback (most recent call last):
File "D:\Tests\StreamingT2V\StreamingT2V\t2v_enhanced\gradio_demo.py", line 43, in <module>
msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'
Thanks for your interesting work. Intuitively, autoregressive generation may result in poor quality after some iterations, which is also reflected in my experiments on this project. But I noticed that the video on your YouTube suffers little on this issue. Do you use some tricks?
I tried this on huggingface.co space to test but it doesn't work and nothing worked on my mobile phone.
You can try here:
https://huggingface.co/spaces/PAIR/StreamingT2V
Hi,
when I run this command:
python inference.py --prompt="A cat running on the street"
I got the following "CUDA out of memory." error-message: I have 2 nvidia-4090 but it seems I only can use one of them: could you let me know how to change the config.yaml so I could run inference.py with nvidia-4090(24GB)? Thanks!
...
Traceback (most recent call last):
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/inference.py", line 66, in
stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol)
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model
model.load_state_dict(torch.load(
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
return obj.cuda(device)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda
untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 23.65 GiB total capacity; 22.20 GiB already allocated; 22.06 MiB free; 22.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
...
Hello.
Thanks for making code public.
Code works fine for 24 frames (--num_frames=24), but throws error for 48 frames. It happens in class Animation: of utils.iimage.py file.
self.anim_str = self.anim_obj.to_html5_video()
Fix: Default embed_limit=20.0 (in MB). Set it to 1000.0 MB
self.anim_str = self.anim_obj.to_html5_video(embed_limit=1000.0)
OS: Ubuntu22.04
matplotlib.version = '3.8.3'
Thanks.
Very good job, what kind of GPU configuration do you need? What is the training time?
how to reduce memory to 16 GPU
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.