picsart-ai-research / streamingt2v Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 104.0 22.57 MB

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Home Page: https://streamingt2v.github.io/

Python 100.00%

long-video-generation

streamingt2v's People

Contributors

Stargazers

Watchers

Forkers

ericismyeldestson amutong xueminghui bruinxiong asksasasa83 00000muye00000 goodie1972 silencexsj hpoghos minokamiay camenduru glociks painebenjamin af-74413592 1179021477 nemonameless alexandor91 paperwave llbbcc assassindesign oldnaari qingshui anthonyyuan tyfeld neilblaze pipinam shaonc jaraim jmwdpk catalystofchange qbz226 liunix61 vital121 xusanpangzi azure-dragon-ai zfd1 hhy5277 beyond1920 2132660698 thanhpham1987 if-ai arctan90 tonghengcheng jianrong-lu saulocatharino kellhuang waichan8 useles2king fishke22 zhenqicai allenchang6868 iamleon121 cjnama smalltong02 alexchanjy baizze hongdangshao b08240 songtao1973 dawei533105 mr-harry fzydesign slyb2020 channono cpretzinger enjoyair dearborn-open-ai metoceantech zhoulingjie core00-gs youngboy88bin lognat0704 ponyzym xiaozhiob zhanglaocai gunpowder78 coding-alt feilisp4 kingzuo kuron88 zhaopufeng hzwinsome kiteretsu77 dancer4code yeyuxx adithyx shimomurakei

streamingt2v's Issues

Regarding the error issue with "arch='ViT-H-14 ', version='laion2b_s32d_b79k"

This is my code error.
I have downloaded laion/CLIP-ViT-H-14-laion2B-s32B-b79K/
And it was placed under t2v_enhanced.
But it seems that the path does not quite match this rule with the data.

Errors:

An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
Expected a <class 'NoneType'>
Given value type: <class 'jsonargparse.namespace.Namespace'>
Given value: Namespace(class_path='t2v_enhanced.model.diffusers_conditional.models.controlnet.image_embedder.FrozenOpenCLIPImageEmbedder', init_args=Namespace(arch='ViT-H-14', version='laion2b_s32b_b79k', device='cuda', max_length=77, freeze=True, antialias=True, ucg_rate=0.0, unsqueeze_dim=False, repeat_to_max_len=False, num_image_crops=0, output_tokens=False))

[bug] in inference.py

When I run the code, following error occurred.

Traceback (most recent call last):
  File "/home/StreamingT2V/t2v_enhanced/inference.py", line 97, in <module>
    stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)
TypeError: stream_long_gen() takes 9 positional arguments but 10 were given

In infernece.py 10 parameters are passed in.

stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)

However, when stream_long_gen is defined only 9 parameters there.

def stream_long_gen(prompt, short_video, n_autoreg_gen, seed, t, image_guidance, result_file_stem, stream_cli, stream_model):

Stuck in model downloading...

I try to deply the StreamingT2V in runpod.
But it stuck in a modelscope downloading. I wonder what is that model and how to download it manually and move to suitable location.

/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: `TransformerTemporalModel` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead.
  deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-20 00:58:19,707 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-20 00:58:19,707 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-20 00:58:19,732 - modelscope - INFO - Loading done! Current index file version is 1.9.0, with md5 4dda297b0e7635fe0be07f1409d42589 and a total number of 921 components indexed
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.76it/s]
It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, image_encoder, unet, scheduler, feature_extractor to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading.
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:173: UserWarning:
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:01<00:00,  6.43it/s]
2024-04-20 00:58:26,276 - modelscope - INFO - Use user-specified model revision: v1.1.0
Downloading:  15%|█████████▊                                                        | 800M/5.26G [00:23<00:50, 94.3MB/s]

Awesome work! When the training code will be released?

When to release train codes? I hope to train the model with my datasets?

rm not supported on Windows

Any chance you can replace the rm calls with something compatible with both linux and Windows?
When running inference.py I get

'rm' is not recognized as an internal or external command,
operable program or batch file.

No stats as to which script/line causes it.
Output movie is still crteated, so not a complete show stopper.

[BUG]The following error occurred while running "python gradio_demo. py"

The following error occurred while running "python gradio_demo. py":

(st2v) root@autodl-container-e3fa488242-5bb059c5:~/autodl-tmp/StreamingT2V/t2v_enhanced# python gradio_demo.py
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: TransformerTemporalModelOutput is deprecated and will be removed in version 0.29. Importing TransformerTemporalModelOutput from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: TransformerTemporalModel is deprecated and will be removed in version 0.29. Importing TransformerTemporalModel from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel, instead.
deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: TransformerTemporalModelOutput is deprecated and will be removed in version 0.29. Importing TransformerSpatioTemporalModel from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-18 13:24:47,828 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-18 13:24:47,829 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-18 13:24:47,863 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 f8e838123bfe60156f78863dc483a14b and a total number of 972 components indexed
Traceback (most recent call last):
File "/root/autodl-tmp/StreamingT2V/t2v_enhanced/gradio_demo.py", line 42, in
msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'

May I ask how to solve it？

Enhanced video length limited to 4sec

I am trying to generate 248 frame long sequence. It generates low resolution video of 31 seconds, but the final enhanced video is only 4 sec long. I found this for many different cases with varying frame sizes. Why is enhanced (high res) video length limited to 4sec?

Please find a sample case with output files here.
https://drive.google.com/drive/folders/1eUNgtO_Ndf0qCza31BzgPEsU4Y75HZl8?usp=sharing

ImportError: cannot import name '_datasets_server' from 'datasets.utils' (/root/anaconda3/envs/st2v/lib/python3.10/site-packages/datasets/utils/init.py)

How to resolve this?

Are there any plans to release the source code?

List VRAM usage on title page

Please save Github some bandwidth and list average VRAM usage/minimum VRAM requirements per generation type on the home page. Thanks!

Are there any plans to release the code of the training pipeline of StreamingT2V?

Hi, I wonder whether there are any plans on releasing the training pipeline of StreamingT2V? Thank you very much!

Failed to load CLIP from local

I want to use the downloaded clip model, so I change the 'pretrained' to the model path. like this:
model, _, _ = open_clip.create_model_and_transforms(
arch,
pretrained='/root/.cache/huggingface/hub/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/open_clip_pytorch_model.bin',
device=torch.device("cpu"),
#pretrained=version,
)
But it occurs the error：

Traceback (most recent call last):
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 785, in adapt_typehints
val = adapt_class_type(val, serialize, instantiate_classes, sub_add_kwargs, prev_val=prev_val)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 996, in adapt_class_type
return val_class(**{**init_args, **dict_kwargs})
File "/root/proj/StreamingT2V/t2v_enhanced/model/diffusers_conditional/models/controlnet/image_embedder.py", line 75, in init
model, _, _ = open_clip.create_model_and_transforms(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 387, in create_model_and_transforms
model = create_model(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 291, in create_model
load_checkpoint(model, checkpoint_path)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 160, in load_checkpoint
resize_text_pos_embed(state_dict, model)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/model.py", line 573, in resize_text_pos_embed
assert old_width == width, 'text pos_embed width changed!'
AssertionError: text pos_embed width changed!

Do I have another way to load the clip model locally？

Make a suggestion

I don't know why your framework needs to use a modelscope. But this thing will download a lot of models in Conda's environment. It takes up a lot of space. Suggest placing future versions under a folder in the project. This is better.

Thank you, I will continue to support this product.

Great Work, will you release the MAWE part code?

Reproducing demo videos on site?

Hi, is there any config/settings to reproduce the videos shown here as demos? My brief tests with the default settings in inference.py yield bad results, see:

A_monkeyh_holding_blue_flames_14_07_31_309858_enhanced.mp4

Great work, when will the code be released？

No module named 't2v_enhanced'

(.venv) PS D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced> python inference.py --prompt="A cat running on the street"
Traceback (most recent call last):
File "D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced\inference.py", line 11, in
from t2v_enhanced.model.video_ldm import VideoLDM
ModuleNotFoundError: No module named 't2v_enhanced'

Is this support VideoCrafter?

VideoCrafter2 is a strong T2V model. Did you apply StreamingT2V to it?

it support mac mps？

Is it possible to use Motion LoRA for AnimateDiff or MotionCtrl for SVD with this?

I'd be interested in combining your model with the ones mentioned in the title. Any way to do that?

Evaluation Code

Hi authors, thanks for your work! Could you please provide the evaluation code for Table 8. Quanitative comparison to state-of-the-art open-source text-to-long-video generators?

details on training dataset

Thank you for your great contributions! I noticed in your paper that the model is trained with a dataset from publicly available sources. Could you possibly provide further details about this ?

Cannot find cv2 module in t2v_enhanced\utils\iimage.py

modelscope - WARNING - task video-to-video output keys are missing

I encountered a problem while running: modelscope - WARNING - task video to video output keys are missing. How can I resolve this issue? Thank you

Some Errors：

When I execute the following statement, I get this error:
$I0$}XM6{K7AG6_1@4`GE9H$

Why would such an error be reported? I have followed the readme document as normal.

And I get this error when executing the following statement:

How to deal with the above problems?

error: Validation failed: No action for key "trainer.barebones" to check its value.

There is a bug about the image SVD inference file

这个能在4090的显卡上运行吗？

如果可以的话，请帮忙出一个文档，万分感谢

Must initialize from damo-vilab/text-to-video-ms-1.7b?

Hi, I found that here:
https://github.com/Picsart-AI-Research/StreamingT2V/blame/c1b8068bcbcdbbfa0dd0df3371d3c93a1f5132de/t2v_enhanced/model_init.py#L71C7-L71C7

def init_streamingt2v_model(ckpt_file, result_fol):
       ...
      cli = CustomCLI(VideoLDM)

It seems that VideoLDM must initialized from damo-vilab/text-to-video-ms-1.7b (in config pipeline_repo: damo-vilab/text-to-video-ms-1.7b). Moreover, ckpt_file, which is the path to Stream_t2v.ckpt, just add to sys.argv, but not used in any function.
I am confused about this piece of code and hope to get your explanation, thanks :)

Warning message: Enabling CPU offloading option for models

File: model_init.py

pipe.enable_model_cpu_offload()
return pipe.to(device)

It seems after enabling CPU offloading option, model is send to CUDA device. It is done so in a number of model initializations. It seems the correct option would be:

pipe.enable_model_cpu_offload()
return pipe

For cases, where model offloading is not done:
return pipe.to(device)

Thanks,

Error launching gradio_demo.py under Windows

python gradio_demo.py

gives this error then aborts

Traceback (most recent call last):
  File "D:\Tests\StreamingT2V\StreamingT2V\t2v_enhanced\gradio_demo.py", line 43, in <module>
    msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'

How do you prevent from quality degradation in autoregressive generation?

Thanks for your interesting work. Intuitively, autoregressive generation may result in poor quality after some iterations, which is also reflected in my experiments on this project. But I noticed that the video on your YouTube suffers little on this issue. Do you use some tricks?

[It's a bug] It's not working on huggingface

I tried this on huggingface.co space to test but it doesn't work and nothing worked on my mobile phone.

You can try here:
https://huggingface.co/spaces/PAIR/StreamingT2V

"CUDA out of memory" when running inference.py on nvidia-4090

Hi,

when I run this command:
python inference.py --prompt="A cat running on the street"

I got the following "CUDA out of memory." error-message: I have 2 nvidia-4090 but it seems I only can use one of them: could you let me know how to change the config.yaml so I could run inference.py with nvidia-4090(24GB)? Thanks!

...
Traceback (most recent call last):
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/inference.py", line 66, in
stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol)
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model
model.load_state_dict(torch.load(
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
return obj.cuda(device)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda
untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 23.65 GiB total capacity; 22.20 GiB already allocated; 22.06 MiB free; 22.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
...

Error for frames 48 or more

Hello.

Thanks for making code public.

Code works fine for 24 frames (--num_frames=24), but throws error for 48 frames. It happens in class Animation: of utils.iimage.py file.

self.anim_str = self.anim_obj.to_html5_video()

Fix: Default embed_limit=20.0 (in MB). Set it to 1000.0 MB

self.anim_str = self.anim_obj.to_html5_video(embed_limit=1000.0)

OS: Ubuntu22.04
matplotlib.version = '3.8.3'

Thanks.

picsart-ai-research / streamingt2v Goto Github PK

streamingt2v's People

Contributors

Stargazers

Watchers

Forkers

streamingt2v's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs