GithubHelp home page GithubHelp logo

picsart-ai-research / streamingt2v Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 104.0 22.57 MB

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Home Page: https://streamingt2v.github.io/

Python 100.00%
long-video-generation

streamingt2v's People

Contributors

honghuis avatar hpoghos avatar levon-khachatryan avatar oldnaari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamingt2v's Issues

Regarding the error issue with "arch='ViT-H-14 ', version='laion2b_s32d_b79k"

This is my code error.
I have downloaded laion/CLIP-ViT-H-14-laion2B-s32B-b79K/
And it was placed under t2v_enhanced.
But it seems that the path does not quite match this rule with the data.

Errors:

  • An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
  • Expected a <class 'NoneType'>
    Given value type: <class 'jsonargparse.namespace.Namespace'>
    Given value: Namespace(class_path='t2v_enhanced.model.diffusers_conditional.models.controlnet.image_embedder.FrozenOpenCLIPImageEmbedder', init_args=Namespace(arch='ViT-H-14', version='laion2b_s32b_b79k', device='cuda', max_length=77, freeze=True, antialias=True, ucg_rate=0.0, unsqueeze_dim=False, repeat_to_max_len=False, num_image_crops=0, output_tokens=False))

[bug] in inference.py

When I run the code, following error occurred.

Traceback (most recent call last):
  File "/home/StreamingT2V/t2v_enhanced/inference.py", line 97, in <module>
    stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)
TypeError: stream_long_gen() takes 9 positional arguments but 10 were given

In infernece.py 10 parameters are passed in.

stream_long_gen(args.prompt, short_video, n_autoreg_gen, args.negative_prompt, args.seed, args.num_steps, args.image_guidance, name, stream_cli, stream_model)

However, when stream_long_gen is defined only 9 parameters there.

def stream_long_gen(prompt, short_video, n_autoreg_gen, seed, t, image_guidance, result_file_stem, stream_cli, stream_model):

Stuck in model downloading...

I try to deply the StreamingT2V in runpod.
But it stuck in a modelscope downloading. I wonder what is that model and how to download it manually and move to suitable location.

/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: `TransformerTemporalModel` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead.
  deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-20 00:58:19,707 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-20 00:58:19,707 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-20 00:58:19,732 - modelscope - INFO - Loading done! Current index file version is 1.9.0, with md5 4dda297b0e7635fe0be07f1409d42589 and a total number of 921 components indexed
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.76it/s]
It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, image_encoder, unet, scheduler, feature_extractor to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading.
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:173: UserWarning:
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:01<00:00,  6.43it/s]
2024-04-20 00:58:26,276 - modelscope - INFO - Use user-specified model revision: v1.1.0
Downloading:  15%|█████████▊                                                        | 800M/5.26G [00:23<00:50, 94.3MB/s]

rm not supported on Windows

Any chance you can replace the rm calls with something compatible with both linux and Windows?
When running inference.py I get

'rm' is not recognized as an internal or external command,
operable program or batch file.

No stats as to which script/line causes it.
Output movie is still crteated, so not a complete show stopper.

[BUG]The following error occurred while running "python gradio_demo. py"

The following error occurred while running "python gradio_demo. py":

(st2v) root@autodl-container-e3fa488242-5bb059c5:~/autodl-tmp/StreamingT2V/t2v_enhanced# python gradio_demo.py
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: TransformerTemporalModelOutput is deprecated and will be removed in version 0.29. Importing TransformerTemporalModelOutput from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: TransformerTemporalModel is deprecated and will be removed in version 0.29. Importing TransformerTemporalModel from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel, instead.
deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/root/miniconda3/envs/st2v/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: TransformerTemporalModelOutput is deprecated and will be removed in version 0.29. Importing TransformerSpatioTemporalModel from diffusers.models.transformer_temporal is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel, instead.
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-18 13:24:47,828 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-18 13:24:47,829 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-18 13:24:47,863 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 f8e838123bfe60156f78863dc483a14b and a total number of 972 components indexed
Traceback (most recent call last):
File "/root/autodl-tmp/StreamingT2V/t2v_enhanced/gradio_demo.py", line 42, in
msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'

May I ask how to solve it?

List VRAM usage on title page

Please save Github some bandwidth and list average VRAM usage/minimum VRAM requirements per generation type on the home page. Thanks!

Failed to load CLIP from local

I want to use the downloaded clip model, so I change the 'pretrained' to the model path. like this:
model, _, _ = open_clip.create_model_and_transforms(
arch,
pretrained='/root/.cache/huggingface/hub/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/open_clip_pytorch_model.bin',
device=torch.device("cpu"),
#pretrained=version,
)
But it occurs the error:

Traceback (most recent call last):
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 785, in adapt_typehints
val = adapt_class_type(val, serialize, instantiate_classes, sub_add_kwargs, prev_val=prev_val)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/jsonargparse/typehints.py", line 996, in adapt_class_type
return val_class(**{**init_args, **dict_kwargs})
File "/root/proj/StreamingT2V/t2v_enhanced/model/diffusers_conditional/models/controlnet/image_embedder.py", line 75, in init
model, _, _ = open_clip.create_model_and_transforms(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 387, in create_model_and_transforms
model = create_model(
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 291, in create_model
load_checkpoint(model, checkpoint_path)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/factory.py", line 160, in load_checkpoint
resize_text_pos_embed(state_dict, model)
File "/root/miniconda3/envs/st2v/lib/python3.10/site-packages/open_clip/model.py", line 573, in resize_text_pos_embed
assert old_width == width, 'text pos_embed width changed!'
AssertionError: text pos_embed width changed!

Do I have another way to load the clip model locally?

Make a suggestion

I don't know why your framework needs to use a modelscope. But this thing will download a lot of models in Conda's environment. It takes up a lot of space. Suggest placing future versions under a folder in the project. This is better.

Thank you, I will continue to support this product.

Reproducing demo videos on site?

Hi, is there any config/settings to reproduce the videos shown here as demos? My brief tests with the default settings in inference.py yield bad results, see:

A_monkeyh_holding_blue_flames_14_07_31_309858_enhanced.mp4

No module named 't2v_enhanced'

1713790476091

(.venv) PS D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced> python inference.py --prompt="A cat running on the street"
Traceback (most recent call last):
File "D:\Python file\StreamingT2V-main\StreamingT2V-main\t2v_enhanced\inference.py", line 11, in
from t2v_enhanced.model.video_ldm import VideoLDM
ModuleNotFoundError: No module named 't2v_enhanced'

Evaluation Code

Hi authors, thanks for your work! Could you please provide the evaluation code for Table 8. Quanitative comparison to state-of-the-art open-source text-to-long-video generators?

details on training dataset

Thank you for your great contributions! I noticed in your paper that the model is trained with a dataset from publicly available sources. Could you possibly provide further details about this ?

Some Errors:

When I execute the following statement, I get this error:
I0$}XM6{K7AG6_1@4`GE9H
9QOZBDJ4TUIPT7 _H)WD%VD
{BY7G7 P6A($%9~)2A$V%@Q
Why would such an error be reported? I have followed the readme document as normal.

And I get this error when executing the following statement:
0P1`2543734D_7}E7N}DFZG

How to deal with the above problems?

Must initialize from damo-vilab/text-to-video-ms-1.7b?

Hi, I found that here:
https://github.com/Picsart-AI-Research/StreamingT2V/blame/c1b8068bcbcdbbfa0dd0df3371d3c93a1f5132de/t2v_enhanced/model_init.py#L71C7-L71C7

def init_streamingt2v_model(ckpt_file, result_fol):
       ...
      cli = CustomCLI(VideoLDM)

It seems that VideoLDM must initialized from damo-vilab/text-to-video-ms-1.7b (in config pipeline_repo: damo-vilab/text-to-video-ms-1.7b). Moreover, ckpt_file, which is the path to Stream_t2v.ckpt, just add to sys.argv, but not used in any function.
I am confused about this piece of code and hope to get your explanation, thanks :)

Warning message: Enabling CPU offloading option for models

File: model_init.py

pipe.enable_model_cpu_offload()
return pipe.to(device)

It seems after enabling CPU offloading option, model is send to CUDA device. It is done so in a number of model initializations. It seems the correct option would be:

pipe.enable_model_cpu_offload()
return pipe

For cases, where model offloading is not done:
return pipe.to(device)

Thanks,

Error launching gradio_demo.py under Windows

python gradio_demo.py

gives this error then aborts

Traceback (most recent call last):
  File "D:\Tests\StreamingT2V\StreamingT2V\t2v_enhanced\gradio_demo.py", line 43, in <module>
    msxl_model = init_v2v_model(cfg_v2v)
TypeError: init_v2v_model() missing 1 required positional argument: 'device'

"CUDA out of memory" when running inference.py on nvidia-4090

Hi,

when I run this command:
python inference.py --prompt="A cat running on the street"

I got the following "CUDA out of memory." error-message: I have 2 nvidia-4090 but it seems I only can use one of them: could you let me know how to change the config.yaml so I could run inference.py with nvidia-4090(24GB)? Thanks!

...
Traceback (most recent call last):
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/inference.py", line 66, in
stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol)
File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model
model.load_state_dict(torch.load(
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
return obj.cuda(device)
File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda
untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 23.65 GiB total capacity; 22.20 GiB already allocated; 22.06 MiB free; 22.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
...

Error for frames 48 or more

Hello.

Thanks for making code public.

Code works fine for 24 frames (--num_frames=24), but throws error for 48 frames. It happens in class Animation: of utils.iimage.py file.

self.anim_str = self.anim_obj.to_html5_video()

Fix: Default embed_limit=20.0 (in MB). Set it to 1000.0 MB

self.anim_str = self.anim_obj.to_html5_video(embed_limit=1000.0)

OS: Ubuntu22.04
matplotlib.version = '3.8.3'

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.