sihyun-yu / digan Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022).
Home Page: https://sihyun.me/digan/
Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022).
Home Page: https://sihyun.me/digan/
Hi, thanks for your great work. I am planning on training your model with custom dataset. I encounter following error:
""""CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)"""""
I tried multiple ways to solve this issue such as reducing batch size to 1, reducing number of gpus to 1 and reducing resolution of images to 64X64. I am training on NVIDIA Titan Xp GPUs with 12GB RAM. I didn't find any luck yet!
Can you help me resolve this issue?
Hello~ thank you for making this project open source.
When I deployed the project, I downloaded the UCF-101 dataset and placed it in /data/UCF101/train, but when I ran the training code:
python src/infra/launch.py hydra.run.dir=. +experiment_name=test +dataset.name=UCF-101
I got an error:
Error: --data: [Errno 2] No such file or directory: '/data/UCF-101/train'
So is there something wrong?
Hi,
thanks for your work! I want to use your repo with a .zip dataset, however I get following error:
File "/DIGAN/training/dataset.py", line 538, in init
classes, class_to_idx = find_classes(path)
File "/DIGAN/training/dataset.py", line 68, in find_classes
classes = [d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d))]
NotADirectoryError: [Errno 20] Not a directory: '/DIGAN/data/dataset.zip'
Also I wanted to ask if I can somehow combine your model with the FVD evaluation of StyleGAN-V. Can you maybe integrate their evaluation protocol into your pipeline on the fly during training? I am having problems doing that and I think their evaluation protocol uses a better FVD evaluation
Hello! I have two questions about FVD computing.
frechet_video_distance.py
forces the generated sequence to be of length 16fake = torch.cat([rearrange(
G(z, c, timesteps=16, noise_mode='const')[0].clamp(-1, 1).cpu(),
'(b t) c h w -> b t h w c', t=16) for z, c in zip(grid_z, grid_c)])
If I want to train DIGAN with clips length of 32 or 128. What should I do for FVD computing?
Thank you very much!
Hi,
Thank you for your work.
I was looking at your code and I noticed that in ToRGBLayer, the modulated_conv2d function is used to generate the RGB frames. Does this mean that the network is not fully implicit but contains convolutions in the last layer, or did I miss something?
digan/src/training/networks.py
Line 386 in 8368d5b
Thank you for your help!
Following the README file, I failed to run the project. Here are some suggestions and questions:
data/UCF-101
?launch.py
is just an encapsulation of train.py
, since some default settings may not available for everyone, why not provide a set of train.py
?Thanks for sharing your great work. I found the following issues in dataset.py:
if 'kinetics' in self._path or 'KINETICS' in self._path or 'SKY' in self._path:
if train:
dir_path = os.path.join(self._path, 'train')
else:
dir_path = os.path.join(self._path, 'val')
and line #579 should be changed to:
self._all_fnames = {os.path.relpath(os.path.join(root, fname), start=dir_path) for root, _dirs, files in os.walk(dir_path) for fname in files}
Otherwise, it won't work for the data from Kinetics and Sky datasets.
def _get_zipfile()
is not defined in the code, but it is used in lines #582 and #607. The following lines can be added after the line #598 : def _get_zipfile(self):
assert self._type == 'zip'
if self._zipfile is None:
self._zipfile = zipfile.ZipFile(self._path)
return self._zipfile
def _file_ext(fname)
should be changed to def _file_ext(self, fname)
.Great work and thanks for releasing the code! Do you have any following plans to release the trained models as well? Thanks!
Hi I am getting an error on running the training script using the command provided in the README.md
python src/infra/launch.py hydra.run.dir=. +experiment_name=exp01 +dataset.name=kinetics
as below:
"self._image_fnames = sorted(fname for fname in self._all_fnames if self._file_ext(fname) in PIL.Image.EXTENSION)
TypeError: _file_ext() missing 1 required positional argument: 'fname'"
I have the data for kinetics processed according to the steps in prepare_data folder.
Hi,
Thank you for your work. When I run generate_videos.py on the pretrained checkpoints, it gave out the following error:
Loading networks from "../digan/pretrained/ucf-101-train-test.pkl"...
Traceback (most recent call last):
File "src/scripts/generate_videos.py", line 59, in
generate_videos()
File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "src/scripts/generate_videos.py", line 38, in generate_videos
G = legacy.load_network_pkl(f)['G_ema'].to(device).eval() # type: ignore
File "/mnt/home/v_jiangshihao/digan_new/src/legacy.py", line 21, in load_network_pkl
data = _LegacyUnpickler(f).load()
File "/mnt/home/v_jiangshihao/digan_new/src/torch_utils/persistence.py", line 190, in _reconstruct_persistent_obj
module = _src_to_module(meta.module_src)
File "/mnt/home/v_jiangshihao/digan_new/src/torch_utils/persistence.py", line 226, in _src_to_module
exec(src, module.dict) # pylint: disable=exec-used
File "", line 14, in
ModuleNotFoundError: No module named 'torchsde'
Do you know what's the cause of that? Thanks for your help!
Hi, I could not find the code for performing video-related tasks that were shown in the paper such as video interpolation, extrapolation, inversion, etc. I added these functionalities on top of your repository here - https://github.com/skymanaditya1/digan/blob/master/src/scripts/project.py.
Please let me know if this looks okay to you and if you would like, I can create a PR for the same (with the refactoring of course).
Hi,
you compare to MoCoGAN-HD on Taichi where they do not report results on this dataset in their paper. I assume you used their repo to train on Taichi. Can you please share the checkpoint you used because I am trying to compare to both of your works.
Also can you share information how you did the time extrapolation? So how did you adjust Ts?
I trained digan on a dataset at 128x128 resolution. I now intend to generate the output at 256x256 resolution. However, when I load the pretrained model, the output img_resolution is set at 128x128. I have tried changing the output resolution at multiple places, however, I am unable to do so. Any help on this would be appreciated.
Dear authors,
Hello! First of all, thank you for your inspiring work!
I encountered an issue with multi-GPU training on my 8 V100-16G GPUs. When distributing models across GPUs,
if rank == 0:
print(f'Distributing across {num_gpus} GPUs...')
ddp_modules = dict()
for name, module in [('G_mapping', G.mapping), ('G_synthesis', G.synthesis), ('D', D), (None, G_ema), ('augment_pipe', augment_pipe)]:
if rank == 0:
print("[Distributing] Module {} ...".format(name))
if (num_gpus > 1) and (module is not None) and len(list(module.parameters())) != 0:
module.requires_grad_(True)
module = torch.nn.parallel.DistributedDataParallel(module, device_ids=[device], broadcast_buffers=False,
find_unused_parameters=False)
module.requires_grad_(False)
if rank == 0:
print("[Distributed] Module {}".format(name))
if name is not None:
ddp_modules[name] = module
the process failed on first module G_mapping
, reporting
[Distributing] Module G_mapping ...
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, unhandled cuda error, NCCL version 21.0.3
ncclUnhandledCudaError: Call to CUDA function failed.
The GPU memory consumption status is as follow,
wangyuhan-8-v100 Sat Mar 5 12:28:13 2022 460.73.01
[0] Tesla V100-SXM2-16GB | 36'C, 22 % | 15415 / 16160 MB | yuhan:python/31701(1283M) yuhan:python/31696(6905M) yuhan:python/31699(1151M) yuhan:python/31700(1241M) yuhan:python/31697(1283M) yuhan:python/31698(1283M) yuhan:python/31702(1175M) yuhan:python/31703(1099M)
[1] Tesla V100-SXM2-16GB | 37'C, 0 % | 2022 / 16160 MB | yuhan:python/31697(2019M)
[2] Tesla V100-SXM2-16GB | 38'C, 0 % | 2022 / 16160 MB | yuhan:python/31698(2019M)
[3] Tesla V100-SXM2-16GB | 39'C, 0 % | 2014 / 16160 MB | yuhan:python/31699(2011M)
[4] Tesla V100-SXM2-16GB | 35'C, 0 % | 2014 / 16160 MB | yuhan:python/31700(2011M)
[5] Tesla V100-SXM2-16GB | 35'C, 0 % | 2022 / 16160 MB | yuhan:python/31701(2019M)
[6] Tesla V100-SXM2-16GB | 36'C, 0 % | 2014 / 16160 MB | yuhan:python/31702(2011M)
[7] Tesla V100-SXM2-16GB | 37'C, 0 % | 2014 / 16160 MB | yuhan:python/31703(2011M)
I am not very familiar with this and seemingly GPU_0
is running out of memory. I am wondering whether it is the reason behind the ncclUnhandledError
.
Could you please help me figure out what caused this error? Is your implementation working on 16GB V100 GPUs?
Thank you very much.
Thanks for your great work!
I have some questions about the FVD calculation on UCF-101 dataset.
As noted in the paper, there are two different experiments for UCF-101, (train) and (train + test) split.
My questions are as below:
Hi,
I can not download the i3d_pretrained_400.pt file
Can you provide it for me?
tanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.