I successfully installed all dependences, but obtain "RuntimeError: CUDA call failed" at forward step when testing the Deep Video Deblurring Dataset.
Name Version Build Channel
_libgcc_mutex 0.1 main defaults
argparse 1.4.0 pypi_0 pypi
blas 1.0 mkl defaults
ca-certificates 2020.7.22 0 defaults
certifi 2020.6.20 py37_0 defaults
cffi 1.14.2 py37he30daa8_0 defaults
cudatoolkit 9.0 h13b8566_0 defaults
cycler 0.10.0 pypi_0 pypi
easydict 1.9 pypi_0 pypi
freetype 2.10.2 h5ab3b9f_0 defaults
future 0.18.2 pypi_0 pypi
intel-openmp 2020.2 254 defaults
jpeg 9b h024ee3a_2 defaults
kiwisolver 1.2.0 pypi_0 pypi
lcms2 2.11 h396b838_0 defaults
ld_impl_linux-64 2.33.1 h53a641e_7 defaults
libedit 3.1.20191231 h14c3975_1 defaults
libffi 3.3 he6710b0_2 defaults
libgcc-ng 9.1.0 hdf63c60_0 defaults
libpng 1.6.37 hbc83047_0 defaults
libstdcxx-ng 9.1.0 hdf63c60_0 defaults
libtiff 4.1.0 h2733197_1 defaults
lz4-c 1.9.2 he6710b0_1 defaults
matplotlib 3.3.1 pypi_0 pypi
mkl 2020.2 256 defaults
mkl-service 2.3.0 py37he904b0f_0 defaults
mkl_fft 1.1.0 py37h23d657b_0 defaults
mkl_random 1.1.1 py37h0573a6f_0 defaults
ncurses 6.2 he6710b0_1 defaults
ninja 1.10.1 py37hfd86e86_0 defaults
numpy 1.19.1 py37hbc911f0_0 defaults
numpy-base 1.19.1 py37hfa32c7d_0 defaults
olefile 0.46 py37_0 defaults
opencv-python 4.4.0.42 pypi_0 pypi
openexr 1.3.2 pypi_0 pypi
openssl 1.1.1g h7b6447c_0 defaults
pillow 7.2.0 py37hb39fc2d_0 defaults
pip 20.2.2 py37_0 defaults
protobuf 3.13.0 pypi_0 pypi
pycparser 2.20 py_2 defaults
pyexr 0.3.8 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
python 3.7.9 h7579374_0 defaults
python-dateutil 2.8.1 pypi_0 pypi
pytorch 1.0.1 py3.7_cuda9.0.176_cudnn7.4.2_2 pytorch
readline 8.0 h7b6447c_0 defaults
scipy 1.5.2 pypi_0 pypi
setuptools 49.6.0 py37_0 defaults
six 1.15.0 py_0 defaults
sqlite 3.33.0 h62c20be_0 defaults
tensorboardx 2.1 pypi_0 pypi
tk 8.6.10 hbc83047_0 defaults
torchvision 0.2.2 py_3 pytorch
wheel 0.35.1 py_0 defaults
xz 5.2.5 h7b6447c_0 defaults
zlib 1.2.11 h7b6447c_3 defaults
zstd 1.4.5 h9ceee32_0 defaults
Use config:
{'CONST': {'DEVICE': 'all',
'NUM_WORKER': 1,
'TEST_BATCH_SIZE': 1,
'TRAIN_BATCH_SIZE': 1,
'WEIGHTS': './ckpt/best-ckpt.pth.tar'},
'DATA': {'COLOR_JITTER': [0.2, 0.15, 0.3, 0.1],
'CROP_IMG_SIZE': [320, 448],
'GAUSSIAN': [0, 0.0001],
'MEAN': [0.0, 0.0, 0.0],
'SEQ_LENGTH': 20,
'STD': [255.0, 255.0, 255.0]},
'DATASET': {'DATASET_NAME': 'VideoDeblur'},
'DIR': {'DATASET_JSON_FILE_PATH': './datasets/VideoDeblur.json',
'DATASET_ROOT': './datasets/DeepVideoDeblurring_Dataset/DeepVideoDeblurring',
'IMAGE_BLUR_PATH': './datasets/DeepVideoDeblurring_Dataset/DeepVideoDeblurring/%s/%s/input/%s.jpg',
'IMAGE_CLEAR_PATH': './datasets/DeepVideoDeblurring_Dataset/DeepVideoDeblurring/%s/%s/GT/%s.jpg',
'OUT_PATH': './result'},
'LOSS': {'MULTISCALE_WEIGHTS': [0.3, 0.3, 0.2, 0.1, 0.1]},
'NETWORK': {'BATCHNORM': False,
'DEBLURNETARCH': 'DeblurNet',
'LEAKY_VALUE': 0.1,
'PHASE': 'test'},
'TEST': {'PRINT_FREQ': 5, 'VISUALIZATION_NUM': 10},
'TRAIN': {'BETA': 0.999,
'BIAS_DECAY': 0.0,
'LEARNING_RATE': 0.0001,
'LR_DECAY': 0.1,
'LR_MILESTONES': [80, 160, 250],
'MOMENTUM': 0.9,
'NUM_EPOCHES': 400,
'PRINT_FREQ': 10,
'SAVE_FREQ': 10,
'USE_PERCET_LOSS': True,
'WEIGHT_DECAY': 0.0}}
CUDA DEVICES NUMBER: 8
[DEBUG] 2020-09-13 01:20:04.316014 Parameters in DeblurNet: 5372547.
[INFO] 2020-09-13 01:20:09.133808 Recovering from ./ckpt/best-ckpt.pth.tar ...
[INFO] 2020-09-13 01:20:09.171222 Recover complete. Current epoch #379, Best_Img_PSNR = 31.241976697921753 at epoch #378.
[INFO] Output_dir: ./result/2020-09-13T01:20:09.171343_DeblurNet/
[INFO] 2020-09-13 01:20:09.177601 Collecting files of Taxonomy [Name = 720p_240fps_2: 5]
[INFO] 2020-09-13 01:20:09.178387 Collecting files of Taxonomy [Name = IMG_0003: 5]
[INFO] 2020-09-13 01:20:09.179175 Collecting files of Taxonomy [Name = IMG_0021: 5]
[INFO] 2020-09-13 01:20:09.179935 Collecting files of Taxonomy [Name = IMG_0030: 5]
[INFO] 2020-09-13 01:20:09.181372 Collecting files of Taxonomy [Name = IMG_0031: 5]
[INFO] 2020-09-13 01:20:09.182820 Collecting files of Taxonomy [Name = IMG_0032: 5]
[INFO] 2020-09-13 01:20:09.184221 Collecting files of Taxonomy [Name = IMG_0033: 5]
[INFO] 2020-09-13 01:20:09.185616 Collecting files of Taxonomy [Name = IMG_0037: 5]
[INFO] 2020-09-13 01:20:09.186999 Collecting files of Taxonomy [Name = IMG_0039: 5]
[INFO] 2020-09-13 01:20:09.188434 Collecting files of Taxonomy [Name = IMG_0049: 5]
[INFO] 2020-09-13 01:20:09.188446 Complete collecting files of the dataset for TEST. Seq Number: 30.
error in forward_cuda_kernel: no kernel image is available for execution on the device
Traceback (most recent call last):
File "runner.py", line 71, in <module>
main()
File "runner.py", line 67, in main
bulid_net(cfg)
File "/data1/wangpengxiao/STFAN/core/build.py", line 113, in bulid_net
test(cfg, init_epoch, dataset_loader, test_transforms, deblurnet, test_writer)
File "/data1/wangpengxiao/STFAN/core/test.py", line 84, in test
output_img, output_fea = deblurnet(img_blur, last_img_blur, output_last_img, output_last_fea)
File "/nvme/wangpengxiao/anaconda3/envs/STFAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data1/wangpengxiao/STFAN/models/DeblurNet.py", line 109, in forward
conv3_d_k = self.kconv_deblur(conv3_d, kernel_deblur)
File "/nvme/wangpengxiao/anaconda3/envs/STFAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data1/wangpengxiao/STFAN/models/FAC/kernelconv2d/KernelConv2D.py", line 87, in forward
return KernelConv2DFunction.apply(input_pad, kernel, self.kernel_size)
File "/data1/wangpengxiao/STFAN/models/FAC/kernelconv2d/KernelConv2D.py", line 37, in forward
kernelconv2d_cuda.forward(input, kernel, intKernelSize, output)
RuntimeError: CUDA call failed (KernelConv2D_forward_cuda at KernelConv2D_cuda.cpp:23)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fc6527a6cf5 in /nvme/wangpengxiao/anaconda3/envs/STFAN/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: KernelConv2D_forward_cuda(at::Tensor&, at::Tensor&, int, at::Tensor&) + 0xe8 (0x7fc62d22b428 in /nvme/wangpengxiao/.local/lib/python3.7/site-packages/kernelconv2d_cuda-1.0.0-py3.7-linux-x86_64.egg/kernelconv2d_cuda.cpython-37m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x1443a (0x7fc62d23643a in /nvme/wangpengxiao/.local/lib/python3.7/site-packages/kernelconv2d_cuda-1.0.0-py3.7-linux-x86_64.egg/kernelconv2d_cuda.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x1454e (0x7fc62d23654e in /nvme/wangpengxiao/.local/lib/python3.7/site-packages/kernelconv2d_cuda-1.0.0-py3.7-linux-x86_64.egg/kernelconv2d_cuda.cpython-37m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x117d3 (0x7fc62d2337d3 in /nvme/wangpengxiao/.local/lib/python3.7/site-packages/kernelconv2d_cuda-1.0.0-py3.7-linux-x86_64.egg/kernelconv2d_cuda.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
frame #9: THPFunction_apply(_object*, _object*) + 0x5a1 (0x7fc673bc6061 in /nvme/wangpengxiao/anaconda3/envs/STFAN/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #47: __libc_start_main + 0xf0 (0x7fc686a9c830 in /lib/x86_64-linux-gnu/libc.so.6)