After running python3 train.py I get: <div class=

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Found this: <a href="https://developer.nvidia.com/cuda-10.0-download-archive?target_os

Error when fine-tuning: RuntimeError: Error(s) in loading state_dict for Demucs,about facebookresearch/denoiser

Comments (17)

adefossez commented on June 15, 2024 1

Hey @youssefavx, you need to passe demucs.hidden=64, so that the architecture matches that of the checkpoint (sorry this is not done automatically). The default is demucs.hidden=48 which matches the dns48 pre-trained model.

from denoiser.

youssefavx commented on June 15, 2024

When I try just vanilla training (not fine tuning), I get this error:

[2020-09-30 02:52:08,278][__main__][INFO] - For logs, checkpoints and samples check /Users/youssef/denoiser/outputs/exp_
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - Training...
[2020-09-30 02:52:09,617][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/Users/youssef/denoiser/denoiser/solver.py", line 139, in train
    train_loss = self._run_one_epoch(epoch)
  File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
    noisy, clean = [x.to(self.device) for x in data]
  File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in <listcomp>
    noisy, clean = [x.to(self.device) for x in data]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

from denoiser.

adiyoss commented on June 15, 2024

It seems like you do not have CUDA installed.
If you have a GPU you should install CUDA drivers, if not you can to train your model on the laptop CPU, however this will be extreamly slow and we do not recommend it.

from denoiser.

youssefavx commented on June 15, 2024

I'm trying to download CUDA since it seems I do have a supported graphics card. The problem is on their site: https://developer.nvidia.com/gameworksdownload#?dn=cuda-toolkit-developer-tools-for-macos-11-0

It says:

NVIDIA® CUDA Toolkit 11.0 no longer supports development or running applications on macOS. While there are no tools which use macOS as a target environment, NVIDIA is making macOS host versions of the following tools that you can launch profiling and debugging sessions on supported target platforms.

Is this true? Or are there alternative CUDA versions one can download?

from denoiser.

youssefavx commented on June 15, 2024

When I try to do pip3 install -r requirements_cuda.txt eitherway, I get this error:

ERROR: Could not find a version that satisfies the requirement torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9)) 
(from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0)
ERROR: No matching distribution found for torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9))

I wonder if this is related to CUDA or not. I don't know if it's installed on my system or not.

from denoiser.

adiyoss commented on June 15, 2024

It seems like a version mismatch between CUDA, torch, and torchaudio.
Can you try the following:

pip install torchaudio==0.5.1
pip install pip install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

And please double check that you have NVIDIA GPU.

from denoiser.

youssefavx commented on June 15, 2024

pip3 install torchaudio==0.5.1

This one worked

pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html

This one gave me the same error:

ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101

In terms of GPUs, I have 2:

NVIDIA GeForce GT 750M 2048 MB
Intel Iris Pro 1536 MB

I checked that the Nvidia GPU is on the Cuda-capable list.

from denoiser.

adiyoss commented on June 15, 2024

And are you using CUDA10?

from denoiser.

youssefavx commented on June 15, 2024

I'm having a hard time figuring out how to check that Cuda is installed on my system, if I've already installed it, or if I have to install it. Is there a way to do so from the terminal or something like that?

I'll search for Cuda 10 online to try to download.

from denoiser.

youssefavx commented on June 15, 2024

Found this: https://developer.nvidia.com/cuda-10.0-download-archive?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal

Will try and report back.

from denoiser.

adiyoss commented on June 15, 2024

Maybe you can try this one: https://gist.github.com/bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a
CUDA 11 won't work since pytorch and torachaudio do not support CUDA11

from denoiser.

youssefavx commented on June 15, 2024

This one seems to be for Ubuntu, I assume it wouldn't work on MacOS?

from denoiser.

youssefavx commented on June 15, 2024

Okay so I finally installed CUDA 10.0.130 on Mac via the first link I shared.

When I run pip3 install -r requirements_cuda.txt

I get the same error:

When I try to run pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

I get the same error again:

When I check the link: https://download.pytorch.org/whl/torch_stable.html - I don't find any options for those packages for MacOS.

When I try to run python3 train.py

I get:

[2020-09-30 15:58:27,985][__main__][INFO] - For logs, checkpoints and samples check /Users/josephvanrowe/denoiser/outputs/exp_
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - Training...
[2020-09-30 15:58:30,371][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 139, in train
    train_loss = self._run_one_epoch(epoch)
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
    noisy, clean = [x.to(self.device) for x in data]
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in <listcomp>
    noisy, clean = [x.to(self.device) for x in data]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

However, when I do conda install -c pytorch cudatoolkit pytorch

I get:

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

And CUDA's installer says that it did install the CUDA toolkit and driver.

So this seems like a mystery.

from denoiser.

youssefavx commented on June 15, 2024

Okay, it seems I'm going to have to build pytorch from source in order to get this working on my machine. This seems like too much of a hassle for me, so instead I tried to get this working on Colab. I had to downgrade to torchaudio 0.4.0 and pytorch 1.4.0 otherwise I'd get an error saying that the NVIDIA driver I had on Colab was too old.

I actually got it to train right, but surprisingly, I'm still getting the same error I got when trying to fine tune on my machine:
(This is after I set "continue_pretrained:" to "dns64" and ran "!python3 train.py"

[2020-09-30 20:08:57,898][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_
[2020-09-30 20:09:02,677][denoiser.solver][INFO] - Loading checkpoint model: checkpoint.th
[2020-09-30 20:09:03,003][denoiser.solver][INFO] - Fine tuning from pre-trained model dns64
Downloading: "https://dl.fbaipublicfiles.com/adiyoss/denoiser/dns64-a7761ff99a7d5bb6.th" to /root/.cache/torch/checkpoints/dns64-a7761ff99a7d5bb6.th
100% 128M/128M [00:04<00:00, 30.7MB/s]
[2020-09-30 20:09:08,161][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 75, in run
    solver = Solver(data, model, optimizer, args)
  File "/content/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/content/denoiser/denoiser/solver.py", line 121, in _reset
    self.model.load_state_dict(model.state_dict())
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Demucs:
	size mismatch for encoder.0.0.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for encoder.0.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for encoder.0.2.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for encoder.0.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.0.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for encoder.1.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.2.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for encoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.0.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for encoder.2.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.2.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for encoder.2.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.0.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for encoder.3.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.2.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for encoder.3.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.0.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for encoder.4.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.2.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for encoder.4.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.0.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for decoder.0.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.2.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for decoder.0.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.1.0.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for decoder.1.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for decoder.1.2.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for decoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.2.0.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for decoder.2.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.2.2.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for decoder.2.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.3.0.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for decoder.3.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.3.2.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for decoder.3.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for decoder.4.0.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for decoder.4.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.4.2.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for lstm.lstm.weight_ih_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.weight_ih_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).

Is this a problem due to a torch version mismatch or something else?

from denoiser.

youssefavx commented on June 15, 2024

@adefossez Thanks so much! So is this in some variable in a particular file, like the config file? Or something to change when running train.py like: python3 train.py demucs.hidden=64 ?

from denoiser.

youssefavx commented on June 15, 2024

@adefossez I just tried the command above, it worked! Hallelujah!

from denoiser.

youssefavx commented on June 15, 2024

And man did you guys do a beautiful job with the logging!

from denoiser.

Error when fine-tuning: RuntimeError: Error(s) in loading state_dict for Demucs about denoiser HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs