Comments (17)
Hey @youssefavx, you need to passe demucs.hidden=64
, so that the architecture matches that of the checkpoint (sorry this is not done automatically). The default is demucs.hidden=48
which matches the dns48 pre-trained model.
from denoiser.
When I try just vanilla training (not fine tuning), I get this error:
[2020-09-30 02:52:08,278][__main__][INFO] - For logs, checkpoints and samples check /Users/youssef/denoiser/outputs/exp_
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - Training...
[2020-09-30 02:52:09,617][__main__][ERROR] - Some error happened
Traceback (most recent call last):
File "train.py", line 99, in main
_main(args)
File "train.py", line 93, in _main
run(args)
File "train.py", line 76, in run
solver.train()
File "/Users/youssef/denoiser/denoiser/solver.py", line 139, in train
train_loss = self._run_one_epoch(epoch)
File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
noisy, clean = [x.to(self.device) for x in data]
File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in <listcomp>
noisy, clean = [x.to(self.device) for x in data]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
_check_driver()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
from denoiser.
It seems like you do not have CUDA installed.
If you have a GPU you should install CUDA drivers, if not you can to train your model on the laptop CPU, however this will be extreamly slow and we do not recommend it.
from denoiser.
I'm trying to download CUDA since it seems I do have a supported graphics card. The problem is on their site: https://developer.nvidia.com/gameworksdownload#?dn=cuda-toolkit-developer-tools-for-macos-11-0
It says:
NVIDIA® CUDA Toolkit 11.0 no longer supports development or running applications on macOS. While there are no tools which use macOS as a target environment, NVIDIA is making macOS host versions of the following tools that you can launch profiling and debugging sessions on supported target platforms.
Is this true? Or are there alternative CUDA versions one can download?
from denoiser.
When I try to do pip3 install -r requirements_cuda.txt
eitherway, I get this error:
ERROR: Could not find a version that satisfies the requirement torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9))
(from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0)
ERROR: No matching distribution found for torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9))
I wonder if this is related to CUDA or not. I don't know if it's installed on my system or not.
from denoiser.
It seems like a version mismatch between CUDA, torch, and torchaudio.
Can you try the following:
- pip install torchaudio==0.5.1
- pip install pip install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
And please double check that you have NVIDIA GPU.
from denoiser.
pip3 install torchaudio==0.5.1
This one worked
pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
This one gave me the same error:
ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101
In terms of GPUs, I have 2:
- NVIDIA GeForce GT 750M 2048 MB
- Intel Iris Pro 1536 MB
I checked that the Nvidia GPU is on the Cuda-capable list.
from denoiser.
And are you using CUDA10?
from denoiser.
I'm having a hard time figuring out how to check that Cuda is installed on my system, if I've already installed it, or if I have to install it. Is there a way to do so from the terminal or something like that?
I'll search for Cuda 10 online to try to download.
from denoiser.
Will try and report back.
from denoiser.
Maybe you can try this one: https://gist.github.com/bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a
CUDA 11 won't work since pytorch and torachaudio do not support CUDA11
from denoiser.
This one seems to be for Ubuntu, I assume it wouldn't work on MacOS?
from denoiser.
Okay so I finally installed CUDA 10.0.130 on Mac via the first link I shared.
When I run pip3 install -r requirements_cuda.txt
I get the same error:
ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101
When I try to run pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
I get the same error again:
ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101
When I check the link: https://download.pytorch.org/whl/torch_stable.html - I don't find any options for those packages for MacOS.
When I try to run python3 train.py
I get:
[2020-09-30 15:58:27,985][__main__][INFO] - For logs, checkpoints and samples check /Users/josephvanrowe/denoiser/outputs/exp_
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - Training...
[2020-09-30 15:58:30,371][__main__][ERROR] - Some error happened
Traceback (most recent call last):
File "train.py", line 99, in main
_main(args)
File "train.py", line 93, in _main
run(args)
File "train.py", line 76, in run
solver.train()
File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 139, in train
train_loss = self._run_one_epoch(epoch)
File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
noisy, clean = [x.to(self.device) for x in data]
File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in <listcomp>
noisy, clean = [x.to(self.device) for x in data]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
_check_driver()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
However, when I do conda install -c pytorch cudatoolkit pytorch
I get:
Collecting package metadata (current_repodata.json): done
Solving environment: done
# All requested packages already installed.
And CUDA's installer says that it did install the CUDA toolkit and driver.
So this seems like a mystery.
from denoiser.
Okay, it seems I'm going to have to build pytorch from source in order to get this working on my machine. This seems like too much of a hassle for me, so instead I tried to get this working on Colab. I had to downgrade to torchaudio 0.4.0 and pytorch 1.4.0 otherwise I'd get an error saying that the NVIDIA driver I had on Colab was too old.
I actually got it to train right, but surprisingly, I'm still getting the same error I got when trying to fine tune on my machine:
(This is after I set "continue_pretrained:" to "dns64" and ran "!python3 train.py"
[2020-09-30 20:08:57,898][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_
[2020-09-30 20:09:02,677][denoiser.solver][INFO] - Loading checkpoint model: checkpoint.th
[2020-09-30 20:09:03,003][denoiser.solver][INFO] - Fine tuning from pre-trained model dns64
Downloading: "https://dl.fbaipublicfiles.com/adiyoss/denoiser/dns64-a7761ff99a7d5bb6.th" to /root/.cache/torch/checkpoints/dns64-a7761ff99a7d5bb6.th
100% 128M/128M [00:04<00:00, 30.7MB/s]
[2020-09-30 20:09:08,161][__main__][ERROR] - Some error happened
Traceback (most recent call last):
File "train.py", line 99, in main
_main(args)
File "train.py", line 93, in _main
run(args)
File "train.py", line 75, in run
solver = Solver(data, model, optimizer, args)
File "/content/denoiser/denoiser/solver.py", line 70, in __init__
self._reset()
File "/content/denoiser/denoiser/solver.py", line 121, in _reset
self.model.load_state_dict(model.state_dict())
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Demucs:
size mismatch for encoder.0.0.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
size mismatch for encoder.0.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for encoder.0.2.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
size mismatch for encoder.0.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.1.0.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
size mismatch for encoder.1.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.1.2.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
size mismatch for encoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.2.0.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
size mismatch for encoder.2.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.2.2.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
size mismatch for encoder.2.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for encoder.3.0.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
size mismatch for encoder.3.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for encoder.3.2.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
size mismatch for encoder.3.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.4.0.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
size mismatch for encoder.4.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.4.2.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
size mismatch for encoder.4.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for decoder.0.0.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
size mismatch for decoder.0.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for decoder.0.2.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
size mismatch for decoder.0.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for decoder.1.0.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
size mismatch for decoder.1.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.1.2.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
size mismatch for decoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.2.0.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
size mismatch for decoder.2.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for decoder.2.2.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
size mismatch for decoder.2.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.3.0.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
size mismatch for decoder.3.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.3.2.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
size mismatch for decoder.3.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for decoder.4.0.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
size mismatch for decoder.4.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.4.2.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
size mismatch for lstm.lstm.weight_ih_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for lstm.lstm.weight_hh_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for lstm.lstm.bias_ih_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for lstm.lstm.bias_hh_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for lstm.lstm.weight_ih_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for lstm.lstm.weight_hh_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for lstm.lstm.bias_ih_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for lstm.lstm.bias_hh_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
Is this a problem due to a torch version mismatch or something else?
from denoiser.
@adefossez Thanks so much! So is this in some variable in a particular file, like the config file? Or something to change when running train.py like: python3 train.py demucs.hidden=64
?
from denoiser.
@adefossez I just tried the command above, it worked! Hallelujah!
from denoiser.
And man did you guys do a beautiful job with the logging!
from denoiser.
Related Issues (20)
- Help please
- Very distorted output
- Causal model results on Valentini
- My valid loss =0, is this normal? HOT 2
- where are the models?
- Wasm model conversion
- How do I use the pre-train model?
- Question about the implemented of SpectralConvergengeLoss
- Using denoiser at all doesn't work at all HOT 1
- access to pretrained weight HOT 1
- Fine-tuning with custom data
- Data DNS load is bug
- Denoiser doesn't works properly
- Background noise persistant even after running the denoiser
- Commercial use? HOT 2
- Can not remove noise at the first seconds
- Seeking Clarification on DemucsStreamer Logic HOT 1
- how to use local model ?
- Denoise an audio array (ndarray) instead of an audio file (mp3)
- Output with distortion
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from denoiser.