GithubHelp home page GithubHelp logo

Comments (17)

adefossez avatar adefossez commented on June 15, 2024 1

Hey @youssefavx, you need to passe demucs.hidden=64, so that the architecture matches that of the checkpoint (sorry this is not done automatically). The default is demucs.hidden=48 which matches the dns48 pre-trained model.

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

When I try just vanilla training (not fine tuning), I get this error:

[2020-09-30 02:52:08,278][__main__][INFO] - For logs, checkpoints and samples check /Users/youssef/denoiser/outputs/exp_
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 02:52:09,431][denoiser.solver][INFO] - Training...
[2020-09-30 02:52:09,617][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/Users/youssef/denoiser/denoiser/solver.py", line 139, in train
    train_loss = self._run_one_epoch(epoch)
  File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
    noisy, clean = [x.to(self.device) for x in data]
  File "/Users/youssef/denoiser/denoiser/solver.py", line 203, in <listcomp>
    noisy, clean = [x.to(self.device) for x in data]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

from denoiser.

adiyoss avatar adiyoss commented on June 15, 2024

It seems like you do not have CUDA installed.
If you have a GPU you should install CUDA drivers, if not you can to train your model on the laptop CPU, however this will be extreamly slow and we do not recommend it.

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

I'm trying to download CUDA since it seems I do have a supported graphics card. The problem is on their site: https://developer.nvidia.com/gameworksdownload#?dn=cuda-toolkit-developer-tools-for-macos-11-0

It says:

NVIDIA® CUDA Toolkit 11.0 no longer supports development or running applications on macOS. While there are no tools which use macOS as a target environment, NVIDIA is making macOS host versions of the following tools that you can launch profiling and debugging sessions on supported target platforms.

Is this true? Or are there alternative CUDA versions one can download?

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

When I try to do pip3 install -r requirements_cuda.txt eitherway, I get this error:

ERROR: Could not find a version that satisfies the requirement torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9)) 
(from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0)
ERROR: No matching distribution found for torch==1.5.1+cu101 (from -r requirements_cuda.txt (line 9))

I wonder if this is related to CUDA or not. I don't know if it's installed on my system or not.

from denoiser.

adiyoss avatar adiyoss commented on June 15, 2024

It seems like a version mismatch between CUDA, torch, and torchaudio.
Can you try the following:

  1. pip install torchaudio==0.5.1
  2. pip install pip install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

And please double check that you have NVIDIA GPU.

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

pip3 install torchaudio==0.5.1

This one worked

pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html

This one gave me the same error:

ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101

In terms of GPUs, I have 2:

  1. NVIDIA GeForce GT 750M 2048 MB
  2. Intel Iris Pro 1536 MB

I checked that the Nvidia GPU is on the Cuda-capable list.

from denoiser.

adiyoss avatar adiyoss commented on June 15, 2024

And are you using CUDA10?

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

I'm having a hard time figuring out how to check that Cuda is installed on my system, if I've already installed it, or if I have to install it. Is there a way to do so from the terminal or something like that?

I'll search for Cuda 10 online to try to download.

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

Found this: https://developer.nvidia.com/cuda-10.0-download-archive?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal

Will try and report back.

from denoiser.

adiyoss avatar adiyoss commented on June 15, 2024

Maybe you can try this one: https://gist.github.com/bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a
CUDA 11 won't work since pytorch and torachaudio do not support CUDA11

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

This one seems to be for Ubuntu, I assume it wouldn't work on MacOS?

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

Okay so I finally installed CUDA 10.0.130 on Mac via the first link I shared.

When I run pip3 install -r requirements_cuda.txt

I get the same error:

ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101

When I try to run pip3 install torch==1.5.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

I get the same error again:

ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0) ERROR: No matching distribution found for torch==1.5.0+cu101

When I check the link: https://download.pytorch.org/whl/torch_stable.html - I don't find any options for those packages for MacOS.

When I try to run python3 train.py

I get:

[2020-09-30 15:58:27,985][__main__][INFO] - For logs, checkpoints and samples check /Users/josephvanrowe/denoiser/outputs/exp_
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-09-30 15:58:30,150][denoiser.solver][INFO] - Training...
[2020-09-30 15:58:30,371][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 139, in train
    train_loss = self._run_one_epoch(epoch)
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in _run_one_epoch
    noisy, clean = [x.to(self.device) for x in data]
  File "/Users/josephvanrowe/denoiser/denoiser/solver.py", line 203, in <listcomp>
    noisy, clean = [x.to(self.device) for x in data]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 47, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

However, when I do conda install -c pytorch cudatoolkit pytorch

I get:

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

And CUDA's installer says that it did install the CUDA toolkit and driver.

So this seems like a mystery.

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

Okay, it seems I'm going to have to build pytorch from source in order to get this working on my machine. This seems like too much of a hassle for me, so instead I tried to get this working on Colab. I had to downgrade to torchaudio 0.4.0 and pytorch 1.4.0 otherwise I'd get an error saying that the NVIDIA driver I had on Colab was too old.

I actually got it to train right, but surprisingly, I'm still getting the same error I got when trying to fine tune on my machine:
(This is after I set "continue_pretrained:" to "dns64" and ran "!python3 train.py"

[2020-09-30 20:08:57,898][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_
[2020-09-30 20:09:02,677][denoiser.solver][INFO] - Loading checkpoint model: checkpoint.th
[2020-09-30 20:09:03,003][denoiser.solver][INFO] - Fine tuning from pre-trained model dns64
Downloading: "https://dl.fbaipublicfiles.com/adiyoss/denoiser/dns64-a7761ff99a7d5bb6.th" to /root/.cache/torch/checkpoints/dns64-a7761ff99a7d5bb6.th
100% 128M/128M [00:04<00:00, 30.7MB/s]
[2020-09-30 20:09:08,161][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 75, in run
    solver = Solver(data, model, optimizer, args)
  File "/content/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/content/denoiser/denoiser/solver.py", line 121, in _reset
    self.model.load_state_dict(model.state_dict())
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Demucs:
	size mismatch for encoder.0.0.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for encoder.0.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for encoder.0.2.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for encoder.0.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.0.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for encoder.1.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.2.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for encoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.0.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for encoder.2.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.2.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for encoder.2.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.0.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for encoder.3.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.2.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for encoder.3.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.0.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for encoder.4.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.2.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for encoder.4.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.0.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for decoder.0.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.2.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for decoder.0.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.1.0.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for decoder.1.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for decoder.1.2.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for decoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.2.0.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for decoder.2.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.2.2.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for decoder.2.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.3.0.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for decoder.3.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.3.2.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for decoder.3.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for decoder.4.0.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for decoder.4.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.4.2.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for lstm.lstm.weight_ih_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.weight_ih_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).

Is this a problem due to a torch version mismatch or something else?

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

@adefossez Thanks so much! So is this in some variable in a particular file, like the config file? Or something to change when running train.py like: python3 train.py demucs.hidden=64 ?

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

@adefossez I just tried the command above, it worked! Hallelujah!

from denoiser.

youssefavx avatar youssefavx commented on June 15, 2024

And man did you guys do a beautiful job with the logging!

from denoiser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.