GithubHelp home page GithubHelp logo

facebookresearch / denoiser Goto Github PK

View Code? Open in Web Editor NEW
1.6K 1.6K 291.0 74.34 MB

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

License: Other

Python 97.90% Shell 2.10%

denoiser's Introduction

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

tests badge

We provide a PyTorch implementation of the paper: Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Audio samples can be found here: Samples

Schema representing the structure of Demucs,
    with a convolutional encoder, an LSTM, and a decoder based on transposed convolutions.

The proposed model is based on the Demucs architecture, originally proposed for music source-separation: (Paper, Code).

Colab

If you want to play with the pretrained model inside colab for instance, start from this Colab Example for Denoiser.

Installation

First, install Python 3.7 (recommended with Anaconda).

Through pip (you just want to use pre-trained model out of the box)

Just run

pip install denoiser

Development (if you want to train or hack around)

Clone this repository and install the dependencies. We recommend using a fresh virtualenv or Conda environment.

git clone https://github.com/facebookresearch/denoiser
cd denoiser
pip install -r requirements.txt  # If you don't have cuda
pip install -r requirements_cuda.txt  # If you have cuda

Live Speech Enhancement

If you want to use denoiser live (for a Skype call for instance), you will need a specific loopback audio interface.

Mac OS X

On Mac OS X, this is provided by Soundflower. First install Soundflower, and then you can just run

python -m denoiser.live

In your favorite video conference call application, just select "Soundflower (2ch)" as input to enjoy your denoised speech.

Watch our live demo presentation in the following link: Demo.

Linux (tested on Ubuntu 20.04)

You can use the pacmd command and the pavucontrol tool:

  • run the following commands:
pacmd load-module module-null-sink sink_name=denoiser
pacmd update-sink-proplist denoiser device.description=denoiser

This will add a Monitor of Null Output to the list of microphones to use. Select it as input in your software.

  • Launch the pavucontrol tool. In the Playback tab, after launching python -m denoiser.live --out INDEX_OR_NAME_OF_LOOPBACK_IFACE and the software you want to denoise for (here an in-browser call), you should see both applications. For denoiser interface as Playback destination which will output the processed audio stream on the sink we previously created.

pavucontrol window and parameters to use.

Other platforms

At the moment, we do not provide official support for other OSes. However, if you have a a soundcard that supports loopback (for instance Steinberg products), you can try to make it work. You can list the available audio interfaces with python -m sounddevice. Then once you have spotted your loopback interface, just run

python -m denoiser.live --out INDEX_OR_NAME_OF_LOOPBACK_IFACE

By default, denoiser will use the default audio input. You can change that with the --in flag.

Note that on Windows you will need to replace python by python.exe.

Troubleshooting bad quality in separation

denoiser can introduce distortions for very high level of noises. Audio can become crunchy if your computer is not fast enough to process audio in real time. In that case, you will see an error message in your terminal warning you that denoiser is not processing audio fast enough. You can try exiting all non required applications.

denoiser was tested on a Mac Book Pro with an 2GHz quadcore Intel i5 with DDR4 memory. You might experience issues with DDR3 memory. In that case you can trade overall latency for speed by processing multiple frames at once. To do so, run

python -m denoiser.live -f 2

You can increase to -f 3 or more if needed, but each increase will add 16ms of extra latency.

Denoising received speech

You can also denoise received speech, but you won't be able to both denoise your own speech and the received speech (unless you have a really beefy computer and enough loopback audio interfaces). This can be achieved by selecting the loopback interface as the audio output of your VC software and then running

python -m denoiser.live --in "Soundflower (2ch)" --out "NAME OF OUT IFACE"

Training and evaluation

Quick Start with Toy Example

  1. Run sh make_debug.sh to generate json files for the toy dataset.
  2. Run python train.py

Configuration

We use Hydra to control all the training configurations. If you are not familiar with Hydra we recommend visiting the Hydra website. Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.

The config file with all relevant arguments for training our model can be found under the conf folder. Notice, under the conf folder, the dset folder contains the configuration files for the different datasets. You should see a file named debug.yaml with the relevant configuration for the debug sample set.

You can pass options through the command line, for instance ./train.py demucs.hidden=32. Please refer to conf/config.yaml for a reference of the possible options. You can also directly edit the config.yaml file, although this is not recommended due to the way experiments are automatically named, as explained hereafter.

Checkpointing

Each experiment will get a unique name based on the command line options you passed. Restarting the same command will reuse the existing folder and automatically start from a previous checkpoint if possible. In order to ignore previous checkpoints, you must pass the restart=1 option. Note that options like device, num_workers, etc. have no influence on the experiment name.

Setting up a new dataset

If you want to train using a new dataset, you can:

  1. Create a separate config file for it.
  2. Place the new config files under the dset folder. Check conf/dset/debug.yaml for more details on configuring your dataset.
  3. Point to it either in the general config file or via the command line, e.g. ./train.py dset=name_of_dset.

You also need to generate the relevant .jsonfiles in the egs/folder. For that purpose you can use the python -m denoiser.audio command that will scan the given folders and output the required metadata as json. For instance, if your noisy files are located in $noisy and the clean files in $clean, you can do

out=egs/mydataset/tr
mkdir -p $out
python -m denoiser.audio $noisy > $out/noisy.json
python -m denoiser.audio $clean > $out/clean.json

Usage

1. Data Structure

The data loader reads both clean and noisy json files named: clean.json and noisy.json. These files should contain all the paths to the wav files to be used to optimize and test the model along with their size (in frames). You can use python -m denoiser.audio FOLDER_WITH_WAV1 [FOLDER_WITH_WAV2 ...] > OUTPUT.json to generate those files. You should generate the above files for both training and test sets (and validation set if provided). Once this is done, you should create a yaml (similarly to conf/dset/debug.yaml) with the dataset folders' updated paths. Please check conf/dset/debug.yaml for more details.

2. Training

Training is simply done by launching the train.py script:

./train.py

This scripts read all the configurations from the conf/config.yaml file.

Distributed Training

To launch distributed training you should turn on the distributed training flag. This can be done as follows:

./train.py ddp=1

Logs

Logs are stored by default in the outputs folder. Look for the matching experiment name. In the experiment folder you will find the best.th serialized model, the training checkpoint checkpoint.th, and well as the log with the metrics trainer.log. All metrics are also extracted to the history.json file for easier parsing. Enhancements samples are stored in the samples folder (if noisy_dir or noisy_json is set in the dataset).

Fine tuning

You can fine-tune one of the 3 pre-trained models dns48, dns64 and master64. To do so:

./train.py continue_pretrained=dns48
./train.py continue_pretrained=dns64 demucs.hidden=64
./train.py continue_pretrained=master64 demucs.hidden=64

3. Evaluating

Evaluating the models can be done by:

python -m denoiser.evaluate --model_path=<path to the model> --data_dir=<path to folder containing noisy.json and clean.json>

Note that the path given to --model_path should be obtained from one of the best.th file, not checkpoint.th. It is also possible to use pre-trained model, using either --dns48, --dns64or --master64. For more details regarding possible arguments, please see:

usage: denoiser.evaluate [-h] [-m MODEL_PATH | --dns48 | --dns64 | --master64]
                         [--device DEVICE] [--dry DRY]
                         [--num_workers NUM_WORKERS] [--streaming]
                         [--data_dir DATA_DIR] [--matching MATCHING]
                         [--no_pesq] [-v]

Speech enhancement using Demucs - Evaluate model performance

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_PATH, --model_path MODEL_PATH
                        Path to local trained model.
  --dns48               Use pre-trained real time H=48 model trained on DNS.
  --dns64               Use pre-trained real time H=64 model trained on DNS.
  --master64            Use pre-trained real time H=64 model trained on DNS
                        and Valentini.
  --device DEVICE
  --dry DRY             dry/wet knob coefficient. 0 is only input signal, 1
                        only denoised.
  --num_workers NUM_WORKERS
  --streaming           true streaming evaluation for Demucs
  --data_dir DATA_DIR   directory including noisy.json and clean.json files
  --matching MATCHING   set this to dns for the dns dataset.
  --no_pesq             Don't compute PESQ.
  -v, --verbose         More loggging

4. Denoising

Generating the enhanced files can be done by:

python -m denoiser.enhance --model_path=<path to the model> --noisy_dir=<path to the dir with the noisy files> --out_dir=<path to store enhanced files>

Notice, you can either provide noisy_dir or noisy_json for the test data. Note that the path given to --model_path should be obtained from one of the best.th file, not checkpoint.th. It is also possible to use pre-trained model, using either --dns48, --dns64or --master64. For more details regarding possible arguments, please see:

usage: denoiser.enhance [-h] [-m MODEL_PATH | --dns48 | --dns64 | --master64]
                        [--device DEVICE] [--dry DRY]
                        [--num_workers NUM_WORKERS] [--streaming]
                        [--out_dir OUT_DIR] [--batch_size BATCH_SIZE] [-v]
                        [--noisy_dir NOISY_DIR | --noisy_json NOISY_JSON]

Speech enhancement using Demucs - Generate enhanced files

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_PATH, --model_path MODEL_PATH
                        Path to local trained model.
  --dns48               Use pre-trained real time H=48 model trained on DNS.
  --dns64               Use pre-trained real time H=64 model trained on DNS.
  --master64            Use pre-trained real time H=64 model trained on DNS
                        and Valentini.
  --device DEVICE
  --dry DRY             dry/wet knob coefficient. 0 is only input signal, 1
                        only denoised.
  --num_workers NUM_WORKERS
  --streaming           true streaming evaluation for Demucs
  --out_dir OUT_DIR     directory putting enhanced wav files
  --batch_size BATCH_SIZE
                        batch size
  -v, --verbose         more loggging
  --noisy_dir NOISY_DIR
                        directory including noisy wav files
  --noisy_json NOISY_JSON
                        json file including noisy wav files

5. Reproduce Results

Here we provide a detailed description of how to reproduce the results from the paper:

Valentini dataset

  1. Download Valentini dataset.
  2. Adapt the Valentini config file and run the processing script.
  3. Generate the egs/ files as explained here after.
  4. Launch the training using the launch_valentini.sh (or launch_valentini_nc.sh for non causal) script.

Important: unlike what we stated in the paper, the causal models were trained with a weight of 0.1 for the STFT loss, not 0.5.

To create the egs/ file, adapt and run the following code

noisy_train=path to valentini
clean_train=path to valentini
noisy_test=path to valentini
clean_test=path to valentini
noisy_dev=path to valentini
clean_dev=path to valentini

mkdir -p egs/val/tr
mkdir -p egs/val/cv
mkdir -p egs/val/tt

python -m denoiser.audio $noisy_train > egs/val/tr/noisy.json
python -m denoiser.audio $clean_train > egs/val/tr/clean.json

python -m denoiser.audio $noisy_test > egs/val/tt/noisy.json
python -m denoiser.audio $clean_test > egs/val/tt/clean.json

python -m denoiser.audio $noisy_dev > egs/val/cv/noisy.json
python -m denoiser.audio $clean_dev > egs/val/cv/clean.json

DNS dataset

  1. Download both DNS dataset, be sure to use the interspeech2020 branch.
  2. Setup the paths in the DNS config file to suit your setup and run the processing script.
  3. Generate the egs/ files as explained here after.
  4. Launch the training using the launch_dns.sh script.

To create the egs/ file, adapt and run the following code

dns=path to dns
noisy=path to processed noisy
clean=path to processed clean
testset=$dns/datasets/test_set
mkdir -p egs/dns/tr
python -m denoiser.audio $noisy > egs/dns/tr/noisy.json
python -m denoiser.audio $clean > egs/dns/tr/clean.json

mkdir -p egs/dns/tt
python -m denoiser.audio $testset/synthetic/no_reverb/noisy $testset/synthetic/with_reverb/noisy > egs/dns/tt/noisy.json
python -m denoiser.audio $testset/synthetic/no_reverb/clean $testset/synthetic/with_reverb/clean > egs/dns/tt/clean.json

Online Evaluation

Our online implementation is based on pure python code with some optimization of the streaming convolutions and transposed convolutions. We benchmark this implementation on a quad-core Intel i5 CPU at 2 GHz. The Real-Time Factor (RTF) of the proposed models are:

Model Threads RTF
H=48 1 0.8
H=64 1 1.2
H=48 4 0.6
H=64 4 1.0

In order to compute the RTF on your own CPU launch the following command:

python -m denoiser.demucs --hidden=48 --num_threads=1

The output should be something like this:

total lag: 41.3ms, stride: 16.0ms, time per frame: 12.2ms, delta: 0.21%, RTF: 0.8

Feel free to explore different settings, i.e. bigger models and more CPU-cores.

Citation

If you use the code in your paper, then please cite it as:

@inproceedings{defossez2020real,
  title={Real Time Speech Enhancement in the Waveform Domain},
  author={Defossez, Alexandre and Synnaeve, Gabriel and Adi, Yossi},
  booktitle={Interspeech},
  year={2020}
}

License

This repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file.

The file denoiser/stft_loss.py was adapted from the kan-bayashi/ParallelWaveGAN repository. It is an unofficial implementation of the ParallelWaveGAN paper, released under the MIT License. The file scripts/matlab_eval.py was adapted from the santi-pdp/segan_pytorch repository. It is an unofficial implementation of the SEGAN paper, released under the MIT License.

denoiser's People

Contributors

adefossez avatar adiyoss avatar hoyaaaa avatar kventinel avatar mpariente avatar rdmn avatar syhw avatar wesbz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

denoiser's Issues

inference denoiser

Tell me in a nutshell, if possible, which functions to take from which files to implement such an implementation:
input wav -> function denoiser -> output wav

Very long audio files : sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED

Hi there !

Thanks for your work ! I've been applying your model on short audio files with success, and the result is very impressive !
I'd like to go one step further and enhance 16-hour long audio files.

When I launch :

python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --verbose --device cuda

I get :

Traceback (most recent call last):
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 130, in enhance
    estimate = get_estimate(model, noisy_signals, args)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 67, in get_estimate
    estimate = model(noisy)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/demucs.py", line 161, in forward
    mono = mix.mean(dim=1, keepdim=True)
RuntimeError: sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Reduce.cuh":928, please report a bug to PyTorch.

I tried to launch the model on cpus, with or without the --streaming flag but without success.
According to this thread, it seems that the error occurs when calling the sum function on very large tensors.

Here's the error I get on CPU :

/var/spool/slurmd/job1202815/slurm_script: line 40: 10526 Floating point exception(core dumped) python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --num_workers 10 --verbose

Does it seem unrealistic to enhance such long audio files to you ? Can you think of any workaround ?
I could cut my long audio files into multiple smaller chunks, but I'd create artifacts and would prefer to avoid this pain :)

Thanks a lot :)

During eval: ValueError: fs (sampling frequency) should be either 8000 or 16000

Hey guys, while training on Colab with a sample rate of 32kHz, the training was going fine up until it started the evaluation process.

I first got this warning multiple times:

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

[2021-01-10 11:40:29,138][denoiser.evaluate][INFO] - Eval estimates | 116/148 | 12.1 it/sec

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

And so on...

And then I got this error:

[2021-01-10 11:40:31,883][__main__][ERROR] - Some error happened
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/content/denoiser/denoiser/evaluate.py", line 91, in _run_metrics
    pesq_i = get_pesq(clean, estimate, sr=args.sample_rate)
  File "/content/denoiser/denoiser/evaluate.py", line 108, in get_pesq
    pesq_val += pesq(sr, ref_sig[i], out_sig[i], 'wb')
  File "/usr/local/lib/python3.6/dist-packages/pesq/__init__.py", line 28, in pesq
    raise ValueError("fs (sampling frequency) should be either 8000 or 16000")
ValueError: fs (sampling frequency) should be either 8000 or 16000
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 170, in train
    pesq, stoi = evaluate(self.args, self.model, self.tt_loader)
  File "/content/denoiser/denoiser/evaluate.py", line 72, in evaluate
    pesq_i, stoi_i = pending.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
ValueError: fs (sampling frequency) should be either 8000 or 16000

For the test set, should I convert the files to 16kHz? Wouldnt make much sense since I'm training on 32kHz.

Why is bandmask applied on both input (noisy) and target (clean)??

Hi thanks for the repo! I have a question. Why is bandmask applied on both input (noisy) and target (clean)?? would it make sense to only apply bandmask on the input data and let the model reconstruct the missing frequencies? Sorry if this questions is obvious and i just lack some fundamental understanding! Would greatly appreciate any insight on this.

Error when fine-tuning: RuntimeError: Error(s) in loading state_dict for Demucs

After running python3 train.py I get:

[2020-09-30 00:18:43,333][__main__][INFO] - For logs, checkpoints and samples check /Users/youssef/denoiser/outputs/exp_
[2020-09-30 00:18:44,810][denoiser.solver][INFO] - Fine tuning from pre-trained model dns64
Downloading: "https://dl.fbaipublicfiles.com/adiyoss/denoiser/dns64-a7761ff99a7d5bb6.th" to /Users/youssef/.cache/torch/checkpoints/dns64-a7761ff99a7d5bb6.th
100%|#################################################################################################################################| 128M/128M [02:00<00:00, 1.12MB/s]

[2020-09-30 00:20:47,197][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 75, in run
    solver = Solver(data, model, optimizer, args)
  File "/Users/youssef/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/Users/youssef/denoiser/denoiser/solver.py", line 123, in _reset
    self.model.load_state_dict(model.state_dict())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Demucs:
	size mismatch for encoder.0.0.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for encoder.0.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for encoder.0.2.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for encoder.0.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.0.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for encoder.1.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.2.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for encoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.0.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for encoder.2.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.2.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for encoder.2.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.0.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for encoder.3.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.2.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for encoder.3.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.0.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for encoder.4.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.2.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for encoder.4.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.0.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for decoder.0.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.2.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for decoder.0.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.1.0.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for decoder.1.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for decoder.1.2.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for decoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.2.0.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for decoder.2.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.2.2.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for decoder.2.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.3.0.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for decoder.3.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.3.2.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for decoder.3.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for decoder.4.0.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for decoder.4.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.4.2.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for lstm.lstm.weight_ih_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.weight_ih_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).

Before running this command, I changed the config.yaml file from:

continue_pretrained:

to

continue_pretrained: dns64

FineTuning on custom dataset using pre-trained models

Hi!
The pre-trained model is performing really well on my dataset. I wanted to finetune this model, that is, master64 (trained on Valentini and DNS datasets), on my dataset and then check the results. Could you let me know how I can do that?
On a tangential note, in the config files, there is a parameter called matching which is set to sort. If I have understood correctly, as long as clean and noisy files are named similarly so that they have the same order on sort, the matching is successul, am I correct?
Thanks in advance for the clarification!

vergy small loss while bad performance

I should say it is an excellent project firstly.

Once I cloned the codes and tried to train the sample dataset (debug), the training process has no problem. The validation loss is around 0.05 and the performance of the enhanced noisy file is good. Then I tried to replace the debug dataset by my dataset:

  1. move my dataset to dataset
  2. generate the clean and noisy .json files with make_debug.sh
  3. modify the conf/config.yaml and build a new conf/dset/mydata.yaml for my dataset
    The new training process has no errors and the validation loss is around 0.0008. But when I check the enhanced performance of files in output/ex/samples, the level of the enhanced wav files is almost zero. The level of the input noisy files is around 500~1000, so there should be mistakes somewhere. But I dont know whether I missed something or my dataset has some problem?

Thanks in advance.

solver.py:211: UserWarning: Using a target size (torch.Size([1, 1, 112543])) that is different to the input size (torch.Size([1, 1, 112499])).

In the config yaml file I set depth to 6 and hidden to 96, and I got this error during cross-validation (training was going fine up til then):

[2021-02-05 00:17:19,682][__main__][INFO] - For logs, checkpoints and samples check /content/drive/My Drive/outputs/exp_demucs.hidden=96
[2021-02-05 00:17:26,292][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-05 00:17:26,292][denoiser.solver][INFO] - Training...
[2021-02-05 00:39:43,596][denoiser.solver][INFO] - Train | Epoch 1 | 822/4114 | 0.6 it/sec | Loss 0.00932
[2021-02-05 01:02:00,280][denoiser.solver][INFO] - Train | Epoch 1 | 1644/4114 | 0.6 it/sec | Loss 0.00815
[2021-02-05 01:24:16,911][denoiser.solver][INFO] - Train | Epoch 1 | 2466/4114 | 0.6 it/sec | Loss 0.00757
[2021-02-05 01:46:33,454][denoiser.solver][INFO] - Train | Epoch 1 | 3288/4114 | 0.6 it/sec | Loss 0.00715
[2021-02-05 02:08:49,911][denoiser.solver][INFO] - Train | Epoch 1 | 4110/4114 | 0.6 it/sec | Loss 0.00687
[2021-02-05 02:08:56,199][denoiser.solver][INFO] - Train Summary | End of Epoch 1 | Time 6689.91s | Train Loss 0.00686
[2021-02-05 02:08:56,200][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-05 02:08:56,200][denoiser.solver][INFO] - Cross validation...
/content/denoiser/denoiser/solver.py:211: UserWarning: Using a target size (torch.Size([1, 1, 112543])) that is different to the input size (torch.Size([1, 1, 112499])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss = F.l1_loss(clean, estimate)
[2021-02-05 02:09:00,276][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 148, in train
    valid_loss = self._run_one_epoch(epoch, cross_valid=True)
  File "/content/denoiser/denoiser/solver.py", line 211, in _run_one_epoch
    loss = F.l1_loss(clean, estimate)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2190, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 52, in broadcast_tensors
    return torch._C._VariableFunctions.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (112499) must match the size of tensor b (112543) at non-singleton dimension 2

I wonder if this is related to my settings or something else. batch size was set to 6

STFT Loss device issues

Hi,
When fine-tuning on a gpu machine and setting the STFT loss to true in the config file I get an error:

    solver.train()
  File "/home/wscuser/denoiser/denoiser/solver.py", line 143, in train
    train_loss = self._run_one_epoch(epoch)
  File "/home/wscuser/denoiser/denoiser/solver.py", line 50, in _run_one_epoch
    sc_loss, mag_loss = self.mrstftloss(estimate.squeeze(1), clean.squeeze(1))
  File "/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 138, in forward
    sc_l, mag_l = f(x, y)
  File "/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 94, in forward
    x_mag = stft(x, self.fft_size, self.shift_size, self.win_length, self.window)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 28, in stft
    x_stft = torch.stft(x, fft_size, hop_size, win_length, window)
  File "/anaconda/lib/python3.7/site-packages/torch/functional.py", line 516, in stft
    normalized, onesided, return_complex)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!



Any idea why is it happening?

Thanks in advance!

Downsample audio files

Hi,
Does your repo have the script to downsample .wav files?
Or can you share with us the code you used?

Higher sample rate

Hello, thanks for this repo! Is there a pre-trained model for the higher sample rate? something like 44.1khz which is the most common found in audio?

master64 same model as reported in the paper?

Hi,
Thanks for the great repo. I ran the master64 pretrained model (that was trained on VCTK and DNS together) and evaluated it on the VCTK valset. I am getting a PESQ score of 3.019 and STOI of 95.00 (averaged across all 824 files in the valset). This model corresponds to H=64, U=4,S=4, so I looked at the paper and the objective score mentioned there (for the same parameter model) is PESQ=2.94 and STOI =95. Does that look to be correct, or I am doing something wrong here?

One another question I had was regarding access to the non-causal model reported in the paper. Are you guys working to release that soon? Thanks in advance for your response!!

RuntimeError: CUDA out of memory

Hey guys, in trying to make the first 'hello world' into training this model / fine-tuning it, I basically replaced the debug files noisy.json and clean.json with my own json file content that pointed to my own dataset. The dataset contains around 2.5K files and is around 1GB when at 44kHz and lower when at 16kHz as expected.

The problem is that when trying to run this on Colab (which worked with the original toy dataset provided, I'm now getting this unexpected error:

[2020-10-01 19:57:18,722][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64
[2020-10-01 19:57:23,614][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 19:57:23,615][denoiser.solver][INFO] - Training...
[2020-10-01 19:57:26,054][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 184, in forward
    x = decode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 94, in forward
    return F.relu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 914, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 502.00 MiB (GPU 0; 14.73 GiB total capacity; 13.03 GiB already allocated; 467.88 MiB free; 13.49 GiB reserved in total by PyTorch)

I have different versions of my dataset, from 16kHz, to 22kHz, 32kHz, and 44.1khz. Every time I try one, I get a variant of the same error above. For example, when I try with 44.1kHz:

!python3 train.py demucs.hidden=64 sample_rate=44100

I get:

[2020-10-01 20:00:59,516][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64,sample_rate=44100
[2020-10-01 20:01:04,753][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 20:01:04,754][denoiser.solver][INFO] - Training...
[2020-10-01 20:01:09,170][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 176, in forward
    x = encode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 448, in forward
    return F.glu(input, self.dim)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 946, in glu
    return torch._C._nn.glu(input, dim)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 14.73 GiB total capacity; 11.28 GiB already allocated; 2.65 GiB free; 11.30 GiB reserved in total by PyTorch)

Whereas when I try 16kHz, I get:

[2020-10-01 19:57:18,722][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64
[2020-10-01 19:57:23,614][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 19:57:23,615][denoiser.solver][INFO] - Training...
[2020-10-01 19:57:26,054][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 184, in forward
    x = decode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 94, in forward
    return F.relu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 914, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 502.00 MiB (GPU 0; 14.73 GiB total capacity; 13.03 GiB already allocated; 467.88 MiB free; 13.49 GiB reserved in total by PyTorch)

And the 'memory tried to allocate' just varies, the free memory is always just underneath the 'available' memory.

The fact that the 'free memory' varies when I change the dataset (which have different sizes) makes me think it's something entirely different than CUDA being out of memory, though I could be wrong.

The version pytorch I'm running is 1.4.0 and 0.4.0 for torchaudio because otherwise I get an error saying the Cuda driver is out of date.

I get this error when I try to train, and when I try to fine tune.

Am I doing something wrong in setting all this up? Should I be arranging my files differently than the debug ones? I tried to place everything its correct directory, and point everything to its correct directory.

hubconf models update

Hi! Thank you for your work! Can you add existing model to hubconf.py? Seems that there are some mistakes in current models' names:

from denoiser.pretrained import demucs_rt48, demucs_rt64 # noqa

Thank you!

Linux-like support

Hey, really nice project, and cool things in the code, thanks for sharing !

I didn't know about audio-loopback, so I checked if we can emulate it with Linux but this doesn't seem completely obvious.

At the moment, we do not provide official support for other OSes.

Do you have plans about that?

Cheers,
Manu

Training details

Hi!

First, thank you for the awesome work!

I was interested in getting more details about how you trained your SOTA model. If I understand correctly, you first trained the model on the Valentini dataset, but how did you partition the data into train, validation and test set (there is no such partition when I download it)? Then you used the best model from this first training as a restart for the training with the DNS dataset, right? Did you then use both of the datasets together or only DNS in this second stage? Also for DNS, how did you build the noisy audio samples and how did you partition into train and test set?

Thank you in advance!

Recommended architectural parameter changes for higher sample rate

Any recommendations on architecture changes if we want to train on higher sample rate data?

For example, the number of channels, kernel size, strides, etc from this line of code, or even changes to the multiresolution stft loss found here? Or maybe even the parameters in Bandmasking found here.

My intuition is as you have a higher sample rate you want to have larger filter sizes and kernel size to be able to model more frequency bands. Is that a correct assumption? Thanks!

torch error loading checkpoint + best.th

This seems like a new development. After training a new model, and trying to run inference on my computer, I get this (new warning paired with the following error):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
INFO:denoiser.pretrained:Loading model from /Users/yousseavx/Downloads/checkpoint.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/yousseavx/denoiser/denoiser/enhance.py", line 155, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Users/yousseavx/denoiser/denoiser/enhance.py", line 113, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Users/yousseavx/denoiser/denoiser/pretrained.py", line 59, in get_model
    pkg = torch.load(args.model_path)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 774, in _legacy_load
    result = unpickler.load()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 730, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

(This is in fact with the new modification / updated code that solved the error a while back). So when I tried to modify the line:

pkg = torch.load(args.model_path)

to:

pkg = torch.load(args.model_path, map_location=torch.device('cpu'))

I now get this:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
INFO:denoiser.pretrained:Loading model from /Users/yousseavx/Downloads/best.th

But now the program just terminates in the terminal immediately after it prints that last line. And I've tried this with both the checkpoint.th model and best.th model.

The model was dns48, training from scratch, training only without cross-validation or testing.

I tried this with the pip3 install -U denoiser

And I also tried running this same command from inside the current git cloned repo, same error / behavior.

Denosing .wav files using pre-trained model

Hi!
I have a couple of .wav files that I want to denoise using the model you have provided, without any training involved. Could you suggest how that can be done?
Thanks in advance.

Question about the frame length and frame shift .

Hello,sorry to disturb you.
I read the paper and code, but still confused about the frame length and frame shift of the audios.

In the Training paragraph, it said " With this setup, the causal DEMUCS processes audio has a frame size of 37 ms and a stride of 16 ms."

Here, why the frame length and frame shift is 37 and 16ms ? How is it calculated?

Hopefully to hear from you.

there are some error in using " python -m denoiser.live -i 5 --out 0".

i install denoiser using "pip install denoiser".
os : ubuntu 16.04 LTS
error log:
File "/home/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/live.py", line 142, in
main()
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/live.py", line 80, in main
model = get_model(args).to(args.device)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 69, in get_model
model = dns48()
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 31, in dns48
return _demucs(pretrained, DNS_48_URL, hidden=48)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 25, in _demucs
state_dict = torch.hub.load_state_dict_from_url(url, map_location='cpu')
File "/home/anaconda3/lib/python3.7/site-packages/torch/hub.py", line 495, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/home/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 772, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 4906173 more bytes. The file might be corrupted.

It seems that something went wrong during model loading.

Resample kernel

Hi, in the resample kernel part,when generating hann window to truncated sinc function,i confused why creat a 4 /times zeros hann window and then select the odd part. Is there any difference to diretly creat a 2 /times zeros hann window?

win = th.hann_window(4 * zeros + 1, periodic=False)
winodd = win[1::2]

your computer configuration

Hello, may I ask your computer configuration?
And my video card is 2080ti, Is it applicable?
Thank you very much

Not processing audio fast enough on i7

Hello,
I'm running denoiser with macOS X

ProductName:	Mac OS X
ProductVersion:	10.14.5
BuildVersion:	18F132

on macBook Pro 2,6 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3

^@ip-192-168-178-22:denoiser loretoparisi$ python3 -m denoiser.live
Model loaded.
Ready to process audio.
Not processing audio fast enough, time per frame is 43.6ms !
Not processing audio fast enough, time per frame is 17.2ms !
Not processing audio fast enough, time per frame is 16.8ms !
Not processing audio fast enough, time per frame is 16.5ms !
Not processing audio fast enough, time per frame is 16.4ms !
Not processing audio fast enough, time per frame is 16.0ms !
Not processing audio fast enough, time per frame is 16.0ms !
Not processing audio fast enough, time per frame is 16.1ms !

No additional audio processing application is running. This is top when running denoiser

Processes: 353 total, 4 running, 349 sleeping, 2030 threads                                                                                                   00:53:09
Load Avg: 1.50, 1.53, 1.75  CPU usage: 50.53% user, 5.12% sys, 44.33% idle  SharedLibs: 314M resident, 60M data, 119M linkedit.
MemRegions: 117477 total, 3538M resident, 140M private, 2092M shared. PhysMem: 13G used (2551M wired), 3477M unused.
VM: 1779G vsize, 1371M framework vsize, 17855884(0) swapins, 22605506(0) swapouts. Networks: packets: 8744381/7632M in, 10377030/7433M out.
Disks: 2747195/188G read, 2354610/124G written.

This is without denoiser running:

Processes: 351 total, 2 running, 349 sleeping, 1905 threads                                                                                                   00:54:44
Load Avg: 2.00, 1.72, 1.80  CPU usage: 2.99% user, 3.35% sys, 93.64% idle  SharedLibs: 314M resident, 60M data, 119M linkedit.
MemRegions: 117174 total, 3539M resident, 139M private, 2078M shared. PhysMem: 12G used (2550M wired), 3649M unused.
VM: 1770G vsize, 1370M framework vsize, 17855884(0) swapins, 22605506(0) swapouts. Networks: packets: 8744977/7632M in, 10377512/7433M out.
Disks: 2747324/188G read, 2355710/124G written.

If you need additional system info, I'm happy to provide.

Thank you.

RuntimeError: Offset past EOF

Hi,

I'm trying to reproduce your model.
I got an error when I started training on GPUs with launch_valentini.sh.
The error was 'Offset past EOF' but I'm not familiar with the error.
I didn't change conf/conf.yaml except for output directory of logs.
Can you give me any advices I should check next step?

Thank you.

Script output:

$ bash launch_valentini.sh
[2021-02-19 21:30:45,199][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:45,719][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-19 21:30:46,017][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:49,350][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-19 21:30:49,351][denoiser.solver][INFO] - Training...
[2021-02-19 21:30:49,483][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 72, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

[2021-02-19 21:30:49,532][denoiser.executor][ERROR] - Worker 0 died, killing all workers

got response "Killed%"

$ python -m denoiser.enhance --noisy_dir="/home/16k_1" --out_dir="/home/op1"


    INFO:denoiser.pretrained:Loading` pre-trained real time H=48 model trained on DNS.
    Killed%

My files are having sample rate of 16000 & channel = 1. still got "Killed%" response

KeyError: 'model' when trying to continue from a previously trained model 'best.th'

Yesterday I fine tuned dns64 on 32kHz on Colab. I exported the best.th model file to google, and then tested it, it worked fine. However, I wanted to continue training from this model's point. So I set the config file setting continue_from to /content/drive/MyDrive/best.th

Then when running, I got this error:

[2021-01-11 03:50:57,731][denoiser.solver][INFO] - Loading checkpoint model: /content/drive/MyDrive/best.th
[2021-01-11 03:50:59,057][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 78, in run
    solver = Solver(data, model, optimizer, args)
  File "/content/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/content/denoiser/denoiser/solver.py", line 111, in _reset
    self.model.load_state_dict(package['model']['state'])
KeyError: 'model'

Would love any help whatsoever. It'd be really cool if I could continue training from a previous checkpoint since Colab doesn't give continuous access.

KeyError: 'class'

No need to respond as I know I'm doing some weird experimental stuff here...

I trained a model with demucs.hidden set to 96, and a depth of 6.

I downloaded the checkpoint.th which was around 4.5 GB.

I ran a command to test it on an audio file, and I got this error:

INFO:denoiser.pretrained:Loading model from /Volumes/Transcend/checkpoint_44K_96.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/pretrained.py", line 62, in get_model
    pkg = torch.load(args.model_path, map_location='cuda:0')
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 765, in _legacy_load
    result = unpickler.load()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 721, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 800, in restore_location
    return default_restore_location(storage, map_location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 150, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 134, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

So then I edited pretrained.py

I changed:

pkg = torch.load(args.model_path)

to:

pkg = torch.load(args.model_path, map_location=torch.device('cpu'))

and I also tried:

pkg = torch.load(args.model_path, map_location='cpu')

And I get this error when I run the same command again:

INFO:denoiser.pretrained:Loading model from /Volumes/Transcend/checkpoint_44K_96.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/pretrained.py", line 65, in get_model
    model = deserialize_model(pkg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/utils.py", line 39, in deserialize_model
    klass = package['class']
KeyError: 'class'

how hydra version?

hi,thanks your repo.
I running './train.py ddp=1'
I get the BUg :
Traceback (most recent call last):
File "./train.py", line 104, in main
_main(args)
File "./train.py", line 96, in _main
start_ddp_workers()
File "/data/code/denoiser/denoiser/executor.py", line 77, in start_ddp_workers
log = utils.HydraConfig().hydra.job_logging.handlers.file.filename
AttributeError: 'HydraConfig' object has no attribute 'hydra'

Usage example denoiser

Hi, I apologize for the stupid question, but what commands need to be run to remove noise from wav file (indicating the path to it)?

ERROR: Command errored out with exit status 128

ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com/ludlows/python-pesq' /private/var/folders/_b/szqwdfn979n4fdg7f2j875_r0000gn/T/pip-install-hy9metiy/pesq Check the logs for full command output.

with pip3 install -r requirements.txt with our without python3 -m venv .venv, thus this does not happen when installing with pip.

Thank you

Why limit to single thread?

enhance.py
torch.set_num_threads(1)

def get_estimate(model, noisy, args):
    torch.set_num_threads(1)
    if args.streaming:
        streamer = DemucsStreamer(model, dry=args.dry)
        with torch.no_grad():
            estimate = torch.cat([
                streamer.feed(noisy[0]),
                streamer.flush()], dim=1)[None]
    else:
        with torch.no_grad():
            estimate = model(noisy)
            estimate = (1 - args.dry) * estimate + args.dry * noisy
    return estimate

The script to generate the egs/ files

Hi, Thanks for this great work.
I'm trying to reproduce your paper results and ran script you gave us at README.md in order to generate the egs/ files of Valentini and DNS but json files weren't saved.
Please check my questions below.

  1. I took a look at denoiser/audio.py and I wonder if json.dump(meta, sys.stdout, indent=4) at the bottom line worked in your environment.
    When I changed it to json.dump(meta, sys.argv[2], indent=4), the script looked work well in the case of valentini.
  2. When I tried dns script after I cloned DNS-challenge (interspeech2020/master), the part of $testset/synthetic/reverb/noisy seemed to be weird because there is no directory. I guess it means $testset/synthetic/with_reverb/noisy but is it true? Or should I change the branch of the DNS-challenge repository?

Thank you!

Implementation of CSIG, CBAK, COVL metrics

Hi!
In original paper you evaluated your model on metrics PESQ, STOI, CSIG, CBAK, COVL. I found the implementations of PESQ and STOI metrics in denoiser/evaluate.py file but didn't see CSIG, CBAK, COVL metrics. Are you going to add these metrics? It will be very helpful!

Thanks in advance!

Training on DNS - training parameters?

Hi,
I was trying to train the denoiser on DNS, and so I downloaded the dataset from asteroid{https://github.com/mpariente/asteroid/blob/master/egs/dns_challenge}, finished until stage 2 which is preprocessing the dataset. I get around 60K paired files after preprocessing. For training the denoiser, I am unable to run the code at batch_size=128 (which is in 'launch_dns.sh'). With a 12G 1080Ti GPU, I can do atmost batch_size=4, which is taking me around 3 days to do 1 epoch which is a lot, so I was wondering if you guys did any preprocessing on top for DNS?? Thanks!

The problem of the enhanced results while using different loss functions?

Hello, When I train the demucs model with l1-loss,the enhanced result is very well, and the speech sense is always satisfactory 。But when I change the loss function from l1-loss to l1-loss+0.1* multi-resolution STFT loss, I just changed the stft_loss in the configuration file config.yaml to be true in the configuration file。But the enhanced speech introduced some noise. It will be clear in frequency domain, just like consisting of multiple single frequency components. I have thought a lot and have no idea now.
Hopefully to hear from you.

Distributed training incosstency in train.run(), denoiser/solver.train() and denoiser/solver._run_one_epoch()

Hi,
I noticed that in some cases along these functions sometimes model is used and sometimes dmodel (distributed).

examples:
train.py line 72:
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, betas=(0.9, args.beta2))

denoiser/solver.py in self.train() line 138, 153, 179, 185 :
self.model.train()
self.model.eval()
pesq, stoi = evaluate(self.args, self.model, self.tt_loader)
enhance(self.args, self.model, self.samples_dir)

denoiser/solver.py in self._run_one_epoch() line 216:
estimate = self.dmodel(noisy)

Is it still consistent with distributed training if the optimizer gets the model.parameters() and not dmodel ones?
Also, the same question regarding the model.train() and eval().

Thanks in advance!!

Using `--model_path` with pretrained models returns error

Hi,
Inference runs perfectly if I specify the model using --dns48 or --dns64 or --master64. Example:
python3 -m denoiser.enhance --noisy_dir=../noisy/ --out_dir=../cleaned/ --master64 --sample_rate 16000 will do the job.
However, when I try to specify the path of the pre-trained model explicitly using --model_path or -m it will break. Example:
python3 -m denoiser.enhance --noisy_dir=../noisy/ --out_dir=../cleaned/ --model_path /root/.cache/torch/checkpoints/master64-8a5dfb4bb92753dd.th --sample_rate 16000
will give the following error:

INFO:denoiser.pretrained:Loading model from /root/.cache/torch/checkpoints/master64-8a5dfb4bb92753dd.th
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/denoiser/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/content/denoiser/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/content/denoiser/denoiser/pretrained.py", line 60, in get_model
    model = deserialize_model(pkg)
  File "/content/denoiser/denoiser/utils.py", line 35, in deserialize_model
    klass = package['class']
KeyError: 'class'

What I am missing here?

Thank you for your awesome project!

What parameters to set to train a deeper network?

Hey guys thanks for all the help so far!

I’m curious if I wanted to train from scratch a deeper network, what parameters do I need to change in the yaml config file?

When I fine tuned dns48 and dns64 on 44.1kHz it worked great but I felt that for some reason the higher res frequencies weren’t really as present in the denoiser recording as in the original recording and I’m wondering if that had something to do with the size of the network

Ideally I’d want it to even ‘fill in’ a bit where the higher res frequencies aren’t really audible.

Although maybe this is getting into changing the actual architecture.

training loss increases when finetuning on master64 on VCTK

Hi!
Thanks for the great repo. The work is awesome!
When I tried to finetune a denoising network on VCTK dataset, I observe that the train loss increases after a few epochs. I am loading the pretrained model "master64" and then finetuning on the VCTK dataset. Here is my forked repo with more details:link. Please look at the "history.json" to check the training history.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.