GithubHelp home page GithubHelp logo

asteroid-team / asteroid Goto Github PK

View Code? Open in Web Editor NEW
2.1K 2.1K 416.0 6.02 MB

The PyTorch-based audio source separation toolkit for researchers

Home Page: https://asteroid-team.github.io/

License: MIT License

Python 87.32% Shell 12.68%
audio-separation deep-learning pretrained-models pytorch source-separation speech-enhancement speech-separation

asteroid's People

Contributors

ariel12321 avatar ben-freist avatar etzinis avatar faroit avatar giorgiacantisani avatar groadabike avatar hangtingchen avatar hihunjin avatar jensheit avatar jonashaag avatar joriscos avatar junzhejosephzhu avatar ldelebec avatar leonieborne avatar mcernak avatar mhu-coder avatar michelolzam avatar mpariente avatar nobel861017 avatar osanseviero avatar popcornell avatar r-sawata avatar saurabh-kataria avatar souppuos avatar subhanjansaha avatar sunits avatar vitrioil avatar xinyi-cheng avatar z-wony avatar zmolikova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asteroid's Issues

RFC: Asteroid CLI design

Here's my draft for Asteroid CLI design. I guess it's a radical change from what we have at the moment...

Let's discuss only about design here, not implementation. I have already given implementation some thought as well and already have a prototype for some parts of the design, but let's agree to a design first.

Please don't be afraid to critise what you don't like. It is likely that I forgot or did not know of some use cases when coming up with the design.


Design goals

  • Separate models, datasets, and experiments (= a model trained on a dataset) from each other.
  • Deduplicate common code.
  • Provide a consistent and convenient interface for users.

API design

Starting from scratch

Assuming you start with an empty hard disk, and want to train a model from scratch.

Steps:

  • Install Asteroid
  • Create dataset config
  • Create model config
  • Run training
  • Run evaluation

Create dataset config (Download and prepare dataset)

Prepare = Create mixtures, create JSON files, etc.

Download dataset from official URL:

$ asteroid data librimix download
Downloading LibriMix dataset to /tmp/librimix-raw...

Prepare dataset, if necessary. Some datasets don't need preparation, there the prepare cmd is absent.

$ asteroid data librimix prepare --n-speakers 2 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix2
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (16 kHz) in ~/asteroid-datasets/librimix2...  # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix2/dataset.yml.

Generated dataset.yml:

dataset: "asteroid.data.LibriMix"
n_speakers: 2
train_dir: data/tt
val_dir: data/cv
...
sample_rate: 16000

Pass options to prepare:

$ asteroid data librimix prepare --n-speakers 3 --sample-rate 8000 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix3
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (8 kHz) in ~/asteroid-datasets/librimix3...  # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix3/dataset.yml.

dataset.yml:

dataset: "asteroid.data.LibriMix"
n_speakers: 3
sample_rate: 8000
train_dir: data/tt
val_dir: data/cv
...

Create model config

Models have a separate config from datasets (and from experiments, see below). Create one with configure:

$ asteroid model convtasnet configure > ~/asteroid-models/convtasnet-default.yml
$ asteroid model convtasnet configure --n-filters 1337 > ~/asteroid-models/convtasnet-larger.yml

Generated convtasnet-default.yml:

n_filters: 512
kernel_size: 16
...

Run training

$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml
Saving training parameters to exp/train_convtasnet_exp1/experiment.yml
Training epoch 0/100...

Generated experiment.yml (Experiment = train or eval) contains model info, dataset info, training info:

data:
  # (Copy of dataset.yml)
  dataset: "asteroid.data.librimix"
  n_speakers: 3
  sample_rate: 8000
  train_dir: data/tt
  val_dir: data/cv
  ...
model:
  # (Copy of convtasnet-default.yml)
  model: "asteroid.models.ConvTasNet"
  n_filters: 512
  kernel_size: 16
  ...
training:
  optim:
    optimizer: "adam"
    ...
  batch_size: 5
  max_epochs: 100
  ...

Change model, dataset, or training params in place:

$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml --n-filters 1234 --sample-rate 8000 --batch-size 5 --max-epochs 50
Saving training parameters to exp/train_convtasnet_exp2/experiment.yml
Warning: Resampling dataset to 8 kHz.
Training epoch 0/50...

Continue training from checkpoint:

$ asteroid train --continue exp/train_convtasnet_exp1/
Creating experiment folder exp/train_convtasnet_exp3/...
Saving training parameters to exp/train_convtasnet_exp3/experiment.yml
Continuing training from checkpoint 42.
Training epoch 43/100...

Run evaluation

$ asteroid eval --experiment exp/train_convtasnet_exp3/
Saving training parameters to exp/train_convtasnet_exp4/experiment.yml
Evaluating ConvTasNet on LibriMix2...

Can change training params for eval:

$ asteroid eval --experiment exp/train_convtasnet_exp3/ --batch-size 10
Saving training parameters to exp/eval_convtasnet_exp5/experiment.yml
Evaluating ConvTasNet on LibriMix2...

Eval on different dataset:

$ asteroid eval --experiment exp/train_convtasnet_exp3/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp6/experiment.yml
Evaluating ConvTasNet on WSJ0...

Starting from pretrained

$ asteroid download-pretrained "mpariente/DPRNN-LibriMix2-2020-08-13"
Downloading DPRNN trained on LibriMix2 to exp/pretrained_dprnn_exp7...
$ ls exp/pretrained_dprnn_exp7
- dprnn_best.pth
- experiment.yml
...

Eval pretrained:

$ asteroid eval --experiment exp/train_convtasnet_exp7/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp7/experiment.yml
Evaluating DPRNN on WSJ0...

Finetune pretrained on custom dataset:

$ asteroid train --continue exp/pretrained_dprnn_exp7 --data /my/dataset.yml --batch-size 123
...

Call to Filterbanks fail

Running this code gives the following error:

Illegal instruction

The error happens for both enc and fenc. I am running pytorch version 1.1.0.
Similar issues while running the wham baseline.

Colab

Colab notebookkk pls

ConvTasNet model can not handle long duration wav files

Hello,

ConvTasNet model can not handle long duration wav files as input,
if the wav has a duration greater than 2 minutes then a crash will occur during the evaluation step :

  • cuda error (if GPU is used)
  • segmentation fault (if CPU mode is used)

Post processing encoder/decoder

I want to take the log of mag spec before passing it to the masker but the current post_process_inputs calls inp_func() with predefined set of options. Perhaps it makes sense to keep this flexible, with a callback function?

Pretrained models

Do you plan to add support for pretrained models (maybe through torchhub?) I think that would make a really nice addition.

E.g. recipes could be different for training and inference.

About result of DPRNN in wham dataset

The commit verison: 631ef15
As the same as defualt run.sh, the configs are as follows.
kernel size=2| chunk size=250 | batch size=3

To avoid the GPU memory problem, I set 3 GPUs to run and num_works=6.

However, during training stage, early stopping happened in epoch 73 and the program didn't continue to run into evaluation stage. Then I modified the model.py according to issue84 and issue96. Finally, I get the result as follows.

Overall metrics :
{'sar': 17.253943631154222,
 'sar_imp': -131.92250498640834,
 'sdr': 16.610877080941982,
 'sdr_imp': 16.459834866636488,
 'si_sdr': 16.222455347439276,
 'si_sdr_imp': 16.223606457496334,
 'sir': 26.228243298887513,
 'sir_imp': 26.077201084582004,
 'stoi': 0.9599706205908263,
  'stoi_imp': 0.22192459732239528}

The result is different from the mentioned result in README.md.

Do you have any idea about the issue?

Music separation datasets and recipes

A nice hello from the sigsep gang,

This looks like a very nice and ambitious approach. Would love to contribute here. Would you be interested in adding music separation things such as

  • The musdb dataset
  • Music specific augmentations
  • multichannel support throughout the package
  • Open-unmix model

set -e not working?

In ConvTasNet run.sh, there's a set -e at the start of the file, so I'd expect the run.sh script to stop for instance if training has failed. But for me it always "falls through" to the next step, e.g. evaluation, which then fails because training hasn't completed.

FurcaNeXt

Hello, thank you for your work. When will you upload the codes about 'FurcaNeXt'?

Bugs in generating wsj0-2mix dataset

๐Ÿ› Bug

Should use mv1 instead of mv2 to get wav files

To Reproduce

https://github.com/mpariente/asteroid/blob/0bdec2644f2d770d037ce804b7f70cb98bd5c9fa/egs/wsj0-mix/DeepClustering/local/convert_sphere2wav.sh#L31

The line uses both mv2 and mv1 to get wav files. But the mv1 will be covered by mv2, resulting in the generated wav files being from mv2. The mv1 is noise-free while the mv2 is noisy. The wsj0-mix dataset is expected to use mv1.

The correct code is
wav=`echo "$line" | sed "s:wv1:wav:g" | awk -v dir=$wav_dir -F'/' '{printf("%s/%s/%s/%s", dir, $(NF-2), $(NF-1), $NF)}'`

Expected behavior

We tested the datasets generated by mv1 and mv2. It is observed that the former can reproduce the results, the latter is worse around 1-2 dB in SI-SNR.

Our results with mv1, the final validation loss was about 2950.
ๅพฎไฟกๆˆชๅ›พ_20200709163058

I am sorry that our results with mv2 were deleted, its final validation loss was about 3500.

Environment

  • Asteroid-master
  • PyTorch 1.4.0
  • PyTorchLightning 7.6.1

Automated tests for egs

๐Ÿš€ Feature

Add automated tests for egs

Motivation

Currently these are entirely untested, which means any changes done to the run.sh scripts etc. must be tested manually. When making changes, testing all the egs is a heavy burden on the developer; some of them even have commercial licenses that not everyone may have access to.

It also means that refactoring egs code takes much more time than could be.

What you'd like

Add a "CI" mode to each eg:

  • Run mini dataset preprocessing (maybe with autogenerated garbage source files so you do not need the actual dataset?)
  • Run a mini epoch with a few batches
  • Run eval

Automatically run this in CI for each eg.

Feedback welcome! It know it's a lot of work but we could easily split it into small steps.

Models not saved during training

Question
I tried asteroid/egs/wham/DPRNN/run.sh but the error was occurred at the end of the training process.
The messages are below:

~~~
sep_clean_8kmin_7101f1a8/checkpoints/_ckpt_epoch_4.ckpt as top 5
Epoch 5: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4022/4022 [28:05<00:00,  2.39it/s, loss=-11.728, v_num=0, val_loss=-11.5]
Traceback (most recent call last):
  File "train.py", line 121, in <module>
    main(arg_dic)
  File "train.py", line 92, in main
    best_path = [b for b, v in best_k.items() if v == min(best_k.values())][0]
IndexError: list index out of range
~~~

I have tried to add some codes at train.py and confirmed the length of checkpoint.best_k_models.items() is zero.
And best_k_models.json contains only {}.

Does anyone have any idea to fix it?
Let me know if you have any comments.

Environment

  • Python 3.7.7

  • torch 1.5.1 (I've tried 1.3.0 but same result)

  • pytorch-lightning 0.7.6

  • Ubuntu 18.04 on GCP

Cleanly separating model and dataset code from egs/

Refs #180

So I just wanted to work through the similarities and differences of the models, datasets and egs. First thing I noticed is that for some egs, model and dataset code lives in the eg folder, and for some it lives in the asteroid package. Is there a reason for this (other than historic :-)? If not, what do you think about moving all the model code and dataset code out of the egs?

The poor performance of DeepClustering

Hi
First thanks a lot for such an excellent tool for speech separation. I have tried the deep clustering part of wsj0-mix
https://github.com/mpariente/asteroid/tree/master/egs/wsj0-mix/DeepClustering
My performance was poor (si-sdr=3.5, sdr=4.5 in 35 epochs with 1 gpu for training). As reported here, the sdr is expected to be closed to 10dB. I am wondering the reason of the failure. Is there any tricks for training, or more epochs are needed for improvement?

Thanks a lot.

Container issues

  1. How to handle multi output coming from masker? Example, clustering output and mask in case of Deep clustering. Right now the masker expects only a single output
  2. Applying the mask: I need to apply the mask on the masker input (magnitude spectra and not the real and imaginary part). Will have to rethink the design

Problem of Result and wham wav.

Hi, Nice try in Conv-Tasnet and WSJ0 experiment!
But there is a few question i am confuse about cause i can't get 12.7dB in WHAM which paper suppose. So i would like to know:

  1. Is there any difference between wsj0 and wham! when separated by convtasnet?
  2. I found same wav in WSJ0 and WHAM have different bit. The one in WHAM is usually twice bigger as the one is wsj0. Is there any reson about it?
    Thanks for your answer!

Yet another problem

Hi again,
Trying to download and install dns_challenge data.
After updating your last fix, i'm in this status:
Screenshot from 2020-03-05 15-20-22
I would appreciate any assistance.

WHAMR!: sep_reverb_noisy 1dB lower than the reported performance

โ“ Questions and Help

What is your question?

Hi,
I am trying task of both separation and denoise on WHAMR!, but my SI-SDR is about 3.94dB, 1dB lower than that in README.
Could you please upload the log so that I can check this by myself ?

Thanks a lot !

about sms_wsj

why do wsj0 and wsj1 need to be merged? I didn't see wsj1 mentioned in https://github.com/fgnt/sms_wsj or in https://arxiv.org/abs/1910.13934,
so how to generate sms_wsj dataset if I only have wsj0?

Several errors when calling loss.backward with AnalyticFreeFB

Tried to use your sample code, and it works stand alone but when training with it, I get "not on right device" errors and some strange error after fixing it.

Here is the code snippet I included:

encoder = Encoder(AnalyticFreeFB(n_filters=512,
                                 kernel_size=256,
                                 stride=128))

I get this error:

  File "/workspaces/speechml/AsSteroid/asteroid/filterbanks/enc_dec.py", line 131, in forward
    return F.conv1d(waveform, filters, stride=self.stride)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

After fixing this error with this patched code:

def forward(self, waveform):
    """ Convolve 1D torch.Tensor with filterbank."""
    filters = self.get_filters()

    filters = filters.to(waveform.device)  # <- Patched code 

    return F.conv1d(waveform, filters, stride=self.stride)

I still get this error when training with GPU device:

Traceback (most recent call last):
  File "/workspaces/speech/scripts/train.py", line 66, in train
    loss.backward()
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: expected device cpu but got device cuda:0

But if I train to CPU only, I get this error:

  File "/workspaces/speech/scripts/train.py", line 66, in train
    loss.backward()
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, 
but the buffers have already been freed. Specify retain_graph=True 
when calling backward the first time.

Any hints why this is?

backward pass on SingleSrcPMSQE returns NaN error

Due to sqrt of zero values in https://github.com/mpariente/asteroid/blob/master/asteroid/losses/pmsqe.py#L264, backward pass while using PMSQE loss function gives the following error: "RuntimeError: Function 'PowBackward0' returned nan values in its 0th output." Adding a smalle epsilon value to sqrt function solves the problem.

invalid syntax at ConvTasNet

My environment:

  • Ubuntu 14.04
  • Python 3.7.7

When I tried ConvTasNet, I got an error at Stage3 below:

Stage 3: Training
  File "train.py", line 50
    model = ConvTasNet(**conf['filterbank'], **conf['masknet'])
                                           ^
SyntaxError: invalid syntax

Basically the code should be ok because I hadn't changed any codes of train.py.
Does anyone have any idea of solution?

Not creating the WHAM mixture

When I tried to ConvTasnet with WHAM (link) from stage 0,
I got a problem at stage 1 โ€˜Run python scripts to create the WHAM mixturesโ€™.

The echoed log was below

Run python scripts to create the WHAM mixtures
16k max dataset, tr split
Completed 500 of 20000 utterances
Completed 1000 of 20000 utterances
Completed 1500 of 20000 utterances
Completed 2000 of 20000 utterances
Completed 2500 of 20000 utterances
Completed 3000 of 20000 utterances
Traceback (most recent call last):
 File "create_wham_from_scratch.py", line 118, in <module>
   create_wham(args.wsj0_root, args.wham_noise_root, args.output_dir)
 File "create_wham_from_scratch.py", line 93, in create_wham
   s1_samples, s2_samples, noise_samples = append_or_truncate(s1_samples, s2_samples, noise_samples,
 File "/home/ttnt/venv/data/wham/wham_scripts/utils.py", line 47, in append_or_truncate
   s1_append[speech_start_sample:speech_end_sample] = s1_samples
ValueError: could not broadcast input array from shape (145927) into shape (145759)

As long as I looked them up, the issue occurred during the 3065th utterance process.
Additionally in utils.py, the s1_samples seemed to be too big to put s1_append[speech_start_sample:speech_end_sample].
The length of them were respectively

  • s1_append: 176252
  • s1_append[speech_start_sample:speech_end_sample]: 145759
  • s1_samples: 145927.

If you have some solutions for this problem or something I should do, please let me know.

My environment is below:
Ubuntu 20.04 on WSL(Windows 10)
Python 3.8.2

Thank you.

Asteroid paper as Interspeech 2020

Dear Asteroid contributors @popcornell @JorisCos @sunits @mhu-coder @jensheit @etzinis @mdjuamart @Ariel12321 @dditter @michelolzam (Future contributors : @faroit)

We intend to submit a paper describing Asteroid to Interpseech 2020 (deadline is May 8th) and as contributors, it seems logic that you appear as co-authors on the paper. You'll be asked to proof-read the final version of the paper, but I guess this is normal. Also, you are welcome to help in any way for the paper, just let me know if you'd like to.

Could you please provide me with your full name and affiliations please (if there are special characters, I'd appreciate the TeX code for it.)? You can do it here or send me an email.

Thanks !

Writing best_k_models.json after every epoch

Hi!

First, I would like to thank you for providing such a great tool for speaker separation research. I love it!
However, I have a slight suggestion. When using the recipes from your egs directory, the best_k_models.json file gets written only after the whole training is finished. This way you cannot just stop your training earlier and go to the evaluation stage because the json file is needed for it. I suggest modifying the code so that the json file is dumped after the 1st epoch, and then updated after each epoch so that you can interrupt your training and go straight to the evaluation stage.

Cheers

Peter

conf.yaml options vs. run.sh options

In the WHAM ConvTasNet scripts, you can set some options in conf.yaml and some options in run.sh. The run.sh ones seem to have precedence over the conf.yaml ones.

To me it's confusing since I do not see the reason for two places to specify these things. In practice, I never use the run.sh ones since I want to keep multiple model configurations anyways, so I'll end up having multiple conf.yaml files.

My suggestion is to remove the options from run.sh and add a new flag to run.sh, say --conf, that is a path to a conf.yaml file. This way it's obvious where the config is coming from and also you can easily switch between multiple configs.

Crash during evaluation of ConvTasNet recipe with 16000 Hz, enh_single

Hello,

I tried the ConvTasNet recipe (wham dataset),

the current evaluation script provided (eval.py) crashes when the model has been trained with these parameters (16000 Hz, enh_single task) :

data:
  mode: min
  nondefault_nsrc: null
  sample_rate: 16000
  task: enh_single
  train_dir: data/wav16k/min/tr
  valid_dir: data/wav16k/min/cv
filterbank:
  kernel_size: 32
  n_filters: 512
  stride: 16
main_args:
  exp_dir: exp/train_convtasnet__16k_enh_single_wham_v5/
  gpus: '-1'
  help: null
masknet:
  bn_chan: 128
  hid_chan: 512
  mask_act: relu
  n_blocks: 8
  n_repeats: 3
  n_src: 1
  skip_chan: 128
optim:
  lr: 0.001
  optimizer: adam
  weight_decay: 0.0
positional arguments: {}
training:
  batch_size: 4
  early_stop: true
  epochs: 200
  half_lr: true
  num_workers: 8

the evaluation step in run.sh crashes, when the script tries to create wav files, an "index out of range" error message occurs :

Traceback (most recent call last):
  File "eval.py", line 118, in <module>
    main(arg_dic)
  File "eval.py", line 78, in main
    conf['sample_rate'])
  File ".../lib/python3.7/site-packages/soundfile.py", line 313, in write
    channels = data.shape[1]
IndexError: tuple index out of range

Lines in eval.py file that trigger the bug :

            #Loop over the sources and estimates
            for src_idx, src in enumerate(sources_np):
                sf.write(local_save_dir + "s{}.wav".format(src_idx+1), src,
                         conf['sample_rate'])
            for src_idx, est_src in enumerate(est_sources_np):
                sf.write(local_save_dir + "s{}_estimate.wav".format(src_idx+1),
                         est_src, conf['sample_rate'])

Implementation of Conv-TasNet using cLN

I have some questions about the implementation of Conv-TasNet. If you use cLN, it should be a causal model, so it should not be able to see subsequent sequence information in the current convolution operation. The Conv-TasNet model you implemented did not process this content. I don't know if I didn't see it or did not realize the content.

Unable to run evaluate step in dns challenge implementation

Stage 5 : Evaluate
0%| | 0/150 [00:01<?, ?it/s]
Traceback (most recent call last):
File "eval_on_synthetic.py", line 194, in
main(arg_dic)
File "eval_on_synthetic.py", line 46, in main
save_dir=save_dir)
File "eval_on_synthetic.py", line 110, in evaluate
metrics_list=COMPUTE_METRICS)
File "eval_on_synthetic.py", line 169, in get_metrics
sample_rate=sample_rate)
File "/exports/stuart/sagar/asteroid/venv_kevin/lib/python3.6/site-packages/pb_bss/evaluation/wrapper.py", line 87, in init
self.channels = self.observation.shape[-2]
IndexError: tuple index out of range

I have trained the model with some modifications, but unable to evaluate on test_set
The log is mentioned above.

Thank you

Blocklist exp directories

Hello,
Thanks for the repo. It makes working on speech enhancement/blind source separation a lot easier.

Would it make sense to blocklist exp directories under the egs directory? This way they don't show up in git status commands after a recipe has been run. I am not sure if the exp directory name is a convention followed in every recipe though.

Adding egs/**/exp to .gitignore should do the trick. What do you think?

Cheers,
Mathieu

Bugs of reloading model

๐Ÿ› Bug

Bugs of reloading model when best_k_models.json not exist

To Reproduce

https://github.com/mpariente/asteroid/blob/0bdec2644f2d770d037ce804b7f70cb98bd5c9fa/egs/wsj0-mix/DeepClustering/model.py#L156-L157

The lines use sort to get the last model, which will performs in a incorrect way in the following situation

>>> all_ckpt=['ckpt_epoch_99.ckpt','ckpt_epoch_100.ckpt','ckpt_epoch_101.ckpt']
>>> all_ckpt.sort()
>>> all_ckpt[-1]
'ckpt_epoch_99.ckpt'

Expected behavior

Maybe we can use the following methods:

>>> all_ckpt=[(ckpt,int("".join(filter(str.isdigit,os.path.basename(ckpt))))) for ckpt in all_ckpt if ckpt.find('ckpt')>=0]
>>> all_ckpt.sort(key=lambda x:x[1])
>>> all_ckpt[-1][0]
'ckpt_epoch_101.ckpt'

Environment

  • Python 3.7

WHAM ConvTasnet recipe 4.5dB worse than reported numbers

I have tried the Wham ConvTasnet recipe and the SI-SDR comes out to be 11.9dB whereas the reported numbers are 16.2dB, which is a huge gap. So, I am wondering what is wrong with my setup and config.
CONFIG:
(I have tried different configs. Pasting the best config here)
data:
mode: min
nondefault_nsrc: null
sample_rate: 8000
task: sep_clean
train_dir: data/wav8k/min/tr
valid_dir: data/wav8k/min/cv
filterbank:
kernel_size: 16
n_filters: 512
stride: 8
main_args:
exp_dir: exp/train_convtasnet/
help: null
masknet:
bn_chan: 128
hid_chan: 512
mask_act: relu
n_blocks: 8
n_repeats: 3
n_src: 2
skip_chan: 128
optim:
lr: 0.001
optimizer: adam
weight_decay: 0.0
positional arguments: {}
training:
batch_size: 24
early_stop: true
epochs: 200
half_lr: true
num_workers: 8

pytorch-lightning deprecations

๐Ÿ› Bug

Trainer() argument name changes in 0.8:

  • max_nb_epoch -> max_epoch
  • default_save_path -> default_root_dir

Either pin pytorch-lightning<0.8 or change the names.

Question: training duration per sample

Would someone share training duration per sample for some of the nets?

For example, what would be the training duration for a sample for Conv-TasNet on a single P100 GPU?

DPRNN would also be interesting.

I am looking for a good compromise of training duration and separation/enhancement quality.

Thanks!

Issues with argparser on wham dataset

bash run.sh --stage 3 --python_path python
Results from the following experiment will be stored in exp/train_convtasnet_sep_clean_8kmin_009b94e6
Stage 3: Training
usage: train.py [-h] [--use_cuda USE_CUDA] [--model_path MODEL_PATH]
[--n_filters N_FILTERS] [--kernel_size KERNEL_SIZE]
[--stride STRIDE] [--n_blocks N_BLOCKS]
[--n_repeats N_REPEATS] [--mask_act MASK_ACT]
[--epochs EPOCHS] [--half_lr HALF_LR]
[--early_stop EARLY_STOP] [--max_norm MAX_NORM]
[--checkpoint CHECKPOINT] [--continue_from CONTINUE_FROM]
[--optimizer OPTIMIZER] [--lr LR]
[--weight_decay WEIGHT_DECAY] [--train_dir TRAIN_DIR]
[--valid_dir VALID_DIR] [--task TASK]
[--nondefault_nsrc NONDEFAULT_NSRC]
[--sample_rate SAMPLE_RATE] [--mode MODE]
[--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS]
train.py: error: argument --continue_from: invalid NoneType value: ''

Setup retrieval based on string

As done in masknn.norms, setting up retrieval based on strings could be pretty useful for optimizers, activation functions and filterbanks in a first stage.

Error in Training ConvTasNet on LibriMix

๐Ÿ› Bug

To Reproduce

After having created all mixtures and moving onto stage 1 of run.sh, the following errors show up -

Stage 1: Training
Traceback (most recent call last):
  File "train.py", line 127, in <module>
{'data': {'n_src': 3,
          'sample_rate': 16000,
          'segment': 3,
          'task': 'sep_noisy',
          'train_dir': 'data/wav8k/min/train-360',
          'valid_dir': 'data/wav8k/min/dev'},
 'filterbank': {'kernel_size': 16, 'n_filters': 512, 'stride': 8},
 'main_args': {'exp_dir': 'exp/train_convtasnet_84932317', 'help': None},
 'masknet': {'bn_chan': 128,
             'hid_chan': 512,
             'mask_act': 'relu',
             'n_blocks': 8,
             'n_repeats': 3,
             'skip_chan': 128},
 'optim': {'lr': 0.001, 'optimizer': 'adam', 'weight_decay': 0.0},
 'positional arguments': {},
 'training': {'batch_size': 24,
              'early_stop': True,
              'epochs': 200,
              'half_lr': True,
              'num_workers': 4}}
    main(arg_dic)
  File "train.py", line 33, in main
    segment=conf['data']['segment'])
  File "/home/subhanjan/asteroid/asteroid/data/librimix_dataset.py", line 52, in __init__
    md_file = [f for f in os.listdir(csv_dir) if 'both' in f][0]
FileNotFoundError: [Errno 2] No such file or directory: 'data/wav8k/min/train-360'

Expected behavior

I have been trying to train ConvTasNet on n_src=3 for a while now and since LibriMix has this feature conveniently built in, I have been trying to use that, but I have been running into errors with the train_dir, test_dir variables in run.sh. Do they need to be changed? They're being parsed as .csv files so what should these paths be changed to.

Environment

This is what my run.sh looks like

storage_dir=../LibriMix3spk

# After running the recipe a first time, you can run it from stage 3 directly to train new models.

# Path to the python you'll use for the experiment. Defaults to the current python
# You can run ./utils/prepare_python_env.sh to create a suitable python environment, paste the output here.
python_path=python

# Example usage
# ./run.sh --stage 3 --tag my_tag --task sep_noisy --id 0,1

# General
stage=0  # Controls from which stage to start
tag=""  # Controls the directory name associated to the experiment
# You can ask for several GPUs using id (passed to CUDA_VISIBLE_DEVICES)
id=0
out_dir=librimix # Controls the directory name associated to the evaluation results inside the experiment directory

# Network config
n_blocks=8
n_repeats=3
mask_act=relu
# Training config
epochs=200
batch_size=24
num_workers=4
half_lr=yes
early_stop=yes
# Optim config
optimizer=adam
lr=0.001
weight_decay=0.
# Data config
train_dir=data/wav8k/min/train-360
valid_dir=data/wav8k/min/dev
test_dir=data/wav8k/min/test
sample_rate=16000
n_src=3
segment=3
task=sep_noisy  # one of 'enh_single', 'enh_both', 'sep_clean', 'sep_noisy'

Kindly consider making run.sh more user-friendly and modular, since it's the user's only way of interacting with the program.

NVIDIA-SMI

Hi Manuel,
Are you planning to log the GPU usage during training?
This can be done by using https://github.com/nicolargo/nvidia-ml-py3.
This would help to see the resources without the need for running "watch nvidia-smi" or "nvidia-smi dmon" in a secondary screen.
Thank you

Create callbacks for Solver

Create a base class for callback e.g Callback and rewrite learning rate halving and early stopping using it. Rewrite Solver accordingly.
This would give the user more freedom to define it's own callbacks.

Use pytorch-lightning for training?

This would solve #6 and #3 .

  • Check flexibility with respect to :
    • Model saving and loading (need to instantiate or not?)
    • Optimizers and callbacks
    • config.yml files in egs
    • Loss classes to handle PIT nicely
    • Unified dataset and dataloader structure?

Any feedback on that from anybody?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.