asteroid-team / asteroid Goto Github PK
View Code? Open in Web Editor NEWThe PyTorch-based audio source separation toolkit for researchers
Home Page: https://asteroid-team.github.io/
License: MIT License
The PyTorch-based audio source separation toolkit for researchers
Home Page: https://asteroid-team.github.io/
License: MIT License
Here's my draft for Asteroid CLI design. I guess it's a radical change from what we have at the moment...
Let's discuss only about design here, not implementation. I have already given implementation some thought as well and already have a prototype for some parts of the design, but let's agree to a design first.
Please don't be afraid to critise what you don't like. It is likely that I forgot or did not know of some use cases when coming up with the design.
Assuming you start with an empty hard disk, and want to train a model from scratch.
Steps:
Prepare = Create mixtures, create JSON files, etc.
Download dataset from official URL:
$ asteroid data librimix download
Downloading LibriMix dataset to /tmp/librimix-raw...
Prepare dataset, if necessary. Some datasets don't need preparation, there the prepare
cmd is absent.
$ asteroid data librimix prepare --n-speakers 2 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix2
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (16 kHz) in ~/asteroid-datasets/librimix2... # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix2/dataset.yml.
Generated dataset.yml
:
dataset: "asteroid.data.LibriMix"
n_speakers: 2
train_dir: data/tt
val_dir: data/cv
...
sample_rate: 16000
Pass options to prepare
:
$ asteroid data librimix prepare --n-speakers 3 --sample-rate 8000 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix3
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (8 kHz) in ~/asteroid-datasets/librimix3... # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix3/dataset.yml.
dataset.yml
:
dataset: "asteroid.data.LibriMix"
n_speakers: 3
sample_rate: 8000
train_dir: data/tt
val_dir: data/cv
...
Models have a separate config from datasets (and from experiments, see below). Create one with configure
:
$ asteroid model convtasnet configure > ~/asteroid-models/convtasnet-default.yml
$ asteroid model convtasnet configure --n-filters 1337 > ~/asteroid-models/convtasnet-larger.yml
Generated convtasnet-default.yml
:
n_filters: 512
kernel_size: 16
...
$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml
Saving training parameters to exp/train_convtasnet_exp1/experiment.yml
Training epoch 0/100...
Generated experiment.yml
(Experiment = train or eval) contains model info, dataset info, training info:
data:
# (Copy of dataset.yml)
dataset: "asteroid.data.librimix"
n_speakers: 3
sample_rate: 8000
train_dir: data/tt
val_dir: data/cv
...
model:
# (Copy of convtasnet-default.yml)
model: "asteroid.models.ConvTasNet"
n_filters: 512
kernel_size: 16
...
training:
optim:
optimizer: "adam"
...
batch_size: 5
max_epochs: 100
...
Change model, dataset, or training params in place:
$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml --n-filters 1234 --sample-rate 8000 --batch-size 5 --max-epochs 50
Saving training parameters to exp/train_convtasnet_exp2/experiment.yml
Warning: Resampling dataset to 8 kHz.
Training epoch 0/50...
Continue training from checkpoint:
$ asteroid train --continue exp/train_convtasnet_exp1/
Creating experiment folder exp/train_convtasnet_exp3/...
Saving training parameters to exp/train_convtasnet_exp3/experiment.yml
Continuing training from checkpoint 42.
Training epoch 43/100...
$ asteroid eval --experiment exp/train_convtasnet_exp3/
Saving training parameters to exp/train_convtasnet_exp4/experiment.yml
Evaluating ConvTasNet on LibriMix2...
Can change training params for eval:
$ asteroid eval --experiment exp/train_convtasnet_exp3/ --batch-size 10
Saving training parameters to exp/eval_convtasnet_exp5/experiment.yml
Evaluating ConvTasNet on LibriMix2...
Eval on different dataset:
$ asteroid eval --experiment exp/train_convtasnet_exp3/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp6/experiment.yml
Evaluating ConvTasNet on WSJ0...
$ asteroid download-pretrained "mpariente/DPRNN-LibriMix2-2020-08-13"
Downloading DPRNN trained on LibriMix2 to exp/pretrained_dprnn_exp7...
$ ls exp/pretrained_dprnn_exp7
- dprnn_best.pth
- experiment.yml
...
Eval pretrained:
$ asteroid eval --experiment exp/train_convtasnet_exp7/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp7/experiment.yml
Evaluating DPRNN on WSJ0...
Finetune pretrained on custom dataset:
$ asteroid train --continue exp/pretrained_dprnn_exp7 --data /my/dataset.yml --batch-size 123
...
Hi, thank you for sharing these awesome codes!
Could you please tell me how to get the csv files of Librimix?
https://github.com/mpariente/asteroid/blob/68cb7a3a53afc4b509570e71711ed333ab938e42/asteroid/data/librimix_dataset.py#L14
Thank you!
Fei
Running this code gives the following error:
Illegal instruction
The error happens for both enc and fenc. I am running pytorch version 1.1.0.
Similar issues while running the wham baseline.
Colab notebookkk pls
Hello,
ConvTasNet model can not handle long duration wav files as input,
if the wav has a duration greater than 2 minutes then a crash will occur during the evaluation step :
I want to take the log of mag spec before passing it to the masker but the current post_process_inputs calls inp_func() with predefined set of options. Perhaps it makes sense to keep this flexible, with a callback function?
Do you plan to add support for pretrained models (maybe through torchhub?) I think that would make a really nice addition.
E.g. recipes could be different for training and inference.
The commit verison: 631ef15
As the same as defualt run.sh, the configs are as follows.
kernel size=2| chunk size=250 | batch size=3
To avoid the GPU memory problem, I set 3 GPUs to run and num_works=6.
However, during training stage, early stopping happened in epoch 73 and the program didn't continue to run into evaluation stage. Then I modified the model.py according to issue84 and issue96. Finally, I get the result as follows.
Overall metrics :
{'sar': 17.253943631154222,
'sar_imp': -131.92250498640834,
'sdr': 16.610877080941982,
'sdr_imp': 16.459834866636488,
'si_sdr': 16.222455347439276,
'si_sdr_imp': 16.223606457496334,
'sir': 26.228243298887513,
'sir_imp': 26.077201084582004,
'stoi': 0.9599706205908263,
'stoi_imp': 0.22192459732239528}
The result is different from the mentioned result in README.md.
Do you have any idea about the issue?
A nice hello from the sigsep gang,
This looks like a very nice and ambitious approach. Would love to contribute here. Would you be interested in adding music separation things such as
In ConvTasNet run.sh
, there's a set -e
at the start of the file, so I'd expect the run.sh
script to stop for instance if training has failed. But for me it always "falls through" to the next step, e.g. evaluation, which then fails because training hasn't completed.
Hello, thank you for your work. When will you upload the codes about 'FurcaNeXt'?
Should use mv1 instead of mv2 to get wav files
The line uses both mv2 and mv1 to get wav files. But the mv1 will be covered by mv2, resulting in the generated wav files being from mv2. The mv1 is noise-free while the mv2 is noisy. The wsj0-mix dataset is expected to use mv1.
The correct code is
wav=`echo "$line" | sed "s:wv1:wav:g" | awk -v dir=$wav_dir -F'/' '{printf("%s/%s/%s/%s", dir, $(NF-2), $(NF-1), $NF)}'`
We tested the datasets generated by mv1 and mv2. It is observed that the former can reproduce the results, the latter is worse around 1-2 dB in SI-SNR.
Our results with mv1, the final validation loss was about 2950.
I am sorry that our results with mv2 were deleted, its final validation loss was about 3500.
Add automated tests for egs
Currently these are entirely untested, which means any changes done to the run.sh
scripts etc. must be tested manually. When making changes, testing all the egs is a heavy burden on the developer; some of them even have commercial licenses that not everyone may have access to.
It also means that refactoring egs code takes much more time than could be.
Add a "CI" mode to each eg:
Automatically run this in CI for each eg.
Feedback welcome! It know it's a lot of work but we could easily split it into small steps.
Question
I tried asteroid/egs/wham/DPRNN/run.sh but the error was occurred at the end of the training process.
The messages are below:
~~~
sep_clean_8kmin_7101f1a8/checkpoints/_ckpt_epoch_4.ckpt as top 5
Epoch 5: 100%|โโโโโโโโโโ| 4022/4022 [28:05<00:00, 2.39it/s, loss=-11.728, v_num=0, val_loss=-11.5]
Traceback (most recent call last):
File "train.py", line 121, in <module>
main(arg_dic)
File "train.py", line 92, in main
best_path = [b for b, v in best_k.items() if v == min(best_k.values())][0]
IndexError: list index out of range
~~~
I have tried to add some codes at train.py and confirmed the length of checkpoint.best_k_models.items() is zero.
And best_k_models.json contains only {}.
Does anyone have any idea to fix it?
Let me know if you have any comments.
Environment
Python 3.7.7
torch 1.5.1 (I've tried 1.3.0 but same result)
pytorch-lightning 0.7.6
Ubuntu 18.04 on GCP
Wavesplit uses standard convolutions, not separable (separable worked worse).
Before the residual stacks there is a single convolution layer with kernel size 4, 512 filters, no dilation, no stride.
Anyone interested?
Consistency constraints could useful as well.
Refs #180
So I just wanted to work through the similarities and differences of the models, datasets and egs. First thing I noticed is that for some egs, model and dataset code lives in the eg folder, and for some it lives in the asteroid package. Is there a reason for this (other than historic :-)? If not, what do you think about moving all the model code and dataset code out of the egs?
Hi
First thanks a lot for such an excellent tool for speech separation. I have tried the deep clustering part of wsj0-mix
https://github.com/mpariente/asteroid/tree/master/egs/wsj0-mix/DeepClustering
My performance was poor (si-sdr=3.5, sdr=4.5 in 35 epochs with 1 gpu for training). As reported here, the sdr is expected to be closed to 10dB. I am wondering the reason of the failure. Is there any tricks for training, or more epochs are needed for improvement?
Thanks a lot.
Hi, Nice try in Conv-Tasnet and WSJ0 experiment!
But there is a few question i am confuse about cause i can't get 12.7dB in WHAM which paper suppose. So i would like to know:
Hi,
I am trying task of both separation and denoise on WHAMR!, but my SI-SDR is about 3.94dB, 1dB lower than that in README.
Could you please upload the log so that I can check this by myself ?
Thanks a lot !
why do wsj0 and wsj1 need to be merged? I didn't see wsj1 mentioned in https://github.com/fgnt/sms_wsj
or in https://arxiv.org/abs/1910.13934
,
so how to generate sms_wsj dataset if I only have wsj0?
Tried to use your sample code, and it works stand alone but when training with it, I get "not on right device" errors and some strange error after fixing it.
Here is the code snippet I included:
encoder = Encoder(AnalyticFreeFB(n_filters=512,
kernel_size=256,
stride=128))
I get this error:
File "/workspaces/speechml/AsSteroid/asteroid/filterbanks/enc_dec.py", line 131, in forward
return F.conv1d(waveform, filters, stride=self.stride)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
After fixing this error with this patched code:
def forward(self, waveform):
""" Convolve 1D torch.Tensor with filterbank."""
filters = self.get_filters()
filters = filters.to(waveform.device) # <- Patched code
return F.conv1d(waveform, filters, stride=self.stride)
I still get this error when training with GPU device:
Traceback (most recent call last):
File "/workspaces/speech/scripts/train.py", line 66, in train
loss.backward()
File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: expected device cpu but got device cuda:0
But if I train to CPU only, I get this error:
File "/workspaces/speech/scripts/train.py", line 66, in train
loss.backward()
File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time,
but the buffers have already been freed. Specify retain_graph=True
when calling backward the first time.
Any hints why this is?
Due to sqrt of zero values in https://github.com/mpariente/asteroid/blob/master/asteroid/losses/pmsqe.py#L264
, backward pass while using PMSQE loss function gives the following error: "RuntimeError: Function 'PowBackward0' returned nan values in its 0th output." Adding a smalle epsilon value to sqrt function solves the problem.
As in this paper, implement the three masking strategies : Mag
, ReIm
, Comp
.
Additionally, add the option to not mask at all.
It would be nice to get rid of the prints in the Solver
.
My environment:
When I tried ConvTasNet, I got an error at Stage3 below:
Stage 3: Training
File "train.py", line 50
model = ConvTasNet(**conf['filterbank'], **conf['masknet'])
^
SyntaxError: invalid syntax
Basically the code should be ok because I hadn't changed any codes of train.py.
Does anyone have any idea of solution?
When I tried to ConvTasnet with WHAM (link) from stage 0,
I got a problem at stage 1 โRun python scripts to create the WHAM mixturesโ.
The echoed log was below
Run python scripts to create the WHAM mixtures
16k max dataset, tr split
Completed 500 of 20000 utterances
Completed 1000 of 20000 utterances
Completed 1500 of 20000 utterances
Completed 2000 of 20000 utterances
Completed 2500 of 20000 utterances
Completed 3000 of 20000 utterances
Traceback (most recent call last):
File "create_wham_from_scratch.py", line 118, in <module>
create_wham(args.wsj0_root, args.wham_noise_root, args.output_dir)
File "create_wham_from_scratch.py", line 93, in create_wham
s1_samples, s2_samples, noise_samples = append_or_truncate(s1_samples, s2_samples, noise_samples,
File "/home/ttnt/venv/data/wham/wham_scripts/utils.py", line 47, in append_or_truncate
s1_append[speech_start_sample:speech_end_sample] = s1_samples
ValueError: could not broadcast input array from shape (145927) into shape (145759)
As long as I looked them up, the issue occurred during the 3065th utterance process.
Additionally in utils.py, the s1_samples seemed to be too big to put s1_append[speech_start_sample:speech_end_sample].
The length of them were respectively
If you have some solutions for this problem or something I should do, please let me know.
My environment is below:
Ubuntu 20.04 on WSL(Windows 10)
Python 3.8.2
Thank you.
Dear Asteroid contributors @popcornell @JorisCos @sunits @mhu-coder @jensheit @etzinis @mdjuamart @Ariel12321 @dditter @michelolzam (Future contributors : @faroit)
We intend to submit a paper describing Asteroid to Interpseech 2020 (deadline is May 8th) and as contributors, it seems logic that you appear as co-authors on the paper. You'll be asked to proof-read the final version of the paper, but I guess this is normal. Also, you are welcome to help in any way for the paper, just let me know if you'd like to.
Could you please provide me with your full name and affiliations please (if there are special characters, I'd appreciate the TeX code for it.)? You can do it here or send me an email.
Thanks !
Hi!
First, I would like to thank you for providing such a great tool for speaker separation research. I love it!
However, I have a slight suggestion. When using the recipes from your egs
directory, the best_k_models.json
file gets written only after the whole training is finished. This way you cannot just stop your training earlier and go to the evaluation stage because the json file is needed for it. I suggest modifying the code so that the json file is dumped after the 1st epoch, and then updated after each epoch so that you can interrupt your training and go straight to the evaluation stage.
Cheers
Peter
In the WHAM ConvTasNet scripts, you can set some options in conf.yaml
and some options in run.sh
. The run.sh
ones seem to have precedence over the conf.yaml
ones.
To me it's confusing since I do not see the reason for two places to specify these things. In practice, I never use the run.sh
ones since I want to keep multiple model configurations anyways, so I'll end up having multiple conf.yaml
files.
My suggestion is to remove the options from run.sh
and add a new flag to run.sh
, say --conf
, that is a path to a conf.yaml
file. This way it's obvious where the config is coming from and also you can easily switch between multiple configs.
Hello,
I tried the ConvTasNet recipe (wham dataset),
the current evaluation script provided (eval.py) crashes when the model has been trained with these parameters (16000 Hz, enh_single task) :
data:
mode: min
nondefault_nsrc: null
sample_rate: 16000
task: enh_single
train_dir: data/wav16k/min/tr
valid_dir: data/wav16k/min/cv
filterbank:
kernel_size: 32
n_filters: 512
stride: 16
main_args:
exp_dir: exp/train_convtasnet__16k_enh_single_wham_v5/
gpus: '-1'
help: null
masknet:
bn_chan: 128
hid_chan: 512
mask_act: relu
n_blocks: 8
n_repeats: 3
n_src: 1
skip_chan: 128
optim:
lr: 0.001
optimizer: adam
weight_decay: 0.0
positional arguments: {}
training:
batch_size: 4
early_stop: true
epochs: 200
half_lr: true
num_workers: 8
the evaluation step in run.sh crashes, when the script tries to create wav files, an "index out of range" error message occurs :
Traceback (most recent call last):
File "eval.py", line 118, in <module>
main(arg_dic)
File "eval.py", line 78, in main
conf['sample_rate'])
File ".../lib/python3.7/site-packages/soundfile.py", line 313, in write
channels = data.shape[1]
IndexError: tuple index out of range
Lines in eval.py file that trigger the bug :
#Loop over the sources and estimates
for src_idx, src in enumerate(sources_np):
sf.write(local_save_dir + "s{}.wav".format(src_idx+1), src,
conf['sample_rate'])
for src_idx, est_src in enumerate(est_sources_np):
sf.write(local_save_dir + "s{}_estimate.wav".format(src_idx+1),
est_src, conf['sample_rate'])
I have some questions about the implementation of Conv-TasNet. If you use cLN, it should be a causal model, so it should not be able to see subsequent sequence information in the current convolution operation. The Conv-TasNet model you implemented did not process this content. I don't know if I didn't see it or did not realize the content.
Hi,
When will code for wavesplit be released ?
Thanks
Stage 5 : Evaluate
0%| | 0/150 [00:01<?, ?it/s]
Traceback (most recent call last):
File "eval_on_synthetic.py", line 194, in
main(arg_dic)
File "eval_on_synthetic.py", line 46, in main
save_dir=save_dir)
File "eval_on_synthetic.py", line 110, in evaluate
metrics_list=COMPUTE_METRICS)
File "eval_on_synthetic.py", line 169, in get_metrics
sample_rate=sample_rate)
File "/exports/stuart/sagar/asteroid/venv_kevin/lib/python3.6/site-packages/pb_bss/evaluation/wrapper.py", line 87, in init
self.channels = self.observation.shape[-2]
IndexError: tuple index out of range
I have trained the model with some modifications, but unable to evaluate on test_set
The log is mentioned above.
Thank you
Hello,
Thanks for the repo. It makes working on speech enhancement/blind source separation a lot easier.
Would it make sense to blocklist exp
directories under the egs
directory? This way they don't show up in git status
commands after a recipe has been run. I am not sure if the exp
directory name is a convention followed in every recipe though.
Adding egs/**/exp
to .gitignore
should do the trick. What do you think?
Cheers,
Mathieu
Bugs of reloading model when best_k_models.json not exist
The lines use sort to get the last model, which will performs in a incorrect way in the following situation
>>> all_ckpt=['ckpt_epoch_99.ckpt','ckpt_epoch_100.ckpt','ckpt_epoch_101.ckpt']
>>> all_ckpt.sort()
>>> all_ckpt[-1]
'ckpt_epoch_99.ckpt'
Maybe we can use the following methods:
>>> all_ckpt=[(ckpt,int("".join(filter(str.isdigit,os.path.basename(ckpt))))) for ckpt in all_ckpt if ckpt.find('ckpt')>=0]
>>> all_ckpt.sort(key=lambda x:x[1])
>>> all_ckpt[-1][0]
'ckpt_epoch_101.ckpt'
I have tried the Wham ConvTasnet recipe and the SI-SDR comes out to be 11.9dB whereas the reported numbers are 16.2dB, which is a huge gap. So, I am wondering what is wrong with my setup and config.
CONFIG:
(I have tried different configs. Pasting the best config here)
data:
mode: min
nondefault_nsrc: null
sample_rate: 8000
task: sep_clean
train_dir: data/wav8k/min/tr
valid_dir: data/wav8k/min/cv
filterbank:
kernel_size: 16
n_filters: 512
stride: 8
main_args:
exp_dir: exp/train_convtasnet/
help: null
masknet:
bn_chan: 128
hid_chan: 512
mask_act: relu
n_blocks: 8
n_repeats: 3
n_src: 2
skip_chan: 128
optim:
lr: 0.001
optimizer: adam
weight_decay: 0.0
positional arguments: {}
training:
batch_size: 24
early_stop: true
epochs: 200
half_lr: true
num_workers: 8
Trainer()
argument name changes in 0.8:
max_nb_epoch
-> max_epoch
default_save_path
-> default_root_dir
Either pin pytorch-lightning<0.8
or change the names.
Would someone share training duration per sample for some of the nets?
For example, what would be the training duration for a sample for Conv-TasNet on a single P100 GPU?
DPRNN would also be interesting.
I am looking for a good compromise of training duration and separation/enhancement quality.
Thanks!
bash run.sh --stage 3 --python_path python
Results from the following experiment will be stored in exp/train_convtasnet_sep_clean_8kmin_009b94e6
Stage 3: Training
usage: train.py [-h] [--use_cuda USE_CUDA] [--model_path MODEL_PATH]
[--n_filters N_FILTERS] [--kernel_size KERNEL_SIZE]
[--stride STRIDE] [--n_blocks N_BLOCKS]
[--n_repeats N_REPEATS] [--mask_act MASK_ACT]
[--epochs EPOCHS] [--half_lr HALF_LR]
[--early_stop EARLY_STOP] [--max_norm MAX_NORM]
[--checkpoint CHECKPOINT] [--continue_from CONTINUE_FROM]
[--optimizer OPTIMIZER] [--lr LR]
[--weight_decay WEIGHT_DECAY] [--train_dir TRAIN_DIR]
[--valid_dir VALID_DIR] [--task TASK]
[--nondefault_nsrc NONDEFAULT_NSRC]
[--sample_rate SAMPLE_RATE] [--mode MODE]
[--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS]
train.py: error: argument --continue_from: invalid NoneType value: ''
I looked at your code and found that the default parameter of the method you implemented is dp. I used ddp as the parameter of the distributed method, and found that the result was much worse. Do you have this situation?
As done in masknn.norms
, setting up retrieval based on strings could be pretty useful for optimizers, activation functions and filterbanks in a first stage.
After having created all mixtures and moving onto stage 1 of run.sh, the following errors show up -
Stage 1: Training
Traceback (most recent call last):
File "train.py", line 127, in <module>
{'data': {'n_src': 3,
'sample_rate': 16000,
'segment': 3,
'task': 'sep_noisy',
'train_dir': 'data/wav8k/min/train-360',
'valid_dir': 'data/wav8k/min/dev'},
'filterbank': {'kernel_size': 16, 'n_filters': 512, 'stride': 8},
'main_args': {'exp_dir': 'exp/train_convtasnet_84932317', 'help': None},
'masknet': {'bn_chan': 128,
'hid_chan': 512,
'mask_act': 'relu',
'n_blocks': 8,
'n_repeats': 3,
'skip_chan': 128},
'optim': {'lr': 0.001, 'optimizer': 'adam', 'weight_decay': 0.0},
'positional arguments': {},
'training': {'batch_size': 24,
'early_stop': True,
'epochs': 200,
'half_lr': True,
'num_workers': 4}}
main(arg_dic)
File "train.py", line 33, in main
segment=conf['data']['segment'])
File "/home/subhanjan/asteroid/asteroid/data/librimix_dataset.py", line 52, in __init__
md_file = [f for f in os.listdir(csv_dir) if 'both' in f][0]
FileNotFoundError: [Errno 2] No such file or directory: 'data/wav8k/min/train-360'
I have been trying to train ConvTasNet on n_src=3 for a while now and since LibriMix has this feature conveniently built in, I have been trying to use that, but I have been running into errors with the train_dir, test_dir variables in run.sh. Do they need to be changed? They're being parsed as .csv files so what should these paths be changed to.
This is what my run.sh looks like
storage_dir=../LibriMix3spk
# After running the recipe a first time, you can run it from stage 3 directly to train new models.
# Path to the python you'll use for the experiment. Defaults to the current python
# You can run ./utils/prepare_python_env.sh to create a suitable python environment, paste the output here.
python_path=python
# Example usage
# ./run.sh --stage 3 --tag my_tag --task sep_noisy --id 0,1
# General
stage=0 # Controls from which stage to start
tag="" # Controls the directory name associated to the experiment
# You can ask for several GPUs using id (passed to CUDA_VISIBLE_DEVICES)
id=0
out_dir=librimix # Controls the directory name associated to the evaluation results inside the experiment directory
# Network config
n_blocks=8
n_repeats=3
mask_act=relu
# Training config
epochs=200
batch_size=24
num_workers=4
half_lr=yes
early_stop=yes
# Optim config
optimizer=adam
lr=0.001
weight_decay=0.
# Data config
train_dir=data/wav8k/min/train-360
valid_dir=data/wav8k/min/dev
test_dir=data/wav8k/min/test
sample_rate=16000
n_src=3
segment=3
task=sep_noisy # one of 'enh_single', 'enh_both', 'sep_clean', 'sep_noisy'
Kindly consider making run.sh more user-friendly and modular, since it's the user's only way of interacting with the program.
Hi Manuel,
Are you planning to log the GPU usage during training?
This can be done by using https://github.com/nicolargo/nvidia-ml-py3.
This would help to see the resources without the need for running "watch nvidia-smi" or "nvidia-smi dmon" in a secondary screen.
Thank you
Create a base class for callback e.g Callback
and rewrite learning rate halving and early stopping using it. Rewrite Solver
accordingly.
This would give the user more freedom to define it's own callbacks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.