freewym / espresso Goto Github PK

View Code? Open in Web Editor NEW

940.0 42.0 116.0 17.59 MB

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

License: Other

Python 98.20% C++ 0.45% Lua 0.09% Shell 0.09% Makefile 0.06% Cuda 0.82% Cython 0.28%

python pytorch fairseq kaldi end-to-end speech-recognition asr

espresso's People

Contributors

Stargazers

Watchers

Forkers

rohithkodali entn-at lvhang qoboty marvis mowayao shujian2015 xdcesc xzm2004260 xinkez abhishekyana ideaplexus huguanglong wgfi110 desh2608 dragomirradev asrivast13 tchigher cuulee hhy5277 tarsbase barseghyanartur hadryan zqma2 priestd09 lanpn85 ze6 sahanduiuc chenchy caoyuhang hongwen-sun twistedmove wendonggan crosstuck rpersie baozixifan huangziliandy lahiruts 1163710124 alan101-tech srgangireddy by2101 chienlinhuang1116 laurii hopeskair yyttyy wqn628 xiaming9880 mbrukman feddybear fei00wu mahbubnoor jinyiyang-jhu dertilo valentinp72 newhousewhite messiaen dendisuhubdy luyizhou4 1div0 tolysz luweishuang 5l1v3r1 zhiguangzhang agangzz vyraun nonlocal shiyuzh2007 peter05010402 chunchiehchang 200987299 hiyoung-asr joyfish kingfener arendu-zz yfliao 53x rheehot abubakar26 medbar 23lnlx pzelasko markwucl shantanunair mirishkarganesh amirhussein96 donstang aarora8 russ76 jmiller0711 goodbyedk sshuster rxhmdia rakhi-alina pengyizhou shirosweets trendingtechnology mbencherif sciai-ai slikos

espresso's Issues

Error in training stage of run_chain_e2e_bichar.sh: 'odict_items' object is not an iterator

I am trying to train a mode on a custom dataset.
All the data preparation stages are done flawlessly (at least seems like that).
But at the begining of the training stage (stage =6) of run_chain_e2e_bichar.sh I get the following error:

<class 'odict_items'> Traceback (most recent call last): File "../../fairseq_cli/train.py", line 510, in <module> cli_main() File "../../fairseq_cli/train.py", line 503, in cli_main distributed_utils.call_main(cfg, main) File "../../fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../fairseq_cli/train.py", line 86, in main task = tasks.setup_task(cfg.task) File "../../espresso/fairseq/tasks/__init__.py", line 44, in setup_task return task.setup_task(cfg, **kwargs) File "../../espresso/espresso/tasks/speech_recognition_hybrid.py", line 432, in setup_task src_dataset = get_asr_dataset_from_json(data_path, split, dictionary, combine=False).src File "../../espresso/espresso/tasks/speech_recognition_hybrid.py", line 236, in get_asr_dataset_from_json if "feat" in next(loaded_json.items()): TypeError: 'odict_items' object is not an iterator

I also tried the biphone trainaing (run_chain_e2e.sh) and got exactly the same error.
Any Ideas on what is the problem or what I am doing wrong are appreciated.
Thank you.

espresso Version: master
PyTorch Version: 1.8.1
OS: CentOS Linux 7
Python version: 3.6

missing file

In one of the READMEs, https://github.com/freewym/espresso/tree/master/examples/speech_recognition#training-librispeech-data,

the command used is python train.py ... but that folder does not have a train.py file.

Deprecated `AT_CHECK` in pychain module

Since I don't seem to be able to report issues in PyChain, reporting the compilation issue here:
In pychain/pytorch_binding/src/pychain.cc:23, the AT_CHECK macro seems to be already deprecated in pytorch 1.5. I had to change it to TORCH_CHECK to finish the compilation.

Support for RNNLM use while decoding?

Does Espresso support Beam search decoding with RNNLM or another LM?

Also is there a version requirement for KALDI? Which KALDI installation is ok to connect to Espresso?

Transformer LM in ASR

Hi,
Thanks you for providing the transformer ASR recipe. Is it possible to use transformer language model instead of lstm? I have seen the recipe run script, it provides use_transformer option in the Acoustic Model training. But in the language model training it does not provide such option.

Thank you in advance for your answer.
Martha.

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

Code

What have you tried?

What's your environment?

fairseq Version (e.g., 1.0 or master):
PyTorch Version (e.g., 1.0)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Any plan for SpecAug?

Thanks for open sourcing this great package! Is there any plan to add SpecAug or other augmentation methods into the code base?

token_text as outputs

❓ Questions and Help

Hello,

I'm a bit confused after reading changes from #58. I was currently using token_text for my work, and I would prefer to continue, if it is possible, using token_text instead of text (because I use special tags similar to <space>).
After reading changes from #58 I have the impression that it's still possible to use token_text as outputs for ASR systems, is it right?

If so, what argument should be given to the training script?
Which no additional argument, when training with JSON that includes token_text instead of text, I keep getting this error:

Traceback (most recent call last):
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/export/home/lium/vpelloin/git/espresso/fairseq/distributed/utils.py", line 328, in distributed_main
    main(cfg, **kwargs)
  File "/export/home/lium/vpelloin/git/espresso/fairseq_cli/train.py", line 176, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/export/home/lium/vpelloin/git/espresso/fairseq_cli/train.py", line 287, in train
    log_output = trainer.train_step(samples)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/export/home/lium/vpelloin/git/espresso/fairseq/trainer.py", line 674, in train_step
    ignore_grad=is_dummy_batch,
  File "/export/home/lium/vpelloin/git/espresso/fairseq/tasks/fairseq_task.py", line 476, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/lium/vpelloin/git/espresso/espresso/criterions/label_smoothed_cross_entropy_v2.py", line 150, in forward
    net_output = model(**sample["net_input"], epoch=self.epoch)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/lium/vpelloin/git/espresso/fairseq/distributed/module_proxy_wrapper.py", line 55, in forward
    return self.module(*args, **kwargs)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/lium/vpelloin/git/espresso/fairseq/distributed/legacy_distributed_data_parallel.py", line 74, in forward
    return self.module(*inputs, **kwargs)
  File "/lium/home/vpelloin/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'prev_output_tokens'

Thank you so much for the incredible work you're doing with this tool!

Problem with Long Utterances for MALACH Corpus

I am trying to use espresso to decode the MALACH Corpus. One of the characteristics of MALACH is that the training utterances are all short ( < 8 secs on the whole) but the test data contains a significant number of long utterances ( . 20 seconds). I am observing that on these long utterances it produces decent output for the first 5-6 seconds, deteriorates rapidly thereafter, puts out some repeated words, and then stops decoding resulting in many deletions. This is for a transformer model based on the wsj recipe. MALACH has about 160 hours of training data. I would welcome some suggestions/help here - it almost looks like some parameter setting would fix things.

Thanks
Michael

Instructions for Training from Scratch?

Hi,
Thanks for releasing this code.
I am trying to do ASR for Gujarati (an Indian Language) and have custom labelled data. It would be great it you could release a README file for:

how to train models from scratch
how to perform inference from scratch

Thanks,
Kalpit

Using wav2vec with Espresso

Wav2vec is included under examples. Can it be used with Espresso and are there any examples where features from hdf5 files are used in Espresso?

All the best

please could you explain me how can I train and inference using custom dataset?

I am working on korean dataset.

Transformer model recipe for Librispeech is not working

🐛 Bug

Failed running the librispeech training with speech_transformer_librispeech architecture.

To Reproduce
Steps to reproduce the behavior (always include the command you ran):

When I change the script "run.sh" under folder "<espresso_root>/examples/asr_librispeech" to use arch "speech_transformer_librispeech",.

I changed the "run.sh" on below lines:

# Just start training and skip other preparation process
# stage=1 
stage=8

and

# Change arch from speech_conv_lstm_librispeech to speech_transformer_librispeech
  CUDA_VISIBLE_DEVICES=$free_gpu speech_train.py data --task speech_recognition_espresso --seed 1 --user-dir espresso \
    --num-workers 0 --data-buffer-size 0 --max-tokens 26000 --max-sentences 24 --curriculum 1 \
    --valid-subset $valid_subset --max-sentences-valid 48 --ddp-backend no_c10d \
    --distributed-world-size $ngpus --distributed-port $(if [ $ngpus -gt 1 ]; then echo 100; else echo -1; fi) \
    --optimizer adam --lr 0.001 --weight-decay 0.0 --clip-norm 2.0 \
    --save-dir $dir --restore-file checkpoint_last.pt --save-interval-updates $((6000/ngpus)) \
    --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 --best-checkpoint-metric wer \
    --dict $dict --bpe sentencepiece --sentencepiece-vocab ${sentencepiece_model}.model \
    --max-source-positions 9999 --max-target-positions 999 \
    --log-interval $((8000/ngpus)) --log-format simple \
    --arch **speech_transformer_librispeech** --criterion cross_entropy_v2 \
    --print-training-sample-interval $((4000/ngpus)) \
    $opts --specaugment-config "$specaug_config" 2>&1 | tee $log_file

Run cmd './run.sh'
Got error

Traceback (most recent call last):
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/speech_train.py", line 341, in distributed_main
    main(args, init_distributed=True)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/speech_train.py", line 72, in main
    model = task.build_model(args)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/tasks/speech_recognition.py", line 339, in build_model
    model = super().build_model(args)
  File "/nfs/mercury-13/u20/cli/src/espresso/fairseq/tasks/fairseq_task.py", line 211, in build_model
    model = models.build_model(args, self)
  File "/nfs/mercury-13/u20/cli/src/espresso/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/models/speech_transformer.py", line 132, in build_model
    return cls(encoder, decoder)
TypeError: __init__() missing 1 required positional argument: 'decoder'

This is due to the constructor not insert with args, and can be fixed by add back the args param in speech_transformer.py as below

        # return cls(encoder, decoder)
        return cls(args, encoder, decoder)

But after that, it got more complicated error following as below

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/speech_train.py", line 341, in distributed_main
    main(args, init_distributed=True)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/speech_train.py", line 121, in main
    valid_losses, should_stop = train(args, trainer, task, epoch_itr)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/speech_train.py", line 210, in train
    log_output = trainer.train_step(samples)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/nfs/mercury-13/u20/cli/src/espresso/fairseq/trainer.py", line 408, in train_step
    ignore_grad=is_dummy_batch,
  File "/nfs/mercury-13/u20/cli/src/espresso/fairseq/tasks/fairseq_task.py", line 342, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/mercury-13/u20/cli/src/espresso/espresso/criterions/cross_entropy_v2.py", line 49, in forward
    net_output = model(**sample["net_input"], epoch=self.epoch)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/mercury-13/u20/cli/src/espresso/fairseq/legacy_distributed_data_parallel.py", line 86, in forward
    return self.module(*inputs, **kwargs)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'epoch'

issues about speech_fconv.py

What is your question?

I read your code about applying fairseq to ASR. Regarding the decoder part, I noticed that position embedding is not added to the default parameters. But since I don't have the librispeech dataset, I only used the Chinese dataset I have to do an experiment. I found that when the position embedding is not added, the loss can be reduced to about 0.6, but the decoding will appear: due to the lack of position information, the decoded sentence is too short or will be decoded to an empty state. But when I set decoder_positional_embed to True, I found that the loss will start to oscillate around 3. I want to ask if this phenomenon occurs because I haven't trained enough epochs. (According to experience, the loss generally needs to be reduced to below 1 and the decoded result can be partially correct)

Code

Besides, I saw the fairseq paper, the implementation of fposition embedding is different from the traditional sine and cosine formula. I want to ask if I can adjust the weight when adding x and pos_emb? Thanks a lot!

SIGSEGV while running train.py on a multi GPU setup

I have setup a ubuntu 18.04 4 CPU and 4 GPU environment to execute the librispeech dataset training.

The prepare step went through fine.

But when I launch the training using:
python train.py ./librispeech-workdir/preprocessed-data/ --save-dir ./librispeech-workdir/train-output/ --max-epoch 80 --task speech_recognition_e --arch vggtransformer_2 --optimizer adadelta --lr 1.0 --adadelta-eps 1e-8 --adadelta-rho 0.95 --clip-norm 10.0 --max-tokens 5000 --log-format json --log-interval 1 --criterion cross_entropy_acc --user-dir examples/speech_recognition/

I get the following error right at the outset:
)
| model vggtransformer_2, criterion CrossEntropyWithAccCriterion
| num. model params: 315190057 (num. trained: 315190057)
| training on 4 GPUs
| max tokens per GPU = 5000 and max sentences per GPU = None
| no existing checkpoint found ./librispeech-workdir/train-output/checkpoint_last.pt
| loading train data for epoch 0
Traceback (most recent call last):
File "train.py", line 343, in
cli_main()
File "train.py", line 335, in cli_main
nprocs=args.distributed_world_size,
File "/home/chandraka/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/chandraka/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 0 terminated with signal SIGSEGV

Unable to proceed ahead in teh absence of any clues a to what might be causing it etc

Please help

It starts out with

| distributed init (rank 3): tcp://localhost:15160
| distributed init (rank 0): tcp://localhost:15160
| distributed init (rank 2): tcp://localhost:15160
| distributed init (rank 1): tcp://localhost:15160
| initialized host espresso-2 as rank 2
| initialized host espresso-2 as rank 1
| initialized host espresso-2 as rank 3
| initialized host espresso-2 as rank 0
Namespace(adadelta_eps=1e-08, adadelta_rho=0.95, anneal_eps=False, arch='vggtransformer_2', best_checkpoint_metric='loss', bpe=None, bucket_cap_mb=25, clip_norm=
10.0, conv_dec_config='((256, 3, True),) * 4', cpu=False, criterion='cross_entropy_acc', curriculum=0, data='./librispeech-workdir/preprocessed-data/', dataset_i
mpl=None, ddp_backend='c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method='tcp://localhost:15160', distributed_no_
spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=4, empty_cache_freq=0, enc_output_dim=1024, fast_stat_sync=False, find_unused_parame
ters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_windo
w=None, input_feat_per_channel=80, keep_interval_updates=-1, keep_last_epochs=-1, log_format='json', log_interval=1, lr=[1.0], lr_scheduler='fixed', lr_shrink=0.
1, max_epoch=80, max_sentences=None, max_sentences_valid=None, max_tokens=5000, max_tokens_valid=5000, max_update=0, maximize_best_checkpoint_metric=False, memor
y_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_op
timizer_state=False, num_workers=1, optimizer='adadelta', optimizer_overrides='{}', required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=Fa
lse, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='./librispeech-workdir/train-output/', save_interval=1, save_interval
_updates=0, seed=1, sentence_avg=False, silence_token='▁', skip_invalid_size_inputs_valid_test=False, task='speech_recognition_e', tbmf_wrapper=False, tensorboar
d_logdir='', tgt_embed_dim=512, threshold_loss_scale=None, tokenizer=None, train_subset='train', transformer_dec_config='((1024, 16, 4096, True, 0.15, 0.15, 0.15
),) * 6', transformer_enc_config='((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 16', update_freq=[1], use_bmuf=False, user_dir='examples/speech_recognition/', va
lid_subset='valid', validate_interval=1, vggblock_enc_config='[(64, 3, 2, 2, True), (128, 3, 2, 2, True)]', warmup_updates=0, weight_decay=0.0)
| dictionary: 5001 types

(I have had to rename the speech_recognition task to speech_recognition_e as there is a similarly named task in fairseq directory as well)

tensorized_lookahead_language_model SyntaxError

Hi~ I was running the asr_wsj and got SyntaxError: invalid syntax.

this is the info.

File "/share/nas165/QAQ/espresso/fairseq/models/tensorized_lookahead_language_model.py", line 61
    self.lm_decoder: FairseqIncrementalDecoder = word_lm.decoder
                   ^
SyntaxError: invalid syntax

Anyone could help me?
tyvm

Which is the difference between stage 5 and stage 7

Which is the key difference between stage 5 where subword lm is trained and the stage 7 where the model is trained in the dataset asr_librispeech

Error found when running librispeech recipe with latest version of espresso

🐛 Bug

There are two issues after install the latest version of espresso:

The specaug parameter parsing errro occur once we enable the specaug function

2020-11-11 12:04:42 | INFO | espresso.speech_train | --max-tokens is the maximum number of input frames in a batch
Traceback (most recent call last):
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 415, in <module>
    cli_main()
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 404, in cli_main
    cfg = convert_namespace_to_omegaconf(args)
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/fairseq/dataclass/utils.py", line 324, in convert_namespace_to_omegaconf
    composed_cfg = compose("config", overrides=overrides, strict=False)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/experimental/compose.py", line 31, in compose
    cfg = gh.hydra.compose_config(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 180, in _load_configuration
    parsed_overrides = parser.parse_overrides(overrides=overrides)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/core/override_parser/overrides_parser.py", line 95, in parse_overrides
    raise OverrideParseException(
hydra.errors.OverrideParseException: mismatched input 'W' expecting <EOF>
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

It crash in model training step (step 8) without any error

2020-11-11 12:38:55 | INFO | espresso.speech_train | task: SpeechRecognitionEspressoTask
2020-11-11 12:38:55 | INFO | espresso.speech_train | model: SpeechLSTMModel
2020-11-11 12:38:55 | INFO | espresso.speech_train | criterion: LabelSmoothedCrossEntropyV2Criterion)
2020-11-11 12:38:55 | INFO | espresso.speech_train | num. model params: 159660204 (num. trained: 159660204)
2020-11-11 12:38:55 | INFO | fairseq.trainer | detected shared parameter: decoder.attention.query_proj.bias <- decoder.attention.value_proj.bias
2020-11-11 12:38:55 | INFO | espresso.speech_train | training on 1 devices (GPUs/TPUs)
2020-11-11 12:38:55 | INFO | espresso.speech_train | max tokens per GPU = 26000 and batch size per GPU = 24
2020-11-11 12:38:55 | INFO | fairseq.trainer | no existing checkpoint found exp/lstm_wsj.specaug.bpe1k/checkpoint_last.pt
2020-11-11 12:38:55 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-11 12:39:05 | INFO | espresso.tasks.speech_recognition | /nfs/mercury-13/u20/cli/src/espresso.latest/espresso/examples/asr_librispeech/data-bulgarian-bpe1k/train.json 33004 examples
./run.sh: line 259:  4839 Segmentation fault      CUDA_VISIBLE_DEVICES=$free_gpu speech_train.py $data_dir --task speech_recognition_espresso --seed 1 --log-interval $((8000/ngpus/update_freq)) --log-format simple --print-training-sample-interval $((4000/ngpus/update_freq)) --num-workers 0 --data-buffer-size 0 --max-tokens 26000 --batch-size 24 --curriculum 1 --empty-cache-freq 50 --valid-subset $valid_subset --batch-size-valid 48 --ddp-backend no_c10d --update-freq $update_freq --distributed-world-size $ngpus --optimizer adam --lr 0.001 --weight-decay 0.0 --clip-norm 2.0 --save-dir $dir --restore-file checkpoint_last.pt --save-interval-updates $((6000/ngpus/update_freq)) --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 --best-checkpoint-metric wer --criterion label_smoothed_cross_entropy_v2 --label-smoothing 0.1 --smoothing-type uniform --dict $dict --bpe sentencepiece --sentencepiece-model ${sentencepiece_model}.model --max-source-positions 9999 --max-target-positions 999 $opts --specaugment-config "$specaug_config" 2>&1

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run cmd: ./run.sh
See error: listed above

Expected behavior

Able to train model with the recipe

Environment

fairseq Version (e.g., 1.0 or master): 1.0.0a0+d966482
PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): CentOS Linux release 7.7.1908 (Core)
How you installed fairseq (pip, source): pip install from source
Build command you used (if compiling from source): pip install --editable .
Python version: 3.8.5
CUDA/cuDNN version: py3.8_cuda10.0.130_cudnn7.6.3_0
GPU models and configuration:
Any other relevant information:

Additional context

SWBD ASR Expected Results- WER

Hi, can the expected Switchboard Test Set WER's using the code be confirmed ?

I ran the code as per instructions with no errors and with default hyperparams and was able to obtain WER's 9.5 % on SWBD Test Set, 14.5 % on Eval2000, 19.5 % on Callhome using the provided recipe, and I couldn't match the performance reported in the paper (https://arxiv.org/pdf/1909.08723.pdf)

Would it be possible to share intermediate results such as perplexity on Subword LM testing, maybe loss and WER plots?

Here are some of mine for verification:
ASR Decoding Results:

Callhm

WER=19.5%, Sub=12.6%, Ins=3.5%, Del=3.4%

Eval2000

WER=14.5%, Sub=9.2%, Ins=2.5%, Del=2.7%

LM Results:
Eval2000:

Evaluated 60377 tokens in 2.4s (24979.13 tokens/s)      
                                                                                                                                                                Loss: 3.6210, Perplexity: 37.37

on RT03 -

Evaluated 109920 tokens in 3.9s (28378.85 tokens/s)                                                                                                                                                                    

Loss: 3.7787, Perplexity: 43.76

Thanks

Wanna know which recipe involve multi-level LM model train and decoding. Also, can we use word + subword as multi-level decoding ?

What is your question?

As what stated in subject:

Wanna know which recipe involve multi-level LM model train and decoding.
Can we use word + subword as multi-level decoding ? and how ?

What have you tried?

Have read the librispeech and wsj recipe, but unable to see some clear idea on how to enable the multi-level (word + sub-word) in LSTM (ASR) model decoding.

What's your environment?

fairseq Version (e.g., 1.0 or master):
PyTorch Version : 1.4.0
OS (e.g., Linux): Centos7
How you installed fairseq (pip, source): pip
Python version: 3.7
CUDA/cuDNN version: 10.0

SpecAugment is not used due to a typo in the prefetch_called assignment

In line 147 of feat_text_dataset.py must be
self.prefetch_called = True
not
self.prefetched_called = True

espresso/espresso/data/feat_text_dataset.py

Line 147 in f59f395

self.prefetched_called = True

TIMIT Demo example

🚀 Feature Request

Would it be possible to upload an example for TIMIT for demonstration purpose? All other Speech Recognition datasets are kinda too large to download when just trying out this repo. Having TIMIT would make allow people new to ASR to quickly try out and appreciate the convinience of this framework. Thanks.

Motivation

Pitch

Alternatives

Additional context

SpecAug slows down training time

Hey there,

I am training a Librispeech transformer on 4 P100 GPUs which works fine so far with ~1.1h/epoch. As I was now experimenting with specAug, I noticed that the training time doubles with ~2.38h/epoch.

Is this expected behaviour?

I suspected that SpecAug might be part of dataloading so I tried to increase num-workers during training to something > 0, but that gave me errors which seems to be caused by an insufficient shared memory size (which I unfortunately cannot change due to missing root privileges).

So is there any other way to speed up SpecAug training?

Thanks, Timo

WSJ Recipe: "wsj_data_prep.sh: Spot check of command line arguments failed"

What is your question?

I am trying to run wsj recipe using ./run.sh but I get the following error:

Stage 0: Data Preparation
ln: failed to create symbolic link 'links/??-?.?': File exists
ln: failed to create symbolic link 'links/??-??.?': File exists
wsj_data_prep.sh: Spot check of command line arguments failed
Command line arguments must be absolute pathnames to WSJ directories
with names like 11-13.1.
Note: if you have old-style WSJ distribution,
local/cstr_wsj_data_prep.sh may work instead, see run.sh for example.

Code

./run.sh

What have you tried?

I don't see cstr_wsj_data_prep.sh in local directory of wsj directory.

!ls local/cstr_wsj_data_prep.sh

ls: cannot access 'local/cstr_wsj_data_prep.sh': No such file or directory

What's your environment?

fairseq Version (e.g., 1.0 or master): 1.0
PyTorch Version (e.g., 1.0): 1.7.0+cu101
OS (e.g., Linux): Google Colab (Linux)
How you installed fairseq (pip, source): pip install --editable . in espresso source code
Build command you used (if compiling from source): To install Espresso commands in readme
Python version: 3.6
CUDA/cuDNN version: Not using GPU now
GPU models and configuration: Not using GPU now, my problem is in downloading dataset stage
Any other relevant information: I am using source code from master branch

Different WERs when decoding with different batch size (--max-sentences)

Hi,
I would ask the reason why I get different results when I decode with different batch sizes. (--max-sentences).

For example, with the same language model and same attention-based encoder-decoder model:

by setting the batch size (--max-sentences) as 32, in WSJ, the WER is 3.46% in eval92 and 5.71% in dev93.
by setting the batch size (--max-sentences) as 1, in WSJ, the WER is 3.42% in eval92 and 5.67% in dev93.

The difference is not large, I guess this issue is caused by the beam search in batches. But I'm not sure clearly where the difference comes from. If you know, I'm glad to have your answer.

Google Colab research creation

It is possible the creation of espresso in Google Colab research to help everyone ?

Build PyChain with CPU

Hi
Is there any method to compile and use pychain on CPU?

Getting OOM error at the middle of the training in asr_swbd recipe on lstm encoder decoder model

error message

2021-03-19 12:09:30 | WARNING | fairseq.trainer | attempting to recover from OOM in forward/backward pass
2021-03-19 12:09:30 | WARNING | fairseq.trainer | OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 600.00 MiB (GPU 0; 10.92 GiB total capacity; 8.55 GiB already allocated; 385.56 MiB free; 9.09 GiB reserved in total by PyTorch)
2021-03-19 12:09:30 | WARNING | fairseq.trainer | |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 51           |        cudaMalloc retries: 66        |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    6951 MB |    9130 MB |   16478 GB |   16471 GB |
|       from large pool |    6938 MB |    9118 MB |   16447 GB |   16441 GB |
|       from small pool |      12 MB |      16 MB |      30 GB |      30 GB |
|---------------------------------------------------------------------------|
| Active memory         |    6951 MB |    9130 MB |   16478 GB |   16471 GB |
|       from large pool |    6938 MB |    9118 MB |   16447 GB |   16441 GB |
|       from small pool |      12 MB |      16 MB |      30 GB |      30 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    9308 MB |    9526 MB |  396082 MB |  386774 MB |
|       from large pool |    9294 MB |    9508 MB |  395864 MB |  386570 MB |
|       from small pool |      14 MB |      18 MB |     218 MB |     204 MB |
|---------------------------------------------------------------------------|
| Non-releasable memory |  569634 KB |     770 MB |    1982 GB |    1982 GB |
|       from large pool |  568358 KB |     766 MB |    1946 GB |    1946 GB |
|       from small pool |    1276 KB |      12 MB |      36 GB |      36 GB |
|---------------------------------------------------------------------------|
| Allocations           |     441    |     688    |    1162 K  |    1162 K  |
|       from large pool |     140    |     150    |     143 K  |     143 K  |
|       from small pool |     301    |     548    |    1019 K  |    1019 K  |
|---------------------------------------------------------------------------|
| Active allocs         |     441    |     688    |    1162 K  |    1162 K  |
|       from large pool |     140    |     150    |     143 K  |     143 K  |
|       from small pool |     301    |     548    |    1019 K  |    1019 K  |
|---------------------------------------------------------------------------|
| GPU reserved segments |      99    |     102    |    1279    |    1180    |
|       from large pool |      92    |      93    |    1170    |    1078    |
|       from small pool |       7    |       9    |     109    |     102    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |     103    |     120    |  547671    |  547568    |
|       from large pool |      78    |      80    |   60015    |   59937    |
|       from small pool |      25    |      44    |  487656    |  487631    |
|===========================================================================|

nvidia Driver Version: 460.32.03

environment

blas                      1.0                         mkl  
mkl                       2020.1                      217  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.1.0            py37h23d657b_0  
mkl_random                1.1.1            py37h0573a6f_0  
torch                     1.7.1+cu101              pypi_0    pypi
torchaudio                0.7.2                    pypi_0    pypi
torchvision               0.8.2+cu101              pypi_0    pypi

I reduced the batch_size to 1, --empty-cache-freq to 1 ...still, OOM happens at the middle of training

GPU utilization is very low

Dear All,

I'm running the asr_librispeech recipe in an ubuntu 16.04 virtual machine with 4 V100s.

My problem is that the CPUs are all busy but the GPU utilization is always very low. So the training speed is slow (both in the LSTM and transformer cases). How to solve this issue?

Thanks a lot!


top - 19:31:44 up 3 days, 16:48,  1 user,  load average: 68.33, 76.75, 61.09
Tasks:  14 total,   5 running,   9 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.0 us, 25.6 sy,  0.0 ni, 65.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 52826323+total, 21278182+free, 35688764 used, 27979264+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 48815801+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                          
34316 liao      20   0 32.798g 7.950g 541428 R 791.4  1.6 128:56.46 python3                                                                                                                                                          
34315 liao      20   0 32.940g 7.943g 532860 R 701.7  1.6 166:19.57 python3                                                                                                                                                          
34317 liao      20   0 32.643g 7.951g 542212 R 636.9  1.6 141:01.29 python3                                                                                                                                                          
34318 liao      20   0 32.706g 7.952g 542148 R 613.0  1.6 139:37.17 python3


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   42C    P0    85W / 300W |   3902MiB / 16160MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   44C    P0    78W / 300W |   3890MiB / 16160MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   49C    P0   162W / 300W |   3902MiB / 16160MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   45C    P0   169W / 300W |   3900MiB / 16160MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

SWBD Recipe Error

Hi, I am trying to run the SWBD recipe on my local machine. I am getting errors at Stage 2 of the run script, building the dictionary and text tokenization. The error seems to be coming from the "tokenizing text for train/valid/test sets..." stage running spm_encode.py.

Code

This is the full shell output:

sentencepiece_trainer.cc(116) LOG(INFO) Running command: --bos_id=-1 --pad_id=0 --eos_id=1 --unk_id=2 --input=data/lang/input --vocab_size=1003 --character_coverage=1.0 --model_type=unigram --model_prefix=data/lang/train_nodup_unigram1000 --input_sentence_size=10000000 --user_defined_symbols=[laughter],[noise],[vocalized-noise]
sentencepiece_trainer.cc(49) LOG(INFO) Starts training with :
TrainerSpec {
  input: data/lang/input
  input_format:
  model_prefix: data/lang/train_nodup_unigram1000
  model_type: UNIGRAM
  vocab_size: 1003
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 10000000
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  treat_whitespace_as_suffix: 0
  user_defined_symbols: [laughter]
  user_defined_symbols: [noise]
  user_defined_symbols: [vocalized-noise]
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 2
  bos_id: -1
  eos_id: 1
  pad_id: 0
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇
}
NormalizerSpec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv:
}

trainer_interface.cc(267) LOG(INFO) Loading corpus: data/lang/input
trainer_interface.cc(139) LOG(INFO) Loaded 1000000 lines
trainer_interface.cc(139) LOG(INFO) Loaded 2000000 lines
trainer_interface.cc(114) LOG(WARNING) Too many sentences are loaded! (2416025), which may slow down training.
trainer_interface.cc(116) LOG(WARNING) Consider using --input_sentence_size=<size> and --shuffle_input_sentence=true.
trainer_interface.cc(119) LOG(WARNING) They allow to randomly sample <size> sentences from the entire corpus.
trainer_interface.cc(315) LOG(INFO) Loaded all 2416025 sentences
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: <pad>
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: </s>
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: <unk>
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: [laughter]
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: [noise]
trainer_interface.cc(330) LOG(INFO) Adding meta_piece: [vocalized-noise]
trainer_interface.cc(335) LOG(INFO) Normalizing sentences...
trainer_interface.cc(384) LOG(INFO) all chars count=120465092
trainer_interface.cc(392) LOG(INFO) Done: 100% characters are covered.
trainer_interface.cc(402) LOG(INFO) Alphabet size=43
trainer_interface.cc(403) LOG(INFO) Final character coverage=1
trainer_interface.cc(435) LOG(INFO) Done! preprocessed 2416025 sentences.
unigram_model_trainer.cc(129) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(133) LOG(INFO) Extracting frequent sub strings...
unigram_model_trainer.cc(184) LOG(INFO) Initialized 166028 seed sentencepieces
trainer_interface.cc(441) LOG(INFO) Tokenizing input sentences with whitespace: 2416025
trainer_interface.cc(451) LOG(INFO) Done! 69957
unigram_model_trainer.cc(470) LOG(INFO) Using 69957 sentences for EM training
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=59852 obj=9.23769 num_tokens=130093 num_tokens/piece=2.17358
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=44412 obj=7.29956 num_tokens=132354 num_tokens/piece=2.98014
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=33308 obj=7.24442 num_tokens=141637 num_tokens/piece=4.25234
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=33303 obj=7.23651 num_tokens=141660 num_tokens/piece=4.25367
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=24977 obj=7.21871 num_tokens=158375 num_tokens/piece=6.34083
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=24977 obj=7.21644 num_tokens=158399 num_tokens/piece=6.34179
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=18732 obj=7.21162 num_tokens=175442 num_tokens/piece=9.3659
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=18732 obj=7.20821 num_tokens=175404 num_tokens/piece=9.36387
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=14049 obj=7.21798 num_tokens=192101 num_tokens/piece=13.6736
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=14049 obj=7.21295 num_tokens=192059 num_tokens/piece=13.6707
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=10536 obj=7.23918 num_tokens=207654 num_tokens/piece=19.709
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=10536 obj=7.23244 num_tokens=207609 num_tokens/piece=19.7047
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=7902 obj=7.27241 num_tokens=221580 num_tokens/piece=28.041
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=7902 obj=7.26387 num_tokens=221484 num_tokens/piece=28.0289
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=5926 obj=7.32839 num_tokens=234743 num_tokens/piece=39.6124
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=5926 obj=7.31716 num_tokens=234693 num_tokens/piece=39.6039
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=4444 obj=7.40817 num_tokens=248571 num_tokens/piece=55.9341
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=4444 obj=7.39317 num_tokens=248418 num_tokens/piece=55.8996
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=3333 obj=7.50897 num_tokens=262750 num_tokens/piece=78.8329
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=3333 obj=7.49001 num_tokens=262534 num_tokens/piece=78.7681
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=2499 obj=7.64161 num_tokens=276859 num_tokens/piece=110.788
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=2499 obj=7.61733 num_tokens=276640 num_tokens/piece=110.7
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=1874 obj=7.80273 num_tokens=292799 num_tokens/piece=156.243
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=1874 obj=7.77333 num_tokens=292543 num_tokens/piece=156.106
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=1405 obj=7.99379 num_tokens=309225 num_tokens/piece=220.089
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=1405 obj=7.95503 num_tokens=308821 num_tokens/piece=219.801
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=0 size=1103 obj=8.15973 num_tokens=321388 num_tokens/piece=291.376
unigram_model_trainer.cc(486) LOG(INFO) EM sub_iter=1 size=1103 obj=8.12422 num_tokens=321274 num_tokens/piece=291.273
trainer_interface.cc(507) LOG(INFO) Saving model: data/lang/train_nodup_unigram1000.model
trainer_interface.cc(531) LOG(INFO) Saving vocabs: data/lang/train_nodup_unigram1000.vocab
Traceback (most recent call last):
  File "../../scripts/spm_encode.py", line 99, in <module>
    main()
  File "../../scripts/spm_encode.py", line 90, in main
    print(" ".join(enc_line), file=output_h)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2581' in position 0: ordinal not in range(128)

What have you tried?

My setup should be ok as I have been running the WSJ recipe without issue but I notice that a different script is used here for the tokenizing. Any help or advice would be great!

LM shallow fusion for the Japanese Language.

What is your question?

I 'm looking for a fast speech recognition toolkit available in Japanese.
I tried to build a CSJ(Corpus of Spontaneous Japanese) recipe of Espresso referring to a librispeech recipe.
The CSJ is a Japanese corpus also used in Kaldi and Espnet.
https://github.com/kaldi-asr/kaldi/tree/master/egs/csj/

However, LM shallow fusion seems to be not effective in my recipe and I can't obtain sufficient results.

What have you tried?

I obtained the model of character error rates shown below.

	eval1	eval2	eval3
espresso(my recipe)	11.89%	8.30%	8.94%
kaldi(https://www.merl.com/publications/docs/TR2018-036.pdf)	9.0%	7.2%	9.6%

I tried the language model off. (lm_shallow_fusion=false)
Unexpectedly, the character error rates improved.

	eval1	eval2	eval3
espresso(lm_shallow_fusion=false)	11.53%	7.82%	8.56%

I don't know why but the language model I build seems to be not effective for speech recognition.
Training specifications and logs are shown below.

LM training (This is same as the librispeech recipe)

if [ ${stage} -le 5 ]; then
  echo "Stage 5: subword LM Training"
  valid_subset=valid
  mkdir -p $lmdir/log
  log_file=$lmdir/log/train.log
  [ -f $lmdir/checkpoint_last.pt ] && log_file="-a $log_file"
  CUDA_VISIBLE_DEVICES=$free_gpu python3 ../../fairseq_cli/train.py $lmdatadir --seed 1 --user-dir espresso \
    --task language_modeling_for_asr --dict $lmdict \
    --log-interval $((16000/ngpus)) --log-format simple \
    --num-workers 0 --max-tokens 32000 --max-sentences 1024 --curriculum 1 \
    --valid-subset $valid_subset --max-sentences-valid 1536 \
    --distributed-world-size $ngpus --distributed-port $(if [ $ngpus -gt 1 ]; then echo 100; else echo -1; fi) \
    --max-epoch 30 --optimizer adam --lr 0.001 --clip-norm 1.0 \
    --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \
    --save-dir $lmdir --restore-file checkpoint_last.pt --save-interval-updates $((16000/ngpus)) \
    --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 \
    --arch lstm_lm_librispeech --criterion cross_entropy --sample-break-mode eos 2>&1 | tee $log_file
fi

LM training log

2020-04-05 07:25:11 | INFO | fairseq.data.data_utils | loaded 1209204 examples from: data/lm_text/train
2020-04-05 07:25:12 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
2020-04-05 07:32:04 | INFO | train | epoch 001 | loss 8.388 | ppl 334.94 | wps 52376.4 | ups 3.06 | wpb 17090.1 |
bsz 969.7 | num_updates 1247 | lr 0.001 | gnorm 0.582 | clip 14.2 | train_wall 328 | wall 413
2020-04-05 07:32:04 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 8.256 | ppl 305.75 | wps 107397 |
wpb 18047.8 | bsz 800 | num_updates 1247
2020-04-05 07:32:05 | INFO | fairseq.checkpoint_utils | saved checkpoint exp/lm_lstm/checkpoint1.pt (epoch 1 @ 124
7 updates, score 8.256) (writing took 1.0231214840023313 seconds)
2020-04-05 07:38:55 | INFO | train | epoch 002 | loss 7.194 | ppl 146.42 | wps 51840.6 | ups 3.03 | wpb 17090.1 |
bsz 969.7 | num_updates 2494 | lr 0.001 | gnorm 0.624 | clip 12.3 | train_wall 324 | wall 824
2020-04-05 07:38:56 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 6.767 | ppl 108.94 | wps 106708 |
wpb 18047.8 | bsz 800 | num_updates 2494 | best_loss 6.767
...

2020-04-05 10:44:54 | INFO | train | epoch 029 | loss 4.56 | ppl 23.59 | wps 51650.9 | ups 3.02 | wpb 17090.1 | bs
z 969.7 | num_updates 36163 | lr 1.52588e-08 | gnorm 0.482 | clip 6.4 | train_wall 325 | wall 11983
2020-04-05 10:44:55 | INFO | valid | epoch 029 | valid on 'valid' subset | loss 6.794 | ppl 110.97 | wps 106502 |
wpb 18047.8 | bsz 800 | num_updates 36163 | best_loss 6.387
2020-04-05 10:44:58 | INFO | fairseq.checkpoint_utils | saved checkpoint exp/lm_lstm/checkpoint29.pt (epoch 29 @ 3
6163 updates, score 6.794) (writing took 2.9264725770044606 seconds)
2020-04-05 10:51:46 | INFO | train | epoch 030 | loss 4.56 | ppl 23.59 | wps 51649.6 | ups 3.02 | wpb 17090.1 | bs
z 969.7 | num_updates 37410 | lr 1.52588e-08 | gnorm 0.482 | clip 6.4 | train_wall 325 | wall 12396
2020-04-05 10:51:47 | INFO | valid | epoch 030 | valid on 'valid' subset | loss 6.794 | ppl 110.97 | wps 106266 |
wpb 18047.8 | bsz 800 | num_updates 37410 | best_loss 6.387

Model Training (This is same as the librispeech recipe)

if [ ${stage} -le 8 ]; then
  echo "Stage 8: Model Training"
  valid_subset=valid
  mkdir -p $dir/log
  log_file=$dir/log/train.log
  [ -f $dir/checkpoint_last.pt ] && log_file="-a $log_file"
  opts=""
  if $apply_specaug; then
    opts="$opts --max-epoch 95 --lr-scheduler tri_stage --warmup-steps $((2000/ngpus)) --hold-steps $((600000/ngpus)) --decay-steps $((1040000/ngpus))"
    opts="$opts --encoder-rnn-layers 5"
    specaug_config="{'W': 80, 'F': 27, 'T': 100, 'num_freq_masks': 2, 'num_time_masks': 2, 'p': 1.0}"
  else
    opts="$opts --max-epoch 30 --lr-scheduler reduce_lr_on_plateau_v2 --lr-shrink 0.5 --start-reduce-lr-epoch 10"
  fi
  CUDA_VISIBLE_DEVICES=$free_gpu speech_train.py data --task speech_recognition_espresso --seed 1 --user-dir espresso \
    --log-interval $((8000/ngpus)) --log-format simple --print-training-sample-interval $((4000/ngpus)) \
    --num-workers 0 --max-tokens 26000 --max-sentences 24 --curriculum 1 \
    --valid-subset $valid_subset --max-sentences-valid 48 --ddp-backend no_c10d \
    --distributed-world-size $ngpus --distributed-port $(if [ $ngpus -gt 1 ]; then echo 100; else echo -1; fi) \
    --optimizer adam --lr 0.001 --weight-decay 0.0 --clip-norm 2.0 \
    --save-dir $dir --restore-file checkpoint_last.pt --save-interval-updates $((6000/ngpus)) \
    --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 --best-checkpoint-metric wer \
    --arch speech_conv_lstm_librispeech --criterion label_smoothed_cross_entropy_v2 \
    --label-smoothing 0.1 --smoothing-type uniform \
    --scheduled-sampling-probs 1.0 --start-scheduled-sampling-epoch 1 \
    --dict $dict --bpe sentencepiece --sentencepiece-vocab ${sentencepiece_model}.model \
    --max-source-positions 9999 --max-target-positions 999 \
    $opts --specaugment-config "$specaug_config" 2>&1 | tee $log_file
fi

if [ ${stage} -le 9 ]; then
  echo "Stage 9: Decoding"
  opts=""
  path=$dir/$checkpoint
  decode_affix=
  if $lm_shallow_fusion; then
    path="$path:$lmdir/$lm_checkpoint"
    opts="$opts --lm-weight 0.47 --eos-factor 1.5"
    if $apply_specaug; then
      # overwrite the existing opts
      opts="$opts --lm-weight 0.4"
    fi
    decode_affix=shallow_fusion
  fi
  for dataset in $test_set; do
    decode_dir=$dir/decode_$dataset${decode_affix:+_${decode_affix}}
    CUDA_VISIBLE_DEVICES=$(echo $free_gpu | sed 's/,/ /g' | awk '{print $1}') speech_recognize.py data \
      --task speech_recognition_espresso --user-dir espresso --max-tokens 15000 --max-sentences 24 \
      --num-shards 1 --shard-id 0 --dict $dict --bpe sentencepiece --sentencepiece-vocab ${sentencepiece_model}.model \
      --gen-subset $dataset --max-source-positions 9999 --max-target-positions 999 \
      --path $path --beam 60 --max-len-a 0.08 --max-len-b 0 --lenpen 1.0 \
      --results-path $decode_dir $opts

    echo "log saved in ${decode_dir}/decode.log"
    if $kaldi_scoring; then
      echo "verify WER by scoring with Kaldi..."
      local/score_e2e.sh data/$dataset $decode_dir
      cat ${decode_dir}/scoring_kaldi/wer
    fi
  done
fi

Model Training log

2020-04-05 15:29:52 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample REF: え
2020-04-05 15:29:52 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample PRD: 出る燿
2020-04-05 15:50:20 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample REF: 喫茶店行ったり
2020-04-05 15:50:20 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample PRD: 三さに
2020-04-05 16:03:00 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 12.629 | nll_loss 12.256 | wer 96.
275 | cer 94.7993 | ppl 4892.8 | wps 1669.6 | wpb 699.5 | bsz 31 | num_updates 6000
2020-04-05 16:03:18 | INFO | fairseq.checkpoint_utils | saved checkpoint exp/lstm/checkpoint_1_6000.pt (epoch 1 @
6000 updates, score 94.7993) (writing took 18.49637553100183 seconds)
2020-04-05 16:15:50 | INFO | train_inner | epoch 001:   8000 / 52726 loss=4.376, nll_loss=3.062, ppl=8.35, wps=293
.8, ups=2.9, wpb=101.3, bsz=24, num_updates=8000, lr=0.001, gnorm=1.54, clip=7.7, train_wall=654, wall=3035
2020-04-05 16:15:50 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample REF: ああすこで犬が
2020-04-05 16:15:50 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample PRD: ああすこののが
2020-04-05 16:46:41 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.753 | nll_loss 9.16 | wer 95.05
| cer 89.5882 | ppl 571.93 | wps 1507.5 | wpb 699.5 | bsz 31 | num_updates 12000 | best_cer 89.5882

....

2020-04-24 08:41:54 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample REF: またえー予稿集には間に合わなかったのですがえー阻害音と共鳴音の違いを表わしている
2020-04-24 08:41:54 | INFO | espresso.criterions.label_smoothed_cross_entropy_v2 | sample PRD: またえー予稿集には間に合わなかったのですがえー阻害音と共鳴音の違いを表わしている
2020-04-24 09:11:43 | INFO | train | epoch 030 | loss 1.776 | nll_loss 0.192 | ppl 1.14 | wps 381.1 | ups 0.94 | wpb 404.2 | bsz 22.9 | num_updates 1.58178e+06 | lr 1e-05 | gnorm 965926 | clip 0.1 | train_wall 19236 | wall 1.61919e+06
2020-04-24 09:13:05 | INFO | valid | epoch 030 | valid on 'valid' subset | loss 2.08 | nll_loss 0.544 | wer 53.725 | cer 8.7176 | ppl 1.46 | wps 1094.4 | wpb 699.5 | bsz 31 | num_updates 1.58178e+06 | best_cer 8.6631

Could you give me any hint?
Are there things to be careful of applying LM shallow fusion to a language excluding English?

What's your environment?

fairseq Version (e.g., 1.0 or master): 0.9.0
PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): Ubuntu 18.04.4 LTS
How you installed fairseq (pip, source): pip
Python version: 3.7
CUDA/cuDNN version: 10.0.130 / libcudnn.so.7.5.1
GPU models and configuration: Tesla K80

thanks.

Verify WER by scoring with Kaldi

Hi authors,
I'm using Librispeech run.sh recipe. I trained the acoustic model (speech_conv_lstm_librispeech) using 4 GPU 1080ti But I'm facing this error while doing kaldi scoring.
local/score.sh data/test_clean exp/lstm/decode_test_clean_shallow_fusion
run.pl: job failed, log is in exp/lstm/decode_test_clean_shallow_fusion/scoring_kaldi/log/score.log
My second question
Is there any documentation for using my pre-trained model to decode audio wav, i would like to compare the decoding speed between ESPNet and Esspresso https://arxiv.org/abs/1909.08723

ONNX exportation of speech_lstm based model

❓ Questions and Help

Not able onnx export speech_lstm based model.

What is your question?

Is the speech_lstm model expected to be onnx exportable? Currently get the error as shown below:

Code

        torch.onnx.export(model, dummy_input,  
                          f.name+".onnx", verbose=True, opset_version=12,
                          input_names=input_names, output_names=output_names)

Traceback (most recent call last):
File "export_77/../test_scripts/test_export_asr.py", line 67, in _test_save_and_onnx_model
input_names=input_names, output_names=output_names)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/init.py", line 230, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/utils.py", line 91, in export
use_external_data_format=use_external_data_format)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/utils.py", line 639, in _export
dynamic_axes=dynamic_axes)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/utils.py", line 421, in _model_to_graph
dynamic_axes=dynamic_axes, input_names=input_names)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/utils.py", line 203, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/init.py", line 263, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/utils.py", line 934, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 133, in wrapper
return fn(g, *args, **kwargs)
File "export_77/.env/export/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 441, in transpose
axes[dim0], axes[dim1] = axes[dim1], axes[dim0]
IndexError: list index out of range

What's your environment?

PyTorch Version (e.g., 1.0) 1.7.0
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): via espresso
Python version:3.6

ASR_WSJ: LM is training but no logging output?

🐛 Bug

I am running the asr_wsj recipe. It is training the word_lm (stage 6) since last night but does not produce any output, logging or otherwise.

When I run nvtop or Nvidia-smi the gpus seem to be busy with my jobs. I am running 4 GPUs in parallel. Early on there were some OOM problems that it tried to recover from. Is it possible it in some sort of weird infinite loop but is doing nothing?

Attached is the screen output - at the top you can see nvidia-smi is run along with the early OOM messages.

no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/condabin/conda
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/conda
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/conda-env
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/activate
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/deactivate
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/profile.d/conda.sh
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/fish/conf.d/conda.fish
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/shell/condabin/Conda.psm1
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/shell/condabin/conda-hook.ps1
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/lib/python3.7/site-packages/xontrib/conda.xsh
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/profile.d/conda.csh
no change /home/map22/.bashrc
No action taken.
Tue Dec 8 22:30:53 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36 Driver Version: 440.36 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:02:00.0 Off | N/A |
| 23% 18C P8 9W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:03:00.0 Off | N/A |
| 23% 21C P8 9W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... On | 00000000:82:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... On | 00000000:83:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Stage 3: Text Binarization for LM Training
./run.sh: binarizing word text...
Unable to get 4 GPUs
Stage 6: word LM Training
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:19801
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 3
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 2
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 0
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 1
2020-12-08 22:32:39 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 1000, 'log_format': 'simple', 'tensorboard_logdir': None, 'wandb_project': None, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': True}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 4, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'tcp://localhost:19801', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'broadcast_buffers': False, 'distributed_wrapper': 'DDP', 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 4, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False, 'distributed_num_procs': 4}, 'dataset': {'_name': None, 'num_workers': 0, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': 6400, 'batch_size': 256, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 6400, 'batch_size_valid': 512, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 25, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.001], 'min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': 'exp/wordlm_lstm', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 1000, 'keep_interval_updates': 5, 'keep_last_epochs': 5, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 4}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': False, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_factor': None, 'subwordlm_weight': 0.8, 'oov_penalty': 0.0001, 'disable_open_vocab': False, 'apply_log_softmax': False, 'state_prior_file': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': Namespace(_name='lstm_wordlm_wsj', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, add_bos_token=False, all_gather_list_size=16384, arch='lstm_wordlm_wsj', batch_size=256, batch_size_valid='512', best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='cross_entropy', curriculum=0, data='data/wordlm_text', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_dropout_in=0.35, decoder_dropout_out=0.35, decoder_embed_dim=1200, decoder_embed_path=None, decoder_freeze_embed=False, decoder_hidden_size=1200, decoder_layers=3, decoder_out_embed_dim=1200, decoder_rnn_residual=False, device_id=0, dict='data/lang/wordlist_65000.txt', disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=4, distributed_wrapper='DDP', dropout=0.35, empty_cache_freq=0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, future_target=False, gen_subset='test', is_wordlm=True, keep_best_checkpoints=-1, keep_interval_updates=5, keep_last_epochs=5, localsgd_frequency=3, log_format='simple', log_interval=1000, lr=[0.001], lr_patience=0, lr_scheduler='reduce_lr_on_plateau', lr_shrink=0.5, lr_threshold=0.0001, max_epoch=25, max_target_positions=None, max_tokens=6400, max_tokens_valid=6400, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=4, num_shards=1, num_workers=0, optimizer='adam', optimizer_overrides='{}', output_dictionary_size=-1, pad=1, past_target=False, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, profile=False, quantization_config_path=None, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=True, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='exp/wordlm_lstm', save_interval=1, save_interval_updates=1000, scoring='bleu', seed=1, self_target=False, sentence_avg=False, shard_id=0, share_embed=True, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, task='language_modeling_for_asr', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=1024, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_init_lr=-1, warmup_updates=0, weight_decay=0.0, zero_sharding='none'), 'task': {'_name': 'language_modeling_for_asr', 'data': 'data/wordlm_text', 'sample_break_mode': 'eos', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': False, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'seed': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'dict': 'data/lang/wordlist_65000.txt'}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': False}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': False, 'tpu': False, 'lr': [0.001]}, 'lr_scheduler': {'_name': 'reduce_lr_on_plateau', 'lr_shrink': 0.5, 'lr_threshold': 0.0001, 'lr_patience': 0, 'warmup_updates': 0, 'warmup_init_lr': -1.0, 'lr': [0.001], 'maximize_best_checkpoint_metric': False}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2020-12-08 22:32:39 | INFO | espresso.tasks.language_modeling_for_asr | dictionary: 65003 types
2020-12-08 22:32:39 | INFO | fairseq.data.data_utils | loaded 503 examples from: data/wordlm_text/valid
2020-12-08 22:32:42 | INFO | fairseq_cli.train | LSTMLanguageModelEspresso(
(decoder): SpeechLSTMDecoder(
(dropout_in_module): FairseqDropout()
(dropout_out_module): FairseqDropout()
(embed_tokens): Embedding(65003, 1200, padding_idx=0)
(layers): ModuleList(
(0): LSTMCell(1200, 1200)
(1): LSTMCell(1200, 1200)
(2): LSTMCell(1200, 1200)
)
)
)
2020-12-08 22:32:42 | INFO | fairseq_cli.train | task: LanguageModelingForASRTask
2020-12-08 22:32:42 | INFO | fairseq_cli.train | model: LSTMLanguageModelEspresso
2020-12-08 22:32:42 | INFO | fairseq_cli.train | criterion: CrossEntropyCriterion)
2020-12-08 22:32:42 | INFO | fairseq_cli.train | num. model params: 112592400 (num. trained: 112592400)
2020-12-08 22:32:43 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 0: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 1: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 2: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 3: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2020-12-08 22:32:43 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2020-12-08 22:32:43 | INFO | fairseq_cli.train | max tokens per GPU = 6400 and batch size per GPU = 256
2020-12-08 22:32:43 | INFO | fairseq.trainer | no existing checkpoint found exp/wordlm_lstm/checkpoint_last.pt
2020-12-08 22:32:43 | INFO | fairseq.trainer | loading train data for epoch 1
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
2020-12-08 22:41:58 | INFO | fairseq.data.data_utils | loaded 1662964 examples from: data/wordlm_text/train
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
2020-12-08 22:42:06 | INFO | fairseq.trainer | begin training epoch 1
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
2020-12-08 22:42:08 | INFO | root | Reducer buckets have been rebuilt in this iteration.
2020-12-08 22:42:14 | WARNING | fairseq.trainer | OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 1.55 GiB (GPU 1; 10.92 GiB total capacity; 7.68 GiB already allocated; 1.37 GiB free; 8.91 GiB reserved in total by PyTorch)
2020-12-08 22:42:14 | WARNING | fairseq.trainer | |===========================================================================|

PyTorch CUDA memory summary, device ID 0
CUDA OOMs: 0
===========================================================================
Metric
---------------------------------------------------------------------------
Allocated memory
from large pool
from small pool
---------------------------------------------------------------------------
Active memory
from large pool
from small pool
---------------------------------------------------------------------------
GPU reserved memory
from large pool
from small pool
---------------------------------------------------------------------------
Non-releasable memory
from large pool
from small pool
---------------------------------------------------------------------------
Allocations
from large pool
from small pool
---------------------------------------------------------------------------
Active allocs
from large pool
from small pool
---------------------------------------------------------------------------
GPU reserved segments
from large pool
from small pool
---------------------------------------------------------------------------
Non-releasable allocs
from large pool
from small pool
===========================================================================