GithubHelp home page GithubHelp logo

yh1008 / speech-to-text Goto Github PK

View Code? Open in Web Editor NEW
70.0 8.0 19.0 1012.64 MB

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras

Home Page: http://llcao.net/cu-deeplearning17/project.html

Shell 2.42% Jupyter Notebook 91.70% Python 5.67% Perl 0.21%
kaldi dnn cnn speech-recognition speech-to-text

speech-to-text's Introduction

Mixlingual Speech Recognition

From the team:

As Chinese students studying in the states, we found our speaking habits morphed -- English words and phrases easily get slipped into Chinese sentences. We greatly feel the need to have messaging apps that can handle multilingual speech-to-text translation. So in this task, we are going to develop this function -- build a model using deep learning architecture(DNN, CNN, LSTM) to corretly translate multilingual audio (having Chinese and English in the same sentence) into text.

- Video Demo

Table of Content:

Directory Description

codeswitch:

Contains scripts to build our system

description:

LDC2015S04, our dataset description

notes:

Our study notes on Kaldi related recipie, including timit and librispeech

Resources to Build the System

Data Source:

Baseline Model Paper:

Other Code-switching related Paper:

Feature Improvement related Paper:

Interesting Python Kaldi Wrapper to be examined:

Kaldi recommended recipe to be examined:

Kaldi resources:

Data Preperation:

filename: pattern: format: path: source:
acoustic data: spk2gender <speakerID><gender> /data/train /data/test handmade
utt2spk <utteranceID><speakerID> /data/train /data/test handmade
wav.scp <utteranceID><full_path_to_audio_file> .scp: kaldi script file /data/train /data/test handmade
text <utteranceID><full_path_to_audio_file> .ark: kaldi archive file /data/train /data/test exists
language data: lexicon.txt <word> <phone 1><phone 2> ... .ark: kaldi archive file data/local/dict egs/voxforge
nonsilence_phones.txt  <phone> data/local/dict unkown
silence_phones.txt  <phone> data/local/dict unkown
optional_silence.txt  <phone> data/local/dict unkown
Tools: utils  / kaldi/egs/wsj/s5
steps / kaldi/egs/wsj/s5
score.sh  / kaldi/egs/voxforge/s5/local 

Language Model:

What are our language model:
3-grams trained from the transcripts of THCHS30 + LDC2015S04

directory structure taken from /egs/TIMIT/s5:

/data
  /local
    /nist_lm
      /lm_phone_bg.arpa.gz

How to build a language model:

Kaldi script utils/prepare_lang.sh

usage: utils/prepare_lang.sh <dict-src-dir> <oov-dict-entry> <tmp-dir> <lang-dir>
e.g.: utils/prepare_lang.sh data/local/dict <SPOKEN_NOISE> data/local/lang data/lang
options:
     --num-sil-states <number of states>             # default: 5, #states in silence models.
     --num-nonsil-states <number of states>          # default: 3, #states in non-silence models.
     --position-dependent-phones (true|false)        # default: true; if true, use _B, _E, _S & _I
                                                     # markers on phones to indicate word-internal positions.
     --share-silence-phones (true|false)             # default: false; if true, share pdfs of
                                                     # all non-silence phones.
     --sil-prob <probability of silence>             # default: 0.5 [must have 0 < silprob < 1]

Turning the –share-silence-phones option to TRUE was extremely helpful for the Cantonese data of IARPA's BABEL project, where the data is very messy and has long untranscribed portions that the Kaldi developers try to align to a special phone that is designated for that purpose. The --sil-prob might be another potentially important option.

Preparation

  • lexicon.txt
    • The pronunciation dictionary where every line is a word with its phonemic pronunciation. It Only contains words and their pronunciations that are present in the corpus.
    • ENG: CMU dictionary
  • nonsilence_phones.txt
  • optional_silence.txt
  • silence_phones.txt

MFCC Feature Extraction:

   echo
   echo "===== FEATURES EXTRACTION ====="
   echo
 
   # Making feats.scp files
   mfccdir=mfcc
   # Uncomment and modify arguments in scripts below if you have any problems with data sorting
   # utils/validate_data_dir.sh data/train     # script for checking prepared data - here: for data/train directory
   # utils/fix_data_dir.sh data/train          # tool for data proper sorting if needed - here: for data/train directory
   steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
   steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir
  
   # Making cmvn.scp files
   steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
   steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir

MFCC-related documents

HMM - GMM

Reference

a as the transition probability from state i to state j
b as the emission probability from state j to sequence X

Forward-backward algorithm fine tunes a

GMM providesb

HMM solves the following three problems:

  1. overall likelihood (Forward algorithm): determine the likelihood of an observation sequence X=(x1, x2, ... xT) being generated by an HMM
  2. training (Forward-backward algorithm EM): given an observation sequence, learn the best lambda
  3. decoding (Viterbi algorithm): given an on observation sequence, determine the most probable hidden state sequence

CNN and MFSC features

In order to train CNN, we need to extract MFSC features from the acoustic data instead of MFCC features, as Discrete Cosine Transformation (DCT) in MFCC destroys locality. MFSC features also called filter banks. In Kaldi, the scripts are something like the following:

steps/make_fbank.sh --nj 3 \ $trainDir/train_clean_fbank exp/make_fbank/train_clean_fbank feat/fbank/ || exit 1;
steps/compute_cmvn_stats.sh $trainDir/train_clean_fbank exp/make_fbank/train_clean_fbank feat/fbank/ || exit 1;

notice that fbanks don't work well with GMM as fbanks features are highly correlated, and GMM modelled with diagonal covariance matrices assumed independence of feature streams. fbanks/MFSC is okay with DNN, best for CNN.
why MFSC+GMM produced high WER-see Kaldi discussion
why DCT destroys locality-see post

Required Packages

tensorflow == 1.1.0
theano == 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
keras == 1.2

Run Kaldi on single GPU

This doesn't require Sun GridEngine. Simply download [CUDA toolkit] (https://developer.nvidia.com/cuda-downloads), install it with

sudo sh cuda_8.0.61_375.26_linux.run

and then go under kaldi/src execute

./configure

to check if it detects CUDA, you will also find CUDA = true in kaldi/src/kaldi.mk then recompile Kaldi with

make -j 8 # 8 for 8-core cpu
make depend -j 8 # 8 for 8-core cpu

Noted that GMM-based training and decode is not supported by GPU, only nnet does. source

** if you are using AWS g2.2xlarge, and launched the instance before 2017-04-18 (when this note is written), its NVIDIA may need a legacy 367.x driver, the default (latest) driver that comes with CUDA-8 cuda_8.0.61_375.26_linux.run will fail. To check the current version of the driver installed on the instance, type

apt-cache search nvidia | grep -P '^nvidia-[0-9]+\s'

to install a version of your choice from the list, type

sudo apt-get install nvidia-367

You can also download a specifc version from the web, for example NVIDIA-Linux-x86_64-367.18.run. Install it with

sudo sh NVIDIA-Linux-x86_64-367.18.run

and then when installing cuda_8.0.61_375.26_linux.run, it will ask you whether to install NVIDIA driver 375, make sure you choose no.

Install tensorflow-gpu

Required:

  1. install CUDA toolkit 8.0 as of 04-18-2017
  2. install cuDNN download v5, as of 04-18-2017, Tensorflow performs the best with cuDNN 5.x
    Follow commands carefully from the Tensorflow website. After intallation, you can test if tensorflow can detect your gpu by typing the following:
# makes sure you are out of the tensorflow git repo
python
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

A working tensorflow will output:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:04.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0

  1. During testing, if you run into error like:
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /usr/local/cuda/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO

from the writer's experience, you didn't set the right LD_LIBRARY_PATH in the ~/.profile file. You need to examine where is libcudnn.so.5 located and move it to the desired location, most likely it will be /usr/local/cuda. Also make sure you type source ~/.profile to activate the change, after you modify the file.

  1. If you are testing it in a python shell, and you met the following error:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

very likely you are in the actual tensorflow git repo. source, make sure you jump out of it before testing.

Install Theano GPU

Keras-kaldi's LSTM training script breaks under the current tensorflow (as tensorflow went through series of API changes during the previous months), we need to install Theano GPU and switch to the theano backend for running run_kt_LSTM.sh.
After installing Theano-gpu using miniconda, in order to modify the theano.config file, you can create .theanorc by the following command:

echo -e "\n[global]\nfloatX=float32\n" >> ~/.theanorc

and add device=gpu to the this file. If theano can't detect NVCC, by giving you the following error:

ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

(but you sure that you installed CUDA), you can solve it by adding the following lines to ~/.profile:

export PATH=/usr/local/cuda-8.0/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

don't forget to source ~/.profile to enable the change.
to change the keras backend from tensorflow to theano, modify:

vim $HOME/.keras/keras.json

to test if theano is indeed using gpu, execute the following file:

from theano import function, config, shared, tensor
import numpy
import time
vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Kaldi script to train nnet

  1. 3-4 hours to train, 3 hours to decode on GPU:
    local/online/run_nnet2_baseline.sh

Chinese CER (Character Error Rate)

  1. egs/hkust/s5/local/ext/score.sh

Keras-Kaldi

dspavankumar/keras-kaldi github repo
Up to the time that we ran his code, the enviornment is still Keras 1.2.0 Make sure that the Keras version is the same across the machines. to reinstall Keras from 2.0.3 to older version, type

$ sudo pip3 install keras==1.2
or 
$ conda install keras==1.2.2 # if you are using conda

If there is version inconsistency (train model using 1.2.0 but decode it with 2.0.3, you will run into problem when loading an existing model:

  File "steps_kt/nnet-forward.py", line 33, in <module>
    m = keras.models.load_model (model)
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 281, in load_model
    Error: “Optimizer weight shape (1024, ) not compatible with provided weight shape (429,1024)”

source

speech-to-text's People

Contributors

kailichen avatar wendywangwwt avatar yh1008 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-to-text's Issues

local/score.sh does not exist

Not scoring because local/score.sh does not exist or not executable

I think we could cp a local/score.sh from other recipes, but in our case, we have both English and Mandarin, and we are very likely needed to generate our own scoring methods to calculate Mix Error Rate.

need to trim audio files?!

in our current code-switching dataset, a single audio file has partial transcriptions specified from milliseconds to milliseconds; for example:

given a piece of audio 01NC01FBX_0101.flac, we have its transcriptions as the following:
01NC01FBX_0101 86300 88370 then area five 的 total 是
01NC01FBX_0101 165090 167860 不懂 but official result 还没有 出 i think 出了 他们 就会

here 01NC01FBX_0101.flac is a single audio file, and it has multiple transcriptions for different frames (86300-88370) and (165090-167860).

However, seems like Kaldi only accepts one utterance (one audio file) to contain one transcription. If that is indeed the case, we need to trim the existing audio into separate ones for Kaldi to process. In the above example, we need to create 2 files
01NC01FBX_0101_86300_88370
and
01NC01FBX_0101_165090_167860
out of the original 01NC01FBX_0101.flac

WARNING: triphone decode

seen warning:
optional-silence SIL is seen only 79.5256679676% of the time at utterance begin. This may not be optimal.

Command:

steps/decode.sh --nj 8 --cmd run.pl exp/tri1/graph data/test exp/tri1/decode

Outputs:

decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd run.pl exp/tri1/graph exp/tri1/decode
analyze_phone_length_stats.py: WARNING: optional-silence SIL is seen only 79.5256679676% of the time at utterance begin.  This may not be optimal.
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(4,36,220) and mean=87.0
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode/log/analyze_lattice_depth_stats.log
Not scoring because local/score.sh does not exist or not executable.

utt2spk not sorted

error: utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates
needs to be fixed!

utils/fix_data_dir.sh data/train
utils/fix_data_dir.sh: file data/train/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/train/spk2utt is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/train/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/train/segments is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/train/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/train/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: filtered data/train/segments from 36990 to 36940 lines based on filter /tmp/kaldi.VKzR/recordings.
utils/fix_data_dir.sh: filtered data/train/wav.scp from 193 to 192 lines based on filter /tmp/kaldi.VKzR/recordings.

  • data/train/utt2spk differ: char 3299, line 98

utt2spk is not in sorted order (fix this yourself)

dataset?

  1. Where to find Mandarin and English speech dataset?

Mal-formed spk2gender file

$ utils/validate_data_dir.sh --no-feats data/test

Mal-formed spk2gender file

right now the utt2spk shows
less data/train/utt2spk | head -2
01FA-UI01FAZ_0101_0004721_0007863 01FA
01FA-UI01FAZ_0101_0008686_0012571 01FA

and spk2gender looks like
less data/train/spk2gender | head -2
01FA F
02FA F

I might need to make 01FA-UI01FAZ this entire string as speaker_id

WARNING: tree has pdf with no stats

during the triphone training, there is a giant sequence of warning complaining Tree has pdf-id x with no stats, like the following:

WARNING (gmm-init-model[5.0.61~1-37b53]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 3 with no stats; correspo
nding phone list: 16 17 18 19 20 
...
WARNING (gmm-init-model[5.0.61~1-37b53]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 442 with no stats; corres
ponding phone list: 1939 1940 1941 1942 
** The warnings above about 'no stats' generally mean you have phones **
** (or groups of phones) in your phone set that had no corresponding data. **
** You should probably figure out whether something went wrong, **
** or whether your data just doesn't happen to have examples of those **
** phones. **

in total, there are

2777 warnings in exp/tri1/log/acc.*.*.log
6578 warnings in exp/tri1/log/align.*.*.log
211 warnings in exp/tri1/log/questions.log
7328 warnings in exp/tri1/log/update.*.log
234 warnings in exp/tri1/log/init_model.log
1 warnings in exp/tri1/log/build_tree.log

ERROR: phone symbol tables data/lang/phones.txt and exp/mono_ali/phones.txt are not compatible.

after running command:

steps/train_deltas.sh --cmd run.pl 1000 11000 data/train data/lang exp/mono_ali exp/tri1

,

there is this error:

ERROR (make-h-transducer[5.1.46~1-0d031]:TopologyForPhone():hmm-topology.cc:333) TopologyForPhone(), phone 2005 not covered.

full terminal output: 
```steps/train_deltas.sh --cmd run.pl 1000 11000 data/train data/lang exp/mono_ali exp/tri1
utils/lang/check_phones_compatible.sh: phone symbol tables data/lang/phones.txt and exp/mono_ali/phones.txt are not compatible.
tree-info exp/tri1/tree 
tree-info exp/tri1/tree 
fsttablecompose data/lang/L_disambig.fst data/lang/G.fst 
fstdeterminizestar --use-log=true 
fstminimizeencoded 
fstpushspecial 
WARNING (fstpushspecial[5.1.46~1-0d031]:Iterate():push-special.cc:182) push-special: finished 200 iterations without converging.  Output will be inaccurate.
fstisstochastic data/lang/tmp/LG.fst 
-0.0544016 -0.0722563
[info]: LG not stochastic.
fstcomposecontext --context-size=3 --central-position=1 --read-disambig-syms=data/lang/phones/disambig.int --write-disambig-syms=data/lang/tmp/disambig_ilabels_3_1.int data/lang/tmp/ilabels_3_1.22095 
fstisstochastic data/lang/tmp/CLG_3_1.fst 
0 -0.0722563
[info]: CLG not stochastic.
make-h-transducer --disambig-syms-out=exp/tri1/graph/disambig_tid.int --transition-scale=1.0 data/lang/tmp/ilabels_3_1 exp/tri1/tree exp/tri1/final.mdl 
ERROR (make-h-transducer[5.1.46~1-0d031]:TopologyForPhone():hmm-topology.cc:333) TopologyForPhone(), phone 2005 not covered.

[ Stack-Trace: ]
make-h-transducer() [0x87532c]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::HmmTopology::TopologyForPhone(int) const
kaldi::GetHmmAsFst(std::vector<int, std::allocator<int> >, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::unordered_map<std::pair<int, std::vector<int, std::allocator<int> > >, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*, kaldi::HmmCacheHash, std::equal_to<std::pair<int, std::vector<int, std::allocator<int> > > >, std::allocator<std::pair<std::pair<int, std::vector<int, std::allocator<int> > > const, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*> > >*)
kaldi::GetHTransducer(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::vector<int, std::allocator<int> >*)
main
__libc_start_main
_start

WARNING: arpa2fst

To generate G.fst I executed

arpa2fst --disambig-symbol=#0 --read-symbol-table=$lang/words.txt $local/tmp/lm.arpa $lang/G.fst

which outputs the following warning:

yh2901@instance-1:~/kaldi/egs/codeswitch$ ./make_graph.sh 

===== MAKING G.fst =====

arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt data/local/tmp/lm.arpa data/lang/G.fst 
LOG (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:96) Reading \data\ section.
LOG (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:151) Reading \1-grams: section.
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 11 [-5.472714	-ying	-0.3005793] skipped: word '-ying' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 21 [-5.472714	Archi	-0.2663992] skipped: word 'Archi' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 23 [-5.472714	Beijing	-0.2994594] skipped: word 'Beijing' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 25 [-5.472714	Cers	-0.3004402] skipped: word 'Cers' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 28 [-5.472714	Deutsche	-0.3009956] skipped: word 'Deutsche' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 35 [-5.472714	Intel	-0.3010285] skipped: word 'Intel' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 36 [-5.472714	Inter	-0.3004702] skipped: word 'Inter' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 38 [-5.296623	J-Cs	-0.2981122] skipped: word 'J-Cs' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 40 [-4.732351	K-box	-0.281] skipped: word 'K-box' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 41 [-5.171684	K-pop	-0.2621106] skipped: word 'K-pop' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 47 [-5.171684	Malaysia	-0.2552751] skipped: word 'Malaysia' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 48 [-5.296623	Mochik	-0.2662709] skipped: word 'Mochik' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 53 [-5.472714	Psychometric	-0.3009759] skipped: word 'Psychometric' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 179 [-5.472714	Shanghai	-0.3010285] skipped: word 'Shanghai' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 180 [-5.472714	Shearwood	-0.2965806] skipped: word 'Shearwood' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 181 [-5.472714	Suzhou	-0.2997037] skipped: word 'Suzhou' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 182 [-5.472714	Swensens	-0.299923] skipped: word 'Swensens' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 184 [-5.472714	T-shirt	-0.3006591] skipped: word 'T-shirt' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 193 [-5.472714	[di]	-0.3010029] skipped: word '[di]' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 194 [-5.472714	[gi]	-0.301] skipped: word '[gi]' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 195 [-5.472714	[uh-huh]	-0.3009854] skipped: word '[uh-huh]' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 198 [-2.450492	a	-0.2567758] skipped: word 'a' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 199 [-5.472714	a-famosa	-0.2955976] skipped: word 'a-famosa' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 200 [-5.472714	aback	-0.2994594] skipped: word 'aback' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 201 [-5.472714	abalone	-0.3009656] skipped: word 'abalone' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 202 [-5.472714	abandoned	-0.3008771] skipped: word 'abandoned' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 203 [-5.472714	abduct	-0.2921268] skipped: word 'abduct' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 204 [-5.472714	abiding	-0.2994594] skipped: word 'abiding' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 205 [-5.472714	abilities	-0.3001809] skipped: word 'abilities' not in symbol table
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:219) line 206 [-5.074774	ability	-0.4649765] skipped: word 'ability' not in symbol table
LOG (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:151) Reading \2-grams: section.
LOG (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:151) Reading \3-grams: section.
WARNING (arpa2fst[5.0.61~1-37b53]:Read():arpa-file-parser.cc:259) Of 161603 parse warnings, 30 were reported. Run program with --max_warnings=-1 to see all warnings
LOG (arpa2fst[5.0.61~1-37b53]:RemoveRedundantStates():arpa-lm-compiler.cc:355) Reduced num-states from 91509 to 15997

Need to examine whether this XXX not in symbol table can be fixed (or simply does it matter)

mkgraph.sh: expected data/lang/G.fst to exist

The above error occurs when calling utils/mkgraph.sh, like the following:

utils/mkgraph.sh data/lang exp/mono_10k exp/mono_10k/graph

or

utils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph

The monophone system (exp/mono_10k) and triphone(exp/tri1) training are generated by the following script:

steps/train_mono.sh --boost-silence 1.25 --nj 8 --cmd run.pl \
data/train_10k data/lang exp/mono_10k

and

steps/train_deltas.sh  2000 11000 data/train data/lang exp/mono_ali exp/tri1

fail to create mfcc feature

$ steps/make_mfcc.sh --nj 8 data/train exp/make_mfcc/train mfcc

utils/validate_data_dir.sh: Successfully validated data-directory data/train
steps/make_mfcc.sh [info]: segments file exists: using that.
run.pl: 8 / 8 failed, log is in exp/make_mfcc/train/make_mfcc_train.*.log

$ less exp/make_mfcc/train/make_mfcc_train.*.log
shows

extract-segments scp,p:data/train/wav.scp exp/make_mfcc/segments.1 ark:- | compute-mfcc-feats --verbose=2 --config=conf/mf
cc.conf ark:- ark:- | copy-feats --compress=true ark:- ark,scp:/home/yh2901/kaldi/egs/codeswitch/mfcc/raw_mfcc_train.1.ark,/
home/yh2901/kaldi/egs/codeswitch/mfcc/raw_mfcc_train.1.scp 
Started at Mon Mar 27 18:20:09 UTC 2017

compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf ark:- ark:- 
copy-feats --compress=true ark:- ark,scp:/home/yh2901/kaldi/egs/codeswitch/mfcc/raw_mfcc_train.1.ark,/home/yh2901/kaldi/egs/
codeswitch/mfcc/raw_mfcc_train.1.scp 
extract-segments scp,p:data/train/wav.scp exp/make_mfcc/segments.1 ark:- 

NI02FAX_0101.flac: ERROR initializing decoder
                   init status = FLAC__STREAM_DECODER_INIT_STATUS_ERROR_OPENING_FILE

An error occurred opening the input file; it is likely that it does not exist
or is not readable.
ERROR (extract-segments[5.0.61~1-37b53]:Read4ByteTag():wave-reader.cc:75) WaveData: expected 4-byte chunk-name, got read errror

[ Stack-Trace: ]
extract-segments() [0x5193f4]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::WaveData::Read4ByteTag(std::istream&, char*)
kaldi::WaveData::Read(std::istream&, kaldi::WaveData::ReadDataType)
kaldi::WaveHolder::Read(std::istream&)
kaldi::RandomAccessTableReaderScriptImpl<kaldi::WaveHolder>::HasKeyInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
kaldi::RandomAccessTableReaderScriptImpl<kaldi::WaveHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::RandomAccessTableReader<kaldi::WaveHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
main
__libc_start_main
_start

WARNING (extract-segments[5.0.61~1-37b53]:Read():feat/wave-reader.h:165) Exception caught in WaveHolder object (reading). 
WARNING (extract-segments[5.0.61~1-37b53]:HasKeyInternal():util/kaldi-table-inl.h:1792) Error reading object from stream 'flac -c -d -s /home/yh2901/kaldi/egs/codeswitch/interview_audio/train/NI02FAX/NI02FAX_0101.flac |'
WARNING (extract-segments[5.0.61~1-37b53]:main():extract-segments.cc:126) Could not find recording NI02FAX_0101, skipping segment NI02FAX_0101_0055711_0060021
WARNING (extract-segments[5.0.61~1-37b53]:Close():kaldi-io.cc:501) Pipe flac -c -d -s /home/yh2901/kaldi/egs/codeswitch/interview_audio/train/NI02FAX/NI02FAX_0101.flac | had nonzero return status 256

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.