Hi,I want to use this demo to Chinese speech recognition. The example I use is thchs30

In order to solve this problem, I just changed the default of decode to the cus

stm file and glm file,about mravanelli/pytorch-kaldi

TParcollet commented on May 17, 2024

Hi, i checked the thchs30 Kaldi recipe and saw that the decoding is based on the standard steps/decode.sh of kaldi. Therefore, it should work with our toolkit. When running the standard kaldi recipe, do you obtain the WER / PER at the end of decoding ? If so, the decoder is able to find the stm and glm files that should exist somewhere ?

from pytorch-kaldi.

TParcollet commented on May 17, 2024

Ok, so the scoring part of the thchs30 is custom. check at local/score.sh in the Kaldi recipe. You might need to call this script to score instead of our.

from pytorch-kaldi.

Johe-cqu commented on May 17, 2024

I am very glad that you can reply so quickly.Do I need to change the score.sh in your demo follow local/score.sh? Can you give me some tips about this?

from pytorch-kaldi.

TParcollet commented on May 17, 2024

Exactly, you have to replace this call to the right score.sh file.

from pytorch-kaldi.

Johe-cqu commented on May 17, 2024

thank you，i will try to do it.

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

Hi, I have the same problem while decoding and scoring the test dataset. My kaldi part of the experiment only generates alignment and graphs. The data preparation I performed does not create stm and glm files. I am using TIMIT_MLP_basic as configuration file. If I create stm and glm file only for the test dataset and store in the test data directory, will the problem be solved or there are other files needed to be created?

from pytorch-kaldi.

TParcollet commented on May 17, 2024

What is your dataset ?

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

My dataset is EMAMAE. It has both acoustic and articulatory features for speech. With regards, Subash

…

On Mon, Jun 24, 2019, 5:04 PM Parcollet Titouan ***@***.***> wrote: What is your dataset ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AKSKLJPXNZ7NPXTBPHLYZ53P4EZG5A5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYOHEGA#issuecomment-505180696>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKSKLJISEUJHKOZDUMBG5LLP4EZG5ANCNFSM4G3EDSDQ> .

from pytorch-kaldi.

Johe-cqu commented on May 17, 2024

In order to solve this problem, I just changed the default script of decode to the custom decode script provided in thchs30.
But I think it won't solve your problem.

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

I created my own stm and glm files for my test dataset (using TIMIT as reference) then I used TIMIT_MLP_basic configuration as it is. It worked. Thanks

from pytorch-kaldi.

Johe-cqu commented on May 17, 2024

OK

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

Hi, I apologize in advance as the following queries maybe more kaldi related but I am asking you here. 1. Is there any way that speaker adaptation can be carried out to the test speakers before running the decoding and scoring on them? (My test set speakers have different accent) 2. My transcriptions are phone sequences like TIMIT, so the WER obtained is the PER right? Also what are these other numbers on the last line of output showing WER? Are they insertion, deletion substitution error counts used to calculate WER? 3. How do one incorporate language/pronunciation models while decoding in pytorch-kaldi framework? Or They are incorporated while computation of alignments and graphs in kaldi? Any help in understanding these concepts will be greatly appreciated. With regards, Subash

…

On Tue, Jun 25, 2019 at 12:55 AM JohnJiang ***@***.***> wrote: Closed #59 <#59>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AKSKLJJL6QLQT63MBE74O4TP4GQNPA5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSE3FH3Q#event-2436256750>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKSKLJKRX4QM3LAV3DW3SPTP4GQNPANCNFSM4G3EDSDQ> .

from pytorch-kaldi.

mravanelli commented on May 17, 2024

Hi, let me try to answer to you questions: 1. *Is there any way that speaker adaptation can be carried out to the testspeakers before running the decoding and scoring on them? (My test setspeakers have different accent)* One possible solution is i-vectors, x-vectors, or d-vectors. You can compute one of these vector and concatenate with the input features. 2. * My transcriptions are phone sequences like TIMIT, so the WER obtained isthe PER right? Also what are these other numbers on the last line of outputshowing WER? Are they insertion, deletion substitution error counts usedto calculate WER?* Yes, for phone outputs the performance reported is actually the PER. This number is extracted from the decode*/score*. If you take a look into one of these folder you can find in the various files more information such as the alignment between the recognized output and the reference text, etc. I suggest you to take a look here if you need a more detailed analysis of the performance. 3. *How do one incorporate language/pronunciation models while decoding inpytorch-kaldi framework? Or They are incorporated while computation ofalignments and graphs in kaldi?* First of all you have to train a language model and then you have to compile a graph that includes both language and acoustic model information. This is done within the kaldi recipes. Best, Mirco

…

On Fri, 28 Jun 2019 at 15:10, Subash33 ***@***.***> wrote: Hi, I apologize in advance as the following queries maybe more kaldi related but I am asking you here. 1. Is there any way that speaker adaptation can be carried out to the test speakers before running the decoding and scoring on them? (My test set speakers have different accent) 2. My transcriptions are phone sequences like TIMIT, so the WER obtained is the PER right? Also what are these other numbers on the last line of output showing WER? Are they insertion, deletion substitution error counts used to calculate WER? 3. How do one incorporate language/pronunciation models while decoding in pytorch-kaldi framework? Or They are incorporated while computation of alignments and graphs in kaldi? Any help in understanding these concepts will be greatly appreciated. With regards, Subash On Tue, Jun 25, 2019 at 12:55 AM JohnJiang ***@***.***> wrote: > Closed #59 <#59>. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #59?email_source=notifications&email_token=AKSKLJJL6QLQT63MBE74O4TP4GQNPA5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSE3FH3Q#event-2436256750 >, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/AKSKLJKRX4QM3LAV3DW3SPTP4GQNPANCNFSM4G3EDSDQ > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AEA2ZVS45C6JHA2R24NMXT3P4ZO23A5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY252RA#issuecomment-506846532>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEA2ZVWZUV6CY7F6OLWNWGLP4ZO23ANCNFSM4G3EDSDQ> .

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

Hi, Thank you for your response. I apologize in advance as the following queries are more generic in nature. 1. I used TIMIT_MLP_basic config file for my dataset and ended up with 62 % PER. I know the config is not tailored for my data-set, but still do you have any suggestions on where people generally go wrong in trying to use these standard recipes and configurations for their own dataset? I will consider the ivector appending tip you mentioned above. Still any suggestions would be appreciated. 2. I am unable to get fmllr features using scripts like "steps/nnet/make_fmllr_feats.sh" (It throws me error "Invalid feature type [UNKNOWN]). I really need to get fmllr features to see if that would help in reducing my PER. Is there external way of computing fmllr features if i could not get through kaldi? 3. The decoding and scoring process for train and dev dataset, does what is the error percentage in that, is it the WER ? If so for test set, I had to create stm and glm file for local/score.sh to work, but i did not create those files for train and dev set, how is the error percentage working in those cases? As always Thank you for your help. With best regards, Subash On Sat, Jun 29, 2019 at 9:12 AM Mirco Ravanelli <[email protected]> wrote:

…

Hi, let me try to answer to you questions: 1. *Is there any way that speaker adaptation can be carried out to the testspeakers before running the decoding and scoring on them? (My test setspeakers have different accent)* One possible solution is i-vectors, x-vectors, or d-vectors. You can compute one of these vector and concatenate with the input features. 2. * My transcriptions are phone sequences like TIMIT, so the WER obtained isthe PER right? Also what are these other numbers on the last line of outputshowing WER? Are they insertion, deletion substitution error counts usedto calculate WER?* Yes, for phone outputs the performance reported is actually the PER. This number is extracted from the decode*/score*. If you take a look into one of these folder you can find in the various files more information such as the alignment between the recognized output and the reference text, etc. I suggest you to take a look here if you need a more detailed analysis of the performance. 3. *How do one incorporate language/pronunciation models while decoding inpytorch-kaldi framework? Or They are incorporated while computation ofalignments and graphs in kaldi?* First of all you have to train a language model and then you have to compile a graph that includes both language and acoustic model information. This is done within the kaldi recipes. Best, Mirco On Fri, 28 Jun 2019 at 15:10, Subash33 ***@***.***> wrote: > Hi, > > I apologize in advance as the following queries maybe more kaldi related > but I am asking you here. > > 1. Is there any way that speaker adaptation can be carried out to the test > speakers before running the decoding and scoring on them? (My test set > speakers have different accent) > 2. My transcriptions are phone sequences like TIMIT, so the WER obtained is > the PER right? Also what are these other numbers on the last line of output > showing WER? Are they insertion, deletion substitution error counts used > to calculate WER? > 3. How do one incorporate language/pronunciation models while decoding in > pytorch-kaldi framework? Or They are incorporated while computation of > alignments and graphs in kaldi? > > Any help in understanding these concepts will be greatly appreciated. > > With regards, > Subash > > On Tue, Jun 25, 2019 at 12:55 AM JohnJiang ***@***.***> > wrote: > > > Closed #59 <#59>. > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > < > #59?email_source=notifications&email_token=AKSKLJJL6QLQT63MBE74O4TP4GQNPA5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSE3FH3Q#event-2436256750 > >, > > or mute the thread > > < > https://github.com/notifications/unsubscribe-auth/AKSKLJKRX4QM3LAV3DW3SPTP4GQNPANCNFSM4G3EDSDQ > > > > . > > > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #59?email_source=notifications&email_token=AEA2ZVS45C6JHA2R24NMXT3P4ZO23A5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY252RA#issuecomment-506846532 >, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/AEA2ZVWZUV6CY7F6OLWNWGLP4ZO23ANCNFSM4G3EDSDQ > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AKSKLJLBVT4WJOFYNBSB7JTP45NVNA5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3YRPQ#issuecomment-506955966>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKSKLJIECC5INY2ZC3IHJ7LP45NVNANCNFSM4G3EDSDQ> .

from pytorch-kaldi.

TParcollet commented on May 17, 2024

Hi !

In general, for questions related to Kaldi, you better go into the official google-group, you will obtain more detailed and precise answers.

First, MLP is a bad solution to get good PER. How is the loss evolving during the training? Then, how much hours do you have ? Are the data clean or very noisy ? A lot of stuffs impacts the decoding. You should try with a bigger net to see first. IVectors, or speaker adaptation will help you to further reduce the PER, but when you're at 60% of PER, the solution is not a simple tweak, you must investigate other configurations and maybe features extraction.
It will be very very hard to computer Fmllr with other toolkits and connect to Kaldi, you should try to solve your problem with generating with Kaldi.
It should only be scoring test. The % reported for train and dev are within the training of the PyTorch acoustic model. Consequently, it's the loss function and not the PER. Unless you forced it ?

from pytorch-kaldi.

subash-khanal commented on May 17, 2024

Thank you so much.

…

On Sat, Jun 29, 2019, 5:21 PM Parcollet Titouan ***@***.***> wrote: Hi ! In general, for questions related to Kaldi, you better go into the official google-group, you will obtain more detailed and precise answers. 1. First, MLP is a bad solution to get good PER. How is the loss evolving during the training? Then, how much hours do you have ? Are the data clean or very noisy ? A lot of stuffs impacts the decoding. You should try with a bigger net to see first. IVectors, or speaker adaptation will help you to further reduce the PER, but when you're at 60% of PER, the solution is not a simple tweak, you must investigate other configurations and maybe features extraction. 2. It will be very very hard to computer Fmllr with other toolkits and connect to Kaldi, you should try to solve your problem with generating with Kaldi. 3. It should only be scoring test. The % reported for train and dev are within the training of the PyTorch acoustic model. Consequently, it's the loss function and not the PER. Unless you forced it ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AKSKLJJNCLXPFEGBRIQ7LNLP47G4FA5CNFSM4G3EDSD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4ALRQ#issuecomment-506987974>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKSKLJJRJYH3IT3MULYMHLDP47G4FANCNFSM4G3EDSDQ> .

from pytorch-kaldi.

stm file and glm file about pytorch-kaldi HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs