GithubHelp home page GithubHelp logo

xminte / yesno_ice_phones Goto Github PK

View Code? Open in Web Editor NEW

This project forked from egillanton/yesno_ice_phones

0.0 0.0 0.0 15.03 MB

A modified version of the Kaldi-ASR yesno example project adjusted for Icelandic, for multiple speakers, and it uses a phone-based model instead of word based-model.

Perl 15.24% Shell 84.76%

yesno_ice_phones's Introduction

Kaldi-ASR Advanced Project
T-718-ATSR - Automatic Speech Recognition, TD-MSc, 2019-1
Reykjavik University - School of Computer Science, Menntavegur 1, IS-101 Reykjavik, Iceland

Table of Contents

Click to expand
  1. Introduction
  2. The Dataset
  3. The Adjustments
  4. Authors
  5. License
  6. References

1 Introduction

A modified version of the Kaldi-ASR yesno example project adjusted for Icelandic, for multiple speakers, and finally, it it uses a phone-based model instead of word based-model.

2 The Dataset

Custom yes/no dataset for Icelandic, made from recordings from students and the staff members from the course.

Summary:

  • 10 different speakers
  • 120 sentences/utterances
  • Speaker 0: 10 utterances
  • Speaker 1: 7 utterances
  • Speaker 2: 8 utterances
  • Speaker 3: 10 utterances
  • Speaker 4: 10 utterances
  • Speaker 5: 42 utterances
  • Speaker 6: 0 utterances
  • Speaker 7: 10 utterances
  • Speaker 8: 8 utterances
  • Speaker 9: 10 utterances
  • Speaker 10: 5 utterances
  • Each utterance consist of 8 words.
  • The vocabulary for the words:
  • Já (yes)
  • Nei (no

3 The Adjustments

3.1 run.sh

Edit following part in the file:

if [ ! -d waves_yesno ]; then
  wget waves_yesno.tar.gz || exit 1;
  tar -xvzf waves_yesno.tar.gz || exit 1;
fi

We have our dataset locally, so we dont have to fetch it from a remote file host.

Remember to re-compile the script.

$ chmod 777 s5/run.sh

Add following code to the while loop:

    $trans =~ s/SA\d\d-//;

This will remove the speaker Id from our filename when creating our data/{X}/text files.

Specify which speakers you want to have in the test set.

while ($l = <FL>)
{
	chomp($l);
	if (index($l, "SA01") == -1  && index($l, "SA10") == -1)
	{
		print TRAINLIST "$l\n";
	}
	else
	{
		print TESTLIST "$l\n";
	}
}

Edit the code so each utterece has it corresponding speaker id istead of global.

  cat data/$x/text | awk '{printf("%s %s\n", $1, substr($1, 0, 5));}' > data/$x/utt2spk

Remember to re-compile the script.

$ chmod 777 s5/local/prepare_data.sh 

Set the correct sample-frequency configuration based on the waves_yesno .wav file format.

--sample-frequency=16000 #  waves_yesno is sampled at 16kHz

For changing the model from a word-based model to a phone-based one, we will need to change the lexicon for our system.

input/lexicon.txt

<SIL> SIL
YES J A U
NO N E I

For simplicity we cept the same labels, but changed the corresponding phonems.

lexicon_nosil.txt

YES J A U
NO N E I

phones.txt

SIL
A
E
I
J
N
U

Done

Now we should be able to run the run.sh file successfully.

$ cd s5/
$ bash run.sh

Expected output result:

%WER 11.46 [ 11 / 96, 7 ins, 1 del, 3 sub ] exp/mono0a/decode_test_yesno/wer_14_1.0

4 Authors

5 License

This project is licensed under the MIT License - see the LICENSE file for details.

6 References

🌟 PLEASE STAR THIS REPO IF YOU FOUND SOMETHING INTERESTING 🌟

yesno_ice_phones's People

Contributors

egillanton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.