GithubHelp home page GithubHelp logo

guycalledjeremy / hmm_speech_recognition_demo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wblgers/hmm_speech_recognition_demo

0.0 0.0 0.0 698 KB

A demo for simple isolated Chinese speech word recognition using GMMHMM in Python

Python 100.00%

hmm_speech_recognition_demo's Introduction

hmm_speech_recognition_demo

0. Setup Environment

This demo project is running on python2.x, please install the following required packages as well:

  • scikits.talkbox: Calculation of MFCC features on audio
  • hmmlearn: Hidden Markov Models in Python, with scikit-learn like API
  • scipy: Fundamental library for scientific computing

All the three python packages can be installed via pip install, on Python3.x, the package scikits.talkbox can't be installed correctly for me.

1. Description

1.1 Problem

By utilizing the GMMHMM in hmmlearn, we try to model the audio files in 10 categories. GMMHMM model provides easy interface to train a HMM model and to evaluate the score on test set.

Please more details in the doc of hmmlearn.

1.2 Dataset

It's a demo project for simple isolated speech word recognition. There are only 100 audio files with extention of .wav for training, and 10 audio files for testing. To be more specified:

  • Train: For digits 0-9, each with 10 sampels with Chinese pronunciation
  • Test: For digits 0-9, each with 1 sample with Chinese pronunciation

1.3 Demo running results:

In python2.x, run the script demo.py, get the result below:

Finish prepare the training data
Finish training of the GMM_HMM models for digits 0-9
Test on true label  10 : predict result label is  9
Test on true label  1 : predict result label is  1
Test on true label  3 : predict result label is  3
Test on true label  2 : predict result label is  2
Test on true label  5 : predict result label is  5
Test on true label  4 : predict result label is  4
Test on true label  7 : predict result label is  7
Test on true label  6 : predict result label is  6
Test on true label  9 : predict result label is  9
Test on true label  8 : predict result label is  9
Final recognition rate is 80.00 %

2. Tricky bug workaround in hmmlearn

In the training of HMM model, there may lead to negative value of startprob_. By checking the code of hmmlearn, I found the following code in hmmlearn/base.py is suspicious:

def _do_mstep(self, stats):
    """Performs the M-step of EM algorithm.
    Parameters
    ----------
    stats : dict
        Sufficient statistics updated from all available samples.
    """
    # The ``np.where`` calls guard against updating forbidden states
    # or transitions in e.g. a left-right HMM.
    if 's' in self.params:
        startprob_ = self.startprob_prior - 1.0 + stats['start']
        self.startprob_ = np.where(self.startprob_ == 0.0,
                                   self.startprob_, startprob_)
        normalize(self.startprob_)
    if 't' in self.params:
        transmat_ = self.transmat_prior - 1.0 + stats['trans']
        self.transmat_ = np.where(self.transmat_ == 0.0,
                                  self.transmat_, transmat_)
        normalize(self.transmat_, axis=1)

When updating the self.startprob_ and self.transmat_ in every step, it will subtract 1.0 from the old value, in this situation, it will be very likely to lead to a negative value for both self.startprob_ and self.transmat_.

Meanwhile, this issue is also submitted in the issue list of hmmlearn, but no response from the maintainer of hmmlearn:

startprob_ of ghmm is negative or nan #276

For a temporary workaround, I modify this part of code to get rid of the subtraction, since there is a normalize after the update.

def _do_mstep(self, stats):
    """Performs the M-step of EM algorithm.

    Parameters
    ----------
    stats : dict
        Sufficient statistics updated from all available samples.
    """
    # The ``np.where`` calls guard against updating forbidden states
    # or transitions in e.g. a left-right HMM.
    if 's' in self.params:
        # startprob_ = self.startprob_prior - 1.0 + stats['start']
        startprob_ = self.startprob_prior + stats['start']
        self.startprob_ = np.where(self.startprob_ == 0.0,
                                   self.startprob_, startprob_)
        normalize(self.startprob_)
    if 't' in self.params:
        # transmat_ = self.transmat_prior - 1.0 + stats['trans']
        transmat_ = self.transmat_prior + stats['trans']
        self.transmat_ = np.where(self.transmat_ == 0.0,
                                  self.transmat_, transmat_)
        normalize(self.transmat_, axis=1)

Please note: This modification is just a wordaround, official solution will be followed up if there is any update.

hmm_speech_recognition_demo's People

Contributors

wblgers avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.