GithubHelp home page GithubHelp logo

Comments (8)

danijel3 avatar danijel3 commented on August 19, 2024

As I discuss in the notebook below, the "gen_list.c" file isn't strictly neccecssary, unless you need absoluttely identical results to HTK:
https://github.com/danijel3/PyHTK/blob/master/python-notebooks/HTKFeaturesExplained.ipynb

I suggest you let the Python class generate the filter (the method "create_filter" is run automatically if you don't load the the filters from the file) and simply ignore the gen_list.c program altogether.

If you still need the program, for some reason, the only way to fix it is to manually change the program, generate a list of filters to your spec, and use the gnereated filters wherever you need it.

To get energy instead of 0th cepstral coefficient (so _E instead of _0) you simply need to set 'raw_energy' to true in the constructor. The LOFREQ and HIFREQ can also be set using the constructor.

The energy normalization.scaling (ENORMALISE/ESCALE) isn't implemented here. To quote from HTK book:

This log energy measure can be normalised to the range $-E_{min}..1.0$ by setting the Boolean configuration parameter ENORMALISE to true (default setting). This normalisation is implemented by subtracting the maximum value of $E$ in the utterance and adding $1.0$. Note that energy normalisation is incompatible with live audio input and in such circumstances the configuration variable ENORMALISE should be explicitly set false. The lowest energy in the utterance can be clamped using the configuration parameter SILFLOOR which gives the ratio between the maximum and minimum energies in the utterance in dB. Its default value is 50dB. Finally, the overall log energy can be arbitrarily scaled by the value of the configuration parameter ESCALE whose default is $0.1$.

There is a slight problem with how to define "the utterance". There is generally a problem with doing normalization, on any features, so it's a good idea to think this through before using such options.

Still, I can implement these two features, if you think it's worth it. Lemme know what you think should be done.

from pyhtk.

metate avatar metate commented on August 19, 2024

Thank you for the information.

  • Target kind: What about support for _Z (zero mean static coef)?
  • Filter map: I generated a filter map using gen_filt.c because the python code does not support omitting hifreq and lofreq (default HTK config), but gen_filt does.
  • Energy normalization: here again, the point was to support default HTK configuration. However, I agree that this is a non-issue for most use cases.

from pyhtk.

danijel3 avatar danijel3 commented on August 19, 2024

Target kind

Didn't do that cause I thought this was more appropriate to perform after feature extraction. All features need to be nomalized/standardized in a consistent manner. Regradless if they are MFCCs or whatever.

For sake of HTK support, I can do this as well.

Filter map

Not sure what you mean by "omitting". Those are parameters, just like any other. You can define them using the constructor, or use the default values, as shown in the documentation.

Energy normalization

I will leave this issue up, until I do the following features:

  • ENORMALIZE
  • ESCALE
  • _Z feature

Personally, I wouldn't use those anyway, but for sake of HTK compatibility, they should be done. I'll just need a few days to find some free time to do that.

Is there anything else that is missing?

You think we should have a feature to parse the HTK config files, or is using the constructor sufficient for most people?

from pyhtk.

metate avatar metate commented on August 19, 2024

Hi,

_Z: I tried to zero mean the features but that didn't work well (resulted in high max diff). I couldn't find good documentation about this option. What are considered as static coefficients?

Filter map: If you look at the code of gen_filt you will notice that lo/hi are allowed to have a value of -1, which is handled differently compared to positive values. This is the default behavior of hcopy. Passing -1 to the python code won't behave the same.

Parsing config: though it would be nice, I wouldn't bother with that. It would require a lot of effort. Determining the constructor parameters was straight forward (for supported configuration, as discussed above).

I ported PyHTK to cpp, using Armadillo as a replacement to numpy, just in case someone needs this functionality in a cpp codebase; will publish soon.

from pyhtk.

danijel3 avatar danijel3 commented on August 19, 2024

To standardize the features, you may use a different library. Simplest would probably be sklearn.preprocessing.StandardScaler The important part is to compute the statistics on the wohle corpus, rather than individual utterance, prior to standardizing the features. Otherwise, you may have to use something like online normalization, but that has a whole different kind of problems.

'static coefficients' are the first features containing normal MFCCs/filterbank/energy - as opposed to deltas/acceleration which are used to model the change in the signal. To quote from HTKBook:

When delta and acceleration coefficients are requested, they are computed for all static parameters including energy if present.

lo/hi are allowed to have a value of -1

I think this is poor coding practice. In some situations, it's neccesssary, due to language limitations, but in Python we really don't need to have such "features". If someone doesn't need to modify lo and hi freuquencies, they can just use the constructor defaults.

In HTK, if you set these values to <0, it simply uses ~0 for lo, and Nyquist frequency for hi. I guess, it wouldn't hurt to set to these values if anyone enters something that is out of range. So if they do put -1, it would correct it to such values, anyway.

I ported PyHTK to cpp

Just wondering - is this better than simply using HTK?

from pyhtk.

danijel3 avatar danijel3 commented on August 19, 2024

Fixed in ba64fcb

Please veify if everything is wokring for you.

from pyhtk.

metate avatar metate commented on August 19, 2024

The code to bound low_freq/hi_freq to the [0, nyquist] range should be performed before the call to create_filter. Other than that all is working perfectly 👍
Thanks again for all your effort.

I won't use the cpp port either, but a colleague of mine wanted it in cpp, planning to adjust it for online feature extraction to be used along with a custom classifier he is working on. I'm not familiar with the details yet.

from pyhtk.

danijel3 avatar danijel3 commented on August 19, 2024

Good point. Didn't test that. It's all fixed in 412bfaf now.

from pyhtk.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.