GithubHelp home page GithubHelp logo

Comments (11)

forslund avatar forslund commented on June 11, 2024

I'm still reading up on the specifics of the mimic/flite voice format and pronunciation dictionary so my opinions might be wrong, misguided, stupid or a combination of above.

Personally I think it would be great if we could let users update the pronunciation dictionary them selves but I would like mimic to stay as independent as possible without too many dependencies. My first thought is to include a standard "precompiled" C-code lexdict and detect if the user/dev has added an alternative dict and use this new dict instead of the standard.

A separate project (or subproject) could track the cmudict and include a simple build script to make it easier to contribute corrections to the mimic pronounciation dictionary. As I see it the cmudict has a licence compatible with mimic/flite so there is no real licensing issue providing the c-code version.

My personal opinion is that the current repo size is a bit on the heavy side even in this age of infinite bandwidth. I would maybe cut a couple of voices and limit the default selection to one or two with the option to download additional voices. separate the data files (like bellbird does) into a separate repositiory (using git-lfs perhaps!) might be an idea to consider.

from mimic1.

rhdunn avatar rhdunn commented on June 11, 2024

For the CMU arctic voices -- awb, bdl, clb, jmk, ksp, rms and slt -- voice data is available, but the labeling is not very accurate. These have a range of errors, from misplaced phone borders (alignment errors), incorrect phoneme assignment due to accent variation (e.g. American vs Canadian vs Scottish English), or incorrect phoneme assignment due to variation between the phrase and what is actually spoken.

The http://festvox.org/11752/packed/ directory contains various example scripts (build_cg_voice, build_clunits_voice, etc.) for building voices (including flite voices) from 100 of the recordings from awb and rms. The generated voices are not good compared with the flitevox files, nor are the LPC/RES diphone voices, due to the lack of decent alignment files and sufficient diphone coverage.

NOTE: cg is clustergen, a HMM-based synthesis model based on HTS (HMM-based Speech Synthesis System) synthesis, and clunits generates LPC/RES (residual linear predictive coding) units based around diphones.

For the other voices, I don't believe the voice data is available.

from mimic1.

LongBoolean avatar LongBoolean commented on June 11, 2024

I have been playing around with the build process a bit. Moved all files into one directory(except for the files my system doesn't need), compiling with gcc -g -O0 -o mimic *.c -lasound -lm -lpulse-simple. I'm getting a few insights from that.

I do think that those data files(lang/cmulex/cmu_lex_entries.c, lang/cmulex/cmu_lex_num_bytes.c, lang/cmulex/cmu_lex_phones_huff_table.c, lang/cmulex/cmu_lex_data_raw.c) should be renamed, changing the extension from .c to .txt or something. Those files do not contain valid c code (most are just comma separated data) and will give compile errors when compiled without the appropriate makefiles.

from mimic1.

forslund avatar forslund commented on June 11, 2024

I agree that we shouldn't leave them as they are. I think by going through the build scripts we can make them produce valid c code and make them easier to use without too much trouble.

Some of those can changed just a bit. For example cmu_lex_data.c includes cmu_lex_data_raw.c in the middle of a table. I would prefer to alter the scripts generating cmu_lex_data.c complete with the data that is in cmu_lex_data_raw.c. cmu_lex_data.c only contain four lines of code so it wouldn't be hard at all.

cmu_lex_num_bytes.c only conains an integer. I'd rather call it cmu_lex_num_bytes.h and let the script generate
#define LEX_NUM_BYTES [generated number]
and use LEX_NUM_BYTES in cmu_lex_entries.c instead of including a c-file in the middle of an assignment.

As soon as I'm certain that the build scripts are working as they're intended I can start modifying the output structure to something we can (hopefully) agree is a workable solution.

from mimic1.

LongBoolean avatar LongBoolean commented on June 11, 2024

@forslund those files in question are not generated by the build scripts. My guess is they are made by a external tool. I was able to get them to work by renaming them like cmu_lex_num_bytes.txt and then #include "cmu_lex_num_bytes.txt" where they are needed.

from mimic1.

zeehio avatar zeehio commented on June 11, 2024

The pull request linked above should re-create the lexicon and the letter to sound rules.

from mimic1.

forslund avatar forslund commented on June 11, 2024

@LongBoolean I'm pretty sure they are created by the make_cmulex scripts using scripts from festvox and festival. (at least the files in my example). Some extra processing making them valid c-code would not be hard.

@zeehio excellent!

from mimic1.

zeehio avatar zeehio commented on June 11, 2024

I have changed the title of the issue to better reflect the specific issue we are dealing with.

I plan to recreate the voice models that I can, and the language analysis models needed as the starting point for internationalization

from mimic1.

forslund avatar forslund commented on June 11, 2024

Sounds good. Give me a shout if you need help from someone that doesn't know the first thing about voice models. :)

Testing, code review. Subissues that aren't that hard :)

from mimic1.

m-toman avatar m-toman commented on June 11, 2024

Hi all,
I just came across this thread because I am watching this repository.
Some time ago I (rather) bruteforced German into flite+hts_engine.
It was quite painful and messy, so I agree with your approach to change the method for this... anyway, I took some notes back then:
https://sourceforge.net/p/at-flite/wiki/AddingNewLanguage/

Unfortunately I did that for flite+hts_engine, which was afaik based on flite 1.4 and there was no capability to load models from file. Still, perhaps you can make some use of my notes.

Good luck :),
Markus

from mimic1.

zeehio avatar zeehio commented on June 11, 2024

Sorry for the delay replying, @m-toman. I will for sure take a look at your code and notes and if possible merge it into mimic. Related to #5

from mimic1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.