Comments (11)
I'm still reading up on the specifics of the mimic/flite voice format and pronunciation dictionary so my opinions might be wrong, misguided, stupid or a combination of above.
Personally I think it would be great if we could let users update the pronunciation dictionary them selves but I would like mimic to stay as independent as possible without too many dependencies. My first thought is to include a standard "precompiled" C-code lexdict and detect if the user/dev has added an alternative dict and use this new dict instead of the standard.
A separate project (or subproject) could track the cmudict and include a simple build script to make it easier to contribute corrections to the mimic pronounciation dictionary. As I see it the cmudict has a licence compatible with mimic/flite so there is no real licensing issue providing the c-code version.
My personal opinion is that the current repo size is a bit on the heavy side even in this age of infinite bandwidth. I would maybe cut a couple of voices and limit the default selection to one or two with the option to download additional voices. separate the data files (like bellbird does) into a separate repositiory (using git-lfs perhaps!) might be an idea to consider.
from mimic1.
For the CMU arctic voices -- awb, bdl, clb, jmk, ksp, rms and slt -- voice data is available, but the labeling is not very accurate. These have a range of errors, from misplaced phone borders (alignment errors), incorrect phoneme assignment due to accent variation (e.g. American vs Canadian vs Scottish English), or incorrect phoneme assignment due to variation between the phrase and what is actually spoken.
The http://festvox.org/11752/packed/ directory contains various example scripts (build_cg_voice
, build_clunits_voice
, etc.) for building voices (including flite voices) from 100 of the recordings from awb and rms. The generated voices are not good compared with the flitevox files, nor are the LPC/RES diphone voices, due to the lack of decent alignment files and sufficient diphone coverage.
NOTE: cg is clustergen, a HMM-based synthesis model based on HTS (HMM-based Speech Synthesis System) synthesis, and clunits generates LPC/RES (residual linear predictive coding) units based around diphones.
For the other voices, I don't believe the voice data is available.
from mimic1.
I have been playing around with the build process a bit. Moved all files into one directory(except for the files my system doesn't need), compiling with gcc -g -O0 -o mimic *.c -lasound -lm -lpulse-simple
. I'm getting a few insights from that.
I do think that those data files(lang/cmulex/cmu_lex_entries.c
, lang/cmulex/cmu_lex_num_bytes.c
, lang/cmulex/cmu_lex_phones_huff_table.c
, lang/cmulex/cmu_lex_data_raw.c
) should be renamed, changing the extension from .c to .txt or something. Those files do not contain valid c code (most are just comma separated data) and will give compile errors when compiled without the appropriate makefiles.
from mimic1.
I agree that we shouldn't leave them as they are. I think by going through the build scripts we can make them produce valid c code and make them easier to use without too much trouble.
Some of those can changed just a bit. For example cmu_lex_data.c
includes cmu_lex_data_raw.c
in the middle of a table. I would prefer to alter the scripts generating cmu_lex_data.c
complete with the data that is in cmu_lex_data_raw.c
. cmu_lex_data.c
only contain four lines of code so it wouldn't be hard at all.
cmu_lex_num_bytes.c
only conains an integer. I'd rather call it cmu_lex_num_bytes.h
and let the script generate
#define LEX_NUM_BYTES [generated number]
and use LEX_NUM_BYTES
in cmu_lex_entries.c
instead of including a c-file in the middle of an assignment.
As soon as I'm certain that the build scripts are working as they're intended I can start modifying the output structure to something we can (hopefully) agree is a workable solution.
from mimic1.
@forslund those files in question are not generated by the build scripts. My guess is they are made by a external tool. I was able to get them to work by renaming them like cmu_lex_num_bytes.txt
and then #include "cmu_lex_num_bytes.txt"
where they are needed.
from mimic1.
The pull request linked above should re-create the lexicon and the letter to sound rules.
from mimic1.
@LongBoolean I'm pretty sure they are created by the make_cmulex
scripts using scripts from festvox and festival. (at least the files in my example). Some extra processing making them valid c-code would not be hard.
@zeehio excellent!
from mimic1.
I have changed the title of the issue to better reflect the specific issue we are dealing with.
I plan to recreate the voice models that I can, and the language analysis models needed as the starting point for internationalization
from mimic1.
Sounds good. Give me a shout if you need help from someone that doesn't know the first thing about voice models. :)
Testing, code review. Subissues that aren't that hard :)
from mimic1.
Hi all,
I just came across this thread because I am watching this repository.
Some time ago I (rather) bruteforced German into flite+hts_engine.
It was quite painful and messy, so I agree with your approach to change the method for this... anyway, I took some notes back then:
https://sourceforge.net/p/at-flite/wiki/AddingNewLanguage/
Unfortunately I did that for flite+hts_engine, which was afaik based on flite 1.4 and there was no capability to load models from file. Still, perhaps you can make some use of my notes.
Good luck :),
Markus
from mimic1.
Sorry for the delay replying, @m-toman. I will for sure take a look at your code and notes and if possible merge it into mimic. Related to #5
from mimic1.
Related Issues (20)
- Publish prebuilt mimic releases HOT 2
- windows cross-compilation instructions are not exact. HOT 2
- Problem with speaking. HOT 7
- Voice is not playing over bluetooth HOT 3
- Where is the difference between Mimic and Flite? HOT 2
- Linking fails, but easily fixed HOT 2
- Add support for Palm OS HOT 2
- Unable to make standalone mimic pause and resume at will when reading long text from terminal HOT 5
- Compile for pulseaudio HOT 2
- Using mimic libraries HOT 11
- Distorted 'ap' voice output when compiling with gcc 8.3
- mimic -p '' segs HOT 1
- Tool for creating my own .flitevox? HOT 4
- Dead links: festvox.org doesn't work anymore? HOT 3
- Is there a guide for adding new language support to mimic1? HOT 1
- Windows cross compilation error
- Issue while compiling with LTO enabled HOT 1
- Error when running 'make' HOT 5
- At ./autogen.sh: syntax error near unexpected token `newline'
- "Shared" compilation (with --enable-shared=yes) fails. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimic1.