GithubHelp home page GithubHelp logo

Comments (16)

zeehio avatar zeehio commented on May 29, 2024

Those large files in Alan voice being compiled in parallel can eat about 300MB RAM during compilation.

If make -j 4 or similar is used on a device with 1GB of RAM there is a chance that those files are compiled simultaneously and that there is an out of memory situation.

It seems that the main benefit of embedding a voice is a shorter load time. Once @forslund has his pymimic module we will only need to load the voice once at the beginning so the main advantage of embedding it will disappear. We can then move to not-embedding by default.

from mimic1.

forslund avatar forslund commented on May 29, 2024

@zeehio, no pressure then :)

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

A possible workaround until we have a better solution:

  1. Disable Alan voice at compile time ./configure --disable-vid_gb_ap
  2. Copy the Mycroft voice file from the voices directory
  3. Use mimic with the voice from a file:
    mimic -voice /path/to/Mycroft.flitevox

Or maybe provide already built binaries?

from mimic1.

aatchison avatar aatchison commented on May 29, 2024

Hmm, the flitevox file is just huge and much slower... Pre compiled would be an option, but what about different architectures?

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

Do you have a list of architectures/OS you would like to support?

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

Maybe I profile the voice loading to see where is the bottleneck and if the performance can be improved.

from mimic1.

aatchison avatar aatchison commented on May 29, 2024

Hmm, That might be a good idea. Go a head and release the build process though if you like.

from mimic1.

m-toman avatar m-toman commented on May 29, 2024

I profiled the voice loading once and if I remember correctly the main issue was at https://github.com/MycroftAI/mimic/blob/master/src/cg/cst_cg_map.c#L93 where the mcep trees are read. There are many nested calls reading a lot of numbers value by value, with an error check per call.
freading larger chunks could certainly help here.

The voice loading typically takes some seconds on a background thread on a mobile device once on startup, so this wasn't a huge problem (I work for VocaliD, in case you wonder).

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

Thanks for the info @m-toman, I will try to do that.

Given that you are working for VocaliD, do you know if it would be possible to train an HTS version of the Mycroft voice? Adding hts support to mimic shouldn't be hard, as there is Flite+hts_engine out there.

As you probably already know, HTS voices have a much smaller footprint (<5MB) and in my limited experience (in speech synthesis in Catalan demo) quite good quality, great for embedded apps.

(In case you wonder, I am just collaborating with mimic on my spare time, I worked with speech synthesis in the past at the TALP-UPC group under the supervision of Antonio Bonafonte -great person- and now I just spend some time on it for fun)

from mimic1.

forslund avatar forslund commented on May 29, 2024

@zeehio if you like I can take a look at optimizing the flitevox-loading.

from mimic1.

m-toman avatar m-toman commented on May 29, 2024

Ah, I have been at the SSW 2013 (http://ssw8.talp.cat) in Barcelona :).

I also trained an HTS version but it turned out to be rather disappointing with the regular hts_engine MLSA vocoder (in research we always used STRAIGHT). Mixed excitation as in flite is much smoother (but we had to make some changes to the festvox training to get the 44.1kHz version working). But yes, also much larger due to the random forest.

Our German voice model was also much better when trained using the regular HTS demo (3 samples here: http://m-toman.github.io/SALB/).
I suppose because it was recorded in studio setting with a professional speaker and manually cleaned labels.
We can talk about this by email if you like - m dot toman at neuratec dot com :).

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

@forslund That would be great, thanks! I am thinking that if you can move forward with pymimic once we have released a new mimic version then maybe it is worth to release right now, and push on pymimic a bit more. The main drawback we have with voice loading times is not that it is slow (few seconds), the main issue is that Mycroft is loading the voice on each mimic call (on each sentence) instead of once per session. It would be great to have pymimic as it would allow to keep the loaded voice in memory so we would not be paying the several seconds delay price on each sentence. If you feel that it is easier to get pymimic working than working on optimizations here then I suggest that we release right now, focus on having a pymimic release too and adapting Mycroft to use it. It is up to you :-)

@m-toman I will write you an email :-) I helped in the ssw8 organization (passing microphones, etc). It is a pity that there is not a better free software vocoder implementation, I know in TALP they have been working with both STRAIGHT and AHOcoder both improving the MLSA filter, but unfortunately none of them have a free software implementation. I believe they (at TALP) are also using SALB, I am sure they are thankful for it!

from mimic1.

m-toman avatar m-toman commented on May 29, 2024

Even if a bit off-topic but perhaps a discussion interesting for others too:
Yes, the vocoder is a big bottleneck.
I wrote a small tool to do feature extraction and resynthesis using the flite/mimic MLSA+ME vocoder and it was actually much better than regular MLSA, but still...
if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.
The festvox training scripts for clustergen voices are also a lot messier than the HTS demo training scripts and can hardly be parameterized (well, except sed-replacing scheme script contents).

I've also been thinking about hybrid synthesis, so replacing the vocoder with a unit selection search. In the end, probably a DNN will directly synthesize waveforms, I guess :).

Regarding SALB, yes I've been contacted with some questions on it.
Back then I decided to build around flite instead of extending it because of https://sourceforge.net/p/at-flite/wiki/AddingNewLanguage/

I've also considered ICU but it seemed a bit huge and I wanted to keep the dependencies low, so I just added special treatment for UTF-8 characters for my small German text analysis.
I've been using flite in SALB only for text analysis of English and attached hts_engine, with abstractions in-between. Probably if you build that into mimic, SALB becomes obsolete :).

The connection from flite to hts_engine is rather simple - there is a huge function covering the utterance structure to a hts label and a dummy voice without synthesis function. But I guess discussion on that would belong to a new issue (like the whole post, but I'm not sure where :)).

from mimic1.

zeehio avatar zeehio commented on May 29, 2024

Sorry for the offtopic issue, if I could I would split it.

After your comments I contacted Antonio Bonafonte, Asuncion Moreno both from TALP and Daniel Erro from AHOLAB and Dani sent me not one but two possible alternatives:

AhoTTS is a GPL3 speech synthesis system for Basque and Spanish based on aholab vocoder. To train the voices aholab binaries are needed though, although if we are going to train HTS voices HTK is also needed and non free...

The other solution is a free (BSD) implementation of something similar to the STRAIGHT vocoder called World. I believe it is worth looking into it.

I will open a new issue and try to see if it is possible to move these vocoder comments there ;-)

from mimic1.

aatchison avatar aatchison commented on May 29, 2024

Thanks guys. We could really use a more optimized version:D

from mimic1.

Shallowmallow avatar Shallowmallow commented on May 29, 2024

if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.

Straight is now open source : https://github.com/HidekiKawahara/legacy_STRAIGHT

from mimic1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.