I'm not sure what is going on here becasue some machines have no problem with compilin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

A possible workaround until we have a better solution: Disable

Thanks for the info <a class="user-mention notranslate" data-hovercard-type="user" dat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Compilation issue: Alan's voice introduces long compile times and failures on some machines. about mimic1 HOT 16 OPEN

mycroftai commented on May 29, 2024

Compilation issue: Alan's voice introduces long compile times and failures on some machines.

from mimic1.

Comments (16)

zeehio commented on May 29, 2024

Those large files in Alan voice being compiled in parallel can eat about 300MB RAM during compilation.

If make -j 4 or similar is used on a device with 1GB of RAM there is a chance that those files are compiled simultaneously and that there is an out of memory situation.

It seems that the main benefit of embedding a voice is a shorter load time. Once @forslund has his pymimic module we will only need to load the voice once at the beginning so the main advantage of embedding it will disappear. We can then move to not-embedding by default.

from mimic1.

forslund commented on May 29, 2024

@zeehio, no pressure then :)

from mimic1.

zeehio commented on May 29, 2024

A possible workaround until we have a better solution:

Disable Alan voice at compile time ./configure --disable-vid_gb_ap
Copy the Mycroft voice file from the voices directory
Use mimic with the voice from a file:
mimic -voice /path/to/Mycroft.flitevox

Or maybe provide already built binaries?

from mimic1.

aatchison commented on May 29, 2024

Hmm, the flitevox file is just huge and much slower... Pre compiled would be an option, but what about different architectures?

from mimic1.

zeehio commented on May 29, 2024

Do you have a list of architectures/OS you would like to support?

from mimic1.

zeehio commented on May 29, 2024

Maybe I profile the voice loading to see where is the bottleneck and if the performance can be improved.

from mimic1.

aatchison commented on May 29, 2024

Hmm, That might be a good idea. Go a head and release the build process though if you like.

from mimic1.

m-toman commented on May 29, 2024

I profiled the voice loading once and if I remember correctly the main issue was at https://github.com/MycroftAI/mimic/blob/master/src/cg/cst_cg_map.c#L93 where the mcep trees are read. There are many nested calls reading a lot of numbers value by value, with an error check per call.
freading larger chunks could certainly help here.

The voice loading typically takes some seconds on a background thread on a mobile device once on startup, so this wasn't a huge problem (I work for VocaliD, in case you wonder).

from mimic1.

zeehio commented on May 29, 2024

Thanks for the info @m-toman, I will try to do that.

Given that you are working for VocaliD, do you know if it would be possible to train an HTS version of the Mycroft voice? Adding hts support to mimic shouldn't be hard, as there is Flite+hts_engine out there.

As you probably already know, HTS voices have a much smaller footprint (<5MB) and in my limited experience (in speech synthesis in Catalan demo) quite good quality, great for embedded apps.

(In case you wonder, I am just collaborating with mimic on my spare time, I worked with speech synthesis in the past at the TALP-UPC group under the supervision of Antonio Bonafonte -great person- and now I just spend some time on it for fun)

from mimic1.

forslund commented on May 29, 2024

@zeehio if you like I can take a look at optimizing the flitevox-loading.

from mimic1.

m-toman commented on May 29, 2024

Ah, I have been at the SSW 2013 (http://ssw8.talp.cat) in Barcelona :).

I also trained an HTS version but it turned out to be rather disappointing with the regular hts_engine MLSA vocoder (in research we always used STRAIGHT). Mixed excitation as in flite is much smoother (but we had to make some changes to the festvox training to get the 44.1kHz version working). But yes, also much larger due to the random forest.

Our German voice model was also much better when trained using the regular HTS demo (3 samples here: http://m-toman.github.io/SALB/).
I suppose because it was recorded in studio setting with a professional speaker and manually cleaned labels.
We can talk about this by email if you like - m dot toman at neuratec dot com :).

from mimic1.

zeehio commented on May 29, 2024

@forslund That would be great, thanks! I am thinking that if you can move forward with pymimic once we have released a new mimic version then maybe it is worth to release right now, and push on pymimic a bit more. The main drawback we have with voice loading times is not that it is slow (few seconds), the main issue is that Mycroft is loading the voice on each mimic call (on each sentence) instead of once per session. It would be great to have pymimic as it would allow to keep the loaded voice in memory so we would not be paying the several seconds delay price on each sentence. If you feel that it is easier to get pymimic working than working on optimizations here then I suggest that we release right now, focus on having a pymimic release too and adapting Mycroft to use it. It is up to you :-)

@m-toman I will write you an email :-) I helped in the ssw8 organization (passing microphones, etc). It is a pity that there is not a better free software vocoder implementation, I know in TALP they have been working with both STRAIGHT and AHOcoder both improving the MLSA filter, but unfortunately none of them have a free software implementation. I believe they (at TALP) are also using SALB, I am sure they are thankful for it!

from mimic1.

m-toman commented on May 29, 2024

Even if a bit off-topic but perhaps a discussion interesting for others too:
Yes, the vocoder is a big bottleneck.
I wrote a small tool to do feature extraction and resynthesis using the flite/mimic MLSA+ME vocoder and it was actually much better than regular MLSA, but still...
if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.
The festvox training scripts for clustergen voices are also a lot messier than the HTS demo training scripts and can hardly be parameterized (well, except sed-replacing scheme script contents).

I've also been thinking about hybrid synthesis, so replacing the vocoder with a unit selection search. In the end, probably a DNN will directly synthesize waveforms, I guess :).

Regarding SALB, yes I've been contacted with some questions on it.
Back then I decided to build around flite instead of extending it because of https://sourceforge.net/p/at-flite/wiki/AddingNewLanguage/

I've also considered ICU but it seemed a bit huge and I wanted to keep the dependencies low, so I just added special treatment for UTF-8 characters for my small German text analysis.
I've been using flite in SALB only for text analysis of English and attached hts_engine, with abstractions in-between. Probably if you build that into mimic, SALB becomes obsolete :).

The connection from flite to hts_engine is rather simple - there is a huge function covering the utterance structure to a hts label and a dummy voice without synthesis function. But I guess discussion on that would belong to a new issue (like the whole post, but I'm not sure where :)).

from mimic1.

zeehio commented on May 29, 2024

Sorry for the offtopic issue, if I could I would split it.

After your comments I contacted Antonio Bonafonte, Asuncion Moreno both from TALP and Daniel Erro from AHOLAB and Dani sent me not one but two possible alternatives:

AhoTTS is a GPL3 speech synthesis system for Basque and Spanish based on aholab vocoder. To train the voices aholab binaries are needed though, although if we are going to train HTS voices HTK is also needed and non free...

The other solution is a free (BSD) implementation of something similar to the STRAIGHT vocoder called World. I believe it is worth looking into it.

I will open a new issue and try to see if it is possible to move these vocoder comments there ;-)

from mimic1.

aatchison commented on May 29, 2024

Thanks guys. We could really use a more optimized version:D

from mimic1.

Shallowmallow commented on May 29, 2024

if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.

Straight is now open source : https://github.com/HidekiKawahara/legacy_STRAIGHT

from mimic1.

Compilation issue: Alan's voice introduces long compile times and failures on some machines. about mimic1 HOT 16 OPEN

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs