Comments (19)
Is there any way that I can personally fix this?
from sherpa-onnx.
Could you tell us how we can reproduce it?
I have tested it at
https://huggingface.co/spaces/k2-fsa/text-to-speech
Everything is working fine.
from sherpa-onnx.
It only happens with the Android APK version. To reproduce, install the Android APK that I linked above, set it as the default text to speech engine and then synthesize speech on the Android device. I'll post a video of it.
from sherpa-onnx.
By the way, what is the CPU on your Android phone?
from sherpa-onnx.
Pixel 8 Pro Tensor G3 processor
from sherpa-onnx.
Does only this model on your phone have such pauses? Do other models work well on your phone?
from sherpa-onnx.
video of my model:
https://github.com/k2-fsa/sherpa-onnx/assets/75046310/7d35659d-106d-4520-a1ff-c79d058525dd
screen-20240512-180601.2.mp4
from sherpa-onnx.
And here it does not with Glados and other models:
screen-20240512-181649.mp4
from sherpa-onnx.
sweetbbak-amy en_GB
is twice larger in file size than other models.
In other words, this model is so large that it takes a lot of time to synthesize a sentence on your phone.
from sherpa-onnx.
I see, thanks for your help. Im training a new one right now thats going to be the lowest size, hopefully I can use that instead. Maybe I can submit that instead or potentially get some advice on building the APKs? I read the docs not that long ago and I am beyond lost, its a little over my head.
from sherpa-onnx.
Maybe I can submit that instead or potentially get some advice on building the APKs?
Both are fine to me.
If you want to build an APK by yourself, you can follow our doc at
https://k2-fsa.github.io/sherpa/onnx/android/build-sherpa-onnx.html
Or you can open-source your onnx model and we can build the APK and make it public.
from sherpa-onnx.
Thank you! Im going to open-source it for sure, but I will try to build it myself or I'll @ you in this thread or the thread you have in piper tts if that is okay.
from sherpa-onnx.
Actually I have one question, I'm on the last step of building an APK out of a Piper model but I'm lost at this step. What type of model is piper-tts supposed to be in this context?
OnlineRecognizer.kt
14 -> {
val modelDir = "vits-piper-en_GB-sweetbbak-amy"
return OnlineModelConfig(
neMoCtc = OnlineNeMoCtcModelConfig(
model = "$modelDir/en_GB-sweetbbak-amy.onnx",
//model = "$modelDir/model.onnx",
),
tokens = "$modelDir/tokens.txt",
)
}
and the hint in the wiki is:
If you select a different pre-trained model, make sure that you also change the corresponding code listed in the following screen shot:
but I can't find any information on what the piper models are supposed to be in the pre-trained models list
from sherpa-onnx.
What type of model is piper-tts supposed to be in this context?
You have selected the wrong file.
You should use
https://github.com/k2-fsa/sherpa-onnx/blob/master/android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/Tts.kt
The examples listed above should be straightforward to follow.
from sherpa-onnx.
Make sure you are using the latest master branch.
from sherpa-onnx.
By the way, are moels in
https://github.com/sweetbbak/Neural-Amy-TTS/tree/main/models
trained with piper?
If yes, I think they should be usable directly in sherpa-onnx.
from sherpa-onnx.
Yea, they are all trained with Piper. I just finished one up and I wanted to test them to see what parameters give a good mix of quality and speed on Android. Im on commit #872 939fdd9
So, I can build SherpaOnnxTttsEngine but I can't get the TTS engine to switch to Kaldi on my phone.
I built and copied the arm64 ilbs into jniLibs in the SherpaOnnxTtsEngine android studio project, inserted my model which I just named the same as the other model for convenience, downloaded the espeak-ng-data, copied in the tokens.txt file, installed onnxruntime with python and ran the python script in the model directory and added my model to TtsEngine.kt like this:
modelDir = "vits-piper-en_GB-sweetbbak-amy"
modelName = "en_GB-sweetbbak-amy.onnx"
dataDir = "vits-piper-en_GB-sweetbbak-amy/espeak-ng-data"
lang = "eng"
from sherpa-onnx.
Nevermind, it randomly started working after trying a few times. Must be some weird bug on my phones end. I appreciate the help. Also, last question, where does the tokens.txt come from and is it generally safe to just re-use it for every english model or is their some process for converting *.onnx.json into tokens.txt?
from sherpa-onnx.
Please see https://k2-fsa.github.io/sherpa/onnx/tts/piper.html
for how tokens.txt is generated.
def generate_tokens(config):
id_map = config["phoneme_id_map"]
with open("tokens.txt", "w", encoding="utf-8") as f:
for s, i in id_map.items():
f.write(f"{s} {i[0]}\n")
print("Generated tokens.txt")
is it generally safe to just re-use it for every english model
Please always regenerate it with your .json
file.
from sherpa-onnx.
Related Issues (20)
- Flutter android offline speech recognition. HOT 1
- TeleSpeech-ASR1.0对方言支持的比较好,建议将这个模型添加到sherpa中使用 HOT 7
- 说话人识别可以不提前做声纹注册,实现直接识别吗? HOT 2
- shaerpa-onnx 在ios上报错:Precondition failed: encoder.int8.onnx does not exist! HOT 4
- ios上如何使用关键词预训练模型,是否有相关文档或者demo HOT 3
- TTS crashes with nnapi enabled (works with CPU) HOT 2
- vad-non-streaming-asr-paraformer in csharp can;t detect the last samples HOT 1
- How to disable logs for offline-paraformer-greedy-search-decoder.cc HOT 3
- [csharp] Chinese char output ???????? while using OfflinePunctuation HOT 3
- 【csharp】SherpaOfflinePunctuationAddPunct: Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Repeat 2 times: HOT 5
- [feature request] remember speed HOT 3
- [csharp] SpeakerEmbeddingExtractor cost memory 6G and keep growing, does it normal or has memory leak? HOT 8
- [csharp] SherpaOfflinePunctuationAddPunct. Unhandled exception. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception HOT 2
- 安装Python Package CUDA版本失败 HOT 25
- whisper onnx convert to rknn HOT 6
- Hotwords encoding for phonemes HOT 6
- Getting end timestamp in result(s) HOT 2
- Runtime option to disable sherpa logs HOT 2
- Import Piper / VITS as iOS Custom System Voices for TTS using AVSpeechSynthesisProviderAudioUnit Extension in Swift HOT 1
- Convert new piper tts models. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sherpa-onnx.