Comments (6)
I installed onnxruntime-gpu specifically for CUDA 12.x following the instructions from
That won't affect the onnxruntime used in sherpa-onnx.
Could you try CUDA 11.8 since we are using onnxruntime 1.17.1 in sherpa-onnx ?
from sherpa-onnx.
I have uninstalled CUDA 12.4 and installed CUDA 11.8.
Then I used python setup.py install
again to rebuild and install sherpa-onnx for Nvidia GPU.
Then I ran the offline-tts-play.py
example again. This got past the onnxruntime_providers_cuda.dll
error. However, a new error appeared.
Could not locate cublasLt64_12.dll. Please make sure it is in your library path!
After some google searching, I think this means that I need to update CUDA to a newer version?
python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:12:24,597 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:12:26.0113635 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:12:26.0302070 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:12:26.0345325 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Could not locate cublasLt64_12.dll. Please make sure it is in your library path!
from sherpa-onnx.
CUDA 11.8 contains cublasLt64_11.dll
so I uninstalled 11.8 and installed 12.2.
I did not rebuild and reinstall sherpa-onnx for gpu.
I tried running the offline-tts-play.py
example and encountered the onnxruntime_providers_cuda.dll
error again.
Next, I reinstalled 11.8, keeping 12.2 as well since it is possible to have multiple installations.
I updated the path back to 11.8.
I retried the example py again and it got further. This time, it produced an error about zlibwapi.dll
.
I got zlibwapi.dll
from http://www.winimage.com/zLibDll/ as per NVIDIA Cuda and CUDNN installation instructions.
- this is version 1.2.3
python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:32:10,848 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:32:11.9375753 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:32:11.9569339 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:32:11.9607816 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx\csrc\offline-tts-vits-model.cc:Init:79 ---vits model---
model_type=vits
comment=piper
has_espeak=1
language=English
voice=en-us
n_speakers=904
sample_rate=22050
----------input names----------
0 input
1 input_lengths
2 scales
3 sid
----------output names----------
0 output
2024-05-15 00:32:12,393 INFO [piper_stream_example.py:322] Loading model done.
2024-05-15 00:32:12,394 INFO [piper_stream_example.py:330] Start generating ...
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx/csrc/offline-tts-vits-impl.h:Generate:165 Raw text: This is a test
Could not load library zlibwapi.dll. Error code 193. Please verify that the library is built correctly for your processor architecture (32-bit, 64-bit)
(venv)
from sherpa-onnx.
I was using the precomplied ddls for 32 bit. I downloaded the correct 64 bit ones from http://www.winimage.com/zLibDll/ and now there are no errors when running the offline-tts-play.py
example.
Thank you for help! @csukuangfj
By the way, is there a performance issue with onnxruntime gpu?
I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.
My gpu is RTX 3090. My cpu is i9-14900k.
2024-05-15 00:50:46.3377129 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:50:46.3555852 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:50:46.3603249 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
from sherpa-onnx.
Glad to hear that you finally managed to run sherpa-onnx with GPU on Windows.
I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.
GPU needs warmup also the advantage of GPU is parallel processing.
Moving data between CPU and GPU also takes time. In other words, GPU is not necessarily faster than CPU if you want to synthesize a single utterance.
from sherpa-onnx.
Related Issues (20)
- VS2022 使用静态库编译MFC例程成功,使用动态库编译失败 HOT 10
- 使用 sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12 模型,热词异常 HOT 5
- Decoding method 'modified_beam_search' gives letters/words on silence, while 'greedy_search' works well HOT 17
- TTS WebAssembly for other languages not work HOT 10
- sherpa-onnx-offline-tts: Allow piping text using cat or text file input with command line parameter HOT 1
- sherpa-onnx-offline-tts: Allow piping audio from tts directly to stdout for further processing
- tts: Increase TTS volume to 0 DB.
- Android TTS: Some ideas for packaging tts engine and voices/models HOT 2
- TTS: Add a SAPI 5 driver for Windows HOT 2
- sherpa-onnx-keyword-spotter 使用命令行工具 识别结果找不到 HOT 1
- 麦克风采样率和通道数的设置问题 HOT 1
- Export Time Info. HOT 2
- 说话人识别繁体字问题 HOT 5
- 给Java初学者在windows下编译java api和jni的过程 HOT 1
- What is the difference between running silero-vad with sherpa-onnx compared to the original onnxruntime HOT 2
- 如何安装cpp-api HOT 2
- Unable to run the SherpaOnnx2Pass android example project HOT 4
- Obtaining Encoder Features HOT 1
- How to use android NNAPI? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sherpa-onnx.