I am unsure if this is an issue with sherpa-onnx gpu installation or onnxruntime-gpu i

I have uninstalled CUDA 12.4 and installed CUDA 11.8. <a target="_bl

Issue with building and running sherpa-onnx gpu on Windows about sherpa-onnx HOT 6 CLOSED

tn-17 commented on June 20, 2024

Issue with building and running sherpa-onnx gpu on Windows

from sherpa-onnx.

Comments (6)

csukuangfj commented on June 20, 2024

I installed onnxruntime-gpu specifically for CUDA 12.x following the instructions from

That won't affect the onnxruntime used in sherpa-onnx.

Could you try CUDA 11.8 since we are using onnxruntime 1.17.1 in sherpa-onnx ?

from sherpa-onnx.

tn-17 commented on June 20, 2024

I have uninstalled CUDA 12.4 and installed CUDA 11.8.

Then I used python setup.py install again to rebuild and install sherpa-onnx for Nvidia GPU.

Then I ran the offline-tts-play.py example again. This got past the onnxruntime_providers_cuda.dll error. However, a new error appeared.

Could not locate cublasLt64_12.dll. Please make sure it is in your library path!

After some google searching, I think this means that I need to update CUDA to a newer version?

python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:12:24,597 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:12:26.0113635 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:12:26.0302070 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.  
2024-05-15 00:12:26.0345325 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Could not locate cublasLt64_12.dll. Please make sure it is in your library path!

from sherpa-onnx.

tn-17 commented on June 20, 2024

CUDA 11.8 contains cublasLt64_11.dll so I uninstalled 11.8 and installed 12.2.

I did not rebuild and reinstall sherpa-onnx for gpu.

I tried running the offline-tts-play.py example and encountered the onnxruntime_providers_cuda.dll error again.

Next, I reinstalled 11.8, keeping 12.2 as well since it is possible to have multiple installations.

I updated the path back to 11.8.

I retried the example py again and it got further. This time, it produced an error about zlibwapi.dll.

I got zlibwapi.dll from http://www.winimage.com/zLibDll/ as per NVIDIA Cuda and CUDNN installation instructions.

this is version 1.2.3

python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:32:10,848 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:32:11.9375753 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:32:11.9569339 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:32:11.9607816 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx\csrc\offline-tts-vits-model.cc:Init:79 ---vits model---
model_type=vits
comment=piper
has_espeak=1
language=English
voice=en-us
n_speakers=904
sample_rate=22050
----------input names----------
0 input
1 input_lengths
2 scales
3 sid
----------output names----------
0 output


2024-05-15 00:32:12,393 INFO [piper_stream_example.py:322] Loading model done.
2024-05-15 00:32:12,394 INFO [piper_stream_example.py:330] Start generating ...
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx/csrc/offline-tts-vits-impl.h:Generate:165 Raw text: This is a test
Could not load library zlibwapi.dll. Error code 193. Please verify that the library is built correctly for your processor architecture (32-bit, 64-bit)
(venv)

from sherpa-onnx.

tn-17 commented on June 20, 2024

I was using the precomplied ddls for 32 bit. I downloaded the correct 64 bit ones from http://www.winimage.com/zLibDll/ and now there are no errors when running the offline-tts-play.py example.

Thank you for help! @csukuangfj

By the way, is there a performance issue with onnxruntime gpu?

I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.

My gpu is RTX 3090. My cpu is i9-14900k.

2024-05-15 00:50:46.3377129 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:50:46.3555852 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:50:46.3603249 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

from sherpa-onnx.

csukuangfj commented on June 20, 2024

Glad to hear that you finally managed to run sherpa-onnx with GPU on Windows.

I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.

GPU needs warmup also the advantage of GPU is parallel processing.

Moving data between CPU and GPU also takes time. In other words, GPU is not necessarily faster than CPU if you want to synthesize a single utterance.

from sherpa-onnx.

Issue with building and running sherpa-onnx gpu on Windows about sherpa-onnx HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs