GithubHelp home page GithubHelp logo

Comments (6)

csukuangfj avatar csukuangfj commented on June 20, 2024

I installed onnxruntime-gpu specifically for CUDA 12.x following the instructions from

That won't affect the onnxruntime used in sherpa-onnx.

Could you try CUDA 11.8 since we are using onnxruntime 1.17.1 in sherpa-onnx ?

from sherpa-onnx.

tn-17 avatar tn-17 commented on June 20, 2024

I have uninstalled CUDA 12.4 and installed CUDA 11.8.

image

Then I used python setup.py install again to rebuild and install sherpa-onnx for Nvidia GPU.

Then I ran the offline-tts-play.py example again. This got past the onnxruntime_providers_cuda.dll error. However, a new error appeared.

Could not locate cublasLt64_12.dll. Please make sure it is in your library path!

After some google searching, I think this means that I need to update CUDA to a newer version?

python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:12:24,597 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:12:26.0113635 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:12:26.0302070 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.  
2024-05-15 00:12:26.0345325 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Could not locate cublasLt64_12.dll. Please make sure it is in your library path!

from sherpa-onnx.

tn-17 avatar tn-17 commented on June 20, 2024

CUDA 11.8 contains cublasLt64_11.dll so I uninstalled 11.8 and installed 12.2.

I did not rebuild and reinstall sherpa-onnx for gpu.

I tried running the offline-tts-play.py example and encountered the onnxruntime_providers_cuda.dll error again.

Next, I reinstalled 11.8, keeping 12.2 as well since it is possible to have multiple installations.

I updated the path back to 11.8.

I retried the example py again and it got further. This time, it produced an error about zlibwapi.dll.

I got zlibwapi.dll from http://www.winimage.com/zLibDll/ as per NVIDIA Cuda and CUDNN installation instructions.

  • this is version 1.2.3
python piper_stream_example.py --vits-model=./en_US-libritts_r-medium.onnx --vits-tokens=./tokens.txt --vits-data-dir=./espeak-ng-data --output-filename=./test.wav --provider cuda --debug True 'This is a test'
Namespace(vits_model='./en_US-libritts_r-medium.onnx', vits_lexicon='', vits_tokens='./tokens.txt', vits_data_dir='./espeak-ng-data', vits_dict_dir='', tts_rule_fsts='', output_filename='./test.wav', sid=0, debug=True, provider='cuda', num_threads=1, speed=1.0, text='This is a test')
2024-05-15 00:32:10,848 INFO [piper_stream_example.py:320] Loading model ...
2024-05-15 00:32:11.9375753 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:32:11.9569339 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:32:11.9607816 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx\csrc\offline-tts-vits-model.cc:Init:79 ---vits model---
model_type=vits
comment=piper
has_espeak=1
language=English
voice=en-us
n_speakers=904
sample_rate=22050
----------input names----------
0 input
1 input_lengths
2 scales
3 sid
----------output names----------
0 output


2024-05-15 00:32:12,393 INFO [piper_stream_example.py:322] Loading model done.
2024-05-15 00:32:12,394 INFO [piper_stream_example.py:330] Start generating ...
C:\Users\T\Desktop\Code\ai\stella\sherpa-onnx\sherpa-onnx/csrc/offline-tts-vits-impl.h:Generate:165 Raw text: This is a test
Could not load library zlibwapi.dll. Error code 193. Please verify that the library is built correctly for your processor architecture (32-bit, 64-bit)
(venv)

from sherpa-onnx.

tn-17 avatar tn-17 commented on June 20, 2024

I was using the precomplied ddls for 32 bit. I downloaded the correct 64 bit ones from http://www.winimage.com/zLibDll/ and now there are no errors when running the offline-tts-play.py example.

Thank you for help! @csukuangfj

By the way, is there a performance issue with onnxruntime gpu?

I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.

My gpu is RTX 3090. My cpu is i9-14900k.

2024-05-15 00:50:46.3377129 [W:onnxruntime:, transformer_memcpy.cc:74 onnxruntime::MemcpyTransformer::ApplyImpl] 28 Memcpy nodes are added to the graph torch_jit for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-15 00:50:46.3555852 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 00:50:46.3603249 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on June 20, 2024

Glad to hear that you finally managed to run sherpa-onnx with GPU on Windows.

I am finding that cpu is faster than gpu when measuring the "time in seconds to receive the first message" for generating the tts audio.

GPU needs warmup also the advantage of GPU is parallel processing.

Moving data between CPU and GPU also takes time. In other words, GPU is not necessarily faster than CPU if you want to synthesize a single utterance.

from sherpa-onnx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.