(myenv) ubuntu@152:~/sherpa-onnx/python_api_examples$ python3 test.py Elapsed: 0.0

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

The int8 model is obtained via the following code <div class="Box Box--condens

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

int8 quantized TTS model slower than fp32 about sherpa-onnx HOT 9 OPEN

martinshkreli commented on June 20, 2024

int8 quantized TTS model slower than fp32

from sherpa-onnx.

Comments (9)

csukuangfj commented on June 20, 2024 1

@martinshkreli

Could you describe how you get the int8 models?

from sherpa-onnx.

danpovey commented on June 20, 2024

Fangjun will get back to you about it, but: hi, martin shkreli!
We might need more hardware info and details about what differed between those two runs.

from sherpa-onnx.

martinshkreli commented on June 20, 2024

Hi guys, thanks again for the wonderful repo. I followed this link to download the model:
https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/vits.html#download-the-model

Then, I used that file (vits-ljs.int8.onnx) for inference in the python script (offline-tts.py). This was on an 8xA100 instance.

from sherpa-onnx.

martinshkreli commented on June 20, 2024

@martinshkreli

Could you describe how you get the int8 models?

hi Fangjun, i just wanted to try and get your attention one more time, sorry if I am being annoying!

from sherpa-onnx.

csukuangfj commented on June 20, 2024

The int8 model is obtained via the following code

sherpa-onnx/scripts/vits/export-onnx-ljs.py

Lines 204 to 208 in d771762

 quantize_dynamic( 

 model_input=filename, 

 model_output=filename_int8, 

 weight_type=QuantType.QUInt8, 

 )

Note that it uses

sherpa-onnx/scripts/vits/export-onnx-ljs.py

Line 207 in d771762

weight_type=QuantType.QUInt8,

It is a known issue about onnxruntime that quint8 is slower.

For instance, if you search with google, you can find similar issues:

from sherpa-onnx.

danpovey commented on June 20, 2024

fangjun, is the int8 intended for different applications or devices then?

…

On Friday, February 16, 2024, Fangjun Kuang ***@***.***> wrote: The int8 model is obtained via the following code https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L204-L208 Note that it uses https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L207 It is a known issue about onnxruntime that quint8 is slower. For instance, if you search with google, you can find similar issues: - microsoft/onnxruntime#12854 <microsoft/onnxruntime#12854> - microsoft/onnxruntime#6732 <microsoft/onnxruntime#6732> — Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO24SJC2ZHERFOMYLKDYT5HQDAVCNFSM6AAAAABC45NFDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYGMYTONZUHA> . You are receiving this because you commented.Message ID: ***@***.***>

from sherpa-onnx.

csukuangfj commented on June 20, 2024

int8 model mentioned in this issue is about 4x less in file size than that of float32.

If memory matters, then int8 model is preferred.

from sherpa-onnx.

beqabeqa473 commented on June 20, 2024

hi @csukuangfj do you know how to optimize speed of an int8 model? I was experimenting several months ago with it, but i was not able to convert to qint8 and quint8 is really slow on cpu.

from sherpa-onnx.

nshmyrev commented on June 20, 2024

You don't need to optimize speed, you need to pick MB-iSTFT VITS model, they are order of magnitude faster than raw VITS with the same quality.

from sherpa-onnx.

int8 quantized TTS model slower than fp32 about sherpa-onnx HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	quantize_dynamic(
	model_input=filename,
	model_output=filename_int8,
	weight_type=QuantType.QUInt8,
	)