It's exciting to see the open source of Fast Transformer v3.0. However, I don't find t

[Performance of INT8] Feature requested about fastertransformer HOT 7 CLOSED

nvidia commented on May 14, 2024

[Performance of INT8] Feature requested

from fastertransformer.

Comments (7)

byshiue commented on May 14, 2024

You can find the performance comparison at subsection "Performance on INT8 without quantizing residual connection" in https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#encoder-performance-on-t4-and-tensorflow

from fastertransformer.

Rivendile commented on May 14, 2024

This subsection shows the time and speedup, but it doesn't show the exact match / F1 score for INT8 like https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#performance-on-application-codes-of-tensorflow. Could you please tell me where to find performance on application code for INT8?

from fastertransformer.

byshiue commented on May 14, 2024

https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0/bert-tf-quantization

from fastertransformer.

Rivendile commented on May 14, 2024

Thanks for your timely reply :)

from fastertransformer.

Rivendile commented on May 14, 2024

The quantization in DeepLearningExamples/FasterTransformer/v3.0/bert-tf-quantization is fake quantizaton which uses FP32 to calculate the quantized values. However, the speedup is tested using INT8 which is 8 bits. Are they the same? Or is there something I misunderstand? Looking forward to your reply.

from fastertransformer.

hxbai commented on May 14, 2024

bert-tf-quantization is only for training. You should train a checkpoint and import it with FT tensorflow op. FasterTransformer op does inference in INT8 precision. The whole workflow is in Evaluate the accuracy of FasterTransformer under INT8 part of README.

from fastertransformer.

Rivendile commented on May 14, 2024

Thanks for your reply.
Besides, I would appreciate it if the mechanism and optimizations taken for INT8 will be made more clearly in the README.

from fastertransformer.

Recommend Projects

[Performance of INT8] Feature requested about fastertransformer HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs