I was looking at the QuantizationProfile node and the related QuantizationProfile inst

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Should we lower QuantizationProfile node? about glow HOT 6 CLOSED

pytorch commented on May 29, 2024

Should we lower QuantizationProfile node?

from glow.

Comments (6)

jfix71 commented on May 29, 2024 1

@qcolombet I'm not sure we need to profile on the backend we will use. I think we can make it work by lowering all operators for profiling on the interpreter. Doing so means we get maximum profiling info about all components of lowered operators. Then if a backend does not want to lower something it can reconstruct what quantization parameters should be used for the unlowered operator from the profile of its components.

As a simple example, right now we profile an FC as unlowered. This means when we quantize we apply the same quantization params to the matmul and the add when lowering it. Instead, we could lower it to matmul and add for profiling so we do get their quantization params. Then if a backend wants to not lower the FC, or use the same quantization params for both, then it can do so. But this way we don't lose information.

from glow.

rdzhabarov commented on May 29, 2024

The idea is to run profiling once per network (use the quantization profile node) and then reuse the profiled data for different backends to quantize the network properly. I think it does not really matter who is going to execute profiling (one backend is enough). There is no plan to support this in other backends, certainly not on the accelerator.

In a long-term, graph profiling for quantization will be just a utility and Glow will load pre-quantized models. But even if we profile ourselves performance efficiency is not a big issue here.

from glow.

jfix71 commented on May 29, 2024

In a long-term, graph profiling for quantization will be just a utility and Glow will load pre-quantized models.

I think we may have some precision issues here for input operators that we lower. For example, let's say we get a quantized LSTM unit as input. The quantization parameters for the LSTM that are passed to Glow give us poor information about the best parameters for each individual component of the lowered LSTM.

One alternative would be to have ONNX lower the LSTM itself and give us the components of the LSTM itself all quantized. However, what if a backend wants to not lower the LSTM? We could try to pattern match and recreate it, but it may be imperfect/difficult.

Perhaps we could design this such that the quantized LSTM unit is passed unlowered into Glow along with quantization parameters for the internal components of the LSTM. I have no idea how easy or feasible this might be. But we could then lower it and know how to quantize its component parts, while still allowing a backend to decide not to lower it in the first place.

from glow.

qcolombet commented on May 29, 2024

There is no plan to support this in other backends, certainly not on the accelerator.

@rdzhabarov Okay, so just supporting this in the interpreter is enough.

@jfix71 raised an interesting issue and it sounds like we would want to run the profiling on the backend we want to use. What is the plan then?

from glow.

qcolombet commented on May 29, 2024

Makes sense to me assuming reconstructing the information is indeed possible.

from glow.

qcolombet commented on May 29, 2024

All right, there is nothing to do here.

Thanks for the feedbacks!

from glow.

Should we lower QuantizationProfile node? about glow HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs