In a few examples (<a href="https://xilinx.github.io/brevitas/getting_started.html" re

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

With input_quant export to FINN doesnt work: <div

Does there have to be QuantIdentity as a first layer? about brevitas HOT 8 CLOSED

phixerino commented on June 24, 2024

Does there have to be QuantIdentity as a first layer?

from brevitas.

Comments (8)

fabianandresgrob commented on June 24, 2024 1

Thank you, it does help a lot. I'm guessing that the quantized input to the next QuantConv2d layer is done by QuantReLU, because it has return_quant_tensor=True. Is there any advantage to have return_quant_tensor=True also in the QuantConv2d layer?

Also if I set input_quant=Int8ActPerTensorFloat, does it matter if the input is in range 0-1 or 0-255?

In fact, it does make a difference. Cmp. this tutorial, especially cell 13 onwards. The results of a QuantConv layer with return_quant_tensor disabled and followed by a QuantReLU will slightly be different from QuantConv with return_quant_tensor enabled followed by a QuantReLU. This is because one usually quantizes the output of the Conv layer with an 8-bit signed quantizer. However, ReLU can exploit unsigned quantization, as the output will be > 0 anyways. So if you quantize the output of the conv layer with a signed quantizer and then apply the QuantReLU, you lose half of the range as we strip all negative values. The output will then be re-quantized using an unsigned quantizer. If we had used 8 bits for both, we basically lose 1 bit, as the conv output would be in range -127 to 127, after ReLU it would be 0 to 127. But an unsigned 8-bit quantizer could have used the range 0 to 255.

Similarly, applying quantization to the range 0-1 vs. 0-255 makes a difference. I'd suggest go with the usual 0-1 range for the input data.

With input_quant export to FINN doesnt work:
File /opt/conda/lib/python3.10/site-packages/brevitas/export/onnx/manager.py:121, in ONNXBaseManager.export_onnx(cls, module, args, export_path, input_shape, input_t, disable_warnings, **onnx_export_kwargs)
...
---> 30     assert not module.is_input_quant_enabled
     31     assert not module.is_output_quant_enabled
     32     if module.is_bias_quant_enabled:

AssertionError: 
So instead I used qnn.QuantIdentity(bit_width=first_layer_weight_bit_width, return_quant_tensor=True) and it works.

I'm using 4-bit to quantize my weight and activations, but as in the examples, I'm using 8-bit for the first and last layer. Now when I'm using 8-bit in the QuantIdentity layer, should I still use 8-bits in my first QuantConv2d layer?

Sorry for loads of questions, but I really appreciate the answers.

Yes, the QuantIdentity quantizes the input for the first layer, however, it does not quantize the weights of the first QuantConv2d to 8 bits. So if you want to quantize your first conv layer to 8 bits, you need to use 8-bits for that layer.

No worries :)

from brevitas.

fabianandresgrob commented on June 24, 2024

Hi @phixerino,

Thanks for your question. The QuantIdentity is merely quantizing a tensor that you put in. In other words, it is calculating the quantization parameters for your input and returns a QuantTensor (if you set return_quant_tensor=True).
If you specify a quantizer for the layer, i.e. QuantLinear(2, 4, input_quant=Int8ActPerTensorFloat, bias=False), you don't need to use the QuantIdentity layer. If you don't specify the input_quant in the layer, then you should use the QuantIdentity layer. Algorithmically, these two options are doing the exact same thing (given you choose the same quantizer). We provide these options as they can make a difference when exporting the model, i.e. to ONNX format. So it really depends on your use case if it's beneficial or not.
You can see this in more detail here or here.

Data type should be the same :)

from brevitas.

phixerino commented on June 24, 2024

Thank you, I understand. But why isnt there the input_quant=Int8ActPerTensorFloat in the first layer of ResNet or any of the ImageNet examples?

from brevitas.

fabianandresgrob commented on June 24, 2024

Currently, when quantizing models using src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py, these settings are applied when the method quantize_model() is called. Basically, the original layers are replaced by their quant counterparts and the input_quant is set according to the configuration you pass. You can check out this method to see how it is done, using the debugger to go through it helps.
We are working to expose these methods and provide an easier workflow.

from brevitas.

phixerino commented on June 24, 2024

I see. So when I want to use QAT I need to put input_quant=Int8ActPerTensorFloat? And if I didnt, then the input to my first layer would not be quantized, right? Then how does the quantization work when the weights of the layer are quantized? I am trying to figure out how would that impact the speed of model inference on FPGA.

from brevitas.

fabianandresgrob commented on June 24, 2024

Usually, you would want your input to be quantized, so you need to specify input_quant or use QuantIdentity. In this example, the input is expected to be already quantized to 8 bits. If you don't quantize your input, you'll basically multiply non-restricted floating point inputs with your quantized weights, leading to a higher bit-width output. That means storing intermediate outputs need a higher bit width. Usually the workflow is to quantize your input and weights, do the forward pass, and then quantize the output again, so it is the quantized input for the next layer. Hope that helps.

from brevitas.

phixerino commented on June 24, 2024

Thank you, it does help a lot. I'm guessing that the quantized input to the next QuantConv2d layer is done by QuantReLU, because it has return_quant_tensor=True. Is there any advantage to have return_quant_tensor=True also in the QuantConv2d layer?

Also if I set input_quant=Int8ActPerTensorFloat, does it matter if the input is in range 0-1 or 0-255?

from brevitas.

phixerino commented on June 24, 2024

With input_quant export to FINN doesnt work:

File /opt/conda/lib/python3.10/site-packages/brevitas/export/onnx/manager.py:121, in ONNXBaseManager.export_onnx(cls, module, args, export_path, input_shape, input_t, disable_warnings, **onnx_export_kwargs)
...
---> 30     assert not module.is_input_quant_enabled
     31     assert not module.is_output_quant_enabled
     32     if module.is_bias_quant_enabled:

AssertionError:

So instead I used qnn.QuantIdentity(bit_width=first_layer_weight_bit_width, return_quant_tensor=True) and it works.

I'm using 4-bit to quantize my weight and activations, but as in the examples, I'm using 8-bit for the first and last layer. Now when I'm using 8-bit in the QuantIdentity layer, should I still use 8-bits in my first QuantConv2d layer?

Sorry for loads of questions, but I really appreciate the answers.

from brevitas.

Does there have to be QuantIdentity as a first layer? about brevitas HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs