GithubHelp home page GithubHelp logo

Comments (8)

fabianandresgrob avatar fabianandresgrob commented on June 24, 2024 1

Thank you, it does help a lot. I'm guessing that the quantized input to the next QuantConv2d layer is done by QuantReLU, because it has return_quant_tensor=True. Is there any advantage to have return_quant_tensor=True also in the QuantConv2d layer?

Also if I set input_quant=Int8ActPerTensorFloat, does it matter if the input is in range 0-1 or 0-255?

In fact, it does make a difference. Cmp. this tutorial, especially cell 13 onwards. The results of a QuantConv layer with return_quant_tensor disabled and followed by a QuantReLU will slightly be different from QuantConv with return_quant_tensor enabled followed by a QuantReLU. This is because one usually quantizes the output of the Conv layer with an 8-bit signed quantizer. However, ReLU can exploit unsigned quantization, as the output will be > 0 anyways. So if you quantize the output of the conv layer with a signed quantizer and then apply the QuantReLU, you lose half of the range as we strip all negative values. The output will then be re-quantized using an unsigned quantizer. If we had used 8 bits for both, we basically lose 1 bit, as the conv output would be in range -127 to 127, after ReLU it would be 0 to 127. But an unsigned 8-bit quantizer could have used the range 0 to 255.

Similarly, applying quantization to the range 0-1 vs. 0-255 makes a difference. I'd suggest go with the usual 0-1 range for the input data.

With input_quant export to FINN doesnt work:

File /opt/conda/lib/python3.10/site-packages/brevitas/export/onnx/manager.py:121, in ONNXBaseManager.export_onnx(cls, module, args, export_path, input_shape, input_t, disable_warnings, **onnx_export_kwargs)
...
---> 30     assert not module.is_input_quant_enabled
     31     assert not module.is_output_quant_enabled
     32     if module.is_bias_quant_enabled:

AssertionError: 

So instead I used qnn.QuantIdentity(bit_width=first_layer_weight_bit_width, return_quant_tensor=True) and it works.

I'm using 4-bit to quantize my weight and activations, but as in the examples, I'm using 8-bit for the first and last layer. Now when I'm using 8-bit in the QuantIdentity layer, should I still use 8-bits in my first QuantConv2d layer?

Sorry for loads of questions, but I really appreciate the answers.

Yes, the QuantIdentity quantizes the input for the first layer, however, it does not quantize the weights of the first QuantConv2d to 8 bits. So if you want to quantize your first conv layer to 8 bits, you need to use 8-bits for that layer.

No worries :)

from brevitas.

fabianandresgrob avatar fabianandresgrob commented on June 24, 2024

Hi @phixerino,

Thanks for your question. The QuantIdentity is merely quantizing a tensor that you put in. In other words, it is calculating the quantization parameters for your input and returns a QuantTensor (if you set return_quant_tensor=True).
If you specify a quantizer for the layer, i.e. QuantLinear(2, 4, input_quant=Int8ActPerTensorFloat, bias=False), you don't need to use the QuantIdentity layer. If you don't specify the input_quant in the layer, then you should use the QuantIdentity layer. Algorithmically, these two options are doing the exact same thing (given you choose the same quantizer). We provide these options as they can make a difference when exporting the model, i.e. to ONNX format. So it really depends on your use case if it's beneficial or not.
You can see this in more detail here or here.

Data type should be the same :)

from brevitas.

phixerino avatar phixerino commented on June 24, 2024

Thank you, I understand. But why isnt there the input_quant=Int8ActPerTensorFloat in the first layer of ResNet or any of the ImageNet examples?

from brevitas.

fabianandresgrob avatar fabianandresgrob commented on June 24, 2024

Currently, when quantizing models using src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py, these settings are applied when the method quantize_model() is called. Basically, the original layers are replaced by their quant counterparts and the input_quant is set according to the configuration you pass. You can check out this method to see how it is done, using the debugger to go through it helps.
We are working to expose these methods and provide an easier workflow.

from brevitas.

phixerino avatar phixerino commented on June 24, 2024

I see. So when I want to use QAT I need to put input_quant=Int8ActPerTensorFloat? And if I didnt, then the input to my first layer would not be quantized, right? Then how does the quantization work when the weights of the layer are quantized? I am trying to figure out how would that impact the speed of model inference on FPGA.

from brevitas.

fabianandresgrob avatar fabianandresgrob commented on June 24, 2024

Usually, you would want your input to be quantized, so you need to specify input_quant or use QuantIdentity. In this example, the input is expected to be already quantized to 8 bits. If you don't quantize your input, you'll basically multiply non-restricted floating point inputs with your quantized weights, leading to a higher bit-width output. That means storing intermediate outputs need a higher bit width. Usually the workflow is to quantize your input and weights, do the forward pass, and then quantize the output again, so it is the quantized input for the next layer. Hope that helps.

from brevitas.

phixerino avatar phixerino commented on June 24, 2024

Thank you, it does help a lot. I'm guessing that the quantized input to the next QuantConv2d layer is done by QuantReLU, because it has return_quant_tensor=True. Is there any advantage to have return_quant_tensor=True also in the QuantConv2d layer?

Also if I set input_quant=Int8ActPerTensorFloat, does it matter if the input is in range 0-1 or 0-255?

from brevitas.

phixerino avatar phixerino commented on June 24, 2024

With input_quant export to FINN doesnt work:

File /opt/conda/lib/python3.10/site-packages/brevitas/export/onnx/manager.py:121, in ONNXBaseManager.export_onnx(cls, module, args, export_path, input_shape, input_t, disable_warnings, **onnx_export_kwargs)
...
---> 30     assert not module.is_input_quant_enabled
     31     assert not module.is_output_quant_enabled
     32     if module.is_bias_quant_enabled:

AssertionError: 

So instead I used qnn.QuantIdentity(bit_width=first_layer_weight_bit_width, return_quant_tensor=True) and it works.

I'm using 4-bit to quantize my weight and activations, but as in the examples, I'm using 8-bit for the first and last layer. Now when I'm using 8-bit in the QuantIdentity layer, should I still use 8-bits in my first QuantConv2d layer?

Sorry for loads of questions, but I really appreciate the answers.

from brevitas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.