Please see the below colab that I am using to convert mobilet v2 from pytorch to tflit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Not able to use converter.py to generate pytorch (mobilenet) to tflite(int8-quantized) for mobilenet model using Colab about tinyneuralnetwork HOT 31 CLOSED

alibaba commented on May 2, 2024

Not able to use converter.py to generate pytorch (mobilenet) to tflite(int8-quantized) for mobilenet model using Colab

from tinyneuralnetwork.

Comments (31)

peterjc123 commented on May 2, 2024

@nyadla-sys Thanks for trying out our project. We will have a look soon.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 I like to convert below pytorch mobilenetv2 model to tflite(int8) using TFLiteconverter that is implemented as part of TinyNeuralNetwork GitHub
import torchvision.models as models
model = models.mobilenet_v2(pretrained=True)
model.eval()

and also I like to use below x as Input to the model(# model input (or a tuple for multiple inputs)
x = torch.randn(1, 3, 224, 224, requires_grad=True)

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Yes, you are free to use your model. This is just the example code.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123
Actually i also written script that converts from pytorch to tflite(int8) using below colab
it seems to be tflite(float32) models works as expected ,but tflite(int8 quantized) results are not correct
https://github.com/nyadla-sys/pytorch_2_tflite/blob/main/pytorch_to_onnx_to_tflite(quantized)_with_imagedata.ipynb

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 so I am thinking to use your GitHub to convert from pytorch model to tflite(quantized) model
if you can have colab notebook which does something lke this with your GitHub,that is appreciated

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Could you please tell me how you converted the model to int8 quantized format through TinyNeuralNetwork?

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123
I just changed permission to colab and please use the below link
https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Thanks, I will take a look.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@nyadla-sys Could you please tell me how you converted the model to int8 quantized format through TinyNeuralNetwork?

I have not really started for mobilenet v2 model and I am only trying to do the example that was given as part of TinyNeuralNetwork and I hope it is converting on pytorch mobilenet v1 model

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 it may be good idea to add colab notebooks, which can take few pytorch models and convert them to tflite(quantized).

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Looks like an environment issue. In the colab environment, the namespace examples refer to a package instead of the module in our project. It can be resolved by using sys.path.insert(0, xxx) instead of sys.path.append. Also some __init__.pys are missing.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 if possible could you please create colab and share it with me

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys I've updated the repo so you can run the scripts your shared with a kernel restart

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys BTW, https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/converter/convert.py is the example for converting a PyTorch model to a float32 TFLite model. If you want quantized models, please refer to https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/qat/qat.py.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 When I run below command on colab
!python /content/TinyNeuralNetwork/examples/qat/qat.py

Same error observed ,may be fix needs to be added to qat.py too
Traceback (most recent call last):
File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 7, in
from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet
ModuleNotFoundError: No module named 'examples.models'

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 I made necessary changes and am getting different error while running qat.py
https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

Traceback (most recent call last):
File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 107, in
main_worker(args)
File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 71, in main_worker
context.train_loader, context.val_loader = get_dataloader(args.data_path, 224, args.batch_size, args.workers)
File "/content/TinyNeuralNetwork/examples/qat/../../tinynn/util/cifar10.py", line 45, in get_dataloader
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
File "/usr/local/lib/python3.7/dist-packages/torchvision/datasets/cifar.py", line 69, in init
raise RuntimeError('Dataset not found or corrupted.' +
RuntimeError: Dataset not found or corrupted. You can use download=True to download it

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@peterjc123 When I run below command on colab !python /content/TinyNeuralNetwork/examples/qat/qat.py

Same error observed ,may be fix needs to be added to qat.py too Traceback (most recent call last): File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 7, in from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet ModuleNotFoundError: No module named 'examples.models'

@nyadla-sys Should be fixed.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

it is continue to throw some other errors

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

This default parameters in this script are not designed for running in the colab environment since it has limited resources. You need to lower the batch size and the number of workers to a proper value.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Below is the working copy of qat.py in the colab environment with GPU. I've updated the number of workers to 2, the batch size to 128 and the number of epochs to 1. If you use CPU, please lower the batch size to 96 and you can also set context.max_iteration to speed up training.

import argparse
import os
import sys

CURRENT_PATH = os.path.abspath(os.path.dirname(__file__))

sys.path.insert(1, os.path.join(CURRENT_PATH, '../../'))

import torch
import torch.nn as nn
import torch.optim as optim

from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet
from tinynn.converter import TFLiteConverter
from tinynn.graph.quantization.quantizer import QATQuantizer
from tinynn.graph.tracer import model_tracer
from tinynn.util.cifar10 import get_dataloader, train_one_epoch, validate
from tinynn.util.train_util import DLContext, get_device, train


def main_worker(args):
    with model_tracer():
        model = Mobilenet()
        model.load_state_dict(torch.load(DEFAULT_STATE_DICT))

        # Provide a viable input for the model
        dummy_input = torch.rand((1, 3, 224, 224))

        # TinyNeuralNetwork provides a QATQuantizer class that may rewrite the graph for and perform model fusion for
        # quantization. The model returned by the `quantize` function is ready for QAT.
        # By default, the rewritten model (in the format of a single file) will be generated in the working directory.
        # You may also pass some custom configuration items through the argument `config` in the following line. For
        # example, if you have a QAT-ready model (e.g models in torchvision.models.quantization),
        # then you may use the following line.
        #   quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'rewrite_graph': False})
        # Alternatively, if you have modified the generated model description file and want the quantizer to load it
        # instead, then use the code below.
        #     quantizer = QATQuantizer(
        #         model, dummy_input, work_dir='out', config={'force_overwrite': False, 'is_input_quantized': None}
        #     )
        # The `is_input_quantized` in the previous line is a flag on the input tensors whether they are quantized or
        # not, which can be None (False for all inputs) or a list of booleans that corresponds to the inputs.
        # Also, we support multiple qschemes for quantization preparation. There are several common choices.
        #   a. Asymmetric uint8. (default) config={'asymmetric': True, 'per_tensor': True}
        #      The is the most common choice and also conforms to the legacy TFLite quantization spec.
        #   b. Asymmetric int8. config={'asymmetric': True, 'per_tensor': False}
        #      The conforms to the new TFLite quantization spec. In legacy TF versions, this is usually used in post
        #      quantization. Compared with (a), it has support for per-channel quantization in supported kernels
        #      (e.g Conv), while (a) does not.
        #   c. Symmetric int8. config={'asymmetric': False, 'per_tensor': False}
        #      The is same to (b) with no offsets, which may be used on some low-end embedded chips.
        #   d. Symmetric uint8. config={'asymmetric': False, 'per_tensor': True}
        #      The is same to (a) with no offsets. But it is rarely used, which just serves as a placeholder here.

        quantizer = QATQuantizer(model, dummy_input, work_dir='out')
        qat_model = quantizer.quantize()

    print(qat_model)

    # Use DataParallel to speed up training when possible
    if torch.cuda.device_count() > 1:
        qat_model = nn.DataParallel(qat_model)

    # Move model to the appropriate device
    device = get_device()
    qat_model.to(device=device)

    context = DLContext()
    context.device = device
    context.train_loader, context.val_loader = get_dataloader(args.data_path, 224, args.batch_size, args.workers, download=True)
    context.max_epoch = 1
    context.criterion = nn.BCEWithLogitsLoss()
    context.optimizer = torch.optim.SGD(qat_model.parameters(), 0.01, momentum=0.9, weight_decay=5e-4)
    context.scheduler = optim.lr_scheduler.CosineAnnealingLR(context.optimizer, T_max=context.max_epoch + 1, eta_min=0)

    # Quantization-aware training
    train(qat_model, context, train_one_epoch, validate, qat=True)

    with torch.no_grad():
        qat_model.eval()
        qat_model.cpu()

        # The step below converts the model to an actual quantized model, which uses the quantized kernels.
        qat_model = torch.quantization.convert(qat_model)

        # When converting quantized models, please ensure the quantization backend is set.
        torch.backends.quantized.engine = quantizer.backend

        # The code section below is used to convert the model to the TFLite format
        # If you need a quantized model with a specific data type (e.g. int8)
        # you may specify `quantize_target_type='int8'` in the following line.
        # If you need a quantized model with strict symmetric quantization check (with pre-defined zero points),
        # you may specify `strict_symmetric_check=True` in the following line.
        converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out/qat_model.tflite')
        converter.convert()


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--data-path', metavar='DIR', default="/content/data/datasets/cifar10", help='path to dataset')
    parser.add_argument('--config', type=str, default=os.path.join(CURRENT_PATH, 'config.yml'))
    parser.add_argument('--workers', type=int, default=2)
    parser.add_argument('--batch-size', type=int, default=128)

    args = parser.parse_args()
    main_worker(args)

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@peterjc123 Actually i also written script that converts from pytorch to tflite(int8) using below colab it seems to be tflite(float32) models works as expected ,but tflite(int8 quantized) results are not correct https://github.com/nyadla-sys/pytorch_2_tflite/blob/main/pytorch_to_onnx_to_tflite(quantized)_with_imagedata.ipynb

As can be seen from the script, you are passing in an int8 tensor. But instead, since we have quantize and dequantize nodes around the input and the output tensors. You should pass in the original float values instead. We support removing the quantize and dequantize nodes by passing in fuse_quant_dequant=True while constructing the TFLiteConverter object, but we don't support setting custom input ranges. So, if you remove them, you'll need to quantize the input tensor manually.

BTW, I really think we should provide a full code example like you do. We will take some time to figure out one when we have time.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys https://colab.research.google.com/drive/1P-lpfIcPVgfzfpCQqj3nRiNrnKR9ZoZU?usp=sharing Figured out the example to get a post-quantized model using TinyNeuralNetwork.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 Exccellent work.
Thank you for your time, and providing a working colab helps a lot.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123
While exploring the generated tflite model I found that it has suboptimal quantization parameters(scale,Zp) for fused activations (Relu6) as it doesn’t fully exploit the quantization range of Relu6 [0,6.0].
So I was wondering if there are some optimization/quantization settings that can be used in pytorch to tflite conversion process to optimal generate scale/zp for tflite model.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys It's hard to say that a larger range is an optimal choice because it leads to lower bitwise precision. Usually, accuracy is the more important factor in quantization so I would say wanting a whole range of [0.0, 6.0] may be your preference. To support this particular case, we may need to make the following changes.

Insert QuantStub nodes after Relu6 nodes during QAT graph rewriting.
Disable the observer of those nodes and hardcode their quant_min and quant_max to 0.0 and 6.0
Get rid of the additional Quantize nodes with the optimization passes while converting the model to TFLite

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 thanks for your promt response.

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

@peterjc123 Idea was not to increase the range but to restrict float range [0.0, 6.0] at the output of Conv/DW/FC where Relu6 is fused. As beyond [0.0,6.0] range anyway value will be clip.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Could you please elaborate a little bit? You may use Netron to visualize the TFLite models to get a clearer explanation.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@nyadla-sys Okay, I think I understand your problem.

>>> import torch
>>> a = torch.quantize_per_tensor(torch.zeros(1,3,224,224), torch.tensor(0.5), torch.tensor(128), torch.quint8)
>>> torch.nn.ReLU6()(a)
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], size=(1, 3, 224, 224),
       dtype=torch.quint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.5, zero_point=128)

It doesn't generate a new set of quantization params after going through the ReLU6 nodes. However, the problem will only exist when the activations are not fused. Consider the pattern, conv-bn-relu, it will be replaced with a new module ConvBnReLU2d so the observer is fused too. So it seems that we need to add a QuantStub node after every isolated activation node and clamping function that has certain value ranges (e.g. ReLU, ReLU6, torch.{clamp,hardtanh,minimum,maximum}).

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

With 5044f77, it should try to fuse the activations (e.g. relu6) with nodes that supports re-quantization. @nyadla-sys

from tinyneuralnetwork.

nyadla-sys commented on May 2, 2024

thanks @peterjc123

from tinyneuralnetwork.

Not able to use converter.py to generate pytorch (mobilenet) to tflite(int8-quantized) for mobilenet model using Colab about tinyneuralnetwork HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs