GithubHelp home page GithubHelp logo

Comments (5)

ViktorThink avatar ViktorThink commented on May 20, 2024 1

Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html

My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.

from fastt5.

Ki6an avatar Ki6an commented on May 20, 2024

I've not tested the library for t5-11b. I'm glad that you were able to export the model by adding use_external_data_format=True .
I suggest you do the same for quantizing as well.

https://github.com/microsoft/onnxruntime/blob/add4e4225ba69ba48a28889ff91e65bbc5f6f2ca/onnxruntime/python/tools/quantization/quantize.py#L260

and also make sure that you have enough memory.

from fastt5.

ViktorThink avatar ViktorThink commented on May 20, 2024

Thank you for getting back, it's highly appreciated.

I tried adding use_external_data_format=True to quantize_dynamic:

quantize_dynamic(
            model_input=model_name,
            model_output=output_model_name,
            per_channel=True,
            activation_type=QuantType.QUInt8,
            weight_type=QuantType.QUInt8,
            optimize_model=False,
            use_external_data_format=True
        )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],

Still get the exact same error:

ValueError                                Traceback (most recent call last)
<ipython-input-4-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

A bit strange though, since in the documentation you sent it says that setting use_external_data_format=True should solve this error...

from fastt5.

Ki6an avatar Ki6an commented on May 20, 2024

it is strange indeed! The problem seems to be in the onnxruntime library. you could follow this issue and try to solve the problem. if this does not help then, I suggest you create a new issue in onnxruntime regarding this issue.

from fastt5.

samanz avatar samanz commented on May 20, 2024

I'm getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the infer_shapes method doesn't work with large models, and is supposed to be replaced with infer_shapes_path. So that would need to be fixed in the onnxruntime project. I modified the code in onnx_quantizer to look like:

        onnx.shape_inference.infer_shapes_path(model_name, model_name + ".inferred")
        model = onnx.load(model_name + ".inferred")

while passing in a model_name to the method as well. The code was able to get pass the shape inference step, but failed with this information now:

Quantizing... |##########                      | 1/3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e72945460842> in <module>
      1 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
----> 2 quant_model_paths = quantize(onnx_model_paths)
      3 
      4 # step 3. setup onnx runtime
      5 model_sessions = get_onnx_runtime_sessions(quant_model_paths)

~/.local/lib/python3.6/site-packages/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True,
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    281         op_types_to_quantize)
    282 
--> 283     quantizer.quantize_model()
    284     quantizer.model.save_model_to_file(model_output, use_external_data_format)
    285 

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_model(self)
    195                 op_quantizer = CreateDefaultOpQuantizer(self, node)
    196 
--> 197             op_quantizer.quantize()
    198 
    199         self._dequantize_outputs()

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/operators/matmul.py in quantize(self)
     17 
     18         (quantized_input_names, zero_point_names, scale_names, nodes) = \
---> 19             self.quantizer.quantize_inputs(node, [0, 1])
     20 
     21         matmul_integer_output = node.output[0] + "_output_quantized"

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_inputs(self, node, indices, initializer_use_weight_qType)
    613             if initializer is not None:
    614                 q_weight_name, zp_name, scale_name = self.quantize_weight(
--> 615                     initializer, self.weight_qType if initializer_use_weight_qType else self.input_qType)
    616 
    617                 quantized_input_names.append(q_weight_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_weight(self, weight, qType)
    654 
    655         # Update packed weight, zero point, and scale initializers
--> 656         weight_data = self.tensor_proto_to_array(weight)
    657         _, _, zero_point, scale, q_weight_data = quantize_data(weight_data.flatten().tolist(),
    658                                                                get_qrange_for_qType(qType, self.reduce_range), qType)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in tensor_proto_to_array(initializer)
    215     def tensor_proto_to_array(initializer):
    216         if initializer.data_type == onnx_proto.TensorProto.FLOAT:
--> 217             weights = onnx.numpy_helper.to_array(initializer)
    218         else:
    219             raise ValueError('Only float type quantization is supported. Weights {} is {}. '.format(

~/.local/lib/python3.6/site-packages/onnx/numpy_helper.py in to_array(tensor)
     52         return np.frombuffer(
     53             tensor.raw_data,
---> 54             dtype=np_dtype).reshape(dims)
     55     else:
     56         data = getattr(tensor, storage_field),  # type: Sequence[np.complex64]

ValueError: cannot reshape array of size 16777216 into shape (1024,4096)

from fastt5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.