``First of all, this seems like a great repo that I was super excited to find! <p

t5-11b out of memory/FileNotFoundError about fastt5 HOT 5 CLOSED

ki6an commented on May 20, 2024

t5-11b out of memory/FileNotFoundError

from fastt5.

Comments (5)

ViktorThink commented on May 20, 2024 1

Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html

My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.

from fastt5.

Ki6an commented on May 20, 2024

I've not tested the library for t5-11b. I'm glad that you were able to export the model by adding use_external_data_format=True .
I suggest you do the same for quantizing as well.

https://github.com/microsoft/onnxruntime/blob/add4e4225ba69ba48a28889ff91e65bbc5f6f2ca/onnxruntime/python/tools/quantization/quantize.py#L260

and also make sure that you have enough memory.

from fastt5.

ViktorThink commented on May 20, 2024

Thank you for getting back, it's highly appreciated.

I tried adding use_external_data_format=True to quantize_dynamic:

quantize_dynamic(
            model_input=model_name,
            model_output=output_model_name,
            per_channel=True,
            activation_type=QuantType.QUInt8,
            weight_type=QuantType.QUInt8,
            optimize_model=False,
            use_external_data_format=True
        )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],

Still get the exact same error:

ValueError                                Traceback (most recent call last)
<ipython-input-4-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

A bit strange though, since in the documentation you sent it says that setting use_external_data_format=True should solve this error...

from fastt5.

Ki6an commented on May 20, 2024

it is strange indeed! The problem seems to be in the onnxruntime library. you could follow this issue and try to solve the problem. if this does not help then, I suggest you create a new issue in onnxruntime regarding this issue.

from fastt5.

samanz commented on May 20, 2024

I'm getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the infer_shapes method doesn't work with large models, and is supposed to be replaced with infer_shapes_path. So that would need to be fixed in the onnxruntime project. I modified the code in onnx_quantizer to look like:

        onnx.shape_inference.infer_shapes_path(model_name, model_name + ".inferred")
        model = onnx.load(model_name + ".inferred")

while passing in a model_name to the method as well. The code was able to get pass the shape inference step, but failed with this information now:

Quantizing... |##########                      | 1/3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e72945460842> in <module>
      1 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
----> 2 quant_model_paths = quantize(onnx_model_paths)
      3 
      4 # step 3. setup onnx runtime
      5 model_sessions = get_onnx_runtime_sessions(quant_model_paths)

~/.local/lib/python3.6/site-packages/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True,
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    281         op_types_to_quantize)
    282 
--> 283     quantizer.quantize_model()
    284     quantizer.model.save_model_to_file(model_output, use_external_data_format)
    285 

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_model(self)
    195                 op_quantizer = CreateDefaultOpQuantizer(self, node)
    196 
--> 197             op_quantizer.quantize()
    198 
    199         self._dequantize_outputs()

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/operators/matmul.py in quantize(self)
     17 
     18         (quantized_input_names, zero_point_names, scale_names, nodes) = \
---> 19             self.quantizer.quantize_inputs(node, [0, 1])
     20 
     21         matmul_integer_output = node.output[0] + "_output_quantized"

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_inputs(self, node, indices, initializer_use_weight_qType)
    613             if initializer is not None:
    614                 q_weight_name, zp_name, scale_name = self.quantize_weight(
--> 615                     initializer, self.weight_qType if initializer_use_weight_qType else self.input_qType)
    616 
    617                 quantized_input_names.append(q_weight_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_weight(self, weight, qType)
    654 
    655         # Update packed weight, zero point, and scale initializers
--> 656         weight_data = self.tensor_proto_to_array(weight)
    657         _, _, zero_point, scale, q_weight_data = quantize_data(weight_data.flatten().tolist(),
    658                                                                get_qrange_for_qType(qType, self.reduce_range), qType)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in tensor_proto_to_array(initializer)
    215     def tensor_proto_to_array(initializer):
    216         if initializer.data_type == onnx_proto.TensorProto.FLOAT:
--> 217             weights = onnx.numpy_helper.to_array(initializer)
    218         else:
    219             raise ValueError('Only float type quantization is supported. Weights {} is {}. '.format(

~/.local/lib/python3.6/site-packages/onnx/numpy_helper.py in to_array(tensor)
     52         return np.frombuffer(
     53             tensor.raw_data,
---> 54             dtype=np_dtype).reshape(dims)
     55     else:
     56         data = getattr(tensor, storage_field),  # type: Sequence[np.complex64]

ValueError: cannot reshape array of size 16777216 into shape (1024,4096)

from fastt5.

t5-11b out of memory/FileNotFoundError about fastt5 HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs