When trying to export large models, currently we are forced to export QDQ pattern for

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Note: doing the export with <div class="highlight highlight-source-python notransl

Maybe this could be relevant: <a class="issue-link js-issue-link" data-error-text=

PyTorch 2.2 has partially fixed this issue: <a class="issue-link js-issue-link" data-e

ONNX export of integer weights with large models about brevitas HOT 5 OPEN

Giuseppe5 commented on September 25, 2024

ONNX export of integer weights with large models

from brevitas.

Comments (5)

fxmarty commented on September 25, 2024

@costigt-dev @Giuseppe5 Brevitas seem to be using Constant for the int8 weights in ONNX, while PyTorch ONNX export / ORT quantizer use Inititializer. I'm not sure if this difference has any importance but just noting.

Can also be reproduced with daryl149/llama-2-7b-chat-hf & transformers==4.38.1

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:15<00:00,  7.82s/it]
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:16<00:00,  7.58it/s]
Perplexity (original model): 14.506609916687012
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/torch/_tensor.py:1394: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1908.)
  return super().rename(names)
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [01:00<00:00,  2.13it/s]
Perplexity (quantized model): 34.405277252197266
Exporting the model to ONNX...
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.2.0+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_length > self.causal_mask.shape[-1]:
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:68: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  training = torch.tensor(training, dtype=torch.bool)
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/export/common/handler/qcdq.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert bools
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:66: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  signed = torch.tensor(signed, dtype=torch.bool)
Saving external data to one file...
Traceback (most recent call last):
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 100, in serialize_proto
    result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 6642034969

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 163, in <module>
    main(args)
  File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 82, in main
    onnx_export_from_model(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 1152, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 763, in export_models
    export(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 868, in export
    export_output = export_pytorch(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 607, in export_pytorch
    onnx.save(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/__init__.py", line 326, in save_model
    serialized = _get_serializer(format, model_filepath).serialize_proto(proto)
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 103, in serialize_proto
    raise ValueError(
ValueError: The proto size is larger than the 2 GB limit. Please use save_as_external_data to save tensors separately from the model file.

from brevitas.

fxmarty commented on September 25, 2024

Note: doing the export with

    export_manager = StdQCDQONNXManager
    export_manager.change_weight_export(export_weight_q_node=True)
    with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=export_manager):

instead of simply

    with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=StdQCDQONNXManager):

fixes the issue. But this is not a good long-term fix as the serialized model is then ~4x bigger.

from brevitas.

Giuseppe5 commented on September 25, 2024

Maybe this could be relevant:
onnx/onnx#5949

from brevitas.

Giuseppe5 commented on September 25, 2024

PyTorch 2.2 has partially fixed this issue: pytorch/pytorch#111097

The problem in Pytorch <2.2 seems to be that constants are not acounted for in the model size computation.
It would be worth investigating how to mark a value as Initializer rather than a Constant when exporting from Pytorch to ONNX.

cc @costigt-dev

from brevitas.

costigt-dev commented on September 25, 2024

From my investigations there doesn't appear to be any straightforward way to work around this issue in PyTorch 2.1 or below.

from brevitas.

ONNX export of integer weights with large models about brevitas HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs