GithubHelp home page GithubHelp logo

Comments (5)

fxmarty avatar fxmarty commented on September 25, 2024

@costigt-dev @Giuseppe5 Brevitas seem to be using Constant for the int8 weights in ONNX, while PyTorch ONNX export / ORT quantizer use Inititializer. I'm not sure if this difference has any importance but just noting.

Can also be reproduced with daryl149/llama-2-7b-chat-hf & transformers==4.38.1

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:15<00:00,  7.82s/it]
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:16<00:00,  7.58it/s]
Perplexity (original model): 14.506609916687012
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/torch/_tensor.py:1394: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1908.)
  return super().rename(names)
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [01:00<00:00,  2.13it/s]
Perplexity (quantized model): 34.405277252197266
Exporting the model to ONNX...
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.2.0+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_length > self.causal_mask.shape[-1]:
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:68: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  training = torch.tensor(training, dtype=torch.bool)
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/export/common/handler/qcdq.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert bools
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:66: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  signed = torch.tensor(signed, dtype=torch.bool)
Saving external data to one file...
Traceback (most recent call last):
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 100, in serialize_proto
    result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 6642034969

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 163, in <module>
    main(args)
  File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 82, in main
    onnx_export_from_model(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 1152, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 763, in export_models
    export(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 868, in export
    export_output = export_pytorch(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 607, in export_pytorch
    onnx.save(
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/__init__.py", line 326, in save_model
    serialized = _get_serializer(format, model_filepath).serialize_proto(proto)
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 103, in serialize_proto
    raise ValueError(
ValueError: The proto size is larger than the 2 GB limit. Please use save_as_external_data to save tensors separately from the model file.

from brevitas.

fxmarty avatar fxmarty commented on September 25, 2024

Note: doing the export with

    export_manager = StdQCDQONNXManager
    export_manager.change_weight_export(export_weight_q_node=True)
    with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=export_manager):

instead of simply

    with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=StdQCDQONNXManager):

fixes the issue. But this is not a good long-term fix as the serialized model is then ~4x bigger.

from brevitas.

Giuseppe5 avatar Giuseppe5 commented on September 25, 2024

Maybe this could be relevant:
onnx/onnx#5949

from brevitas.

Giuseppe5 avatar Giuseppe5 commented on September 25, 2024

PyTorch 2.2 has partially fixed this issue: pytorch/pytorch#111097

The problem in Pytorch <2.2 seems to be that constants are not acounted for in the model size computation.
It would be worth investigating how to mark a value as Initializer rather than a Constant when exporting from Pytorch to ONNX.

cc @costigt-dev

from brevitas.

costigt-dev avatar costigt-dev commented on September 25, 2024

From my investigations there doesn't appear to be any straightforward way to work around this issue in PyTorch 2.1 or below.

from brevitas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.