Comments (5)
@costigt-dev @Giuseppe5 Brevitas seem to be using Constant
for the int8 weights in ONNX, while PyTorch ONNX export / ORT quantizer use Inititializer
. I'm not sure if this difference has any importance but just noting.
Can also be reproduced with daryl149/llama-2-7b-chat-hf
& transformers==4.38.1
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:15<00:00, 7.82s/it]
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:16<00:00, 7.58it/s]
Perplexity (original model): 14.506609916687012
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/torch/_tensor.py:1394: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1908.)
return super().rename(names)
Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [01:00<00:00, 2.13it/s]
Perplexity (quantized model): 34.405277252197266
Exporting the model to ONNX...
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.0+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:68: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
training = torch.tensor(training, dtype=torch.bool)
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/export/common/handler/qcdq.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert bools
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/brevitas/quant_tensor/__init__.py:66: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
signed = torch.tensor(signed, dtype=torch.bool)
Saving external data to one file...
Traceback (most recent call last):
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 100, in serialize_proto
result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 6642034969
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 163, in <module>
main(args)
File "/home/felix/optimum-amd/examples/quantization/brevitas/quantize_llm.py", line 82, in main
onnx_export_from_model(
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 1152, in onnx_export_from_model
_, onnx_outputs = export_models(
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 763, in export_models
export(
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 868, in export
export_output = export_pytorch(
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/optimum/exporters/onnx/convert.py", line 607, in export_pytorch
onnx.save(
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/__init__.py", line 326, in save_model
serialized = _get_serializer(format, model_filepath).serialize_proto(proto)
File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/serialization.py", line 103, in serialize_proto
raise ValueError(
ValueError: The proto size is larger than the 2 GB limit. Please use save_as_external_data to save tensors separately from the model file.
from brevitas.
Note: doing the export with
export_manager = StdQCDQONNXManager
export_manager.change_weight_export(export_weight_q_node=True)
with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=export_manager):
instead of simply
with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=StdQCDQONNXManager):
fixes the issue. But this is not a good long-term fix as the serialized model is then ~4x bigger.
from brevitas.
Maybe this could be relevant:
onnx/onnx#5949
from brevitas.
PyTorch 2.2 has partially fixed this issue: pytorch/pytorch#111097
The problem in Pytorch <2.2 seems to be that constants are not acounted for in the model size computation.
It would be worth investigating how to mark a value as Initializer rather than a Constant when exporting from Pytorch to ONNX.
cc @costigt-dev
from brevitas.
From my investigations there doesn't appear to be any straightforward way to work around this issue in PyTorch 2.1 or below.
from brevitas.
Related Issues (20)
- Question: Unsigned Quantization HOT 3
- Implement context-manager based export
- Missing Proxy tests
- Export ONNX QOperator HOT 5
- Fix Value Tracer
- Activation Equalization co-optimize flag
- Update entrypoint for LLM
- Add squeeze / unsqueeze operations to quant invariant functions in `torch_handler.py` HOT 4
- Add support for minifloat ptq with fx backend on residual models
- Implement `torch.where` STE for minifloat clamping
- Remove maximum assumptions about NaN/inf values for minifloat configurations
- Change way of setting `NaN` and `inf` values for custom minifloat formats
- Update signature check
- Deprecate use of MacOS (Darwin) runners in CI
- Adding tests for "quantize" function for CNN PTQ HOT 7
- Call for better/more documentation
- Per-channel zero points but per-tensor scales HOT 6
- Documentation setup thoughts HOT 3
- update dependencies=2.0.1 requirement HOT 4
- Mac OSX Tests for `torch==1.9.1` fail when installing dependencies HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brevitas.