Comments (5)
Found intel's results of quantization test of lm-head. There was minimal accuracy loss:
@wenhuach21 Do you know how much ram/vram the intel llama3-8B lm_head quantized test saved vs non-quantized? Here is the untested branch that allows loading of quanted lm-head that I plan to test: https://github.com/Qubitium/AutoGPTQ/tree/sym-false-lm-head combined with intel/auto-round#87
Metric BF16 w4g128 w/o lm-head w4g128 with lm-head qdq
Avg. 0.6352 0.6312 0.6303
mmlu 0.6386 0.6306 0.6318
winogrande 0.7143 0.7238 0.7269
truthfulqa_mc1 0.3623 0.3537 0.3525
rte 0.6751 0.6859 0.6679
piqa 0.7867 0.7797 0.7802
openbookqa 0.3400 0.3300 0.3320
lambada_openai 0.7182 0.7200 0.7173
hellaswag 0.5769 0.5699 0.5701
boolq 0.8297 0.8309 0.8284
arc_easy 0.8152 0.8089 0.8106
arc_challenge 0.5299 0.5102 0.5154
What I know is the model size at W4G128, W/O lm head 5.4G, with lm head 4.7G.
Additionally, if act-order is not enabled or static group is enabled, could Autogptq refrain from dumping the group index into the quantized model, thus conserving some resources
from autogptq.
@XeonKHJ Good question. I will test this tomorrow with intel/auto-round that does offer the ability to quantize lm-head. If there are no inference issues post quantize, I will make it as an option in new PR.
from autogptq.
Found intel's results of quantization test of lm-head. There was minimal accuracy loss:
@wenhuach21 Do you know how much ram/vram the intel llama3-8B lm_head quantized test saved vs non-quantized? Here is the untested branch that allows loading of quanted lm-head that I plan to test: https://github.com/Qubitium/AutoGPTQ/tree/sym-false-lm-head combined with intel/auto-round#87
Metric | BF16 | w4g128 w/o lm-head | w4g128 with lm-head qdq |
---|---|---|---|
Avg. | 0.6352 | 0.6312 | 0.6303 |
mmlu | 0.6386 | 0.6306 | 0.6318 |
winogrande | 0.7143 | 0.7238 | 0.7269 |
truthfulqa_mc1 | 0.3623 | 0.3537 | 0.3525 |
rte | 0.6751 | 0.6859 | 0.6679 |
piqa | 0.7867 | 0.7797 | 0.7802 |
openbookqa | 0.3400 | 0.3300 | 0.3320 |
lambada_openai | 0.7182 | 0.7200 | 0.7173 |
hellaswag | 0.5769 | 0.5699 | 0.5701 |
boolq | 0.8297 | 0.8309 | 0.8284 |
arc_easy | 0.8152 | 0.8089 | 0.8106 |
arc_challenge | 0.5299 | 0.5102 | 0.5154 |
from autogptq.
#648 can now load quantized lm_head
from intel/auto-round
but autogptq quantization of lm-head is still in progress.
from autogptq.
Additionally, if static grouping is not enabled, could Autogptq refrain from dumping the group index into the quantized model, thus conserving some resources.
This is beyond my abilities right now. @fxmarty @LaaZa
from autogptq.
Related Issues (20)
- [PR Ready for Review] [FEATURE] Extend Support for Phi-3
- [FEATURE] Backport vllm expanded Marlin kernel to autogptq. HOT 1
- [DEPRECATION] Discussion on Fused attention and QiGEN HOT 5
- Llama-3 8B Instruct quantized to 8 Bit spits out gibberish in transformers `model.generate()` but works fine in vLLM? HOT 5
- [BUG]safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
- [Question] Differences in quantization logic compared to AWQ
- [FEATURE] ADD SUPPORT DeepSeek-V2 HOT 1
- [BUG] ARM installation error
- [BUG] ROCm installation and building broken
- Target modules [] not found in the base model. Please check the target modules and try again.
- [BUG] Cannot install from source
- [BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. HOT 2
- [FEATURE] Models that support MOE do GPTQ
- [FEATURE] Add marlin24 support
- How to select between different kernels?
- Question about data shape difference between quantization and forward
- [FEATURE] Added code support to 5,6,7 bits quantization can you please add me as contributor I will create a new pull request HOT 4
- [BUG] Quantitative model Yi-1.5-9b-16K does not produce text output.
- How to install auto-gptq in GCC 8.5.0 environment?
- How to get a dequantized model?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autogptq.