GithubHelp home page GithubHelp logo

Comments (5)

wenhuach21 avatar wenhuach21 commented on June 17, 2024 2

Found intel's results of quantization test of lm-head. There was minimal accuracy loss:

@wenhuach21 Do you know how much ram/vram the intel llama3-8B lm_head quantized test saved vs non-quantized? Here is the untested branch that allows loading of quanted lm-head that I plan to test: https://github.com/Qubitium/AutoGPTQ/tree/sym-false-lm-head combined with intel/auto-round#87

https://github.com/intel/auto-round/blob/8a3da144423322dfedb0b3fa702ae35d242496d8/docs/Meta-Llama-3-8B-Instruct-acc.md?plain=1#L3

Metric BF16 w4g128 w/o lm-head w4g128 with lm-head qdq
Avg. 0.6352 0.6312 0.6303
mmlu 0.6386 0.6306 0.6318
winogrande 0.7143 0.7238 0.7269
truthfulqa_mc1 0.3623 0.3537 0.3525
rte 0.6751 0.6859 0.6679
piqa 0.7867 0.7797 0.7802
openbookqa 0.3400 0.3300 0.3320
lambada_openai 0.7182 0.7200 0.7173
hellaswag 0.5769 0.5699 0.5701
boolq 0.8297 0.8309 0.8284
arc_easy 0.8152 0.8089 0.8106
arc_challenge 0.5299 0.5102 0.5154

What I know is the model size at W4G128, W/O lm head 5.4G, with lm head 4.7G.

Additionally, if act-order is not enabled or static group is enabled, could Autogptq refrain from dumping the group index into the quantized model, thus conserving some resources

from autogptq.

Qubitium avatar Qubitium commented on June 17, 2024

@XeonKHJ Good question. I will test this tomorrow with intel/auto-round that does offer the ability to quantize lm-head. If there are no inference issues post quantize, I will make it as an option in new PR.

from autogptq.

Qubitium avatar Qubitium commented on June 17, 2024

Found intel's results of quantization test of lm-head. There was minimal accuracy loss:

@wenhuach21 Do you know how much ram/vram the intel llama3-8B lm_head quantized test saved vs non-quantized? Here is the untested branch that allows loading of quanted lm-head that I plan to test: https://github.com/Qubitium/AutoGPTQ/tree/sym-false-lm-head combined with intel/auto-round#87

https://github.com/intel/auto-round/blob/8a3da144423322dfedb0b3fa702ae35d242496d8/docs/Meta-Llama-3-8B-Instruct-acc.md?plain=1#L3

Metric BF16 w4g128 w/o lm-head w4g128 with lm-head qdq
Avg. 0.6352 0.6312 0.6303
mmlu 0.6386 0.6306 0.6318
winogrande 0.7143 0.7238 0.7269
truthfulqa_mc1 0.3623 0.3537 0.3525
rte 0.6751 0.6859 0.6679
piqa 0.7867 0.7797 0.7802
openbookqa 0.3400 0.3300 0.3320
lambada_openai 0.7182 0.7200 0.7173
hellaswag 0.5769 0.5699 0.5701
boolq 0.8297 0.8309 0.8284
arc_easy 0.8152 0.8089 0.8106
arc_challenge 0.5299 0.5102 0.5154

from autogptq.

Qubitium avatar Qubitium commented on June 17, 2024

#648 can now load quantized lm_head from intel/auto-round but autogptq quantization of lm-head is still in progress.

from autogptq.

Qubitium avatar Qubitium commented on June 17, 2024

Additionally, if static grouping is not enabled, could Autogptq refrain from dumping the group index into the quantized model, thus conserving some resources.

This is beyond my abilities right now. @fxmarty @LaaZa

from autogptq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.