GithubHelp home page GithubHelp logo

Support MPS about hqq HOT 5 CLOSED

mobiusml avatar mobiusml commented on August 16, 2024
Support MPS

from hqq.

Comments (5)

mobicham avatar mobicham commented on August 16, 2024

Hi @benglewis,
The ATEN backend is CUDA. Have you tried the Pytorch backend and using device='mps'?

from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer

#Model and setttings
model_id      = 'meta-llama/Llama-2-7b-chat-hf'
compute_dtype = torch.float16
device        = 'mps'

#Load model on the CPU
######################
model     = HQQModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype)
tokenizer = AutoTokenizer.from_pretrained(model_id) 

#Quantize the model
######################
from hqq.core.quantize import *
quant_config = BaseQuantizeConfig(nbits=4, group_size=64)
model.quantize_model(quant_config=quant_config, compute_dtype=compute_dtype, device=device) 

HQQLinear.set_backend(HQQBackend.PYTORCH)  

from hqq.

benglewis avatar benglewis commented on August 16, 2024

Yes, it tried to work without the hqq_aten , but I got an error where some of the code tried to call it. I will try to update when Iā€™m in front of that computer

from hqq.

mobicham avatar mobicham commented on August 16, 2024

It shouldn't call hqq_aten at all if you set the backend to PYTORCH or PYTORCH_COMPILE.
Unfortunately, I don't have an M1 mac to try it out. Let me know!

from hqq.

benglewis avatar benglewis commented on August 16, 2024

So while that worked (in so far as it didn't crash, I didn't wait for it to finish) for quantizing, but I was not able to open an existing already quantized model. Is that known behavior? Here's the error that I got when loading the quantized model:
.../.micromamba/envs/default/lib/python3.10/site-packages/hqq/core/bitpack.py:76: UserWarning: The operator 'aten::__rshift__.Scalar' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)

from hqq.

mobicham avatar mobicham commented on August 16, 2024

Seems like the op is not implemented for the GPU, it's not an error just a warning.

from hqq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.