Would it be possible to support Apple M1/M2/M3 hardware via the MPS backend for PyTorc

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Support MPS about hqq HOT 5 CLOSED

mobiusml commented on August 16, 2024

Support MPS

from hqq.

Comments (5)

mobicham commented on August 16, 2024

Hi @benglewis,
The ATEN backend is CUDA. Have you tried the Pytorch backend and using device='mps'?

from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer

#Model and setttings
model_id      = 'meta-llama/Llama-2-7b-chat-hf'
compute_dtype = torch.float16
device        = 'mps'

#Load model on the CPU
######################
model     = HQQModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype)
tokenizer = AutoTokenizer.from_pretrained(model_id) 

#Quantize the model
######################
from hqq.core.quantize import *
quant_config = BaseQuantizeConfig(nbits=4, group_size=64)
model.quantize_model(quant_config=quant_config, compute_dtype=compute_dtype, device=device) 

HQQLinear.set_backend(HQQBackend.PYTORCH)

from hqq.

benglewis commented on August 16, 2024

Yes, it tried to work without the hqq_aten , but I got an error where some of the code tried to call it. I will try to update when I’m in front of that computer

from hqq.

mobicham commented on August 16, 2024

It shouldn't call hqq_aten at all if you set the backend to PYTORCH or PYTORCH_COMPILE.
Unfortunately, I don't have an M1 mac to try it out. Let me know!

from hqq.

benglewis commented on August 16, 2024

So while that worked (in so far as it didn't crash, I didn't wait for it to finish) for quantizing, but I was not able to open an existing already quantized model. Is that known behavior? Here's the error that I got when loading the quantized model:
.../.micromamba/envs/default/lib/python3.10/site-packages/hqq/core/bitpack.py:76: UserWarning: The operator 'aten::__rshift__.Scalar' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)

from hqq.

mobicham commented on August 16, 2024

Seems like the op is not implemented for the GPU, it's not an error just a warning.

from hqq.

Recommend Projects

Support MPS about hqq HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs