Trying to test these models in 4bit but having an issue running the benchmark on the c

Benchmark fails when using 4bit file about gptq-for-llama HOT 7 CLOSED

ItsLogic commented on August 15, 2024

Benchmark fails when using 4bit file

from gptq-for-llama.

Comments (7)

ahsima1 commented on August 15, 2024 2

Seems like there were breaking changes to the Transformers library branch for LLaMA support recently huggingface/transformers#21955 (comment)

Once I updated the library and converted the weights using the newest version, I was able to run the benchmark.

Median: 0.03748023509979248
PPL: 6.315393447875977
max memory(MiB): 4676.1708984375

from gptq-for-llama.

qwopqwop200 commented on August 15, 2024

I cannot reproduce this problem.
It's probably a problem caused by your cuda version being low.

from gptq-for-llama.

ahsima1 commented on August 15, 2024

It also fails for me, but with a different error.
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=1 python llama.py decapoda-research/llama-7b-hf c4 --load llama7b-4bit.pt --benchmark 2048

Benchmarking ...
Traceback (most recent call last):
  File "/home/x/llama/GPTQ-for-LLaMa/llama.py", line 407, in <module>
    benchmark(model, input_ids, check=args.check)
  File "/home/x/llama/GPTQ-for-LLaMa/llama.py", line 306, in benchmark
    out = model(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
    outputs = self.model.decoder(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 624, in forward
    layer_outputs = decoder_layer(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 317, in forward
    hidden_states = self.feed_forward(hidden_states)
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 103, in forward
    return self.w2(self.act_fn(self.w1(x)) * self.w3(x))
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/llama/GPTQ-for-LLaMa/quant.py", line 165, in forward
    y = self.bias.clone()
RuntimeError: CUDA error: an illegal memory access was encountered

I'm using cuda version 11.7, installed from anaconda

from gptq-for-llama.

qwopqwop200 commented on August 15, 2024

I have not been able to reproduce this issue either.
Running using cuda 11.3 and torch 1.12.1+cu11.3 might help.

from gptq-for-llama.

Starlento commented on August 15, 2024

Just a reference.
4090, WSL2, python 3.10, gcc and g++ 9.5, cuda 11.3, pytorch: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117.
Seems ok.

Median: 0.015739798545837402
PPL: 6.336050987243652
max memory(MiB): 4740.1552734375

from gptq-for-llama.

qwopqwop200 commented on August 15, 2024

hmm... i don't know

from gptq-for-llama.

ItsLogic commented on August 15, 2024

Latest commits for transformers and this repo seem to work for me too. Not sure what the previous issue was but its gone now so I can get testing

Median: 0.01520681381225586
PPL: 6.328521251678467
max memory(MiB): 4676.1708984375

from gptq-for-llama.

Benchmark fails when using 4bit file about gptq-for-llama HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs