GithubHelp home page GithubHelp logo

Comments (7)

ahsima1 avatar ahsima1 commented on August 15, 2024 2

Seems like there were breaking changes to the Transformers library branch for LLaMA support recently huggingface/transformers#21955 (comment)

Once I updated the library and converted the weights using the newest version, I was able to run the benchmark.

Median: 0.03748023509979248
PPL: 6.315393447875977
max memory(MiB): 4676.1708984375

from gptq-for-llama.

qwopqwop200 avatar qwopqwop200 commented on August 15, 2024

I cannot reproduce this problem.
It's probably a problem caused by your cuda version being low.

from gptq-for-llama.

ahsima1 avatar ahsima1 commented on August 15, 2024

It also fails for me, but with a different error.
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=1 python llama.py decapoda-research/llama-7b-hf c4 --load llama7b-4bit.pt --benchmark 2048

Benchmarking ...
Traceback (most recent call last):
  File "/home/x/llama/GPTQ-for-LLaMa/llama.py", line 407, in <module>
    benchmark(model, input_ids, check=args.check)
  File "/home/x/llama/GPTQ-for-LLaMa/llama.py", line 306, in benchmark
    out = model(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
    outputs = self.model.decoder(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 624, in forward
    layer_outputs = decoder_layer(
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 317, in forward
    hidden_states = self.feed_forward(hidden_states)
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 103, in forward
    return self.w2(self.act_fn(self.w1(x)) * self.w3(x))
  File "/home/x/.conda/envs/gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/x/llama/GPTQ-for-LLaMa/quant.py", line 165, in forward
    y = self.bias.clone()
RuntimeError: CUDA error: an illegal memory access was encountered

I'm using cuda version 11.7, installed from anaconda

from gptq-for-llama.

qwopqwop200 avatar qwopqwop200 commented on August 15, 2024

I have not been able to reproduce this issue either.
Running using cuda 11.3 and torch 1.12.1+cu11.3 might help.

from gptq-for-llama.

Starlento avatar Starlento commented on August 15, 2024

Just a reference.
4090, WSL2, python 3.10, gcc and g++ 9.5, cuda 11.3, pytorch: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117.
Seems ok.

Median: 0.015739798545837402
PPL: 6.336050987243652
max memory(MiB): 4740.1552734375

from gptq-for-llama.

qwopqwop200 avatar qwopqwop200 commented on August 15, 2024

hmm... i don't know

from gptq-for-llama.

ItsLogic avatar ItsLogic commented on August 15, 2024

Latest commits for transformers and this repo seem to work for me too. Not sure what the previous issue was but its gone now so I can get testing

Median: 0.01520681381225586
PPL: 6.328521251678467
max memory(MiB): 4676.1708984375

from gptq-for-llama.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.