Hi! When running an old AWQ quant (for example deepseek 1.3B coder), it works fine. Ho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

GEMV_fast error about autoawq HOT 3 OPEN

SinanAkkoyun commented on August 17, 2024

GEMV_fast error

from autoawq.

Comments (3)

casper-hansen commented on August 17, 2024

I haven’t seen this before with 32 tokens, which is odd. I used the same benchmark script. Can you try to install from the main branch?

from autoawq.

SinanAkkoyun commented on August 17, 2024

I did, I installed AutoAWQ today after pulling with pip install -e ., updated transformers etc
I am running with CUDA 12.2, could that be the culprit?

from autoawq.

SinanAkkoyun commented on August 17, 2024

@casper-hansen I also tried it with 12.1 now (docker: pytorch/pytorch:2.2.2-cuda12.1-cudnn8-devel):

Installation:

cd AutoAWQ
pip install transformers
pip install -e .

Same error:

# python examples/benchmark.py --model_path /models/mistral/small-instruct-v0.2/awq/gemv_fast/
 -- Loading model...
Replacing layers...: 100%|████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:01<00:00, 28.65it/s]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
Fusing layers...: 100%|██████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 548.60it/s]
 -- Warming up...
 -- Generating 32 tokens, 32 in context...
Traceback (most recent call last):
  File "/workspace/AutoAWQ/examples/benchmark.py", line 111, in run_round
    context_time, generate_time = generator(model, input_ids, n_generate)
  File "/workspace/AutoAWQ/examples/benchmark.py", line 54, in generate_torch
    out = model(inputs, use_cache=True)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/AutoAWQ/awq/models/base.py", line 108, in forward
    return self.model(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1157, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/AutoAWQ/awq/modules/fused/model.py", line 127, in forward
    h, _, _ = layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/AutoAWQ/awq/modules/fused/block.py", line 130, in forward
    out = h + self.mlp.forward(self.norm_2(h))
  File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 179, in forward
    return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 393, in forward
    return F.silu(input, inplace=self.inplace)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2075, in silu
    return torch._C._nn.silu(input)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/AutoAWQ/examples/benchmark.py", line 210, in <module>
    main(args)
  File "/workspace/AutoAWQ/examples/benchmark.py", line 178, in main
    stats, model_version = run_round(
  File "/workspace/AutoAWQ/examples/benchmark.py", line 117, in run_round
    raise RuntimeError(ex)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

from autoawq.

GEMV_fast error about autoawq HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs