GithubHelp home page GithubHelp logo

Comments (8)

Andrei-Aksionov avatar Andrei-Aksionov commented on June 9, 2024 1

Thanks a lot @fxmarty.

I only noticed that the package with Marlin kernel is installed even on a system that doesn't support it, e.g. T4. Which is fine. But I would like that a user is notified about this incompatibility.
The error appears only when I try to run forward pass, not during layer initialization. (And the error message doesn't tell me what is exactly wrong.)

import torch
from auto_gptq.nn_modules.qlinear.qlinear_marlin import QuantLinear

x = torch.rand((1, 1, 128), device="cuda").half()
layer = QuantLinear(4, 128, 128, 256, False).to("cuda")
layer(x)
>> 
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

So perhaps it's better to add a check inside init method, something like this:

import subprocess

output = subprocess.check_output("nvidia-smi --query-gpu=compute_cap --format=csv,noheader".split())
compute = float(output.strip().decode())
if compute < 8.0:
    raise NotImplementedError("[Error message]")

Or maybe I misunderstood something? 🤷‍♂️

from autogptq.

fxmarty avatar fxmarty commented on June 9, 2024 1

Good call! Currently the check is in from_quantized

if not torch.cuda.get_device_capability()[0] >= 8:
raise ValueError(f'Can not use Marlin int4*fp16 kernel with a device of compute capability {torch.cuda.get_device_capability()}, the minimum compute capability is 8.0 for Marlin kernel. Please do not use `use_marlin=True`, or please upgrade your GPU ("The more you buy, the more you save." - Taiwanese proverb).')
but it should probably be in marlin's QuantLinear init.

from autogptq.

Andrei-Aksionov avatar Andrei-Aksionov commented on June 9, 2024 1

I knew that there should be an easier way to check compute capability 🙂.

Yes, in my opinion, this check belongs to the QuantLinear class.

"The more you buy, the more you save." - Taiwanese proverb

Now I know :)

from autogptq.

fxmarty avatar fxmarty commented on June 9, 2024

Thank you. I am planning to make a release hopefully this week, with Marlin kernel & built against PyTorch 2.2. Still need to add guards against __CUDA_ARCH__ in marlin codebase as it can only be compiled for compute capability >=8.0.

I'll add a fast repacking first as well.

from autogptq.

Andrei-Aksionov avatar Andrei-Aksionov commented on June 9, 2024

Awesome news!
Thanks a lot 🤗

from autogptq.

fxmarty avatar fxmarty commented on June 9, 2024

Hi @Andrei-Aksionov, AutoGPTQ 0.7.0 is released, check out https://github.com/AutoGPTQ/AutoGPTQ/releases/tag/v0.7.0 & https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation!

from autogptq.

fxmarty avatar fxmarty commented on June 9, 2024

@Andrei-Aksionov Added a check in #567, I'll likely do a patch this week.

from autogptq.

Andrei-Aksionov avatar Andrei-Aksionov commented on June 9, 2024

Thanks!

from autogptq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.