GithubHelp home page GithubHelp logo

Comments (11)

doberst avatar doberst commented on May 28, 2024 1

@shneeba - thanks for testing, and glad that at least it is up and "working" now - yes, please go ahead and close the original issue - and then open up a new one to look at improving performance for Win CUDA with GGUF. Will share a few ideas in the new thread once opened.

from llmware.

doberst avatar doberst commented on May 28, 2024

@shneeba - could you share some details about your Windows environment? Does it have CUDA (Nvidia GPU) ?

from llmware.

shneeba avatar shneeba commented on May 28, 2024

Hey @doberst sure thing, I do indeed:

GPU - RTX 3090TI
Driver Version - 551.76
Processor - AMD Ryzen 9 3900X
RAM - 32Gb

OS - Windows 10 Pro
Version - 22H2
Build version - 19045.4046
Windows Feature Experience Pack 1000.19053.1000.0

from llmware.

doberst avatar doberst commented on May 28, 2024

@shneeba - OK ... I suspect it is the CUDA driver being out of date. I added an option in 0.2.4 to support CUDA on Windows for GGUF), which is automatically loaded if CUDA is detected. Could you check nvcc --version ? Also, you can 'turn off' GPU by setting GGUFConfigs().set_config("use_gpu", False) -> in that case, the GGUF will pull the non-CUDA binary and should run on CPU ...

from llmware.

shneeba avatar shneeba commented on May 28, 2024

Thanks for the quick replies and pointers. I actually didn't have the specific CUDA drivers installed, got this sorted however I'm still seeing the issue (for reference this is my nvcc --version output):

(.venv) PS C:\Users\MYUSERNAME\Documents\projects\llmware> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

I did try setting GGUFConfigs().set_config("use_gpu", False) however I still seem to get the same super vague error from Windows, for reference I just update the agent_multistep_analysis.py with:

from llmware.setup import Setup
from llmware.gguf_configs import GGUFConfigs


def multistep_analysis():

    GGUFConfigs().set_config("use_gpu", False)

    """ In this example, our objective is to research Microsoft history and rivalry in the 1980s with IBM. """

Setting use_gpu to False directly within the gguf_config.py file also didn't seem to fix it.

I'm still digging around but it does feel like something local to my setup somewhere. I've switched back to 0.2.2 for now whilst working on the SLIM model side of things.

from llmware.

doberst avatar doberst commented on May 28, 2024

@shneeba - could you check if you have AVX512 enabled? On Windows machines with AVX512, the GGUF engine seems to be working as expected, but I am able to replicate exactly your error on machines without AVX512 enabled. A good easy way to check is as follows:
pip install cpufeature
import cpufeature
cpufeature.print_features()

Working on a fix .....

from llmware.

shneeba avatar shneeba commented on May 28, 2024

@doberst sounds like this is it, my CPU doesn't support AVX-512, I didn't realise it had got that old! It seems AMD 7xxx series and onwards do. Just for reference:

>>> import cpufeature
>>> cpufeature.print_features()
=== CPU FEATURES ===
    VendorId                : AuthenticAMD
    num_virtual_cores       : 24
    num_physical_cores      : 12
    num_threads_per_core    : 2
    num_cpus                : 1
    cache_line_size         : 64
    cache_L1_size           : 0
    cache_L2_size           : 0
    cache_L3_size           : 0
    OS_x64                  : True
    OS_AVX                  : True
    OS_AVX512               : False
    MMX                     : True
    x64                     : True
    ABM                     : True
    RDRAND                  : True
    BMI1                    : True
    BMI2                    : True
    ADX                     : True
    PREFETCHWT1             : False
    MPX                     : False
    SSE                     : True
    SSE2                    : True
    SSE3                    : True
    SSSE3                   : True
    SSE4.1                  : True
    SSE4.2                  : True
    SSE4.a                  : True
    AES                     : True
    SHA                     : True
    AVX                     : True
    XOP                     : False
    FMA3                    : True
    FMA4                    : False
    AVX2                    : True
    AVX512f                 : False
    AVX512pf                : False
    AVX512er                : False
    AVX512cd                : False
    AVX512vl                : False
    AVX512bw                : False
    AVX512dq                : False
    AVX512ifma              : False
    AVX512vbmi              : False
    AVX512vbmi2             : False
    AVX512vnni              : False
>>>

Thank you 🙇‍♂️

from llmware.

doberst avatar doberst commented on May 28, 2024

@shneeba - hope you had a nice weekend! I have recompiled the gguf engine for windows to use only AVX/AVX2 (and not AVX512 - it seems not uncommon that Windows machines de-activate AVX512 even if the underlying chip supports it) - if you clone the main repo, you will have the fix - small update in the gguf_configs file, and then the new libllama_win.dll binary. If your CUDA drivers are up-to-date, then CUDA should kick-in automatically - if you are getting any errors from that, then please set use_gpu = False, and it will fall back to the CPU only version ...

from llmware.

shneeba avatar shneeba commented on May 28, 2024

@doberst I did thank you, hope you did too (and weren't too deep in this bug)!

You are awesome. This has fixed it and it's working on CPU again, no errors seen. Interesting note about AVX512 on Windows.

It doesn't seem to be picking up my GPU, I've tried setting use_gpu to True or False and sending a larger query to the dragon-yi-6b-gguf model just to be sure and can sadly confirm it still doesn't utilise it. When you say:

If your CUDA drivers are up-to-date, then CUDA should kick-in automatically

Do you happen to know what it's specifically look for? I've got the latest drivers as per my earlier post so I'm unsure what else it requires.

from llmware.

doberst avatar doberst commented on May 28, 2024

@shneeba - I finally got the win-cuda gguf lib to build on CUDA 12.1 - with blazing speed- really awesome. I have merged in the main code with an updated libllama_cuda_win.dll binary (no other changes). I have not yet tested if CUDA 12.2-12.4 will work (hope so). Could you pull the new code and try again - first with 12.4 (fingers crossed), and then fall-back to 12.1 drivers if needed?

from llmware.

shneeba avatar shneeba commented on May 28, 2024

@doberst thanks again for taking more time looking into this. I tried the updated binary and tested with CUDA 12.4 and 12.1 this evening however it's still using the CPU instead of GPU. Is there any additional logging I can enable to maybe catch why it's not using the GPU? I turned llama_cpp_verbose to ON but that didn't give me much info (although not 100% sure what I'm looking for).

As the original error that was blocking the SLIM models from working is fixed would it be better to open a new issue?

from llmware.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.