Comments (11)
@shneeba - thanks for testing, and glad that at least it is up and "working" now - yes, please go ahead and close the original issue - and then open up a new one to look at improving performance for Win CUDA with GGUF. Will share a few ideas in the new thread once opened.
from llmware.
@shneeba - could you share some details about your Windows environment? Does it have CUDA (Nvidia GPU) ?
from llmware.
Hey @doberst sure thing, I do indeed:
GPU - RTX 3090TI
Driver Version - 551.76
Processor - AMD Ryzen 9 3900X
RAM - 32Gb
OS - Windows 10 Pro
Version - 22H2
Build version - 19045.4046
Windows Feature Experience Pack 1000.19053.1000.0
from llmware.
@shneeba - OK ... I suspect it is the CUDA driver being out of date. I added an option in 0.2.4 to support CUDA on Windows for GGUF), which is automatically loaded if CUDA is detected. Could you check nvcc --version
? Also, you can 'turn off' GPU by setting GGUFConfigs().set_config("use_gpu", False) -> in that case, the GGUF will pull the non-CUDA binary and should run on CPU ...
from llmware.
Thanks for the quick replies and pointers. I actually didn't have the specific CUDA drivers installed, got this sorted however I'm still seeing the issue (for reference this is my nvcc --version
output):
(.venv) PS C:\Users\MYUSERNAME\Documents\projects\llmware> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
I did try setting GGUFConfigs().set_config("use_gpu", False)
however I still seem to get the same super vague error from Windows, for reference I just update the agent_multistep_analysis.py
with:
from llmware.setup import Setup
from llmware.gguf_configs import GGUFConfigs
def multistep_analysis():
GGUFConfigs().set_config("use_gpu", False)
""" In this example, our objective is to research Microsoft history and rivalry in the 1980s with IBM. """
Setting use_gpu
to False
directly within the gguf_config.py
file also didn't seem to fix it.
I'm still digging around but it does feel like something local to my setup somewhere. I've switched back to 0.2.2
for now whilst working on the SLIM model side of things.
from llmware.
@shneeba - could you check if you have AVX512 enabled? On Windows machines with AVX512, the GGUF engine seems to be working as expected, but I am able to replicate exactly your error on machines without AVX512 enabled. A good easy way to check is as follows:
pip install cpufeature
import cpufeature
cpufeature.print_features()
Working on a fix .....
from llmware.
@doberst sounds like this is it, my CPU doesn't support AVX-512, I didn't realise it had got that old! It seems AMD 7xxx series and onwards do. Just for reference:
>>> import cpufeature
>>> cpufeature.print_features()
=== CPU FEATURES ===
VendorId : AuthenticAMD
num_virtual_cores : 24
num_physical_cores : 12
num_threads_per_core : 2
num_cpus : 1
cache_line_size : 64
cache_L1_size : 0
cache_L2_size : 0
cache_L3_size : 0
OS_x64 : True
OS_AVX : True
OS_AVX512 : False
MMX : True
x64 : True
ABM : True
RDRAND : True
BMI1 : True
BMI2 : True
ADX : True
PREFETCHWT1 : False
MPX : False
SSE : True
SSE2 : True
SSE3 : True
SSSE3 : True
SSE4.1 : True
SSE4.2 : True
SSE4.a : True
AES : True
SHA : True
AVX : True
XOP : False
FMA3 : True
FMA4 : False
AVX2 : True
AVX512f : False
AVX512pf : False
AVX512er : False
AVX512cd : False
AVX512vl : False
AVX512bw : False
AVX512dq : False
AVX512ifma : False
AVX512vbmi : False
AVX512vbmi2 : False
AVX512vnni : False
>>>
Thank you 🙇♂️
from llmware.
@shneeba - hope you had a nice weekend! I have recompiled the gguf engine for windows to use only AVX/AVX2 (and not AVX512 - it seems not uncommon that Windows machines de-activate AVX512 even if the underlying chip supports it) - if you clone the main repo, you will have the fix - small update in the gguf_configs file, and then the new libllama_win.dll binary. If your CUDA drivers are up-to-date, then CUDA should kick-in automatically - if you are getting any errors from that, then please set use_gpu = False, and it will fall back to the CPU only version ...
from llmware.
@doberst I did thank you, hope you did too (and weren't too deep in this bug)!
You are awesome. This has fixed it and it's working on CPU again, no errors seen. Interesting note about AVX512 on Windows.
It doesn't seem to be picking up my GPU, I've tried setting use_gpu
to True
or False
and sending a larger query to the dragon-yi-6b-gguf
model just to be sure and can sadly confirm it still doesn't utilise it. When you say:
If your CUDA drivers are up-to-date, then CUDA should kick-in automatically
Do you happen to know what it's specifically look for? I've got the latest drivers as per my earlier post so I'm unsure what else it requires.
from llmware.
@shneeba - I finally got the win-cuda gguf lib to build on CUDA 12.1 - with blazing speed- really awesome. I have merged in the main code with an updated libllama_cuda_win.dll binary (no other changes). I have not yet tested if CUDA 12.2-12.4 will work (hope so). Could you pull the new code and try again - first with 12.4 (fingers crossed), and then fall-back to 12.1 drivers if needed?
from llmware.
@doberst thanks again for taking more time looking into this. I tried the updated binary and tested with CUDA 12.4 and 12.1 this evening however it's still using the CPU instead of GPU. Is there any additional logging I can enable to maybe catch why it's not using the GPU? I turned llama_cpp_verbose
to ON
but that didn't give me much info (although not 100% sure what I'm looking for).
As the original error that was blocking the SLIM models from working is fixed would it be better to open a new issue?
from llmware.
Related Issues (20)
- Add class docstrings to four modules HOT 1
- In text citation HOT 3
- array out of bounds error in retrieval HOT 4
- Add class docstring to setup module
- Creating embedding with MongoDB text store when library contains CSV file fails HOT 7
- Add class docstring to module retrieval
- JSON files not being parsed and are being rejected HOT 6
- Add class docstrings to module prompts HOT 1
- quickstart_rag_colab.ipynb
- streamlit and other UI examples HOT 1
- google colab examples and start up scripts HOT 2
- jupyter notebook - more examples and better support HOT 3
- Add Cohere Command R model
- GGUF models not utilising GPU on Windows HOT 2
- PDF files getting rejected in parse step HOT 4
- Can I use SLIM-Agents for german language?
- Error in Prompt.load(from_hf) : model_card (NoneType) is not iterable HOT 3
- llmware.exceptions.ModelNotFoundException: HOT 6
- move llmware base directory HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llmware.