Comments (2)
alright I was executing the command inside the repo, moved outside and now this error:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/endpoints/openai/api_server.py", line 31, in <module>
from aphrodite.endpoints.openai.serving_chat import OpenAIServingChat
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/endpoints/openai/serving_chat.py", line 16, in <module>
from aphrodite.modeling.outlines_decoding import (
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/modeling/outlines_decoding.py", line 15, in <module>
from aphrodite.modeling.outlines_logits_processors import (
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/modeling/outlines_logits_processors.py", line 24, in <module>
from outlines.fsm.fsm import RegexFSM
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/__init__.py", line 2, in <module>
import outlines.generate
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/generate/__init__.py", line 2, in <module>
from .cfg import cfg
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/generate/cfg.py", line 5, in <module>
from outlines.models import OpenAI
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/models/__init__.py", line 11, in <module>
from .llamacpp import LlamaCpp, llamacpp
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/models/llamacpp.py", line 5, in <module>
from outlines.integrations.llamacpp import ( # noqa: F401
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/integrations/llamacpp.py", line 37, in <module>
from outlines.fsm.json_schema import build_regex_from_schema
File "/home/tesh/.local/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 6, in <module>
from jsonschema.protocols import Validator
ModuleNotFoundError: No module named 'jsonschema.protocols'
from aphrodite-engine.
alright, turns out the package I had was outdated and that was enough to thow the error, after updating I got the error that the model or directory is not in the specified path even though it is.
I ran the line inside the directory with ./ and that seemed to make it work, but:
WARNING: gguf quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO: Initializing the Aphrodite Engine (v0.5.1) with the following config:
INFO: Model = './kunoichi-7b.Q4_K_M.gguf'
INFO: DataType = torch.float16
INFO: Model Load Format = auto
INFO: Number of GPUs = 1
INFO: Disable Custom All-Reduce = False
INFO: Quantization Format = gguf
INFO: Context Length = 8192
INFO: Enforce Eager Mode = False
INFO: KV Cache Data Type = auto
INFO: KV Cache Params Path = None
INFO: Device = cuda
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/endpoints/openai/api_server.py", line 563, in <module>
engine = AsyncAphrodite.from_engine_args(engine_args)
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/async_aphrodite.py", line 676, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/async_aphrodite.py", line 341, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/async_aphrodite.py", line 410, in _init_engine
return engine_class(*args, **kwargs)
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/aphrodite_engine.py", line 115, in __init__
self._init_workers()
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/aphrodite_engine.py", line 157, in _init_workers
self._run_workers("load_model")
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/engine/aphrodite_engine.py", line 1028, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/task_handler/worker.py", line 112, in load_model
self.model_runner.load_model()
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/task_handler/model_runner.py", line 121, in load_model
self.model = get_model(self.model_config, self.device_config,
File "/home/tesh/.local/lib/python3.10/site-packages/aphrodite/modeling/loader.py", line 56, in get_model
raise ValueError(
ValueError: The quantization method gguf is not supported for the current GPU. Minimum capability: 61. Current capability: 60.
I watched it go live on my p100 and not my gpu 0, is there a specific reason as to why it went for that gpu?
from aphrodite-engine.
Related Issues (20)
- [Bug]: Does --trust-remote-code work? HOT 1
- [Bug]: multi GPU crashes backend HOT 6
- [Bug]: WSL Cuda out of Memory when Trying to Load GGUF Model HOT 8
- [Usage]: load-in-4bit not load after converted, and it seem not use swap well
- [Bug]: KV Cache and Max Tokens - Lack of Consistency
- [Feature]: Add support for DBRX model HOT 2
- [Bug]: Exllama v2 not working HOT 11
- [Feature]: Add support for Qwen2MoE HOT 1
- [Feature]: Add support for Command-r HOT 2
- [Feature]: actual working health endpoint HOT 2
- [Feature]: any workarounds for cc 6.0? HOT 2
- [Bug]: served-model-name is unused HOT 1
- [Crash]: Program gets terminated HOT 1
- [Bug]: Converting gguf to state_dict HOT 3
- [Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? HOT 5
- [Bug]: manually setting --max-model-len flag always leads to OOM, even if it is set very low HOT 2
- [Bug]: gguf loading failed. config.json? HOT 4
- [Feature]: Support hqq quantize method.
- [Bug]: Mixtral-8x22b-instruct not running with AWQ HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aphrodite-engine.