Comments (1)
working:
ubuntu@compute-permanent-node-171:~/h2ogpt_ops$ docker logs 7987e3d9807b
INFO 04-24 02:25:40 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-24 02:25:40 api_server.py:150] args: Namespace(host='0.0.0.0', port=5004, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='databricks/dbrx-instruct', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir='/home/ubuntu/.cache/huggingface/hub', load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=True, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=1234, swap_space=4, gpu_memory_utilization=0.98, forced_num_gpu_blocks=None, max_num_batched_tokens=32768, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, engine_use_ray=False, disable_log_requests=False, max_log_len=100)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1155, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 852, in __getitem__
raise KeyError(key)
KeyError: 'dbrx'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/vllm/entrypoints/openai/api_server.py", line 157, in <module>
engine = AsyncLLMEngine.from_engine_args(
File "/workspace/vllm/engine/async_llm_engine.py", line 331, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/workspace/vllm/engine/arg_utils.py", line 406, in create_engine_config
model_config = ModelConfig(
File "/workspace/vllm/config.py", line 121, in __init__
self.hf_config = get_config(self.model, trust_remote_code, revision,
File "/workspace/vllm/transformers_utils/config.py", line 37, in get_config
raise e
File "/workspace/vllm/transformers_utils/config.py", line 22, in get_config
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1157, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type `dbrx` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
ubuntu@compute-permanent-node-171:~/h2ogpt_ops$
from vllm.
Related Issues (20)
- [Bug]: Docker image starts vllm.entrypoints.openai.api_server , Docker opens port 8000 but vllm isn't listening on 8000 HOT 2
- [Bug]: 8 GPU setup - vLLM can only start with --tensor-parallel-size=2 but not 4 or 8
- [Bug]: RuntimeError: CUDA error: an illegal memory access was encountered
- [Usage]: Resolve: ModuleNotFoundError: No module named 'vllm.model_executor.parallel_utils' HOT 2
- 0.4.3 error CUDA error: an illegal memory access was encountered
- [RFC]: OpenVINO vLLM backend HOT 3
- [Misc]: PagedAttention + cudagraphs HOT 1
- [Bug]: Llama3 output limited to around 10 tokens HOT 2
- [Bug]: EngineArgs missing value type for `lora_dtype`
- [Bug]: topk=1 and temperature=0 cause different output in vllm
- [Bug]:The vllm service takes two hours to start Because of NCCL HOT 26
- hidden-states from final (or middle layers) HOT 1
- [Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem HOT 1
- [Doc]: Urgent MoE question HOT 2
- [Usage]: How do you specify a specific branch on huggingface to use when downloading a model? HOT 5
- [Bug]: Small context lengths consume more memory than large context lengths HOT 3
- [Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight' HOT 1
- [Bug]: Automatic Prefix caching not working while hitting same request multiple times HOT 2
- [RFC]: Improve guided decoding (logit_processor) APIs and performance. HOT 2
- [Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.