GithubHelp home page GithubHelp logo

Comments (6)

JPonsa avatar JPonsa commented on June 12, 2024 1

Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2

INFO 04-25 09:50:28 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-25 09:50:28 api_server.py:150] args: Namespace(host=None, port=8001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', max_model_len=5000, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.8, forced_num_gpu_blocks=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization='gptq', enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-25 09:50:29 config.py:767] Casting torch.bfloat16 to torch.float16.
WARNING 04-25 09:50:29 config.py:211] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 04-25 09:51:01 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=5000, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-25 09:51:20 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 09:51:20 selector.py:25] Using XFormers backend.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:25] Using XFormers backend.
INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
INFO 04-25 09:52:23 weight_utils.py:177] Using model weights format ['*.safetensors']
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:24 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
I am awake
   time         mem   processes  process usage
  (secs)        (MB)  tot  actv  (sorted, %CPU)
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
INFO 04-25 09:54:42 ray_gpu_executor.py:240] # GPU blocks: 12524, # CPU blocks: 4096
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)>
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 103, in create_completion
    generator = await openai_serving_completion.create_completion(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 178, in create_completion
    async for i, res in result_generator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 81, in consumer
    raise item
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 66, in producer
    async for item in iterator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 638, in generate
    async for request_output in stream:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

from vllm.

ywang96 avatar ywang96 commented on June 12, 2024 1

@ericzhou571 @JPonsa @blackblue9 @supdizh Could you try --disable-custom-all-reduce when you launch the server and see if this issue persists?

from vllm.

supdizh avatar supdizh commented on June 12, 2024

looks like the same issue with #4135
emerged after 0.4.0

from vllm.

ericzhou571 avatar ericzhou571 commented on June 12, 2024

Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2

INFO 04-25 09:50:28 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-25 09:50:28 api_server.py:150] args: Namespace(host=None, port=8001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', max_model_len=5000, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.8, forced_num_gpu_blocks=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization='gptq', enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-25 09:50:29 config.py:767] Casting torch.bfloat16 to torch.float16.
WARNING 04-25 09:50:29 config.py:211] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 04-25 09:51:01 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=5000, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-25 09:51:20 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 09:51:20 selector.py:25] Using XFormers backend.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:25] Using XFormers backend.
INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
INFO 04-25 09:52:23 weight_utils.py:177] Using model weights format ['*.safetensors']
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:24 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
I am awake
   time         mem   processes  process usage
  (secs)        (MB)  tot  actv  (sorted, %CPU)
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
INFO 04-25 09:54:42 ray_gpu_executor.py:240] # GPU blocks: 12524, # CPU blocks: 4096
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)>
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 103, in create_completion
    generator = await openai_serving_completion.create_completion(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 178, in create_completion
    async for i, res in result_generator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 81, in consumer
    raise item
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 66, in producer
    async for item in iterator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 638, in generate
    async for request_output in stream:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

Would you kindly share the specifications of the GPU you utilized while encountering these issues? Also A800-80G?

from vllm.

JPonsa avatar JPonsa commented on June 12, 2024

@ywang96 the issue persists when launching the server with --disable-custom-all-reduce

from vllm.

Warrior0x1 avatar Warrior0x1 commented on June 12, 2024

I encountered a similar issue in version 0.4.2.

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.