Your current environment <div class="snippet-clipboard-content notranslate posit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

looks like the same issue with <a class="issue-link js-issue-link" data-error-text="Fa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3. ,about vllm-project/vllm

Comments (6)

JPonsa commented on June 12, 2024 1

Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2

INFO 04-25 09:50:28 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-25 09:50:28 api_server.py:150] args: Namespace(host=None, port=8001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', max_model_len=5000, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.8, forced_num_gpu_blocks=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization='gptq', enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-25 09:50:29 config.py:767] Casting torch.bfloat16 to torch.float16.
WARNING 04-25 09:50:29 config.py:211] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 04-25 09:51:01 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=5000, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-25 09:51:20 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 09:51:20 selector.py:25] Using XFormers backend.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:25] Using XFormers backend.
INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
INFO 04-25 09:52:23 weight_utils.py:177] Using model weights format ['*.safetensors']
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:24 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
I am awake
   time         mem   processes  process usage
  (secs)        (MB)  tot  actv  (sorted, %CPU)
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
INFO 04-25 09:54:42 ray_gpu_executor.py:240] # GPU blocks: 12524, # CPU blocks: 4096

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)>
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 103, in create_completion
    generator = await openai_serving_completion.create_completion(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 178, in create_completion
    async for i, res in result_generator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 81, in consumer
    raise item
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 66, in producer
    async for item in iterator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 638, in generate
    async for request_output in stream:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

from vllm.

ywang96 commented on June 12, 2024 1

@ericzhou571 @JPonsa @blackblue9 @supdizh Could you try --disable-custom-all-reduce when you launch the server and see if this issue persists?

from vllm.

supdizh commented on June 12, 2024

looks like the same issue with #4135
emerged after 0.4.0

from vllm.

ericzhou571 commented on June 12, 2024

Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2

INFO 04-25 09:50:28 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-25 09:50:28 api_server.py:150] args: Namespace(host=None, port=8001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', max_model_len=5000, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.8, forced_num_gpu_blocks=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization='gptq', enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-25 09:50:29 config.py:767] Casting torch.bfloat16 to torch.float16.
WARNING 04-25 09:50:29 config.py:211] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 04-25 09:51:01 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=5000, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-25 09:51:20 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 09:51:20 selector.py:25] Using XFormers backend.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:21 selector.py:25] Using XFormers backend.
INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:51:33 pynccl_utils.py:45] vLLM is using nccl==2.18.1
INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:18 custom_all_reduce.py:137] NVLink detection failed with message "Not Supported". This is normal if your machine has no NVLink equipped
INFO 04-25 09:52:23 weight_utils.py:177] Using model weights format ['*.safetensors']
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:52:24 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
I am awake
   time         mem   processes  process usage
  (secs)        (MB)  tot  actv  (sorted, %CPU)
�[36m(RayWorkerVllm pid=233183)�[0m INFO 04-25 09:53:12 model_runner.py:104] Loading model weights took 11.0906 GB
INFO 04-25 09:54:42 ray_gpu_executor.py:240] # GPU blocks: 12524, # CPU blocks: 4096

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x2aad73314ae0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x2aad836e63d0>>)>
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 490, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 103, in create_completion
    generator = await openai_serving_completion.create_completion(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 178, in create_completion
    async for i, res in result_generator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 81, in consumer
    raise item
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_completion.py", line 66, in producer
    async for item in iterator:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 638, in generate
    async for request_output in stream:
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/lustre/scratch/scratch/rmhijpo/ctgov_rag/.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ucl/apps/python/3.11.3/gnu-4.9.2/lib/python3.11/asyncio/tasks.py", line 492, in wait_for
    raise exceptions.TimeoutError() from exc
TimeoutError

Would you kindly share the specifications of the GPU you utilized while encountering these issues? Also A800-80G?

from vllm.

JPonsa commented on June 12, 2024

@ywang96 the issue persists when launching the server with --disable-custom-all-reduce

from vllm.

Warrior0x1 commented on June 12, 2024

I encountered a similar issue in version 0.4.2.

from vllm.

[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3. about vllm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs