1:job_id:04000000
:actor_name:ServeReplica:default:opencsg--csg-wukong-1B
[INFO 2024-04-30 03:46:04,636] __init__.py: 14 Import vllm related stuff failed, please make sure 'vllm' is installed.
INFO 2024-04-30 03:46:04,723 default_opencsg--csg-wukong-1B IULCpr app.py:95 - LLM Deployment initialize
[INFO 2024-04-30 03:46:04,723] predictor.py: 27 LLM Predictor Initialize
INFO 2024-04-30 03:46:04,724 default_opencsg--csg-wukong-1B IULCpr app.py:145 - LLM Deployment Reconfiguring...
INFO 2024-04-30 03:46:04,724 default_opencsg--csg-wukong-1B IULCpr app.py:103 - LLM Deployment _should_reinit_worker_group
[INFO 2024-04-30 03:46:04,724] predictor.py: 48 Initializing new worker group ScalingConfig(trainer_resources={'CPU': 0}, num_workers=1, use_gpu=True, resources_per_worker={'CPU': 1.0, 'GPU': 1.0})
[INFO 2024-04-30 03:46:04,724] predictor.py: 59 Engine name is generic
[INFO 2024-04-30 03:46:04,724] predictor.py: 83 LLM Predictor creating a new worker group
[INFO 2024-04-30 03:46:04,818] predictor.py: 100 Build Prediction Worker with runtime_env:
[INFO 2024-04-30 03:46:04,819] predictor.py: 101 None
[INFO 2024-04-30 03:46:04,819] predictor.py: 109 Waiting for placement group to be ready...
[INFO 2024-04-30 03:46:04,887] predictor.py: 113 Starting initialize_node tasks...
[INFO 2024-04-30 03:46:06,970] predictor.py: 124 get version: [None]
[INFO 2024-04-30 03:46:06,970] generic.py: 351 Creating prediction workers...
[INFO 2024-04-30 03:46:06,975] generic.py: 358 Initializing torch_dist process group on workers...
[INFO 2024-04-30 03:46:09,210] generic.py: 368 Initializing model on workers with local_ranks: [0]
[INFO 2024-04-30 03:46:10,294] predictor.py: 68 Rolling over to new worker group [Actor(PredictionWorker, efd48e82c51a27d83f8078f604000000)]
INFO 2024-04-30 03:46:10,377 default_opencsg--csg-wukong-1B IULCpr app.py:236 - new_max_batch_size is 1
INFO 2024-04-30 03:46:10,377 default_opencsg--csg-wukong-1B IULCpr app.py:237 - new_batch_wait_timeout_s is 0
INFO 2024-04-30 03:46:10,377 default_opencsg--csg-wukong-1B IULCpr app.py:162 - LLM Deployment Reconfigured.
/home/yons/llm-inference/llmserve/backend/llm/predictor.py:212: RuntimeWarning: coroutine 'GenericEngine.check_health' was never awaited
self.engine.check_health()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
INFO 2024-04-30 03:47:11,008 default_opencsg--csg-wukong-1B IULCpr 0dd6808d-68b3-42fe-ac35-8f4ce6fb6d21 /api/v1/default/opencsg--csg-wukong-1B/run/predict app.py:210 - batch_generate_text prompts: [Prompt(prompt='What can I do', use_prompt_format=False)]
INFO 2024-04-30 03:47:11,008 default_opencsg--csg-wukong-1B IULCpr 0dd6808d-68b3-42fe-ac35-8f4ce6fb6d21 /api/v1/default/opencsg--csg-wukong-1B/run/predict app.py:273 - Received 1 prompts [Prompt(prompt='What can I do', use_prompt_format=False)]. start_timestamp None timeout_s 100
[INFO 2024-04-30 03:47:11,008] generic.py: 416 LLM GenericEngine do async predict
ERROR 2024-04-30 03:47:11,135 default_opencsg--csg-wukong-1B IULCpr 0dd6808d-68b3-42fe-ac35-8f4ce6fb6d21 /api/v1/default/opencsg--csg-wukong-1B/run/predict replica.py:756 - Request failed due to RayTaskError:
Traceback (most recent call last):
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 753, in wrap_user_method_call
yield
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 914, in call_user_method
raise e from None
ray.exceptions.RayTaskError: �[36mray::ServeReplica:default:opencsg--csg-wukong-1B.handle_request()�[39m (pid=1492889, ip=192.168.80.2)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/_private/utils.py", line 165, in wrap_to_ray_error
raise exception
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 895, in call_user_method
result = await method_to_call(*request_args, **request_kwargs)
File "/home/yons/llm-inference/llmserve/backend/server/app.py", line 217, in batch_generate_text
texts = await asyncio.gather(
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/batching.py", line 498, in batch_wrapper
return await enqueue_request(args, kwargs)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/ray/serve/batching.py", line 228, in _process_batches
results = await func_future
File "/home/yons/llm-inference/llmserve/backend/server/app.py", line 285, in generate_text_batch
prediction = await self._predict_async(
File "/home/yons/llm-inference/llmserve/backend/llm/predictor.py", line 183, in _predict_async
prediction = await self.engine.predict(prompts, generate, timeout_s=timeout_s, start_timestamp=start_timestamp, lock=self._base_worker_group_lock)
File "/home/yons/llm-inference/llmserve/backend/llm/engines/generic.py", line 443, in predict
await asyncio.gather(
File "/home/yons/.conda/envs/abc/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): �[36mray::PredictionWorker.generate()�[39m (pid=1493087, ip=192.168.80.2, actor_id=efd48e82c51a27d83f8078f604000000, repr=PredictionWorker:opencsg/csg-wukong-1B)
File "/home/yons/llm-inference/llmserve/backend/llm/engines/generic.py", line 268, in generate
return generate(
File "/home/yons/llm-inference/llmserve/backend/llm/utils.py", line 161, in inner
ret = func(*args, **kwargs)
File "/home/yons/llm-inference/llmserve/backend/llm/engines/generic.py", line 169, in generate
outputs = pipeline(
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yons/llm-inference/llmserve/backend/llm/pipelines/default_transformers_pipeline.py", line 77, in __call__
model_outputs = self.forward(model_inputs, **forward_params)
File "/home/yons/llm-inference/llmserve/backend/llm/pipelines/default_transformers_pipeline.py", line 208, in forward
generated_sequence = self.pipeline(**prompt_text, **generate_kwargs)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__
return super().__call__(text_inputs, **kwargs)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1187, in __call__
outputs = list(final_iterator)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
processed = self.infer(item, **self.params)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1112, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 327, in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/home/yons/.conda/envs/abc/lib/python3.10/site-packages/transformers/generation/utils.py", line 2735, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
INFO 2024-04-30 03:47:11,135 default_opencsg--csg-wukong-1B IULCpr 0dd6808d-68b3-42fe-ac35-8f4ce6fb6d21 /api/v1/default/opencsg--csg-wukong-1B/run/predict replica.py:772 - BATCH_GENERATE_TEXT ERROR 127.1ms