GithubHelp home page GithubHelp logo

Comments (9)

xwu99 avatar xwu99 commented on September 14, 2024

@yutianchen666 Could you help to reproduce the issue? I am not sure if it is OpenAI version causing api break.

from llm-on-ray.

dkiran1 avatar dkiran1 commented on September 14, 2024

I used openai==0.28 version, since latest version gave error and recommoneded to use this version

from llm-on-ray.

yutianchen666 avatar yutianchen666 commented on September 14, 2024

@yutianchen666 Could you help to reproduce the issue? I am not sure if it is OpenAI version causing api break.

ok, I'll reproduce it soon

from llm-on-ray.

KepingYan avatar KepingYan commented on September 14, 2024

@dkiran1 Thank you for your reporting. If you want to use Openai compatible sdk, please remove --simple parameter. After serving, please set ENDPOINT_URL=http://localhost:8000/v1 when running query_http_requests.py or set OPENAI_API_BASE=http://localhost:8000/v1 when running query_open_sdk.py. And you can see serve.md for more details.

from llm-on-ray.

dkiran1 avatar dkiran1 commented on September 14, 2024

Hi Yan, Thanks for the details, I tried the above mentioned steps, I could run inference server with falcon model, but on running
python examples/inference/api_server_openai/query_openai_sdk.py --model_name="falcon-7b" Its waiting for the response from long time, but no response, I tried with neural chat model, yestuday it was working on upgrading transformer library , but its giving error

d lead to undefined behavior!
(ServeController pid=11891) ERROR 2024-01-19 05:35:26,615 controller 11891 deployment_state.py:672 - Exception in replica 'neural-chat-7b-v3-1#PredictorDeployment#3jmxrf36', the replica will be stopped.
(ServeController pid=11891) Traceback (most recent call last):
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/deployment_state.py", line 670, in check_ready
(ServeController pid=11891) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(ServeController pid=11891) return fn(*args, **kwargs)
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=11891) return func(*args, **kwargs)
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2656, in get
(ServeController pid=11891) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 869, in get_objects
(ServeController pid=11891) raise value.as_instanceof_cause()
(ServeController pid=11891) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:neural-chat-7b-v3-1:PredictorDeployment.initialize_and_get_metadata() (pid=18013, ip=172.17.0.2, actor_id=685216a503325bcc4e3c3c7701000000, repr=<ray.serve._private.replica.ServeReplica:neural-chat-7b-v3-1:PredictorDeployment object at 0x7fabd93efd00>)
(ServeController pid=11891) File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
(ServeController pid=11891) return self.__get_result()
(ServeController pid=11891) File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=11891) raise self._exception
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 570, in initialize_and_get_metadata
(ServeController pid=11891) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=11891) RuntimeError: Traceback (most recent call last):
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 554, in initialize_and_get_metadata
(ServeController pid=11891) await self._user_callable_wrapper.initialize_callable()
(ServeController pid=11891) File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 778, in initialize_callable
(ServeController pid=11891) await self._call_func_or_gen(
(ServeController pid=11891) result = callable(*args, **kwargs)
(ServeController pid=11891) File "/root/llm-ray/inference/predictor_deployment.py", line 64, in init
(ServeController pid=11891) self.predictor = TransformerPredictor(infer_conf)
(ServeController pid=11891) File "/root/llm-ray/inference/transformer_predictor.py", line 22, in init
(ServeController pid=11891) from optimum.habana.transformers.modeling_utils import (
(ServeController pid=11891) File "/root/optimum-habana/optimum/habana/transformers/modeling_utils.py", line 19, in
(ServeController pid=11891) from .models import (
(ServeController pid=11891) File "/root/optimum-habana/optimum/habana/transformers/models/init.py", line 59, in
(ServeController pid=11891) from .mpt import (
(ServeController pid=11891) File "/root/optimum-habana/optimum/habana/transformers/models/mpt/init.py", line 1, in
(ServeController pid=11891) from .modeling_mpt import (
(ServeController pid=11891) File "/root/optimum-habana/optimum/habana/transformers/models/mpt/modeling_mpt.py", line 24, in
(ServeController pid=11891) from transformers.models.mpt.modeling_mpt import MptForCausalLM, MptModel, _expand_mask, _make_causal_mask
(ServeController pid=11891) ImportError: cannot import name '_expand_mask' from 'transformers.models.mpt.modeling_mpt' (/usr/local/lib/python3.10/dist-packages/transformers/models/mpt/modeling_mpt.py)
(ServeController pid=11891) INFO 2024-01-19 05:35:27,338 controller 11891 deployment_state.py:2188 - Replica neural-chat-7b-v3-1#PredictorDeployment#3jmxrf36 is stopped.
(ServeController pid=11891) INFO 2024-01-19 05:35:27,339 controller 11891 deployment_state.py:1850 - Adding 1 replica to deployment PredictorDeployment in application 'neural-chat-7b-v3-1'.
exit(ServeReplica:router:PredictorDeployment pid=18206) /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
(ServeReplica:router:PredictorDeployment pid=18206) warnings.warn(
(ServeReplica:neural-chat-7b-v3-1:PredictorDeployment pid=18013) [WARNING|utils.py:190] 2024-01-19 05:35:26,443 >> optimum-habana v1.8.0.dev0 has been validated for SynapseAI v1.11.0 but the driver version is v1.13.0, this could lead to undefined behavior!

from llm-on-ray.

kira-lin avatar kira-lin commented on September 14, 2024

Hi @dkiran1 , we currently have limited bandwidth and hardware to test on Gaudi. Currently the Gaudi related part is not up to date. I just tested in docker, in vault.habana.ai/gaudi-docker/1.13.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.0 container, you only need to

# install llm-on-ray, assume mounted
pip install -e .
# install latest optimum[habana]
pip install optimum[habana]

Make sure tranformers version is 4.34.1, which is required by optimum[habana], and caused your error. In addition, inference with gaudi does not require IPEX

from llm-on-ray.

dkiran1 avatar dkiran1 commented on September 14, 2024

Hi Lin, Thanks a lot after doing pip install optimum[habana] neural-chat model along with query_openai_sdk is working fine. I will test other models and will post the status

from llm-on-ray.

dkiran1 avatar dkiran1 commented on September 14, 2024

I tested falcon-7b,mpt-7b,mistral-7b and neural-chat model ,I could run inference server of these models , Iam getting response for neural-chat and mistral-7b model with query_openai_sdk.py , but its waiting for resposne for mpt-7b and flacon model

from llm-on-ray.

kira-lin avatar kira-lin commented on September 14, 2024

Hi @dkiran1 ,
When you use openai serving, try add --max_new_tokens config. It seems like optimum-habana requires this config. I'll look into why and how to fix this later.

from llm-on-ray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.