GithubHelp home page GithubHelp logo

Comments (8)

XBeg9 avatar XBeg9 commented on July 23, 2024

I previously raised a question in Slack community channel regarding ongoing support for this project. About a month ago, there was a discussion promising continued development and updates. However, I have not seen any recent changes or updates since then.

Specifically, I am eager to see support for the new vllm/transformer packages, which are crucial for my current use cases. Could we get an update on the progress towards integrating these packages? Any timeline or roadmap would be greatly appreciated, as it would help us plan our projects accordingly.

from ray-llm.

nkwangleiGIT avatar nkwangleiGIT commented on July 23, 2024

I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well.
So ray-llm is not my dependent project now :-)

from ray-llm.

leiwen83 avatar leiwen83 commented on July 23, 2024

I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well. So ray-llm is not my dependent project now :-)

I am also interested in found fastcaht replacement, but I wonder how to implement model registry, dynamic auto scale, and unique entry URL with Ray? ;)

from ray-llm.

nkwangleiGIT avatar nkwangleiGIT commented on July 23, 2024

I think ray serving ingress can do the mode registry, ray auto scale for scaling, and multiple application deployment may achieve the unique entry URL.
I will write a document about how to do this once they're tested, by now, I just test ray serve with vllm serving, and can scale manually using serveConfig like below:

  serveConfigV2: |
    applications:
      - name: llm-serving-app
        import_path: llm-serving:deployment
        route_prefix: /
        runtime_env:
          working_dir: FILE:///vllm-workspace/llm-app.zip
        deployments:
          - name: VLLMPredictDeployment
            num_replicas: 2

from ray-llm.

nkwangleiGIT avatar nkwangleiGIT commented on July 23, 2024

@leiwen83 here is the doc about how to run ray serve and autoscaling:
http://kubeagi.k8s.com.cn/docs/Configuration/DistributedInference/deploy-using-rary-serve/

For model registry or unique entry URL/ingress, need to take a further look, may need to customize on FastAPI?

from ray-llm.

leiwen83 avatar leiwen83 commented on July 23, 2024

fastapi change may not be enough... For fastchat, it implement controller which track status of all workers, which make registry possible.

from ray-llm.

XBeg9 avatar XBeg9 commented on July 23, 2024

@xwu99 is heavily working on updates, let's 🤞 and see the progress here #149

from ray-llm.

depenglee1707 avatar depenglee1707 commented on July 23, 2024

I have upgrade vllm to 0.4.1 in an earlier version in my fork, check the details if you are interested ^_^: https://github.com/OpenCSGs/llm-inference/tree/main/llmserve/backend/llm/engines/vllm

from ray-llm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.