Comments (8)
I previously raised a question in Slack community channel regarding ongoing support for this project. About a month ago, there was a discussion promising continued development and updates. However, I have not seen any recent changes or updates since then.
Specifically, I am eager to see support for the new vllm/transformer packages, which are crucial for my current use cases. Could we get an update on the progress towards integrating these packages? Any timeline or roadmap would be greatly appreciated, as it would help us plan our projects accordingly.
from ray-llm.
I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well.
So ray-llm is not my dependent project now :-)
from ray-llm.
I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well. So ray-llm is not my dependent project now :-)
I am also interested in found fastcaht replacement, but I wonder how to implement model registry, dynamic auto scale, and unique entry URL with Ray? ;)
from ray-llm.
I think ray serving ingress can do the mode registry, ray auto scale for scaling, and multiple application deployment may achieve the unique entry URL.
I will write a document about how to do this once they're tested, by now, I just test ray serve with vllm serving, and can scale manually using serveConfig like below:
serveConfigV2: |
applications:
- name: llm-serving-app
import_path: llm-serving:deployment
route_prefix: /
runtime_env:
working_dir: FILE:///vllm-workspace/llm-app.zip
deployments:
- name: VLLMPredictDeployment
num_replicas: 2
from ray-llm.
@leiwen83 here is the doc about how to run ray serve and autoscaling:
http://kubeagi.k8s.com.cn/docs/Configuration/DistributedInference/deploy-using-rary-serve/
For model registry or unique entry URL/ingress, need to take a further look, may need to customize on FastAPI?
from ray-llm.
fastapi change may not be enough... For fastchat, it implement controller which track status of all workers, which make registry possible.
from ray-llm.
@xwu99 is heavily working on updates, let's 🤞 and see the progress here #149
from ray-llm.
I have upgrade vllm to 0.4.1 in an earlier version in my fork, check the details if you are interested ^_^: https://github.com/OpenCSGs/llm-inference/tree/main/llmserve/backend/llm/engines/vllm
from ray-llm.
Related Issues (20)
- Queue-Worker System HOT 2
- Error when trying to run tensorrt model on ray HOT 1
- [BUG] workers do not launch on g5.12xlarges for the latest image 0.5.0. HOT 6
- Podman Error on red hat 9?
- A100 not correctly detected / No available node types can fulfill resource request
- Serve a new model without restarting RayLLM HOT 1
- In the 0.5.0 release, some files appear to be missing
- Support for the Mistral based Embeddings models
- Autoscaling support in Ray-llm
- How to adjust engine kwargs from defaults values for models in `./models/`
- There should be a feature saying that all of the 3 options are wrong HOT 1
- Issue when running ray-llm with tensorrt-llm HOT 1
- RayWorkerVllm Actor Dies After ~1h: The actor is dead because all references to the actor were removed.
- `serve status` fails on the head pod after model is deployed
- Run rayllm frontend on head pod fails HOT 1
- RAY-LLM stuck at replica step HOT 1
- Error when `serve run` HOT 1
- templating serve-config and model config instead of copy and paste
- Running ray-llm 0.5.0 on g4dn.12xlarge instance HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray-llm.