Comments (15)
I suggest listing the feature in a high priority. We (Caicloud) have an internal version of Serving CRD, which is based on Kubernetes Ingress/Service and Istio. I think we should support different serving backends to be more general.
How about define an interface for serving backends and use annotations or CLI flags in controller to control what backends (KNative, Istio, Linkerd, Kubernetes Native Ingress/Service) will be used
from kserve.
/kind feature
from kserve.
we actually discussed about the annotation approach in our WG meeting today.
from kserve.
@gaocegege That sounds good. To handle canaries and blue/green routing the annotation would indeed need to specify the routing resources to utilize:
- KNative (default?)
- Istio
- Linkerd
- Envoy
from kserve.
@cliveseldon KNative by default SGTM. Using Kubernetes native resources to handle canaries and blue/green routing may be hard, while it is feasible to use istio/linkerd/knative. Thus the list LGTM.
@yuzisun Is there any calendar for the SIG meeting? I am interested in it, too.
from kserve.
@gaocegege, I've added you to the meetings.
from kserve.
I want to minimize the number of implementations. I'm all for pluggability, but we need to make sure we're delivering clear customer value. Knative itself is working on the pluggability question, so we may be able to simply rely on their improvements.
Short term, as we discussed in our KFSWG meeting, the best value for effort will be:
- Default Knative impl w/ canarying & all the bells
- Annotated raw k8s impl w/o canarying
from kserve.
Also @gaocegege, in our discussions so far, we've limited the networking responsibilities of KFServing to in-cluster communication. If you have Knative/Istio installed, you can use their Loadbalancer Service for Ingress, but we're not providing an Ingress solution ourselves.
I'm not sure we want to be creating ingress resources, as they don't map 1:1 with a KFService. If there's some disagreement here, this might be a great topic for a KFSWG Special Topics meeting.
from kserve.
To clarify, what does ingress resource include? I assume you mean any setup of ingress load balancers, external IP etc. right? I also assume it doesn't include ingress rules. In #6 we discussed an internal and external URL as part of the status field... I assumed to get an external URL the implementation would have to write some sort of implementation specific ingress rule?
from kserve.
I think having raw k8s resources (svcs,deployments) created only makes sense if the whole spec for kfserving can be satisfied. If canaries, blue/green, scaling can't be handled then the spec is broken for people who don't want to use KNative.
Just looking at istio, it seems they don't have Go types yet. See here. KNative uses their own versions which is probably ok to reuse for now.
We should decide to either make this issue a
- Non-goal and make the top level README clear this is a KNative only implementation
- Leave for later incorporation
- Handle full functionality which might need to include istio, linkerd as well as HPAs for autoscaling.
from kserve.
It is still a bit unclear to me for the value of implementing the spec with istio, correct me if I am wrong:
- to have canary rollout you need to create istio virtual service, setup istio networking and have some way to clean up the deployments like knative revision gc. Not sure if it is worth the effort to implement exactly what knative serving has done.
- knative serving is pretty lightweight and you get auto scaling down to zero for free.
- future integration with kf pipeline would probabily need knative eventing
I think knative is the serverless layer on top of istio and kfserving needs a serverless solution, whether it is istio or linkerd can be the choice of knative and I know there is conversation for knative to support other service mesh choices.
from kserve.
@yuzisun knative is built on top of istio and has a higher level abstraction. While in China, AFAIK, there are few companies running knative in production environment. They may have istio installed but they do not use knative. Then they cannot use our Serving CRD.
In the early stage, I agree that we could focus on knative implementation, while I hope that we could keep extensibility to support more backends to satisfy the request of the majority applications
from kserve.
Agreed @gaocegege . Given how young this project is, I see multiple implementations and pluggability as much lower priority. When we do consider where and how we make this system pluggable, we need to be really thoughtful about what value we're delivering (unless of course, someone wants to provide the resources themselves).
Knative is in Beta and growing massively. I could imagine that in a year it will be as standard as Istio, so we wouldn't necessarily want to spend too much effort. Let's make the core functionality available and see where the customer gaps are and address them then.
from kserve.
SGTM. thanks.
from kserve.
Given the direction of this project, its growing featureset, and our impending reliance on eventing. I'm closing this as a non-starter.
from kserve.
Related Issues (20)
- Reconciler error while creating InferenceService: failed to call webhook kserve-webhook-server-service.kserve.svc context deadline HOT 3
- storage initializer container downloads all the folder/files from matching path instead of absolute path
- ImportError: cannot import name 'BentoService' from 'bentoml' (/usr/local/lib/python3.9/dist-packages/bentoml/__init__.py)
- Bump transformers in huggingfaceserver python package HOT 1
- Bump vLLM version in huggingface_server to v0.5.1 HOT 1
- Extend the OpenAI schema to support additional parameters
- transformer with v2 + grpc is not work HOT 3
- When reasoning with huggingface server, NCCL error occurs when GPU>1 HOT 4
- Resource requests and limits not respected / Usage of EmptyDir volume "models-dir" exceeds the limit "1536Mi"
- Remove ray dependency for the storage-initializer HOT 3
- Large model deployment timesout HOT 12
- mlflow model cannot be deployed
- Target Model State: Pending HOT 2
- TLS with S3 Outside ConfigMap HOT 2
- Support speculative decoding in vLLM backend of HuggingFace server HOT 1
- Tensorflow model Could not find variable
- How to isolate models from worker pods in a multi-tenancy setup HOT 1
- Support Envoy Gateway as ingress option
- Bump install manifests to v0.13.0 HOT 1
- Improve raw deployment documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kserve.