kserve / kserve Goto Github PK
View Code? Open in Web Editor NEWStandardized Serverless ML Inference Platform on Kubernetes
Home Page: https://kserve.github.io/website/
License: Apache License 2.0
Standardized Serverless ML Inference Platform on Kubernetes
Home Page: https://kserve.github.io/website/
License: Apache License 2.0
Data scientists find it hard to figure out resource limits. KFServing should make it easy to set optimal limits automatically (within user set max bounds).
Possibly a VPA and something that fires off a set of warm-up or test queries on canary would suffice?
The Knative runtime contract also indicates that the serverless platform (knative) is allowed to adjust resource limits, so maybe kfserving serving gets this for free by virtue of relying on knative?
Allow the creation of custom components initially provided by a v1.Container
specification.
It would be useful for developers to have an architecture diagram to assist in the development and debugging of KFServices.
/kind bug
What steps did you take and what happened:
$ make
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
hack/verify-golint.sh
Can not find golint, install with:
go get -u github.com/golang/lint/golint
Makefile:52: recipe for target 'lint' failed
make: *** [lint] Error 1
What did you expect to happen:
go get golang.org/x/lint/golint
per golang/lint#415
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):/kind feature
Describe the solution you'd like
Ability to configure the "domain" configmap
"myorg.mycluster.com" is the prefix of all serving uris
people need a wildcard domain and a cert to match
Knative is planning on autoprovisioning certs in Beta
Currently when user promotes canary spec to default, a new deployment will be created under default configuration despite that the same deployment was already created with canary configuration and serving the traffic, the traffic also needs to switch over to the newly created deployment. Although this is automatic it may take some time to launch the deployment and if the min replica is set low, knative will spend another few seconds to scale up the pods to serve 100% traffic. One proposal to make the canary promotion seamless is that we can add configuration=default
and configuration=canary
labels to the two configurations we create for the KFService, controller can then use label to select configuration instead of name, upon canary promotion controller flips the two labels, in this way once canary deployment goes to 100 it effectively become default upon promotion.
Any reason why this commit deleted the manager_image_patch.yaml? Master seems broken now.
@yuzisun The current implementation of CreateKnativeService will not sync cluster state to spec desired state in the following cases (admittedly off the happy path):
I think these issues stem from the fact that this logic depends on cluster state (revision state). Can we do some other knative thing to resolve this? Happy to try to fix if can point me in the right kn direction!
https://github.com/kubeflow/kfserving/blob/master/pkg/reconciler/ksvc/resources/knative_service.go
if kfsvc.Spec.Canary == nil || kfsvc.Spec.Canary.TrafficPercent == 0 {
//TODO(@yuzisun) should we add model name to the spec, can be different than service name?
container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Default)
revisions = []string{knservingv1alpha1.ReleaseLatestRevisionKeyword}
routingPercent = 0
} else {
container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Canary.ModelSpec)
revisions = []string{kfsvc.Status.Default.Name, knservingv1alpha1.ReleaseLatestRevisionKeyword}
routingPercent = kfsvc.Spec.Canary.TrafficPercent
}
/kind feature
Describe the solution you'd like
We should just have:
status.url
Anything else you would like to add:
External URI should be delegated to the cloud provider. KFServing assumes an istio/knative fabric and will follow the domain provided. We don't need to reinvent the wheel here.
/kind feature
Describe the solution you'd like
User needs to provide only "one" yaml file for model deployment. KFserving takes care of the rest. Possibly align with MLSpec
Right now, our output is only Name and Age. We should have better object summaries.
Add a handler for the scikitlearn model type. Controller work to handoff to servers that support these model types. Related to #18
Currently, min/maxreplicas are set outside of the default/canary specification. This for strange scenarios where replicas changes are not able to be rolled out cleanly.
Should we move this into the canary/default specs? More generally, given that our CRD is responsible for two backend configurations, does it make sense to. by default, add new fields to canary or default instead of at the top level (e.g. auth).
This is also somewhat challenging for any features triggered based off of annotations, as the annotations would need to specify default/canary as well, or be subject to the same weirdness.
Along these same lines, I've been thinking that it might make sense to move trafficPercent up a level. Our 3 top level keys are then corresponding the to underlying 3 knative resources.
We need a kubebuilder webhook for validation logic.
Downloading from GCS and S3 needs to be completed.
There has been a large discussion over email about this topic. I'm opening this up to the repo so we have history and increase participation.
Some early thoughts on the data plane layer to garner feedback on directions.
The reasons to not provide an initial data plane spec are:
For the above the control plane specs would be extended with a protocol
field, e.g.
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "myModel"
spec:
minReplicas: 1
maxReplicas: 2
tensorflow:
modelUri: "gs://mybucket/mymodel"
protocol: seldon # could also be tfserving
There are a range of payload types that can be provided each with their own advantages.
Should we provide a place to provide meta data as part of the payload? Meta data in the API could include:
Should there be a section of the payload providing status of the prediction?
If we wish to support multiple payload protocols and at the same time handle synchronous and asynchronous use cases we may wish to contain the payload within a meta-API that defines what schema the underlying payload contains.
Cloudevents provides a generic protocol for events. By supporting cloudevents as the top level protocol we can easily create data specifications for various ML use cases at that level rather than in 1 overarching data plane definition:
In this way we can build up a set of data plane protocols for different use cases and/or handle particular exisiting protocols (Seldon, TFServing). CloudEvents also fits well into the KNative ecosystem.
Example:
{
"specversion" : "0.2",
"type" : "org.mlspec.serving.prediction",
"schemaurl": "http://mlspec.org/serving/prediction.proto"
"source" : "/myclient",
"id" : "C234-1234-1234",
"time" : "2018-04-05T17:31:05Z",
"datacontenttype" : "application/json",
"data" : {
"meta" : { "hyperParam1":"value1" },
"tensor": {
"values": [1.0,0.5,0.2],
"shape": [1,3]
}
}
}
a response could be a cloudevent:
{
"specversion" : "0.2",
"type" : "org.mlspec.serving.prediction",
"schemaurl": "http://mlspec.org/serving/prediction.proto"
"source" : "/mymodel1",
"id" : "C234-1234-1234",
"time" : "2018-04-05T17:31:05Z",
"datacontenttype" : "application/json",
"data" : {
"meta" : {
"model":"resetNetv1.1",
"status":"success"
},
"tensor": {
"values": [0.9],
"shape": [1]
}
}
}
This may be useful in future pipeline use cases where components of the pipeline may only be interested in some subset of events. So all components may respond and transform prediction
events but only some to reinforcement learning feedback
events.
Our framework servers don't have e2e or unit tests.
It would be great to have some automated testing in place to determine that changes do not break anything:
It might be good to start with:
And eventually enable:
For e2e tests in Kubeflow, see https://github.com/kubeflow/testing#setting-up-kubeflow-test-infrastructure
/kind feature
Describe the solution you'd like
KFServing should expose a consistent way to download models across inference servers and clouds. The current implementation depends on the features of individual inference servers expose. E.g. see #137
Anything else you would like to add:
Proposed solution design is documented here: https://docs.google.com/document/d/1xqBOkoQ6Vzc5gv4O5MgVVNE3qILbKuMkC-DN5zp5w28/edit?usp=sharing
GKE uses nodeselectors to choose GPU type https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#multiple_gpus.
This requires a mutating admission controller to mutate the underlying knative deployment to use a node selector. This behavior isn't currently supported directly by the knative team.
/kind feature
Describe the solution you'd like
Extend kfserver and implement load/predict function for pytorch model server.
/kind feature
This is to fix the TODOs left in the kfservice_status.go
to propagate configuration and route conditions to KFService
.
https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha1/kfservice_status.go#L51
/kind feature
Describe the solution you'd like
Tensorflow images are already built, need to wire them up.
XGboost may need some work in the KFService container
/kind feature
Describe the solution you'd like
kfserving should support onprem cluster, since there are some user cases which are privisioned for on-premise cluster, and the trained model is stored in local storgae. That's better to have a easy way to configure local storage such as PV/PVC and then allow modelUri
point to local path that's mounted by PVC.
/kind feature
Describe the solution you'd like
In addition to making all images configurable with a config map, we need a local developer story. We should have something along the lines of a kustomize patch that overrides a local configmap to point to the developers modified framework images.
/kind feature
Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?
One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?
KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).
Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324
Just wondering how many kinds of modelUrl are supported ?
I can see from the example that google cloud storage is supported : gs://
what about hdfs path like : "hdfs://192.168.11.4:9000/testdata/saved_model_half_plus_two_cpu" ?
Any others?
Thanks
Ben
/kind feature
Describe the solution you'd like
Allow users to use ONNX models in KFServing
Anything else you would like to add:
TensorRT appears to be adding support for ONNX too... so its possible that we may be able to use the TensorRT inference server for this. Ref: triton-inference-server/server#293
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Use the permanent forked version of glog for kubernetes instead of log.
ref:
kubernetes/kubernetes#70264
https://github.com/kubernetes/klog/blob/master/README.md
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
/cc @ellis-bigelow
Tensorflow deployed pods take forever to die. This is probably due to signal mishandling for TFserving. Perhaps we should be more aggressive with shutdown logic on our pods.
Following things might be good candidates to add to config maps
/kind feature
Describe the solution you'd like
TensorRT is a ML Serving framework for DL supporting multiple frontends (TF, ONNX, etc), multi-model serving, GCS/NFS integration (needs S3, etc), Metrics.
They also have a protocol: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto
It might make sense to support TensorRT as a top level framework. We may eventually wish to support multiple frontends (i.e., user submits a saved model.pb, we transform it to a tensorrt model in an init container, and then serve).
We may also want to consider adopting/modifying the TensorRT protocol.
Now there are some inference servers such as TensorRT Inference Server, GraphPipe, TensorFlow Serving, and so on. Different may want to use different servers. Thus I think we should support different servers.
BTW, some servers support serving multiples models and multiple frameworks. For example, Graphpipe supports TensorFlow, PyTorch and Caffe. Maybe we also should investigate how to support co-serving multiple models in one serving CRD.
In order to read model from GCS/S3 storage, we need to support credentials.
e.g for S3 we need to create a secret and refer as environment variables.
kubectl create secret generic aws-creds --from-literal=awsAccessKeyID=${AWS_ACCESS_KEY_ID} \
--from-literal=awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY}
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: awsAccessKeyID
name: aws-creds
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: awsSecretAccessKey
name: aws-creds
Currently has none and just returns nil
We need to set sane defaults for resource requirements and runtime versions. We will likely need other defaults in the future. This is best implemented with an admission controller.
We need performant model servers for each of these technologies and auto-wiring via high level specs analogous to TensorflowSpec
When creating KFservers such as xgboost the modelName
is taken to be metdata.Name
. The user then needs to specify this twice in an API call. For example, in the xgboost sample:
MODEL_NAME=xgboost-iris
curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/models/$MODEL_NAME:predict -d $INPUT_PATH
modelName
to the path as we route the request?, e.g.curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/predict
modelName
then they could try other model names that might happen to be loaded on that server which maybe not what we want./kind feature
Describe the solution you'd like
As part of kfserving, we should optionally be able to deploy a minio server for dev/test needs. Its impractical expecting every user to have gcs/s3 accounts. In future, this can also simplify our test infrastructure
/kind feature
Describe the solution you'd like
KFServices
upon config map changes, it may sound a bit scary but with knative's safe rollout it may not be a problem and we can have a consistent view with config map values, it is also simple to reason about.Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Generate deployable resource yamls for kfserving deployment. This will simplify the kfserving deployments for people wanting to try this out.
Something like here: https://knative.dev/docs/install/knative-custom-install/
https://github.com/knative/serving/releases/download/v0.6.0/serving.yaml
Hi, I am working on Model Serving these days and interested in contributing to this repo. Do we have any doc or design proposal of this repo? Are we trying to implement a Serving CRD?
For now https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha1/kfservice_types.go#L52-L54 has 3 fields for Tensorflow, XGBoost and SKLearn models, in fact TensorflowSpec, XGBoostSpec and SKLearnSpec share same structure. how about refactor the type as below to make ModelSpec thin:
type ModelSpec struct {
// Service Account Name
ServiceAccountName string `json:"serviceAccountName,omitempty"`
// Minimum number of replicas, pods won't scale down to 0 in case of no traffic
MinReplicas int `json:"minReplicas,omitempty"`
// This is the up bound for autoscaler to scale to
MaxReplicas int `json:"maxReplicas,omitempty"`
// The following fields follow a "1-of" semantic. Users must specify exactly one spec.
Custom *CustomSpec `json:"custom,omitempty"`
ModelTemplate * ModelTemplate `json:"modelTemplate,omitempty"`
}
type ModelTemplate struct {
FrameworkName string `json:"frameworkName"`
ModelURI string `json:"modelUri"`
// Defaults to latest Version.
RuntimeVersion string `json:"runtimeVersion,omitempty"`
// Defaults to requests and limits of 1CPU, 2Gb MEM.
Resources v1.ResourceRequirements `json:"resources,omitempty"`
}
We want to allow the user to run kfserving components without the dependency on KNative.
An initial implementation could :
➜ kfserving git:(admission) ✗ make docker-build
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/crds'
RBAC manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/rbac'
go test ./pkg/... ./cmd/... -coverprofile cover.out
? github.com/kubeflow/kfserving/pkg/apis [no test files]
? github.com/kubeflow/kfserving/pkg/apis/serving [no test files]
ok github.com/kubeflow/kfserving/pkg/apis/serving/v1alpha1 8.408s coverage: 42.5% of statements
? github.com/kubeflow/kfserving/pkg/controller [no test files]
--- FAIL: TestReconcile (0.29s)
kfservice_controller_test.go:185:
Unexpected error:
<*errors.StatusError | 0xc0002d8b40>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
Status: "Failure",
Message: "Operation cannot be fulfilled on services.serving.knative.dev \"foo\": the object has been modified; please apply your changes to the latest version and try again",
Reason: "Conflict",
Details: {
Name: "foo",
Group: "serving.knative.dev",
Kind: "services",
UID: "",
Causes: nil,
RetryAfterSeconds: 0,
},
Code: 409,
},
}
Operation cannot be fulfilled on services.serving.knative.dev "foo": the object has been modified; please apply your changes to the latest version and try again
occurred
FAIL
coverage: 72.0% of statements
FAIL github.com/kubeflow/kfserving/pkg/controller/kfservice 7.865s
? github.com/kubeflow/kfserving/pkg/frameworks/tensorflow [no test files]
ok github.com/kubeflow/kfserving/pkg/reconciler/ksvc 7.927s coverage: 81.8% of statements
ok github.com/kubeflow/kfserving/pkg/reconciler/ksvc/resources 0.437s coverage: 100.0% of statements
? github.com/kubeflow/kfserving/pkg/webhook [no test files]
? github.com/kubeflow/kfserving/pkg/webhook/admission/kfservice [no test files]
? github.com/kubeflow/kfserving/cmd/manager [no test files]
make: *** [test] Error 1
/kind feature
Describe the solution you'd like
Similarly to abstracting Knative and Istio implementation details KF-Serving could present performance telemetry in an easily digestible form. I'd like to start a discussion on acceptable naming and metrics setup for KF-Serving as a first step towards having detailed instrumentation.
In our use case we have defined the following latency metrics -
Anything else you would like to add:
I'm looking to involve the Kfserving community in determining what a useful set of metrics would look like. If these can be standardized at the level of kfserving it would be a great base to build on.
CC @rakelkar
I'm interested the kfserving repo, and want to have a quick start for this, seems no basic operation introduction in Readme, any plan to deliver that? or could please share the design document? Thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.