kserve / kserve Goto Github PK

Standardized Serverless ML Inference Platform on Kubernetes

Home Page: https://kserve.github.io/website/

License: Apache License 2.0

Dockerfile 0.60% Makefile 0.43% Go 36.11% Python 60.94% Shell 1.91% Procfile 0.01%

knative machine-learning model-interpretability model-serving istio kubeflow kubeflow-pipelines artificial-intelligence tensorflow pytorch

kserve's Issues

[feature request] Profile models to help determine optimal resource limits

Data scientists find it hard to figure out resource limits. KFServing should make it easy to set optimal limits automatically (within user set max bounds).

Possibly a VPA and something that fires off a set of warm-up or test queries on canary would suffice?

The Knative runtime contract also indicates that the serverless platform (knative) is allowed to adjust resource limits, so maybe kfserving serving gets this for free by virtue of relying on knative?

Allow Custom Components

Allow the creation of custom components initially provided by a v1.Container specification.

Architecture Diagram

It would be useful for developers to have an architecture diagram to assist in the development and debugging of KFServices.

verify-golint recommends incorrect go get

/kind bug

What steps did you take and what happened:

$ make                                                                   
go generate ./pkg/... ./cmd/...                        
go fmt ./pkg/... ./cmd/...                                                             
go vet ./pkg/... ./cmd/...                                        
hack/verify-golint.sh                                  
Can not find golint, install with:       
go get -u github.com/golang/lint/golint                
Makefile:52: recipe for target 'lint' failed
make: *** [lint] Error 1

What did you expect to happen:
go get golang.org/x/lint/golint
per golang/lint#415

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Istio Version:
Knative Version:
KFServing Version:
Kubeflow version:
Minikube version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Ability to configure the "domain"

/kind feature

Describe the solution you'd like
Ability to configure the "domain" configmap

"myorg.mycluster.com" is the prefix of all serving uris
people need a wildcard domain and a cert to match
Knative is planning on autoprovisioning certs in Beta

Promote canary to default without creating new deployment

Currently when user promotes canary spec to default, a new deployment will be created under default configuration despite that the same deployment was already created with canary configuration and serving the traffic, the traffic also needs to switch over to the newly created deployment. Although this is automatic it may take some time to launch the deployment and if the min replica is set low, knative will spend another few seconds to scale up the pods to serve 100% traffic. One proposal to make the canary promotion seamless is that we can add configuration=default and configuration=canary labels to the two configurations we create for the KFService, controller can then use label to select configuration instead of name, upon canary promotion controller flips the two labels, in this way once canary deployment goes to 100 it effectively become default upon promotion.

manager_image_patch.yaml deleted

Any reason why this commit deleted the manager_image_patch.yaml? Master seems broken now.

CreateKnativeService does not sync to desired state in non-happy path cases

@yuzisun The current implementation of CreateKnativeService will not sync cluster state to spec desired state in the following cases (admittedly off the happy path):

User modifies the default spec while a canary is active. Default revision will remain unchanged
New cluster (does not have any existing revisions) syncs serving CRD with default and canary. Will only provision canary?

I think these issues stem from the fact that this logic depends on cluster state (revision state). Can we do some other knative thing to resolve this? Happy to try to fix if can point me in the right kn direction!

https://github.com/kubeflow/kfserving/blob/master/pkg/reconciler/ksvc/resources/knative_service.go

	if kfsvc.Spec.Canary == nil || kfsvc.Spec.Canary.TrafficPercent == 0 {
		//TODO(@yuzisun) should we add model name to the spec, can be different than service name?
		container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Default)
		revisions = []string{knservingv1alpha1.ReleaseLatestRevisionKeyword}
		routingPercent = 0
	} else {
		container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Canary.ModelSpec)
		revisions = []string{kfsvc.Status.Default.Name, knservingv1alpha1.ReleaseLatestRevisionKeyword}
		routingPercent = kfsvc.Spec.Canary.TrafficPercent
	}

Proposal: Collapse URI External/Internal to just URI

/kind feature

Describe the solution you'd like
We should just have:
status.url

Anything else you would like to add:
External URI should be delegated to the cloud provider. KFServing assumes an istio/knative fabric and will follow the domain provided. We don't need to reinvent the wheel here.

Move to 'One' yaml semantic for Model deployments

/kind feature

Describe the solution you'd like
User needs to provide only "one" yaml file for model deployment. KFserving takes care of the rest. Possibly align with MLSpec

Better descriptions on kubectl get kfservices.serving.kubeflow.org

Right now, our output is only Name and Age. We should have better object summaries.

Wire up scikitlearn spec

Add a handler for the scikitlearn model type. Controller work to handoff to servers that support these model types. Related to #18

Strangeness with top level spec fields applying to canary/default

Currently, min/maxreplicas are set outside of the default/canary specification. This for strange scenarios where replicas changes are not able to be rolled out cleanly.

Should we move this into the canary/default specs? More generally, given that our CRD is responsible for two backend configurations, does it make sense to. by default, add new fields to canary or default instead of at the top level (e.g. auth).

This is also somewhat challenging for any features triggered based off of annotations, as the annotations would need to specify default/canary as well, or be subject to the same weirdness.

Along these same lines, I've been thinking that it might make sense to move trafficPercent up a level. Our 3 top level keys are then corresponding the to underlying 3 knative resources.

Validate CRDs with a Validating Admission Controller

We need a kubebuilder webhook for validation logic.

Update DEVELOPER_GUIDE.md

Document prerequisites for development
Instructions for how to run and test code

Python Model download for GCS and S3

Downloading from GCS and S3 needs to be completed.

https://github.com/kubeflow/kfserving/blob/2f8d33d1a9773c5694a22ba749192163251fe287/python/kfserving/kfserving/storage.py#L27-L33

Revisions Concept Breaks GitOps

There has been a large discussion over email about this topic. I'm opening this up to the repo so we have history and increase participation.

Data Plane Specification - Initial Thoughts

Some early thoughts on the data plane layer to garner feedback on directions.

Goals

Provide a data payload that a range of predefined ML Servers including TFServing, XGBoost, SkLearn can interface to handle prediction requests and return responses to an external client.
Allow custom predict functions to be built that can serve requests using the API.

Initial Design Questions

Should we provide a new data plane spec or allow the control plane components to specify which data plane they will handle?

The reasons to not provide an initial data plane spec are:

We want to provide flexibility.
We want to utilize initial payloads specifications which will allow for speedy implementation using existing components such as Seldon's Prediction payload or TFServing payload.
We expect data science models to cover a wide variety of use cases so its unlikely a single spec can be both generic enough and simple enough.

For the above the control plane specs would be extended with a protocol field, e.g.

apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
  name: "myModel"
  spec:
    minReplicas: 1
    maxReplicas: 2 
    tensorflow:
      modelUri: "gs://mybucket/mymodel"
      protocol: seldon # could also be tfserving

Considerations for a New Data Plan Spec

Payloads

There are a range of payload types that can be provided each with their own advantages.

TensorProto
- Standard for Tensorflow Serving
NDArray
- Useful for multi-typed data and REST JSON transport
Simplified Tensor (shape and array of doubles)
- Easier for REST JSON use by beginners

Meta Data

Should we provide a place to provide meta data as part of the payload? Meta data in the API could include:

Unique Prediction ID
Arbitrary map of key/value tags
Meta data about the model that is serving the request
Metrics created

Status

Should there be a section of the payload providing status of the prediction?

Success/Failure
Error codes and human readable descritpions

Support for CloudEvents

If we wish to support multiple payload protocols and at the same time handle synchronous and asynchronous use cases we may wish to contain the payload within a meta-API that defines what schema the underlying payload contains.

Cloudevents provides a generic protocol for events. By supporting cloudevents as the top level protocol we can easily create data specifications for various ML use cases at that level rather than in 1 overarching data plane definition:

Prediction (existing concern for current discussion)
Reinforcement learning
- Send feedback on previous predictions as to whether they were correct
- Provide current world state and get new action

In this way we can build up a set of data plane protocols for different use cases and/or handle particular exisiting protocols (Seldon, TFServing). CloudEvents also fits well into the KNative ecosystem.

Example:

{
      "specversion" : "0.2",
      "type" : "org.mlspec.serving.prediction",
      "schemaurl": "http://mlspec.org/serving/prediction.proto"
      "source" : "/myclient",
      "id" : "C234-1234-1234",
      "time" : "2018-04-05T17:31:05Z",
      "datacontenttype" : "application/json",
      "data" : {
         "meta" : { "hyperParam1":"value1" },
         "tensor": {
	  "values": [1.0,0.5,0.2],
	  "shape": [1,3]
	 }
      }
}

a response could be a cloudevent:

{
      "specversion" : "0.2",
      "type" : "org.mlspec.serving.prediction",
      "schemaurl": "http://mlspec.org/serving/prediction.proto"
      "source" : "/mymodel1",
      "id" : "C234-1234-1234",
      "time" : "2018-04-05T17:31:05Z",
      "datacontenttype" : "application/json",
      "data" : {
         "meta" : {
	   "model":"resetNetv1.1",
           "status":"success"
          },
         "tensor": {
	  "values": [0.9],
	  "shape": [1]
	 }
      }
}

This may be useful in future pipeline use cases where components of the pipeline may only be interested in some subset of events. So all components may respond and transform prediction events but only some to reinforcement learning feedback events.

Add Tests for XGBServer and ScikitLearn server

Our framework servers don't have e2e or unit tests.

Enable Prow + CI/CD

It would be great to have some automated testing in place to determine that changes do not break anything:

It might be good to start with:

Unit Tests
Linting

And eventually enable:

End-to-End tests

For e2e tests in Kubeflow, see https://github.com/kubeflow/testing#setting-up-kubeflow-test-infrastructure

KFServing should have a consistent way of supporting model download across inference server implementations

/kind feature

Describe the solution you'd like
KFServing should expose a consistent way to download models across inference servers and clouds. The current implementation depends on the features of individual inference servers expose. E.g. see #137

Anything else you would like to add:
Proposed solution design is documented here: https://docs.google.com/document/d/1xqBOkoQ6Vzc5gv4O5MgVVNE3qILbKuMkC-DN5zp5w28/edit?usp=sharing

Enable GPU Selection in GKE

GKE uses nodeselectors to choose GPU type https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#multiple_gpus.

This requires a mutating admission controller to mutate the underlying knative deployment to use a node selector. This behavior isn't currently supported directly by the knative team.

Add pytorch model server to kfserving

/kind feature

Describe the solution you'd like
Extend kfserver and implement load/predict function for pytorch model server.

Propagate configuration/route conditions to KFService

/kind feature

This is to fix the TODOs left in the kfservice_status.go to propagate configuration and route conditions to KFService.

https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha1/kfservice_status.go#L51

Enable GPUs for Tensorflow and XGBoost

/kind feature

Describe the solution you'd like
Tensorflow images are already built, need to wire them up.
XGboost may need some work in the KFService container

Support loading model from local storage

/kind feature

Describe the solution you'd like
kfserving should support onprem cluster, since there are some user cases which are privisioned for on-premise cluster, and the trained model is stored in local storgae. That's better to have a easy way to configure local storage such as PV/PVC and then allow modelUri point to local path that's mounted by PVC.

Allow developer overrides for framework images via ConfigMap

/kind feature

Describe the solution you'd like
In addition to making all images configurable with a config map, we need a local developer story. We should have something along the lines of a kustomize patch that overrides a local configmap to point to the developers modified framework images.

Support downloading models from S3/Blob or mounts for TensorRT

/kind feature

Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?

One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?

KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).

Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324

How many modelUri are supported right now?

Just wondering how many kinds of modelUrl are supported ?

I can see from the example that google cloud storage is supported ： gs://

what about hdfs path like : "hdfs://192.168.11.4:9000/testdata/saved_model_half_plus_two_cpu" ?

Any others?

Thanks
Ben

Support for ONNX as a first class framework

/kind feature

Describe the solution you'd like
Allow users to use ONNX models in KFServing

Anything else you would like to add:
TensorRT appears to be adding support for ONNX too... so its possible that we may be able to use the TensorRT inference server for this. Ref: triton-inference-server/server#293

Use klog for kfserving

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Use the permanent forked version of glog for kubernetes instead of log.

ref:
kubernetes/kubernetes#70264
https://github.com/kubernetes/klog/blob/master/README.md

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

/cc @ellis-bigelow

TFServing takes forever to shutdown

Tensorflow deployed pods take forever to die. This is probably due to signal mishandling for TFserving. Perhaps we should be more aggressive with shutdown logic on our pods.

Add to config maps for configurable settings or defaults

Following things might be good candidates to add to config maps

Default serving images for different frameworks
Secret key names for gcp/s3
Readiness/Liveness probes
Default resource requirements

Support for TensorRT as a first class framework

/kind feature

Describe the solution you'd like

What is TensorRT?

TensorRT is a ML Serving framework for DL supporting multiple frontends (TF, ONNX, etc), multi-model serving, GCS/NFS integration (needs S3, etc), Metrics.

They also have a protocol: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto

How does it relate to KFServing?

It might make sense to support TensorRT as a top level framework. We may eventually wish to support multiple frontends (i.e., user submits a saved model.pb, we transform it to a tensorrt model in an init container, and then serve).

We may also want to consider adopting/modifying the TensorRT protocol.

Thoughts?

[feature request] Support different serving servers

Now there are some inference servers such as TensorRT Inference Server, GraphPipe, TensorFlow Serving, and so on. Different may want to use different servers. Thus I think we should support different servers.

BTW, some servers support serving multiples models and multiple frameworks. For example, Graphpipe supports TensorFlow, PyTorch and Caffe. Maybe we also should investigate how to support co-serving multiple models in one serving CRD.

Support credentials to load model from GCS/S3 storage

In order to read model from GCS/S3 storage, we need to support credentials.
e.g for S3 we need to create a secret and refer as environment variables.

kubectl create secret generic aws-creds --from-literal=awsAccessKeyID=${AWS_ACCESS_KEY_ID} \
  --from-literal=awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY}

            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: awsAccessKeyID
                  name: aws-creds
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: awsSecretAccessKey
                  name: aws-creds

Tensorflow Spec needs validation

Currently has none and just returns nil

Default CRDs with DefaultingAdmissionController

We need to set sane defaults for resource requirements and runtime versions. We will likely need other defaults in the future. This is best implemented with an admission controller.

Support for additional frameworks scikit/xgboost/pytorch

We need performant model servers for each of these technologies and auto-wiring via high level specs analogous to TensorflowSpec

ModelName requirement in API calls

When creating KFservers such as xgboost the modelName is taken to be metdata.Name. The user then needs to specify this twice in an API call. For example, in the xgboost sample:

MODEL_NAME=xgboost-iris
curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/models/$MODEL_NAME:predict -d $INPUT_PATH

Should we keep this requirement or allow users to just curl their endpoint and behind the scenes add the modelName to the path as we route the request?, e.g.

curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/predict

If users can specify the modelName then they could try other model names that might happen to be loaded on that server which maybe not what we want.

Add an optional deployment provision for 'minio' server

/kind feature

Describe the solution you'd like
As part of kfserving, we should optionally be able to deploy a minio server for dev/test needs. Its impractical expecting every user to have gcs/s3 accounts. In future, this can also simplify our test infrastructure

Orchestrator component

We need an orchestrator to connect serving features and impedance match between different backend interfaces.

Apply ConfigMap on admission webhook path and sync KFServices upon config change

/kind feature

Describe the solution you'd like

Load ConfigMap on admission webhook path, this will be useful when we add custom configurations for resources, readiness/liveness probes and validate the container spec.
Sync all KFServices upon config map changes, it may sound a bit scary but with knative's safe rollout it may not be a problem and we can have a consistent view with config map values, it is also simple to reason about.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Generate deployable resource yamls as release artifacts

Generate deployable resource yamls for kfserving deployment. This will simplify the kfserving deployments for people wanting to try this out.
Something like here: https://knative.dev/docs/install/knative-custom-install/
https://github.com/knative/serving/releases/download/v0.6.0/serving.yaml

question: The scope of the repo

Hi, I am working on Model Serving these days and interested in contributing to this repo. Do we have any doc or design proposal of this repo? Are we trying to implement a Serving CRD?

ModelSpec definition - thinking

For now https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha1/kfservice_types.go#L52-L54 has 3 fields for Tensorflow, XGBoost and SKLearn models, in fact TensorflowSpec, XGBoostSpec and SKLearnSpec share same structure. how about refactor the type as below to make ModelSpec thin:

type ModelSpec struct {
	// Service Account Name
	ServiceAccountName string `json:"serviceAccountName,omitempty"`
	// Minimum number of replicas, pods won't scale down to 0 in case of no traffic
	MinReplicas int `json:"minReplicas,omitempty"`
	// This is the up bound for autoscaler to scale to
	MaxReplicas int `json:"maxReplicas,omitempty"`
	// The following fields follow a "1-of" semantic. Users must specify exactly one spec.
	Custom     *CustomSpec     `json:"custom,omitempty"`
	ModelTemplate     * ModelTemplate     `json:"modelTemplate,omitempty"`
}

type ModelTemplate struct {
	FrameworkName string `json:"frameworkName"`
	ModelURI string `json:"modelUri"`
	// Defaults to latest Version.
	RuntimeVersion string `json:"runtimeVersion,omitempty"`
	// Defaults to requests and limits of 1CPU, 2Gb MEM.
	Resources v1.ResourceRequirements `json:"resources,omitempty"`
}

Non-KNative Resource Option

We want to allow the user to run kfserving components without the dependency on KNative.
An initial implementation could :

Allow non-knative option via annotations
Create raw k8s deployments and services for model serving components.

Race condition on kfservice_controller_test

➜  kfserving git:(admission) ✗ make docker-build
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/crds'
RBAC manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/rbac'
go test ./pkg/... ./cmd/... -coverprofile cover.out
?   	github.com/kubeflow/kfserving/pkg/apis	[no test files]
?   	github.com/kubeflow/kfserving/pkg/apis/serving	[no test files]
ok  	github.com/kubeflow/kfserving/pkg/apis/serving/v1alpha1	8.408s	coverage: 42.5% of statements
?   	github.com/kubeflow/kfserving/pkg/controller	[no test files]
--- FAIL: TestReconcile (0.29s)
    kfservice_controller_test.go:185:
        Unexpected error:
            <*errors.StatusError | 0xc0002d8b40>: {
                ErrStatus: {
                    TypeMeta: {Kind: "", APIVersion: ""},
                    ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
                    Status: "Failure",
                    Message: "Operation cannot be fulfilled on services.serving.knative.dev \"foo\": the object has been modified; please apply your changes to the latest version and try again",
                    Reason: "Conflict",
                    Details: {
                        Name: "foo",
                        Group: "serving.knative.dev",
                        Kind: "services",
                        UID: "",
                        Causes: nil,
                        RetryAfterSeconds: 0,
                    },
                    Code: 409,
                },
            }
            Operation cannot be fulfilled on services.serving.knative.dev "foo": the object has been modified; please apply your changes to the latest version and try again
        occurred
FAIL
coverage: 72.0% of statements
FAIL	github.com/kubeflow/kfserving/pkg/controller/kfservice	7.865s
?   	github.com/kubeflow/kfserving/pkg/frameworks/tensorflow	[no test files]
ok  	github.com/kubeflow/kfserving/pkg/reconciler/ksvc	7.927s	coverage: 81.8% of statements
ok  	github.com/kubeflow/kfserving/pkg/reconciler/ksvc/resources	0.437s	coverage: 100.0% of statements
?   	github.com/kubeflow/kfserving/pkg/webhook	[no test files]
?   	github.com/kubeflow/kfserving/pkg/webhook/admission/kfservice	[no test files]
?   	github.com/kubeflow/kfserving/cmd/manager	[no test files]
make: *** [test] Error 1

Standardize KF Serving Performance Metrics Naming

/kind feature

Describe the solution you'd like
Similarly to abstracting Knative and Istio implementation details KF-Serving could present performance telemetry in an easily digestible form. I'd like to start a discussion on acceptable naming and metrics setup for KF-Serving as a first step towards having detailed instrumentation.

In our use case we have defined the following latency metrics -

End-To-End Latency – request round-trip time measured from cluster ingress
Routing latency – request round-trip time measured from istio ingress
Invoke (Knative Autoscaler) Latency – request round-trip time as measured from the Istio sidecar including Knative Autoscaler latency
Model (Knative Revision) Latency - request round-trip time as measured from the Istio sidecar excluding Knative Autoscaler latency

Anything else you would like to add:
I'm looking to involve the Kfserving community in determining what a useful set of metrics would look like. If these can be standardized at the level of kfserving it would be a great base to build on.

CC @rakelkar

Readme file or design document

I'm interested the kfserving repo, and want to have a quick start for this, seems no basic operation introduction in Readme, any plan to deliver that? or could please share the design document? Thanks a lot!

kserve / kserve Goto Github PK

kserve's Issues

Goals

Initial Design Questions

Considerations for a New Data Plan Spec

Payloads

Meta Data

Status

Support for CloudEvents

What is TensorRT?

How does it relate to KFServing?

Thoughts?

Recommend Projects

Recommend Topics

Recommend Org

Jobs