GithubHelp home page GithubHelp logo

kserve / kserve Goto Github PK

View Code? Open in Web Editor NEW
3.3K 64.0 1.0K 423.25 MB

Standardized Serverless ML Inference Platform on Kubernetes

Home Page: https://kserve.github.io/website/

License: Apache License 2.0

Dockerfile 0.60% Makefile 0.43% Go 36.11% Python 60.94% Shell 1.91% Procfile 0.01%
knative machine-learning model-interpretability model-serving istio kubeflow kubeflow-pipelines artificial-intelligence tensorflow pytorch

kserve's Issues

[feature request] Profile models to help determine optimal resource limits

Data scientists find it hard to figure out resource limits. KFServing should make it easy to set optimal limits automatically (within user set max bounds).

Possibly a VPA and something that fires off a set of warm-up or test queries on canary would suffice?

The Knative runtime contract also indicates that the serverless platform (knative) is allowed to adjust resource limits, so maybe kfserving serving gets this for free by virtue of relying on knative?

Allow Custom Components

Allow the creation of custom components initially provided by a v1.Container specification.

Architecture Diagram

It would be useful for developers to have an architecture diagram to assist in the development and debugging of KFServices.

verify-golint recommends incorrect go get

/kind bug

What steps did you take and what happened:

$ make                                                                   
go generate ./pkg/... ./cmd/...                        
go fmt ./pkg/... ./cmd/...                                                             
go vet ./pkg/... ./cmd/...                                        
hack/verify-golint.sh                                  
Can not find golint, install with:       
go get -u github.com/golang/lint/golint                
Makefile:52: recipe for target 'lint' failed
make: *** [lint] Error 1

What did you expect to happen:
go get golang.org/x/lint/golint
per golang/lint#415

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Istio Version:
  • Knative Version:
  • KFServing Version:
  • Kubeflow version:
  • Minikube version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Ability to configure the "domain"

/kind feature

Describe the solution you'd like
Ability to configure the "domain" configmap

"myorg.mycluster.com" is the prefix of all serving uris
people need a wildcard domain and a cert to match
Knative is planning on autoprovisioning certs in Beta

Promote canary to default without creating new deployment

Currently when user promotes canary spec to default, a new deployment will be created under default configuration despite that the same deployment was already created with canary configuration and serving the traffic, the traffic also needs to switch over to the newly created deployment. Although this is automatic it may take some time to launch the deployment and if the min replica is set low, knative will spend another few seconds to scale up the pods to serve 100% traffic. One proposal to make the canary promotion seamless is that we can add configuration=default and configuration=canary labels to the two configurations we create for the KFService, controller can then use label to select configuration instead of name, upon canary promotion controller flips the two labels, in this way once canary deployment goes to 100 it effectively become default upon promotion.

CreateKnativeService does not sync to desired state in non-happy path cases

@yuzisun The current implementation of CreateKnativeService will not sync cluster state to spec desired state in the following cases (admittedly off the happy path):

  1. User modifies the default spec while a canary is active. Default revision will remain unchanged
  2. New cluster (does not have any existing revisions) syncs serving CRD with default and canary. Will only provision canary?

I think these issues stem from the fact that this logic depends on cluster state (revision state). Can we do some other knative thing to resolve this? Happy to try to fix if can point me in the right kn direction!

https://github.com/kubeflow/kfserving/blob/master/pkg/reconciler/ksvc/resources/knative_service.go

	if kfsvc.Spec.Canary == nil || kfsvc.Spec.Canary.TrafficPercent == 0 {
		//TODO(@yuzisun) should we add model name to the spec, can be different than service name?
		container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Default)
		revisions = []string{knservingv1alpha1.ReleaseLatestRevisionKeyword}
		routingPercent = 0
	} else {
		container = CreateModelServingContainer(kfsvc.Name, &kfsvc.Spec.Canary.ModelSpec)
		revisions = []string{kfsvc.Status.Default.Name, knservingv1alpha1.ReleaseLatestRevisionKeyword}
		routingPercent = kfsvc.Spec.Canary.TrafficPercent
	}

Proposal: Collapse URI External/Internal to just URI

/kind feature

Describe the solution you'd like
We should just have:
status.url

Anything else you would like to add:
External URI should be delegated to the cloud provider. KFServing assumes an istio/knative fabric and will follow the domain provided. We don't need to reinvent the wheel here.

Wire up scikitlearn spec

Add a handler for the scikitlearn model type. Controller work to handoff to servers that support these model types. Related to #18

Strangeness with top level spec fields applying to canary/default

Currently, min/maxreplicas are set outside of the default/canary specification. This for strange scenarios where replicas changes are not able to be rolled out cleanly.

Should we move this into the canary/default specs? More generally, given that our CRD is responsible for two backend configurations, does it make sense to. by default, add new fields to canary or default instead of at the top level (e.g. auth).

This is also somewhat challenging for any features triggered based off of annotations, as the annotations would need to specify default/canary as well, or be subject to the same weirdness.

Along these same lines, I've been thinking that it might make sense to move trafficPercent up a level. Our 3 top level keys are then corresponding the to underlying 3 knative resources.

Revisions Concept Breaks GitOps

There has been a large discussion over email about this topic. I'm opening this up to the repo so we have history and increase participation.

Data Plane Specification - Initial Thoughts

Some early thoughts on the data plane layer to garner feedback on directions.

Goals

  • Provide a data payload that a range of predefined ML Servers including TFServing, XGBoost, SkLearn can interface to handle prediction requests and return responses to an external client.
  • Allow custom predict functions to be built that can serve requests using the API.

Initial Design Questions

  1. Should we provide a new data plane spec or allow the control plane components to specify which data plane they will handle?

The reasons to not provide an initial data plane spec are:

  • We want to provide flexibility.
  • We want to utilize initial payloads specifications which will allow for speedy implementation using existing components such as Seldon's Prediction payload or TFServing payload.
  • We expect data science models to cover a wide variety of use cases so its unlikely a single spec can be both generic enough and simple enough.

For the above the control plane specs would be extended with a protocol field, e.g.

apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
  name: "myModel"
  spec:
    minReplicas: 1
    maxReplicas: 2 
    tensorflow:
      modelUri: "gs://mybucket/mymodel"
      protocol: seldon # could also be tfserving

Considerations for a New Data Plan Spec

Payloads

There are a range of payload types that can be provided each with their own advantages.

  • TensorProto
    • Standard for Tensorflow Serving
  • NDArray
    • Useful for multi-typed data and REST JSON transport
  • Simplified Tensor (shape and array of doubles)
    • Easier for REST JSON use by beginners

Meta Data

Should we provide a place to provide meta data as part of the payload? Meta data in the API could include:

  • Unique Prediction ID
  • Arbitrary map of key/value tags
  • Meta data about the model that is serving the request
  • Metrics created

Status

Should there be a section of the payload providing status of the prediction?

  • Success/Failure
  • Error codes and human readable descritpions

Support for CloudEvents

If we wish to support multiple payload protocols and at the same time handle synchronous and asynchronous use cases we may wish to contain the payload within a meta-API that defines what schema the underlying payload contains.

Cloudevents provides a generic protocol for events. By supporting cloudevents as the top level protocol we can easily create data specifications for various ML use cases at that level rather than in 1 overarching data plane definition:

  • Prediction (existing concern for current discussion)
  • Reinforcement learning
    • Send feedback on previous predictions as to whether they were correct
    • Provide current world state and get new action

In this way we can build up a set of data plane protocols for different use cases and/or handle particular exisiting protocols (Seldon, TFServing). CloudEvents also fits well into the KNative ecosystem.

Example:

{
      "specversion" : "0.2",
      "type" : "org.mlspec.serving.prediction",
      "schemaurl": "http://mlspec.org/serving/prediction.proto"
      "source" : "/myclient",
      "id" : "C234-1234-1234",
      "time" : "2018-04-05T17:31:05Z",
      "datacontenttype" : "application/json",
      "data" : {
         "meta" : { "hyperParam1":"value1" },
         "tensor": {
	  "values": [1.0,0.5,0.2],
	  "shape": [1,3]
	 }
      }
}

a response could be a cloudevent:

{
      "specversion" : "0.2",
      "type" : "org.mlspec.serving.prediction",
      "schemaurl": "http://mlspec.org/serving/prediction.proto"
      "source" : "/mymodel1",
      "id" : "C234-1234-1234",
      "time" : "2018-04-05T17:31:05Z",
      "datacontenttype" : "application/json",
      "data" : {
         "meta" : {
	   "model":"resetNetv1.1",
           "status":"success"
          },
         "tensor": {
	  "values": [0.9],
	  "shape": [1]
	 }
      }
}

This may be useful in future pipeline use cases where components of the pipeline may only be interested in some subset of events. So all components may respond and transform prediction events but only some to reinforcement learning feedback events.

KFServing should have a consistent way of supporting model download across inference server implementations

/kind feature

Describe the solution you'd like
KFServing should expose a consistent way to download models across inference servers and clouds. The current implementation depends on the features of individual inference servers expose. E.g. see #137

Anything else you would like to add:
Proposed solution design is documented here: https://docs.google.com/document/d/1xqBOkoQ6Vzc5gv4O5MgVVNE3qILbKuMkC-DN5zp5w28/edit?usp=sharing

Enable GPUs for Tensorflow and XGBoost

/kind feature

Describe the solution you'd like
Tensorflow images are already built, need to wire them up.
XGboost may need some work in the KFService container

Support loading model from local storage

/kind feature

Describe the solution you'd like
kfserving should support onprem cluster, since there are some user cases which are privisioned for on-premise cluster, and the trained model is stored in local storgae. That's better to have a easy way to configure local storage such as PV/PVC and then allow modelUri point to local path that's mounted by PVC.

Allow developer overrides for framework images via ConfigMap

/kind feature

Describe the solution you'd like
In addition to making all images configurable with a config map, we need a local developer story. We should have something along the lines of a kustomize patch that overrides a local configmap to point to the developers modified framework images.

Support downloading models from S3/Blob or mounts for TensorRT

/kind feature

Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?

One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?

KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).

Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324

How many modelUri are supported right now?

Just wondering how many kinds of modelUrl are supported ?

I can see from the example that google cloud storage is supported : gs://

what about hdfs path like : "hdfs://192.168.11.4:9000/testdata/saved_model_half_plus_two_cpu" ?

Any others?

Thanks
Ben

TFServing takes forever to shutdown

Tensorflow deployed pods take forever to die. This is probably due to signal mishandling for TFserving. Perhaps we should be more aggressive with shutdown logic on our pods.

Support for TensorRT as a first class framework

/kind feature

Describe the solution you'd like

What is TensorRT?

TensorRT is a ML Serving framework for DL supporting multiple frontends (TF, ONNX, etc), multi-model serving, GCS/NFS integration (needs S3, etc), Metrics.

They also have a protocol: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto

How does it relate to KFServing?

It might make sense to support TensorRT as a top level framework. We may eventually wish to support multiple frontends (i.e., user submits a saved model.pb, we transform it to a tensorrt model in an init container, and then serve).

We may also want to consider adopting/modifying the TensorRT protocol.

Thoughts?

[feature request] Support different serving servers

Now there are some inference servers such as TensorRT Inference Server, GraphPipe, TensorFlow Serving, and so on. Different may want to use different servers. Thus I think we should support different servers.

BTW, some servers support serving multiples models and multiple frameworks. For example, Graphpipe supports TensorFlow, PyTorch and Caffe. Maybe we also should investigate how to support co-serving multiple models in one serving CRD.

Support credentials to load model from GCS/S3 storage

In order to read model from GCS/S3 storage, we need to support credentials.
e.g for S3 we need to create a secret and refer as environment variables.

kubectl create secret generic aws-creds --from-literal=awsAccessKeyID=${AWS_ACCESS_KEY_ID} \
  --from-literal=awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY}
            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: awsAccessKeyID
                  name: aws-creds
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: awsSecretAccessKey
                  name: aws-creds

ModelName requirement in API calls

When creating KFservers such as xgboost the modelName is taken to be metdata.Name. The user then needs to specify this twice in an API call. For example, in the xgboost sample:

MODEL_NAME=xgboost-iris
curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/models/$MODEL_NAME:predict -d $INPUT_PATH
  • Should we keep this requirement or allow users to just curl their endpoint and behind the scenes add the modelName to the path as we route the request?, e.g.
curl -v -H "Host: xgboost-iris.default.svc.cluster.local" http://$CLUSTER_IP/predict 
  • If users can specify the modelName then they could try other model names that might happen to be loaded on that server which maybe not what we want.

Add an optional deployment provision for 'minio' server

/kind feature

Describe the solution you'd like
As part of kfserving, we should optionally be able to deploy a minio server for dev/test needs. Its impractical expecting every user to have gcs/s3 accounts. In future, this can also simplify our test infrastructure

Orchestrator component

We need an orchestrator to connect serving features and impedance match between different backend interfaces.

DataPlane

Apply ConfigMap on admission webhook path and sync KFServices upon config change

/kind feature

Describe the solution you'd like

  • Load ConfigMap on admission webhook path, this will be useful when we add custom configurations for resources, readiness/liveness probes and validate the container spec.
  • Sync all KFServices upon config map changes, it may sound a bit scary but with knative's safe rollout it may not be a problem and we can have a consistent view with config map values, it is also simple to reason about.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

question: The scope of the repo

Hi, I am working on Model Serving these days and interested in contributing to this repo. Do we have any doc or design proposal of this repo? Are we trying to implement a Serving CRD?

ModelSpec definition - thinking

For now https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha1/kfservice_types.go#L52-L54 has 3 fields for Tensorflow, XGBoost and SKLearn models, in fact TensorflowSpec, XGBoostSpec and SKLearnSpec share same structure. how about refactor the type as below to make ModelSpec thin:

type ModelSpec struct {
	// Service Account Name
	ServiceAccountName string `json:"serviceAccountName,omitempty"`
	// Minimum number of replicas, pods won't scale down to 0 in case of no traffic
	MinReplicas int `json:"minReplicas,omitempty"`
	// This is the up bound for autoscaler to scale to
	MaxReplicas int `json:"maxReplicas,omitempty"`
	// The following fields follow a "1-of" semantic. Users must specify exactly one spec.
	Custom     *CustomSpec     `json:"custom,omitempty"`
	ModelTemplate     * ModelTemplate     `json:"modelTemplate,omitempty"`
}

type ModelTemplate struct {
	FrameworkName string `json:"frameworkName"`
	ModelURI string `json:"modelUri"`
	// Defaults to latest Version.
	RuntimeVersion string `json:"runtimeVersion,omitempty"`
	// Defaults to requests and limits of 1CPU, 2Gb MEM.
	Resources v1.ResourceRequirements `json:"resources,omitempty"`
}

Non-KNative Resource Option

We want to allow the user to run kfserving components without the dependency on KNative.
An initial implementation could :

  • Allow non-knative option via annotations
  • Create raw k8s deployments and services for model serving components.

Race condition on kfservice_controller_test

➜  kfserving git:(admission) ✗ make docker-build
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/crds'
RBAC manifests generated under '/Users/ellisbigelow/go/src/github.com/kubeflow/kfserving/config/rbac'
go test ./pkg/... ./cmd/... -coverprofile cover.out
?   	github.com/kubeflow/kfserving/pkg/apis	[no test files]
?   	github.com/kubeflow/kfserving/pkg/apis/serving	[no test files]
ok  	github.com/kubeflow/kfserving/pkg/apis/serving/v1alpha1	8.408s	coverage: 42.5% of statements
?   	github.com/kubeflow/kfserving/pkg/controller	[no test files]
--- FAIL: TestReconcile (0.29s)
    kfservice_controller_test.go:185:
        Unexpected error:
            <*errors.StatusError | 0xc0002d8b40>: {
                ErrStatus: {
                    TypeMeta: {Kind: "", APIVersion: ""},
                    ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
                    Status: "Failure",
                    Message: "Operation cannot be fulfilled on services.serving.knative.dev \"foo\": the object has been modified; please apply your changes to the latest version and try again",
                    Reason: "Conflict",
                    Details: {
                        Name: "foo",
                        Group: "serving.knative.dev",
                        Kind: "services",
                        UID: "",
                        Causes: nil,
                        RetryAfterSeconds: 0,
                    },
                    Code: 409,
                },
            }
            Operation cannot be fulfilled on services.serving.knative.dev "foo": the object has been modified; please apply your changes to the latest version and try again
        occurred
FAIL
coverage: 72.0% of statements
FAIL	github.com/kubeflow/kfserving/pkg/controller/kfservice	7.865s
?   	github.com/kubeflow/kfserving/pkg/frameworks/tensorflow	[no test files]
ok  	github.com/kubeflow/kfserving/pkg/reconciler/ksvc	7.927s	coverage: 81.8% of statements
ok  	github.com/kubeflow/kfserving/pkg/reconciler/ksvc/resources	0.437s	coverage: 100.0% of statements
?   	github.com/kubeflow/kfserving/pkg/webhook	[no test files]
?   	github.com/kubeflow/kfserving/pkg/webhook/admission/kfservice	[no test files]
?   	github.com/kubeflow/kfserving/cmd/manager	[no test files]
make: *** [test] Error 1

Standardize KF Serving Performance Metrics Naming

/kind feature

Describe the solution you'd like
Similarly to abstracting Knative and Istio implementation details KF-Serving could present performance telemetry in an easily digestible form. I'd like to start a discussion on acceptable naming and metrics setup for KF-Serving as a first step towards having detailed instrumentation.

In our use case we have defined the following latency metrics -

  • End-To-End Latency – request round-trip time measured from cluster ingress
  • Routing latency – request round-trip time measured from istio ingress
  • Invoke (Knative Autoscaler) Latency – request round-trip time as measured from the Istio sidecar including Knative Autoscaler latency
  • Model (Knative Revision) Latency - request round-trip time as measured from the Istio sidecar excluding Knative Autoscaler latency

Anything else you would like to add:
I'm looking to involve the Kfserving community in determining what a useful set of metrics would look like. If these can be standardized at the level of kfserving it would be a great base to build on.

CC @rakelkar

Readme file or design document

I'm interested the kfserving repo, and want to have a quick start for this, seems no basic operation introduction in Readme, any plan to deliver that? or could please share the design document? Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.