Some early thoughts on the data plane layer to garner feedback on directions. <h2

Interesting questions. <div class="snippet-clipboard-content notransl

<div class="snippet-clipboard-content notranslate position-relative overfl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Following up on <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Seldon Core's present way of doing this is To have a generic e

A few of my thoughts here. Pre-processing/ transformer and p

Data Plane Specification - Initial Thoughts about kserve HOT 14 CLOSED

kserve commented on July 26, 2024 1

Data Plane Specification - Initial Thoughts

from kserve.

Comments (14)

ellistarn commented on July 26, 2024

Great start here, Clive.

There's a lot of respond to and a lot to research for myself (cloudevents).

It looks like there might be some overlap between HTTP headers and GRPC protocol. How do we intend to model this? Should we avoid all HTTP headers and do everything in the payload? That seems a little odd
Istio returns some general purpose HTTP heads (x-envoy-backend-time). Should we view these as extraneous and not part of the spec?
Have you looked at https://github.com/grpc-ecosystem/grpc-gateway? It would be great to define once in proto and get GRPC + HTTP + Events
We should probably include a section to turn on/off features like:
1. explanation
2. skew detection
What about request batching? TFServing offers instances, which can really improve performance.

from kserve.

ellistarn commented on July 26, 2024

Also -- does this resonate with you at all? I think we'll need the "orchestrator" to impedance match different servers (e.g. tf serving). We'll also need it for enabling other features. If I understand Seldon correctly, this is what's going on under the hood.

from kserve.

ukclivecox commented on July 26, 2024

Interesting questions.

1. It looks like there might be some overlap between HTTP headers and GRPC protocol. How do we intend to model this? Should we avoid all HTTP headers and do everything in the payload? That seems a little odd

I would suggest everything is in the payload for clarity unless there is some reason to provide both?

2. Istio returns some general purpose HTTP heads (x-envoy-backend-time). Should we view these as extraneous and not part of the spec?

I think this is the same issues as 1 above no? If we allow headers in HTTP then other protocols such as gRPC will be different, as well as payloads entering via Kafka etc?

3. Have you looked at https://github.com/grpc-ecosystem/grpc-gateway? It would be great to define once in proto and get GRPC + HTTP + Events

Sounds like that could be a good way to go with everything defined in protos.

4. We should probably include a section to turn on/off features like:

Not sure what you are referring to by "section" - do you mean in the control plane? If so, yes I assume we should discuss elsewhere how to add these extra features to a kfservice either via a new CRD or extensions to the kfservice.

   1. explanation

I would assume a particular proto for explanations. At Seldon we have quite a few ideas on this and can open an issue for this?

   2. skew detection

Similar to above.

5. What about request batching? TFServing offers `instances`, which can really improve performance.

I think that would be good though maybe a later consideration? This sort of meta-prediction server algorithms could be useful across all the underlying servers.

Batch predictions I would assume be handled by assuming a shape of least 2, so first dimension is for number of batches, so smallest batch shape would be (1,X)?

from kserve.

ukclivecox commented on July 26, 2024

Also -- does this resonate with you at all? I think we'll need the "orchestrator" to impedance match different servers (e.g. tf serving). We'll also need it for enabling other features. If I understand Seldon correctly, this is what's going on under the hood.

Sounds interesting. A "routing" orchestrator that then merged the results into a single final message could be a good logical component.

One thing I'm less sure if whether "explanations" can be treated in the same way as usually I would see that as an external component that calls model server for a given request. Most recent black-box model explanation algorithms will need to call the model maybe 1000s of times per request. That model could be a copy of the target model I suppose. Something to discuss. @arnaudvl

Skew/Concept drift may also depend on a source of truth on whether particular predictions are correct or not. I suggest again we create a separate issue for the techniques here. @jklaise at Seldon has done work on this.

from kserve.

yuzisun commented on July 26, 2024

1. It looks like there might be some overlap between HTTP headers and GRPC protocol. How do we intend to model this? Should we avoid all HTTP headers and do everything in the payload? That seems a little odd
I would suggest everything is in the payload for clarity unless there is some reason to provide both?
2. Istio returns some general purpose HTTP heads (x-envoy-backend-time). Should we view these as extraneous and not part of the spec?
I think this is the same issues as 1 above no? If we allow headers in HTTP then other protocols such as gRPC will be different, as well as payloads entering via Kafka etc?

I think some use cases can be relying on http headers such as authentication, distributed tracing. Can http headers get translated to grpc meta data ? I know for kafka the headers can get translated to record headers.

from kserve.

rakelkar commented on July 26, 2024

Cool! lots to learn - cloudevents, kn queue and orchestrator!

Have a very basic question to start - what does the first goal mean exactly?

Provide a data payload that a range of predefined ML Servers including TFServing, XGBoost, SkLearn can interface to handle prediction requests and return responses to an external client.

If I am writing a new model service that is say TF, but I'm using python and have my own pre and post processing logic, what schemas should I use? Will this spec (and associated tools or protos) guide me in some direction? If not, and I invent my own request response schema, what do I need to do to play nice in kfserving and what benefits can I expect to see for my efforts?

from kserve.

ukclivecox commented on July 26, 2024

@rakelkar

To answer in general the user should understand how they can deliver their feature requests for prediction in a flexible payload that provides some guidance on the structure - e.g. TensorProto, or NDArray.

For the core (XGBoost, SKlearn, TFServing) servers I assume they would accept a small set of standard core payloads as suggested - but that's up to discussion.

For custom components the user would decide to choose the schema we offer or optionally use their own but with less builtin functionality if they choose their internal custom schema. So Explainers, Outlier Detectors etc., provided out of the box would only work with the defined Schema.

For future chaining of components into a pipeline that would only work if all components understand the same schema.

from kserve.

DavidLangworthy commented on July 26, 2024

Following up on @rakelkar's question and @cliveseldon's response. How would this concrete example work. A data scientists has provided me with a text classifier trained in tensorflow. The pre-processing logic is written in Python. It takes the input string, tokenizes it, matches it against a vocaublary dictionary loaded from a csv file, does something with UNK's and stop words, turns this into a vector (1xN Tensor) then passes that to a TFServing instance with the embedding and RNN, it then takes the resulting logits out and does some magic to map them into categories expressed as strings e.g. dog, cat, meeting, question, angry, happy and returns a vector of these strings. It is fine to pass the input string and the output strings as raw strings, strings wrapped in JSON, strings through ProtoBuf.
How would this work?
I have been assuming that the input and output is some kind of transformer and the tensorflow model is a model. Does this make sense. Can it be supported in the data plane spec or does this require some kind of high level pipelline spec?

from kserve.

ukclivecox commented on July 26, 2024

Seldon Core's present way of doing this is

To have a generic enough data plane that those data types can be transferred, (binary, string, TensorProto).
To have a pipeline of connected components to pass these payloads between

So this would be part of the discussion for the control plane next steps to see how KFService components can be combined either with Seldon or a new KNative Pipeline CRD to provide routing.

from kserve.

yuzisun commented on July 26, 2024

A few of my thoughts here.

Pre-processing/ transformer and predictor e.g TFServing expect different data schemas, for data transformer the input could be raw event data and output are features, for predictor the input would be features like fixed shape tensor, sparse vectors. When raw events come in we need to convert to cloud events and deliver to the target channel or service defined on pipeline.
I am thinking if we can have a schema/eventType registry for all the inference pipeline event types and schemas. For example

eventType: imagePrediction
schemaUrl: http://github.com/test/image.proto
dataType: application/json
dataSchema:
   "x": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "28",
        "name": ""
       },
       {
        "size": "28",
        "name": ""
       }
      ],
     },
    }

For a ML inference pipeline there can be event types such as raw event, event features, feedbacks etc and we can use event type for filter purpose since different ML service such as concept drift may be interested in feedback events.

I have been following knative eventing WG's new broker/trigger spec (https://github.com/knative/eventing/blob/master/docs/spec/spec.md#kind-trigger), this seems to be related to @ellis-bigelow's orchestrator here if I understand correctly.

from kserve.

ukclivecox commented on July 26, 2024

@yuzisun This sounds good. So a registry for event type and schemas. A few questions:

Where would be see common meta data going? In each schema or something at the transport layer - couldevents/http headers for example?
For the out-of-the-box servers (XGBoost, SKLearn) would be expect their eventypes/schemas to be at quite a general level like eventType:prediction, schemaUrl:http://tensorflow.org/tensor.proto ?
I assume application specific servers like image classification would take something as specific as you have in your example?

from kserve.

ellistarn commented on July 26, 2024

I feel like we may be conflating too many things here.

I think we have 3 high level protocols:

Events
REST
GRPC

I also expect that these will have pretty significant divergence. It may be easier to separate the definitions of these and not try to unify. I feel like munging a cloudevent syntax into an HTTP request or dropping HTTP headers in favor of GRPC will have an impact on the functionality .

HTTP has standards for "cachable" responses: https://developer.mozilla.org/en-US/docs/Glossary/cacheable. We want to make sure we conform to these HTTP standards, even if they don't make sense for GRPC.

Perhaps we can first align on a high level structure that all protocols can agree on like:

{
  metadata: MetadataSpec,
  data: DataSpec
}

and then determine what MetadataSpec and DataSpec are independently (e.g., if we need different dataspecs for tensor.proto, image.proto, etc).

If we need GRPC or Events or HTTP specific fields, they can be protocol specific and layered on top.

Also -- maybe this is a crazy idea, but is it worth looking to the kubernetes resource model for our schema (e.g., metadata, spec, labels, annotations, apiVersion, etc)?

from kserve.

ukclivecox commented on July 26, 2024

@ellis-bigelow I agree makes sense to come to some consensus on core parts of any schema.

However, I think we should also consider how control-plane components advertise what schemas they support - though this may be a separate Issue for the control plane resource definitions connecting with the wider Kubeflow story on metadata storage, usage and enforcement.

from kserve.

ellistarn commented on July 26, 2024

Yeah absolutely. It would be great to be able to poll something like
/protocolz/openapi
/protocolz/grpc
/protocolz/cloudevents

This is the type of thing we need to implement in the orchestrator component imo.

from kserve.

Data Plane Specification - Initial Thoughts about kserve HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs