servicemeshinterface / smi-spec Goto Github PK

View Code? Open in Web Editor NEW

1.1K 73.0 124.0 1014 KB

Service Mesh Interface

Home Page: https://smi-spec.io

License: Apache License 2.0

Makefile 100.00%

service-mesh cncf servicemesh smi

smi-spec's People

Contributors

Stargazers

Watchers

Forkers

gyliu513 zhongzunfa cimomo surajssd kinvolk-archives ivanayov pothulapati linfan yaron2 bbandaru amruta-bandhu-chaudhury beaver-company servicefoundation darren-fu slack bettyjunod rkamisetti792 chzbrgr71 cxfly findpritish zkleiting jeremyrickard delqn rickducott sirnexus barnettzqg chandanpasunoori andypeng2015 ishwarchandra real420og lachie83 tomkerkhove bridgetkromhout liangjw adleong michellen nveenjain draychev patricekrakow hasheddan hairmare forkkit santode nickjackson kiranopatil smarkm freight-trust withlin idvoretskyi cloudmelon leosunmo aaronlcy666 thorstenhans devopstoday11 snehil03 kevinpollet jangocheng aeolabs hangyan tcbyrd jessesuen solo-io michaelxcc raghav3112 jimm-with-a-j leewalter areebmoin clix-dev-llc marvel-works karenhchu adithyaakrishna alan-cha atdavidpark zhdavis woodscumming wasmup nathanawmk meijerm1 mitzen mocofound tokers xunzhuo moteesh-reddy sheetdea dystudio azurecloudmonk tapaswenipathak lingsamuel manick02 manny27nyc abheejain bujihalil1936 atharmohammad frankfanslc keithmattix greenstevester hramazani awesomegolang sundeep-p ams0

smi-spec's Issues

Canary and namespace interactions

Opening this issue to track discussion raised by @grampelberg in #2

How should this interact with namespaces? One of the downsides to the current workflow is that deployment names end up changing and require a tool such as helm or kustomize. By allowing canaries between namespaces, it would be possible to keep names identical and simply clean up namespaces as new versions come out.

Recommended Best Practices

#26 has got me thinking about documentation for the spec and how to use it as a user. There are definitely best practices emerging as we start in on implementation. How you go about actually doing a canary rollout with traffic split is a good one. Another example is the implications around service accounts, identity and access control.

What do folks think is the best way to go about that? Add docs to the repo? Do something on the website? Add them on a per-implementation basis?

Some of this should be managed by code, obviously, such as an OPA based admission controller to do best practice. We should probably open up some separate issues to work on those types of solutions as well.

Transfer priorities from agenda to project board

https://docs.google.com/document/d/1NTBaJf6LhUBlF8_lfvBBt_MbyPvT-6CZNg6Ckpm_yCo/edit#heading=h.wpaq1r94m39s

Conformance Test Tool

Like other specifications, SMI needs a set of conformance tests, conformance utility and dashboard.

Meshery used as the underlying technology for conformance tests with these goals and acknowledgments:

A public, third-party compatibility matrix identifying the SMI features that are supported per service mesh.
Definition of what behavior is expected and conforms to spec. vs not. Partial conformance could be defined by a minimum requirement. It may be that partial conformance allows for / encourages extensibility.
Difference between conformance and full implementation given that some meshes may conscientiously never fully implement functions.

Meshery is soon to be used in the release process of each of the major service mesh projects for performance testing and as such can perform SMI validation testing at the same time.

Design spec

TrafficSplit Question: Are weights in resource.Quantity?

apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: my-weights
spec:
  # The root service that clients use to connect to the destination application.
  service: numbers
  # Services inside the namespace with their own selectors, endpoints and configuration.
  backends:
  - service: one
    # Identical to resources, 1 = 1000m
    weight: 10m
  - service: two
    weight: 100m
  - service: three
    weight: 1500m

Are the weights defined in the above TrafficSplit object, Kubernetes resource.Quantity. ?
Also does weight support all the various suffixes supported by resource.Quantity? Or is it that the spec just supports m a.k.a milli right now ?

If it's full-fledged Quantity then I think servicemeshinterface/smi-sdk-go#6 makes sense.

Metrics spec mentions pods as entity to work with

The specification
https://github.com/deislabs/smi-spec/blob/master/traffic-metrics.md#specification

seems to talk about possibility to define metrics on specific pods.
Isn't this a bit "anti k8s" working with specific pods that can be down and redeployed elsewhere (with a different identifier) any moment? what about scaling? how can someone define metrics on future scaled instances?

rename `specs` to `routes`

https://github.com/deislabs/smi-spec/blob/master/traffic-specs.md

If you replace specs with routes in the traffic access example, it becomes more readable and intuitive.

should we move smi-spec and related projects to its own github org?

ValidatingAdmissionWebhook

At launch, not all meshes will have implementations for all the APIs. It doesn't feel like the APIs will be fully implemented either. Should the specification explicitly call out providing a ValidatingAdmissionWebhook that messages users they've configured a feature that is not supported by the underlying implementation?

Virtual Service/Router/Balancer type concept

I don't see any type of ability to create a virtual service/router/balancer. Something that physically doesn't exist but has routing rules to get to something that does. Basically this ends up being something that looks a lot like Ingress. You might say then use ingress but the service mesh basically has to subsume ingress functionality. The reason being that ingress is north-south only but users need the same functionality east west and logically the same ACL, routing, metrics rules of north south should apply to east west.

So what I'd like to see is I create a service foo in k8s that has no selector. Then create a resources like

kind: Router
spec:
  service: foo
  routes:
  - match:
    # something referring to httpRouteGroup
    dest:
    # something to target a service (that can have traffic splitting applied)

Additional comments/questions on Traffic Access Control

Had the following questions/comments

There is a note that says authentication is handled by underlying implementation. But we need a way to explicitly say that mTLS is needed to communicate with certain services and also specify exceptions. Some of this was there in the first version of the spec.
Specifying the destination
a. Selection is happening based on service account. Multiple Pods across multiple services may have the same service account. It should be possible to also specify service names in the destination. It would be the name that customer uses to create the service.

b. The spec currently says the following

Allowing destination traffic should only be possible with permission of the service owner. Therefore, RBAC rules should be configured to control the pods which are allowed to assign the ServiceAccount defined in the TrafficTarget destination.

This addresses the concern that we dont want spurious services illegally using a service-account but when we allow service name to be specified in the destination then client side enforcement will be required and so it will be good to also include service-account info in the destination.

Specifying the source
a. We need to be able to specify the service names here too (that are allowed) along with the service account info
Out of scope section
Egress Policy - I think we cant keep this out of scope for too long. Istio supports Pods accessing resources outside the cluster. We may need some policies to restrict what can be accessed. Do we use Network Policies for that?
Ingress Policy - is this about accessing services from outside the cluster?

HPA

It is pretty valuable to base HPA decisions off some of the data in TrafficMetrics. What is the integration story there? How could implementations be integrated?

Clarify traffic-split weighting semantics

Consider the following:

We have a blue and green service and deployment.
The blue deployment has 1 replica.
The green deployment has 9 replicas.
There is a service, target, with a traffic split:

--
apiVersion: smi-spec.io/v1beta1
kind: TrafficSplit
metadata:
  name: target-split
spec:
  service: target
  backends:
  - service: blue
    weight: 1000m
  - service: green
    weight: 1000m

The behavior can either be:

10% of the traffic is sent to each pod; OR
50% of the traffic is sent to the blue pod and 5.5% of the traffic sent to each green pod.

I believe that the spec intends the latter, but as far as i can tell, the handling of weights is not described explicitly.

weights should be whole numbers

Hey folks,

In the traffic split example yaml, there are instances where weights are strings with an ending m for milli and there are other instances where weights are just whole numbers like 1 or 0. I was pretty confused when I first encountered the milli notation like many others and I couldn't find great documentation on how to use this.

I understand now that 1 == 1000m and that you'll want to take the sum of all weights in milli and then divide each service's weight by the sum to get the amount of traffic to send to a service. However, this is still pretty confusing to calculate when thinking about Istio for example. Istio doesn't use the milli notation. Weights are whole numbers where the weight must be 0-100 and the sum of the weights must be 100.

I'd like to see us standardize on what weights are and give better documentation here, so I have two points I'd like feedback on:

I'd like to propose that weights be whole numbers rather than strings. It will make weight calculation logic much easier.
I'd like for us to standardize on how we calculate weights so that a TrafficSplit resource means the same thing regardless of the service mesh implementation running under the hood. We can go the Istio route with percentage based weights if that makes sense. @grampelberg has a different point of view here to consider.

#62 converts all weight strings to whole numbers and adds the line to constrain weights whole numbers.

cc/ @nicholasjackson @lachie83 @grampelberg

Should SMI have an Ingress API?

I keep hearing from members of the community that "ingress and service-mesh is hard". From a quick sweep of ingress documentation across the different implementers of SMI, users either have a bunch of CRDs or a fully annotated Kubernetes Ingress in order to achieve ingress traffic to the service mesh.

Firstly - When I say Ingress, I mean North/South traffic onto the service mesh. I'm not yet concerned about East/West.

The question I would like to propose to the community is "Should SMI have an Ingress API"?

It is my position that SMI shouldn't have it's own Ingress API but instead see if we can leverage and integrate with the ongoing work that's happening in Kubernetes sig-network regarding what's currently known as ingress v2. The current goals of that project can be found in the API draft document. Specifically, these new set of APIs are designed to be generic and extensible.

I would like to propose that we take the time to review the ingress v2 API draft document keeping the following questions in mind:

Does this sound implementable?
Is there any other feedback we can provide to the draft document given the perspective that this group already has implementing the SMI APIs.
Although not a primary goal, could ingress v2 provide a pathway to implementing an East/West interface. See #37

If there is sufficient interest in collaborating with the goal of eventually implementing ingress v2, I would be more than happy to represent the SMI community in the upstream ingress v2 working group.

cc @grampelberg @olix0r @nicholasjackson @michelleN @ilevine @ibuildthecloud @aanandr

add info about community meetings to readme

need a definition for weight

Streaming in SMI Metrics

Part of #69

As we are adding in more and more API's, we need to able to stream those responses through web-sockets, etc. This would be really necessary for building dashboards, etc.

A common streaming format, that can be specified across all API's would be useful IMO.

Feel free to add suggestions/questions.

@grampelberg @michelleN

Implement metrics into Kiali

Solicit feedback from the Kiali team regarding the SMI metrics API to determine if there is anything missing before moving to beta.

confusing that we have 4 APIs in the README but always talk about spec having 3 parts

@bridgetkromhout brought this up

Change the README so we describe the 3 high level parts and how they map to the 4 APIs

Clarifications on traffic split

I'm referring to this document:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md

Can someone clarify this statement?
"Weighting traffic between various services is also more generally useful than driving canary releases."
what is the difference between traffic weights and canary releases that is meant here?
why is one more useful over another?
"The resource is associated with a root service" - what is a root service?
In this example:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#specification
3 services receive the following traffic distribution: 10m, 100m, 1500m.
What is "m"? what do 10,100, 1500 mean in relation to each other? How do they form 100% of the traffic?
In this section:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#workflow
there is an example with traffic split of 1 and 0m. what does 1 mean? all traffic? it also doesn't have the "m" suffix while zero does have it
"Weights vs percentages - the primary reason for weights is in failure situations. For example, if 50% of traffic is being sent to a service that has no healthy endpoints - what happens? Weights are simpler to reason about when the underlying applications are changing." - can you elaborate more? I still don't see the benefit in using weights despite the example.

trafficsplit: Behavior with conflicting TrafficSplit instances

What is the expected behavior when multiple traffic split resources refer to the same apex service?

For example, given an apex service:

---
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: blue-green
spec:
  service: apex
  backends:
  - service: blue
    weight: 10m
  - service: green
    weight: 90m

---
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: red-yellow
spec:
  service: apex
  backends:
  - service: red
    weight: 10m
  - service: yellow
    weight: 90m

An implementation could choose to merge these traffic splits so it becomes effectively:

spec:
  service: apex
  backends:
  - service: blue
    weight: 10m
  - service: green
    weight: 90m
  - service: red
    weight: 10m
  - service: yellow
    weight: 90m

Or, we could dictate that this is illegal and require a validating admission controller to enforce service uniqueness.

Or, we could modify the spec to require that the TS resource name must match the apex service name...

This behavior should not remain undefined, however.

traffic metrics check in

At this point, there are both linkerd and istio implementations of the traffic metrics piece of the spec in the smi-metric repo.

@grampelberg @Pothulapati - do ya'll have any thoughts or want to propose any changes at this point? Would love to hear feedback from any other folks who have implemented it as well.

common library for iptables bootstrapping logic and windows equivalent

TrafficTarget: Allow traffic from everyone

Right now a typical TrafficTarget example looks like following:

kind: TrafficTarget
apiVersion: access.smi-spec.io/v1alpha1
metadata:
  name: api-service-api
  namespace: default
destination:
  kind: ServiceAccount
  name: api-service
  namespace: default
  port: 8080
specs:
- kind: HTTPRouteGroup
  name: api-service-routes
  matches:
  - api
sources:
- kind: ServiceAccount
  name: website-service
  namespace: default
- kind: ServiceAccount
  name: payments-service
  namespace: default

src: https://github.com/deislabs/smi-spec/blob/master/traffic-access-control.md

So above says that allow traffic from pods with ServiceAccount website-service & payments-service to destination pods with ServiceAccount api-service. How do I allow traffic from everyone to the pods with ServiceAccount api-service? Is there a way to mention wildcard entry to allow traffic from anyone and everyone and not worry about who is sending the traffic in the sources section?

Specify service port in TrafficSplit

A Kubernetes service can expose more than one port. To support such a service the TrafficSplit could have an optional field in the backend spec:

ClusterIP with multiple ports:

apiVersion: v1
kind: Service
metadata:
  name: frontend-primary
spec:
  type: ClusterIP
  selector:
    app: frontend-primary
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: http
    - name: grpc
      port: 9090
      protocol: TCP
      targetPort: grpc

TrafficSplit for HTTP:

apiVersion: v1beta1
kind: TrafficSplit
metadata:
  name: frontend
spec:
  service: frontend
  backends:
  - service: frontend-primary
    port: 8080
    weight: 900m
  - service: frontend-canary
    port: 8080
    weight: 100m

HPA scaling on metrics

It is valuable to scale workloads on metrics other than cpu/memory. To do this today, you must use the custom metrics API and the prometheus adapter. With the SMI metrics API it should be possible to get HPA working with that API. This project should integrate HPA and SMI metrics to allow for scaling on rps as well as latency.

HTTP traffic policy

What problem are you trying to solve?

I would like to have policy for things other than access control. In particular, it would be nice to describe policy around some HTTP specific behavior such as retries, timeouts and rate limits. These all should be associated with routes and identities.

How should the problem be solved?

There should be a new policy object (ex HTTPPolicy) that associates identity, routes and policy specific to HTTP. This will be both on the client side (retries, timeouts) and server side (rate limits). It might be beneficial to have different objects for each of these behaviors to make it more clear where the policy is being applied.

clarify versioning of APIs

Having version in the header might be confusing. @bridgetkromhout brought this up as well.

Let's add a description of how versioning works in the README
Let's add that as a note/section under the header on the actual spec that describes the versions.

need to make a youtube playlist to post the meeting recordings

Clarification on backends section

It seems that there is a conceptual change here compared to Istio for example - every version of a service is a separate service with a completely separate name
check https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#example-implementation

web-current and web-next - so the differentiation is merely textual in service names? so there is no one service with multiple versions, but several services with separate names?
Also, if we look at the required work process in this example, when a user wants to deploy the "web-next", the current version already should be running under the name web-current, or renamed to it.
And then the web-next will need at some point be renamed to web-current? isn't this more maintenance? Less filter/search abilities that can be done on labels rather than searching inside strings of a name?

add appendix section on how to use weights

The CNAB spec has an Appendix section. I think we should have one too that includes a page on describing weights, what they are, where they come from, and how to use them.

fix CI on this repo

Was working before. Dunno what happened. I'll take a look.

resolving destination service accounts to workload

Currently the traffic targets define access control rules along a set of routes between a source and destination (https://github.com/deislabs/smi-spec/blob/master/traffic-access-control.md). Since we are trying to define rules such as "A can talk to B", I understand why A translates to an identity. However, I'm curious why we're using service accounts as destinations for B, instead of the service or workload itself?

I think this becomes problematic when trying to write an adapter for istio 1.4+, which has now deprecated the destination.user constraint on ServiceRole and the guidance is to migrate to workload selectors. These are simply pod selectors that may only conventionally encode service account info. In figuring out how to implement the controller for the SMI istio adapter on istio 1.4+, we'll need to either translate the service account in the target destination to a service, or translate it to a label selector -- I think this may be challenging in general. Here's how I discovered the issue, and the corresponding istio issue:

SMI adapter issue:
servicemeshinterface/smi-adapter-istio#70

Istio bug fix + discussion about path forward:
istio/istio#17430

Istio workload selector:
https://github.com/istio/api/blob/1187adbd148251b20e7cd8e91f73ebcc09ac7ef1/type/v1beta1/selector.proto

TrafficMap API in SMI Metrics

Part of #71

There is a real interest in smi-metrics directly having an end-point where a service graph is returned.
The discussion around this can be done here

Initial Questions

What should the response graph format be?
What should be the metrics that should be present at the edges level?
What should the granularity of the graph should be? service level, pod level. How should we submit multiple levels?

Feel free add questions/suggestions in the comments.
@michelleN

Is there something like a traffic metrics adapter?

Hello,

I'm trying to understand how a consumer could use SMI to get metrics names and labels available for a given pod (or list of pods or deployment etc.), in order to later query directly those metrics using PromQL.

If I understand correctly, this is not what the TrafficMetrics kind is about, correct? Because TrafficMetrics is actually filled with queried data, so I understand it performs the prom query. If I want to get more control on how to query prometheus, I would prefer to get just the metrics names/labels mapping and build my own promQL query out of them.

Is it out of scope for SMI?

EDIT: I see this is commented in the tradeoff section. So I change my question a little bit: is there any plan to add this kind of adapter in the future?

Traffic (access) policy should support https/tcp/...

Currently the spec only mentions http.
One may want to have different policies for http and https. E.g. block all http and only allow https to pass.
Likewise for what Istio names 'tcp' - arbitrary protocols on top of tcp.
And then later also differentiations by other (recognised) protocols. E.g. allow AMQP-based messaging but block telnet.

Policy

Following yesterdays call we discussed the possibility of changing the TLS spec to a higher level Policy spec. Rather than assigned identity, we would have a specification which controls which services are allowed to communicate.

Adding this here to start a discussion, the following example would define a policy which would allow service.a to communicate with service.b and service.c. All other communication would be denied. It is assumed that allow rules would have a higher priority than deny rules, however, this would be based on the concrete implementation.

apiVersion: v1beta1
kind: Policy
name: my-policy
spec:
   allow:
   - source: service.a
     destination: service.b
   - source: service.b
     destination: service.c
   deny:
   - source: *
     destination: *

Questions:

Should the spec encourage convention to create a consistent user experience, i.e. allow is a higher priority than deny. It could be confusing if two different control planes implemented different conventions.
Currently the spec is MVP and should be extensible should policy need to consider, http paths, header, verbs, etc. Should this be outlined now or is a mention that this could be extended at a later date.

cc. @grampelberg

schedule TrafficPolicy and Identity call for next week

Best practice Mutating Webhook

There's a set of best practices that are pretty important when working with SMI:

separate service accounts for each resource.
validate RBAC for modification of access control policies.

It would be awesome to have either a mutating webhook to apply these best practices for users automatically or a validating admission controller that warns users they're not using best practices. This can be something that all the service meshes use as a component.

SMI slack link in readme is no longer active

Mocked impl playground

The SMI backend is variable depending on the provider (istio, linkerd). A mocked backend would be valuable for playing with and providing feedback on the proposed API. With this we wouldn't need an actual mesh deployed, nor a fully implemented SMI impl for a specific provider. Potential consumers of the API (and provider devs, I think) as SMI impls are coming on line.

For me personally, I'd like to see the Traffic Metrics ASAP.

This would likely also relieve pressure on providing full API specs via swagger or something like that.

Terminology section is missing

There are many different verbs and entities mentioned through the spec. However there is no terminology section to actually define them.

For example:

https://github.com/deislabs/smi-spec/blob/560631fa09e12e75d6a00a09eb2787311c0572fd/traffic-specs.md#httproutegroup
"It enumerates the routes that can be served by an application."
What is an application in this example?
"This resource allows users to incrementally direct percentages of traffic between various services"
What is a service? A Kubernetes service? the pods that have the same label that a service selects? anything else?
"Integrations can use this resource to orchestrate canary releases for new versions of software"
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#traffic-split
Software should probably be replaced with a different word here - service/app (pending the definition for those too).
But what is missing IMO is a definition for multiple versions - how SMI represents several versions of the same app/service?
Define backends
What are referential services mentioned in traffic split spec?

Demos for kubecon

Demo 1 Traffic Routing:

SMI TrafficSplit + flagger
Mesh support: Istio

Demo 2 Traffic Policy:

SMI Traffic*
Mesh support: Consul Connect

Demo 3 Billing (via Metrics):

SMI Metrics + kubecost (spinning an rps meter)
Mesh support: Linkerd

Language inconsistency in traffic metrics

Talking about here

The language here is inconsistent with the rest of the README. A direction is described as the flow of traffic from resource to edge resource. However, the to keyword here was being used to described flow from all resources to the target resource.

Is this intended, or an oversight?

I propose:

"Finally, resource can be as general or specific as desired. For example, with
a direction of to and an empty resource, the metrics are observed at
the foo-775b9cbd88-ntxsl pod and represent all traffic to other resources."

how do we manage traffic split resources over time

How do people manager TrafficSplit resources over time? Do you just update the same TrafficSplit resource forever? Is there some pattern people have come up with that we could document?

Traffic split based on HTTP headers match conditions

In order to support A/B testing scenarios, the TrafficSplit could be extended with HTTP headers match conditions.

E.g. route Chrome users with a canary=enabled cookie or those with a X-Canary header to the canary while all the others will be routed to the primary:

kind: TrafficSplit
metadata:
  name: website
spec:
  service: website
  http:
  - backends:
    - service: website-primary
      weight: 0
    - service: website-canary
      weight: 100
    match:
    - headers:
      - user-agent: ".*Chrome.*"
      - cookie: "^(.*?;)?(canary=enabled)(;.*)?$"    
    - headers:
      - x-canary: "enabled"
  - backends:
    - service: website-primary
      weight: 100
    - service: website-canary
      weight: 0
    match: {}

rate limiting and circuit breaking in SMI

During the calls, multiple people have mentioned interest in seeing how we can define rate limiting and circuit breaking in the spec part of Traffic Split or in another object referenced by TrafficSplit. @grampelberg brought up the point that we need to define circuit breaking because there are multiple definitions as a first step. That's something I can take on. Does anyone have any additional thoughts on how they want to see these two things defined in SMI?

versioning the spec

Does anyone have any thoughts on versioning the spec? I think we should consider versioning each part of the spec (each API) described in the spec independently. For example, traffic split may be pretty stable at this point and ready to move on to whatever version signals stability but traffic access may still have some work. If each API is able to signal it's ready to graduate to the next marker of stability, it would give us all a better idea of where we are in terms of each part of the spec.

If this were a regular go project like the go sdk, I would recommend, we version everything at v0.1.0 (referring to it as alpha maturity) until we reach the point of wanting to cut a v1.0.0-beta and then graduate to v1.0.0 like we're doing with the smi-sdk-go project. However, CNAB called its pre-1.0.0 version of the spec a Working Draft until it was ready and that's also a pretty common way to describe the state of a spec.

prior art:

cnab's versioning process as guided by the Joint Development Foundation
OCI's release process described here

Update apiVersion

All references to apiVersion: v1beta1 need to be updated to smi-spec.io/v1beta1.

servicemeshinterface / smi-spec Goto Github PK

smi-spec's People

Contributors

Stargazers

Watchers

Forkers

smi-spec's Issues

What problem are you trying to solve?

How should the problem be solved?

Recommend Projects

Recommend Topics

Recommend Org

Jobs