servicemeshinterface / smi-spec Goto Github PK
View Code? Open in Web Editor NEWService Mesh Interface
Home Page: https://smi-spec.io
License: Apache License 2.0
Service Mesh Interface
Home Page: https://smi-spec.io
License: Apache License 2.0
Opening this issue to track discussion raised by @grampelberg in #2
How should this interact with namespaces? One of the downsides to the current workflow is that deployment names end up changing and require a tool such as helm or kustomize. By allowing canaries between namespaces, it would be possible to keep names identical and simply clean up namespaces as new versions come out.
#26 has got me thinking about documentation for the spec and how to use it as a user. There are definitely best practices emerging as we start in on implementation. How you go about actually doing a canary rollout with traffic split is a good one. Another example is the implications around service accounts, identity and access control.
What do folks think is the best way to go about that? Add docs to the repo? Do something on the website? Add them on a per-implementation basis?
Some of this should be managed by code, obviously, such as an OPA based admission controller to do best practice. We should probably open up some separate issues to work on those types of solutions as well.
Like other specifications, SMI needs a set of conformance tests, conformance utility and dashboard.
Meshery used as the underlying technology for conformance tests with these goals and acknowledgments:
Meshery is soon to be used in the release process of each of the major service mesh projects for performance testing and as such can perform SMI validation testing at the same time.
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: my-weights
spec:
# The root service that clients use to connect to the destination application.
service: numbers
# Services inside the namespace with their own selectors, endpoints and configuration.
backends:
- service: one
# Identical to resources, 1 = 1000m
weight: 10m
- service: two
weight: 100m
- service: three
weight: 1500m
TrafficSplit
object, Kubernetes resource.Quantity
. ?resource.Quantity
? Or is it that the spec just supports m
a.k.a milli
right now ?If it's full-fledged Quantity then I think servicemeshinterface/smi-sdk-go#6 makes sense.
The specification
https://github.com/deislabs/smi-spec/blob/master/traffic-metrics.md#specification
seems to talk about possibility to define metrics on specific pods.
Isn't this a bit "anti k8s" working with specific pods that can be down and redeployed elsewhere (with a different identifier) any moment? what about scaling? how can someone define metrics on future scaled instances?
https://github.com/deislabs/smi-spec/blob/master/traffic-specs.md
If you replace specs
with routes
in the traffic access example, it becomes more readable and intuitive.
At launch, not all meshes will have implementations for all the APIs. It doesn't feel like the APIs will be fully implemented either. Should the specification explicitly call out providing a ValidatingAdmissionWebhook
that messages users they've configured a feature that is not supported by the underlying implementation?
I don't see any type of ability to create a virtual service/router/balancer. Something that physically doesn't exist but has routing rules to get to something that does. Basically this ends up being something that looks a lot like Ingress. You might say then use ingress but the service mesh basically has to subsume ingress functionality. The reason being that ingress is north-south only but users need the same functionality east west and logically the same ACL, routing, metrics rules of north south should apply to east west.
So what I'd like to see is I create a service foo
in k8s that has no selector. Then create a resources like
kind: Router
spec:
service: foo
routes:
- match:
# something referring to httpRouteGroup
dest:
# something to target a service (that can have traffic splitting applied)
Had the following questions/comments
There is a note that says authentication is handled by underlying implementation. But we need a way to explicitly say that mTLS is needed to communicate with certain services and also specify exceptions. Some of this was there in the first version of the spec.
Specifying the destination
a. Selection is happening based on service account. Multiple Pods across multiple services may have the same service account. It should be possible to also specify service names in the destination. It would be the name that customer uses to create the service.
b. The spec currently says the following
Allowing destination traffic should only be possible with permission of the service owner. Therefore, RBAC rules should be configured to control the pods which are allowed to assign the ServiceAccount defined in the TrafficTarget destination.
This addresses the concern that we dont want spurious services illegally using a service-account but when we allow service name to be specified in the destination then client side enforcement will be required and so it will be good to also include service-account info in the destination.
Specifying the source
a. We need to be able to specify the service names here too (that are allowed) along with the service account info
Out of scope section
Egress Policy - I think we cant keep this out of scope for too long. Istio supports Pods accessing resources outside the cluster. We may need some policies to restrict what can be accessed. Do we use Network Policies for that?
Ingress Policy - is this about accessing services from outside the cluster?
It is pretty valuable to base HPA decisions off some of the data in TrafficMetrics
. What is the integration story there? How could implementations be integrated?
Consider the following:
blue
and green
service and deployment.blue
deployment has 1 replica.green
deployment has 9 replicas.target
, with a traffic split:--
apiVersion: smi-spec.io/v1beta1
kind: TrafficSplit
metadata:
name: target-split
spec:
service: target
backends:
- service: blue
weight: 1000m
- service: green
weight: 1000m
The behavior can either be:
I believe that the spec intends the latter, but as far as i can tell, the handling of weights is not described explicitly.
Hey folks,
In the traffic split example yaml, there are instances where weights are strings with an ending m
for milli and there are other instances where weights are just whole numbers like 1
or 0
. I was pretty confused when I first encountered the milli notation like many others and I couldn't find great documentation on how to use this.
I understand now that 1 == 1000m
and that you'll want to take the sum of all weights in milli and then divide each service's weight by the sum to get the amount of traffic to send to a service. However, this is still pretty confusing to calculate when thinking about Istio for example. Istio doesn't use the milli notation. Weights are whole numbers where the weight must be 0-100 and the sum of the weights must be 100.
I'd like to see us standardize on what weights are and give better documentation here, so I have two points I'd like feedback on:
#62 converts all weight strings to whole numbers and adds the line to constrain weights whole numbers.
I keep hearing from members of the community that "ingress and service-mesh is hard". From a quick sweep of ingress documentation across the different implementers of SMI, users either have a bunch of CRDs or a fully annotated Kubernetes Ingress in order to achieve ingress traffic to the service mesh.
Firstly - When I say Ingress
, I mean North/South traffic onto the service mesh. I'm not yet concerned about East/West.
The question I would like to propose to the community is "Should SMI have an Ingress API"?
It is my position that SMI shouldn't have it's own Ingress API but instead see if we can leverage and integrate with the ongoing work that's happening in Kubernetes sig-network regarding what's currently known as ingress v2
. The current goals of that project can be found in the API draft document. Specifically, these new set of APIs are designed to be generic and extensible.
I would like to propose that we take the time to review the ingress v2
API draft document keeping the following questions in mind:
ingress v2
provide a pathway to implementing an East/West interface. See #37If there is sufficient interest in collaborating with the goal of eventually implementing ingress v2
, I would be more than happy to represent the SMI community in the upstream ingress v2
working group.
cc @grampelberg @olix0r @nicholasjackson @michelleN @ilevine @ibuildthecloud @aanandr
Part of #69
As we are adding in more and more API's, we need to able to stream those responses through web-sockets, etc. This would be really necessary for building dashboards, etc.
A common streaming format, that can be specified across all API's would be useful IMO.
Feel free to add suggestions/questions.
Solicit feedback from the Kiali team regarding the SMI metrics API to determine if there is anything missing before moving to beta.
@bridgetkromhout brought this up
Change the README so we describe the 3 high level parts and how they map to the 4 APIs
I'm referring to this document:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md
Can someone clarify this statement?
"Weighting traffic between various services is also more generally useful than driving canary releases."
what is the difference between traffic weights and canary releases that is meant here?
why is one more useful over another?
"The resource is associated with a root service" - what is a root service?
In this example:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#specification
3 services receive the following traffic distribution: 10m, 100m, 1500m.
What is "m"? what do 10,100, 1500 mean in relation to each other? How do they form 100% of the traffic?
In this section:
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#workflow
there is an example with traffic split of 1 and 0m. what does 1 mean? all traffic? it also doesn't have the "m" suffix while zero does have it
"Weights vs percentages - the primary reason for weights is in failure situations. For example, if 50% of traffic is being sent to a service that has no healthy endpoints - what happens? Weights are simpler to reason about when the underlying applications are changing." - can you elaborate more? I still don't see the benefit in using weights despite the example.
What is the expected behavior when multiple traffic split resources refer to the same apex service?
For example, given an apex service:
---
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: blue-green
spec:
service: apex
backends:
- service: blue
weight: 10m
- service: green
weight: 90m
---
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: red-yellow
spec:
service: apex
backends:
- service: red
weight: 10m
- service: yellow
weight: 90m
An implementation could choose to merge these traffic splits so it becomes effectively:
spec:
service: apex
backends:
- service: blue
weight: 10m
- service: green
weight: 90m
- service: red
weight: 10m
- service: yellow
weight: 90m
Or, we could dictate that this is illegal and require a validating admission controller to enforce service uniqueness.
Or, we could modify the spec to require that the TS resource name must match the apex service name...
This behavior should not remain undefined, however.
At this point, there are both linkerd and istio implementations of the traffic metrics piece of the spec in the smi-metric repo.
@grampelberg @Pothulapati - do ya'll have any thoughts or want to propose any changes at this point? Would love to hear feedback from any other folks who have implemented it as well.
Right now a typical TrafficTarget
example looks like following:
kind: TrafficTarget
apiVersion: access.smi-spec.io/v1alpha1
metadata:
name: api-service-api
namespace: default
destination:
kind: ServiceAccount
name: api-service
namespace: default
port: 8080
specs:
- kind: HTTPRouteGroup
name: api-service-routes
matches:
- api
sources:
- kind: ServiceAccount
name: website-service
namespace: default
- kind: ServiceAccount
name: payments-service
namespace: default
src: https://github.com/deislabs/smi-spec/blob/master/traffic-access-control.md
So above says that allow traffic from pods with ServiceAccount
website-service
& payments-service
to destination pods with ServiceAccount
api-service
. How do I allow traffic from everyone to the pods with ServiceAccount
api-service
? Is there a way to mention wildcard entry to allow traffic from anyone and everyone and not worry about who is sending the traffic in the sources
section?
A Kubernetes service can expose more than one port. To support such a service the TrafficSplit could have an optional field in the backend spec:
ClusterIP with multiple ports:
apiVersion: v1
kind: Service
metadata:
name: frontend-primary
spec:
type: ClusterIP
selector:
app: frontend-primary
ports:
- name: http
port: 8080
protocol: TCP
targetPort: http
- name: grpc
port: 9090
protocol: TCP
targetPort: grpc
TrafficSplit for HTTP:
apiVersion: v1beta1
kind: TrafficSplit
metadata:
name: frontend
spec:
service: frontend
backends:
- service: frontend-primary
port: 8080
weight: 900m
- service: frontend-canary
port: 8080
weight: 100m
It is valuable to scale workloads on metrics other than cpu/memory. To do this today, you must use the custom metrics API and the prometheus adapter. With the SMI metrics API it should be possible to get HPA working with that API. This project should integrate HPA and SMI metrics to allow for scaling on rps as well as latency.
I would like to have policy for things other than access control. In particular, it would be nice to describe policy around some HTTP specific behavior such as retries, timeouts and rate limits. These all should be associated with routes and identities.
There should be a new policy object (ex HTTPPolicy
) that associates identity, routes and policy specific to HTTP. This will be both on the client side (retries, timeouts) and server side (rate limits). It might be beneficial to have different objects for each of these behaviors to make it more clear where the policy is being applied.
Having version in the header might be confusing. @bridgetkromhout brought this up as well.
It seems that there is a conceptual change here compared to Istio for example - every version of a service is a separate service with a completely separate name
check https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#example-implementation
web-current and web-next - so the differentiation is merely textual in service names? so there is no one service with multiple versions, but several services with separate names?
Also, if we look at the required work process in this example, when a user wants to deploy the "web-next", the current version already should be running under the name web-current, or renamed to it.
And then the web-next will need at some point be renamed to web-current? isn't this more maintenance? Less filter/search abilities that can be done on labels rather than searching inside strings of a name?
The CNAB spec has an Appendix section. I think we should have one too that includes a page on describing weights, what they are, where they come from, and how to use them.
Was working before. Dunno what happened. I'll take a look.
Currently the traffic targets define access control rules along a set of routes between a source and destination (https://github.com/deislabs/smi-spec/blob/master/traffic-access-control.md). Since we are trying to define rules such as "A can talk to B", I understand why A translates to an identity. However, I'm curious why we're using service accounts as destinations for B, instead of the service or workload itself?
I think this becomes problematic when trying to write an adapter for istio 1.4+, which has now deprecated the destination.user
constraint on ServiceRole
and the guidance is to migrate to workload selectors. These are simply pod selectors that may only conventionally encode service account info. In figuring out how to implement the controller for the SMI istio adapter on istio 1.4+, we'll need to either translate the service account in the target destination to a service, or translate it to a label selector -- I think this may be challenging in general. Here's how I discovered the issue, and the corresponding istio issue:
SMI adapter issue:
servicemeshinterface/smi-adapter-istio#70
Istio bug fix + discussion about path forward:
istio/istio#17430
Istio workload selector:
https://github.com/istio/api/blob/1187adbd148251b20e7cd8e91f73ebcc09ac7ef1/type/v1beta1/selector.proto
Part of #71
There is a real interest in smi-metrics
directly having an end-point where a service graph is returned.
The discussion around this can be done here
Initial Questions
Feel free add questions/suggestions in the comments.
@michelleN
Hello,
I'm trying to understand how a consumer could use SMI to get metrics names and labels available for a given pod (or list of pods or deployment etc.), in order to later query directly those metrics using PromQL.
If I understand correctly, this is not what the TrafficMetrics
kind is about, correct? Because TrafficMetrics
is actually filled with queried data, so I understand it performs the prom query. If I want to get more control on how to query prometheus, I would prefer to get just the metrics names/labels mapping and build my own promQL query out of them.
Is it out of scope for SMI?
EDIT: I see this is commented in the tradeoff section. So I change my question a little bit: is there any plan to add this kind of adapter in the future?
Currently the spec only mentions http.
One may want to have different policies for http and https. E.g. block all http and only allow https to pass.
Likewise for what Istio names 'tcp' - arbitrary protocols on top of tcp.
And then later also differentiations by other (recognised) protocols. E.g. allow AMQP-based messaging but block telnet.
Following yesterdays call we discussed the possibility of changing the TLS spec to a higher level Policy spec. Rather than assigned identity, we would have a specification which controls which services are allowed to communicate.
Adding this here to start a discussion, the following example would define a policy which would allow service.a to communicate with service.b and service.c. All other communication would be denied. It is assumed that allow rules would have a higher priority than deny rules, however, this would be based on the concrete implementation.
apiVersion: v1beta1
kind: Policy
name: my-policy
spec:
allow:
- source: service.a
destination: service.b
- source: service.b
destination: service.c
deny:
- source: *
destination: *
Questions:
cc. @grampelberg
There's a set of best practices that are pretty important when working with SMI:
It would be awesome to have either a mutating webhook to apply these best practices for users automatically or a validating admission controller that warns users they're not using best practices. This can be something that all the service meshes use as a component.
The SMI backend is variable depending on the provider (istio, linkerd). A mocked backend would be valuable for playing with and providing feedback on the proposed API. With this we wouldn't need an actual mesh deployed, nor a fully implemented SMI impl for a specific provider. Potential consumers of the API (and provider devs, I think) as SMI impls are coming on line.
For me personally, I'd like to see the Traffic Metrics ASAP.
This would likely also relieve pressure on providing full API specs via swagger or something like that.
There are many different verbs and entities mentioned through the spec. However there is no terminology section to actually define them.
For example:
https://github.com/deislabs/smi-spec/blob/560631fa09e12e75d6a00a09eb2787311c0572fd/traffic-specs.md#httproutegroup
"It enumerates the routes that can be served by an application."
What is an application in this example?
"This resource allows users to incrementally direct percentages of traffic between various services"
What is a service? A Kubernetes service? the pods that have the same label that a service selects? anything else?
"Integrations can use this resource to orchestrate canary releases for new versions of software"
https://github.com/deislabs/smi-spec/blob/master/traffic-split.md#traffic-split
Software should probably be replaced with a different word here - service/app (pending the definition for those too).
But what is missing IMO is a definition for multiple versions - how SMI represents several versions of the same app/service?
Define backends
What are referential services mentioned in traffic split spec?
Demo 1 Traffic Routing:
Demo 2 Traffic Policy:
Demo 3 Billing (via Metrics):
Talking about here
The language here is inconsistent with the rest of the README. A direction is described as the flow of traffic from resource to edge resource. However, the to
keyword here was being used to described flow from all resources to the target resource.
Is this intended, or an oversight?
I propose:
"Finally, resource
can be as general or specific as desired. For example, with
a direction
of to
and an empty resource
, the metrics are observed at
the foo-775b9cbd88-ntxsl
pod and represent all traffic to other resources."
How do people manager TrafficSplit resources over time? Do you just update the same TrafficSplit resource forever? Is there some pattern people have come up with that we could document?
In order to support A/B testing scenarios, the TrafficSplit could be extended with HTTP headers match conditions.
E.g. route Chrome
users with a canary=enabled
cookie or those with a X-Canary
header to the canary while all the others will be routed to the primary:
kind: TrafficSplit
metadata:
name: website
spec:
service: website
http:
- backends:
- service: website-primary
weight: 0
- service: website-canary
weight: 100
match:
- headers:
- user-agent: ".*Chrome.*"
- cookie: "^(.*?;)?(canary=enabled)(;.*)?$"
- headers:
- x-canary: "enabled"
- backends:
- service: website-primary
weight: 100
- service: website-canary
weight: 0
match: {}
During the calls, multiple people have mentioned interest in seeing how we can define rate limiting and circuit breaking in the spec part of Traffic Split or in another object referenced by TrafficSplit
. @grampelberg brought up the point that we need to define circuit breaking because there are multiple definitions as a first step. That's something I can take on. Does anyone have any additional thoughts on how they want to see these two things defined in SMI?
Does anyone have any thoughts on versioning the spec? I think we should consider versioning each part of the spec (each API) described in the spec independently. For example, traffic split may be pretty stable at this point and ready to move on to whatever version signals stability but traffic access may still have some work. If each API is able to signal it's ready to graduate to the next marker of stability, it would give us all a better idea of where we are in terms of each part of the spec.
If this were a regular go project like the go sdk, I would recommend, we version everything at v0.1.0 (referring to it as alpha maturity) until we reach the point of wanting to cut a v1.0.0-beta and then graduate to v1.0.0 like we're doing with the smi-sdk-go
project. However, CNAB called its pre-1.0.0 version of the spec a Working Draft
until it was ready and that's also a pretty common way to describe the state of a spec.
prior art:
All references to apiVersion: v1beta1
need to be updated to smi-spec.io/v1beta1
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.