zalando-incubator / stackset-controller Goto Github PK
View Code? Open in Web Editor NEWOpinionated StackSet resource for managing application life cycle and traffic switching in Kubernetes
License: MIT License
Opinionated StackSet resource for managing application life cycle and traffic switching in Kubernetes
License: MIT License
Currently stackset may decide to pre-scale the deployment to a number of pods higher than what is specified inside maxReplicas
. Since HPA can not go beyond maxReplicas
stackset continues waiting and never completes the switch.
Proposed change is to cap the prescaling based on maxReplicas
I am trying to get stackset-controller functional. So far all I have done is take the files in the docs necessary for the controller - rbac, deployment, crds - and wrap them in a simple Helm chart.
I then have tried to deploy the example "StackSet" and while I got that it was created, I neither see any other objects created or log entries from the controller! I am running the controller with the --debug
flag and even that is blank.
I do see some errors in the API server however. The first one I suspect is irrelevant:
E1116 20:06:57.456838 1 naming_controller.go:316] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again
E1116 20:07:02.727043 1 establishing_controller.go:105] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try
again
I1116 20:13:01.464879 1 controller.go:597] quota admission added evaluator for: {zalando.org stacksets}
E1116 20:21:46.876114 1 naming_controller.go:316] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again
E1116 20:21:51.930774 1 establishing_controller.go:105] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again
I am also seeing a bunch of:
I1116 20:28:28.654757 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m22s
I1116 20:36:50.655642 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=9m24s
I1116 20:46:14.659708 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m22s
I1116 20:54:36.663718 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m44s
I1116 21:03:20.665079 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=7m16s
I1116 21:10:36.665996 1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m28s
Here is my StackSet out of the API server:
items:
- apiVersion: zalando.org/v1
kind: StackSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"zalando.org/v1","kind":"StackSet","metadata":{"annotations":{},"name":"my-app","namespace":"kube-system"},"spec":{"ingress":{"backendPort":80,"hosts":["nginx.dev.domain.com"]},"stackLifecycle":{"limit":5,"scaledownTTLSeconds":300},"stackTemplate":{"spec":{"horizon
talPodAutoscaler":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","targetAverageUtilization":50},"type":"Resource"}],"minReplicas":3},"podTemplate":{"spec":{"containers":[{"image":"nginx","name":"nginx","ports":[{"containerPort":80,"name":"ingress"}],"resources":{"limits":{"cpu":"
10m","memory":"50Mi"},"requests":{"cpu":"10m","memory":"50Mi"}}}]}},"replicas":3,"version":"v2"}}}}
creationTimestamp: 2018-11-16T20:23:01Z
generation: 1
name: my-app
namespace: kube-system
resourceVersion: "67688687"
selfLink: /apis/zalando.org/v1/namespaces/kube-system/stacksets/my-app
uid: 6a0442aa-e9dd-11e8-a879-0a6c577faf70
spec:
ingress:
backendPort: 80
hosts:
- nginx.dev.domain.com
stackLifecycle:
limit: 5
scaledownTTLSeconds: 300
stackTemplate:
spec:
horizontalPodAutoscaler:
maxReplicas: 10
metrics:
- resource:
name: cpu
targetAverageUtilization: 50
type: Resource
minReplicas: 3
podTemplate:
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: ingress
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
replicas: 3
version: v2
kind: List
metadata:
resourceVersion: ""
selfLink: ""
and my Deployment spec
piVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: 2018-11-16T20:21:46Z
generation: 2
labels:
app.kubernetes.io/instance: stackset-controller
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: stackset-controller
helm.sh/chart: stackset-controller-0.1.4
name: stackset-controller
namespace: kube-system
resourceVersion: "67702210"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/stackset-controller
uid: 3dcc6f23-e9dd-11e8-8400-06fe3d4492c6
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: stackset-controller
app.kubernetes.io/name: stackset-controller
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
app.kubernetes.io/instance: stackset-controller
app.kubernetes.io/name: stackset-controller
spec:
containers:
- args:
- --debug
image: registry.opensource.zalan.do/teapot/stackset-controller:latest
imagePullPolicy: Always
name: stackset-controller
resources:
limits:
cpu: 10m
memory: 128Mi
requests:
cpu: 10m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: stackset-controller
serviceAccountName: stackset-controller
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2018-11-16T20:21:50Z
lastUpdateTime: 2018-11-16T20:21:50Z
message: Deployment has minimum availability.
status: "True"
type: Available
- lastTransitionTime: 2018-11-16T20:21:46Z
lastUpdateTime: 2018-11-16T21:08:43Z
message: ReplicaSet "stackset-controller-84cb74f794" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Implement the field stackLifecycle
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
# optional Ingress definition.
ingress:
hosts: [my-app.example.org, alt.name.org]
stackLifecycle:
scaledownTTLSeconds: 300
limit: 5
stackTemplate:
....
stackLifecycle definition tells when stackset-controller should clean up the resources from and old stack that gets no traffic anymore.
It might makes sense to extract the information about the desired traffic policy into StackSet
, Stack
or a dedicated resource.
Currently, the desired traffic policy is stored as an annotation on the Ingress
object (for historical reasons). We are already using a different annotation from what skipper needs [0] so there's really no reason to have the new annotation on the Ingress
.
It should either be a real field on:
StackSet
, since it already defines the corresponding DNS name that's traffic switchedTrafficPolicy
that describes how traffic is shaped and doesn't live in the StackSet
[0] we set zalando.org/stack-traffic-weights
and skipper reads zalando.org/backend-weights
Setting stackLifecycle.limit to 1 causes the stackset controller to constantly delete the newly created stack in GC (because it doesn't have traffic) and then recreate it on the next iteration. This should either work correctly as a special case or be disallowed by validation.
Currently, if you delete an application stack $ kubectl delete stack <stack-name>
Kubernetes will automatically clean up all dependent resources including the Deployment
and Service
resources.
The resource could be "protected" with a Finalizer so the controller can halt deleting if the stack is still getting traffic. If this happens it should issue events about it (#12).
There is unclarity how an all-zero traffic weight is handled and interpreted.
Handled (currently):
Interpretation (currently)
Clarify what's the desired behaviour:
We have 2 stacks running:
We realised an issue on rendering-engine-master-358 so we wanted to revert back to rendering-engine-master-356.
zkubectl traffic rendering-engine rendering-engine-master-356 100
Stack rendering-engine-master-356 was prescaled again - but to a very high number of pods: 250 which is much more than the sum of all the pods handling traffic before traffic switch (75+75=150)
Please also note that 250 is currently our maxReplicas limit so I wonder what values would have been picked without this limit.
Two problems arise from this unbounded and broken traffic switch:
Hi,
I couldn't find a way to add annotations to a deployment. For service, ingress and even pods it's straight forward. The annotation is at Zalando to define a custom log parsers for Scalyr.
Maybe I just missed the way to achieve it.
The ScaledownTTLSeconds
field was intended to have the meaning of scaling down a stack that has not been getting traffic for ScaledownTTLSeconds
:
stackset-controller/controller/stack.go
Line 173 in b528bc9
However in the stack Garbage collection code it has the meaning of deleting a stack that is not getting traffic and where the CREATION_TIME is older than ScaledownTTLSeconds
: https://github.com/zalando-incubator/stackset-controller/blob/master/controller/stackset.go#L668
We should ensure that stacks are only deleted if they have not been getting traffic for ScaledownTTLSeconds
independently of their age.
traffic
) or write a small kubectl plugin stack
#71 implemented a hacky solution to a problem where the controller scaled down stacks before the deployment even had a chance to switch traffic to those. However, this behaviour is not exactly intuitive, and could lead to deployments being kept alive for longer than necessary if the user quickly switches traffic back and forth. A cleaner approach would be to mark newly created stacks with an annotation that's erased when they get traffic and introducing a separate TTL for stacks in this state.
TODO
kubectl delete stack <foo>
seems to hang and never deletes the stack.
This seems to be by design due to the finalizer
. Let's deal with it later.
Right now we assume serviceport==targetPort
, but it should be optionally configurable
It's impossible to figure out whether the controller has processed the changes and updated the resources or not. The following information needs to be exposed so it's possible to figure out if the updates are done or still in progress:
observedGeneration
in Stackset.status
, updated only after the changes to the stacks are doneobservedGeneration
on Stack.status
, updated only after the subresources have been changedcurrentStackName
in Stackset.status
to avoid duplicating the weird default
and name generation logicTBD --- just a placeholder
Currently there are no schemas defined for the Stack
and StackSet
CRDs making it very easy to make mistakes e.g. in the PodTemplateSpec
which would render the resource unusable.
Validation should be added to prevent user mistakes.
StackSet
and Stack
resources are inspired by the relationship between ReplicaSet
and Pods
(and also Deployment
and ReplicaSet
I would say).
When I modify a Deployment
object (an in-place update) Kubernetes will seek ReplicaSets
matching the desired podTemplate
spec and create them for the new version if not present.
Similarly, when I update a StackSet
in-place and change the stackTemplate
spec a new Stack
is created for that version. However, to make it really working one also has to increase the stackVersion
field which is transparent in the Deployment
/ReplicaSet
case (presumeably via the pod-template-hash
label).
The controller should send relevant events to the Stack
and StackSet
resources so users know what it's doing.
At the moment the stackset-controller just supports Horizontal Pod Autoscaler Spec under the horizontalPodAutoscaler
key as is. Therefore, users also have the responsibility to put sensible defaults. The format can also get particularly messy when configuring HPA using custom metrics.
It might make sense for us to provide a more minimalistic key e.g Autoscaler
with sensible defaults and a cleaner way to provide customer scaling metrics.
When deployments reference a ConfigMap or Secret they do get updated in place when a new blue/green rollout is triggered. This can already have unintended effects on the old deployment.
We could include ConfigMaps as a subsection of the spec, similar to ingress, so that the controller can transparently create and reference dedicated config maps per version (similar to kustomize
that creates dedicated config maps for each overlay: https://github.com/kubernetes-sigs/kustomize/blob/37f03b4d018235d1a26dda1f031d374776b381a7/examples/ldap/base/kustomization.yaml#L4-L7)
Alternatively, we could use CDP_BUILD_VERSION
version as part of the ConfigMap's name and its reference in the podTemplateSpec
. This would create a new ConfigMap for each rollout with immutable content. Old versions could be cleaned up by leveraging ownerReference
from the Deployment to the ConfigMap, so that once the Stack is deleted the old ConfigMap gets cleaned up too. (Assuming ownerReferences work for ConfigMaps)
See https://github.bus.zalan.do/teapot/tokeninfo-router/pull/10/files for an example of a Deployment using a ConfigMap.
When I deploy two versions of an app it is not possible to delete the latest stack even though it has no traffic. If it is in a crash loop or smoke tests failed I would like to get rid of that stack. Currently the stack is being recreated. This behavior might be surprising and confusing.
Should we adopt this? Any thoughts?
These error messages coming from glog
, but we should delete these log entries, because the behaviour is intended and the client-go timeout works for WATCH, but not for all other API calls.
...
ERROR: logging before flag.Parse: E0808 14:07:03.579958 1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0808 14:07:33.581389 1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
% kubectl logs stackset-controller-56449647cf-rtjt8
Error from server (BadRequest): container "stackset-controller" in pod "stackset-controller-56449647cf-rtjt8" is waiting to start: trying and failing to pull image
zsh: exit 1 kubectl logs stackset-controller-56449647cf-rtjt8
But delivery.yaml builds IMAGE=registry-write.opensource.zalan.do/teapot/stackset-controller-test with "-test"
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: cluster-registry
spec:
ingress:
annotations:
zalando.org/skipper-filter: ...
Custom annotations specified by users should be passed-on to the generated Ingress object.
The prometheus endpoint should expose more metrics that allow to detect whether stackset controller is running well.
Some possibilities:
We get in skipper logs the following errors, because we clean a svc object, but not the link to the service within ingress objects, which we should do, too:
[APP]time="2018-12-12T22:39:50Z" level=error msg="convertPathRule: Failed to get service default, cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef, ingress"
ingress
metadata:
creationTimestamp: 2018-09-04T16:04:59Z
generation: 1
labels:
deployment-id: d-2mctyfio8r3y2f535k2zocoupk
stack-version: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
stackset: cdp-e2e-cdp-cd-robot
name: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
namespace: default
ownerReferences:
- apiVersion: zalando.org/v1
kind: Stack
name: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
uid: 3fc086e8-b05c-11e8-96bd-022ad150ebe8
resourceVersion: "210581283"
spec:
rules:
- host: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef.example.org
http:
paths:
- backend:
serviceName: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
servicePort: ingress
svc
% kubectl get svc cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
Error from server (NotFound): services "cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef" not found
stackset
spec:
ingress:
hosts:
- cdp-e2e-cdp-cd-robot.example.org
stackLifecycle:
limit: 5
scaledownTTLSeconds: 300
stackTemplate:
spec:
podTemplate:
metadata:
labels:
application: cdp-e2e-cdp-cd-robot
spec:
containers:
- env:
- name: VERSION
value: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
image:___
lifecycle:
preStop:
exec:
command:
- sleep
- "5"
name: cdp-e2e-cdp-cd-robot
ports:
- containerPort: 8080
name: ingress
readinessProbe:
httpGet:
path: /health
port: 8080
replicas: 1
version: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
An interesting feature instead of starting with canary, and directly shifting traffic to your new stack, it would be nice to first push some shadow traffic to the new stack, and then decide to switch real traffic.
https://www.functionize.com/blog/what-is-canary-testing-and-dark-launching/
I created a custom autoscaling with skipper requests per second like this in my stackset:
horizontalPodAutoscaler:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Object
object:
metricName: requests-per-second
target:
apiVersion: extensions/v1beta1
kind: Ingress
name: "{{{APPLICATION}}}"
targetValue: 100
The created stack contains a wrong targetValue:
horizontalPodAutoscaler:
maxReplicas: 10
metadata:
creationTimestamp: null
metrics:
- object:
metricName: requests-per-second
target:
apiVersion: extensions/v1beta1
kind: Ingress
name: sandbox-tokeninfo-bridge
targetValue: "0"
type: Object
minReplicas: 2
This is the log I see in stackset-controller:
time="2018-10-04T15:45:38Z" level=info msg="Event(v1.ObjectReference{Kind:\"Stack\", Namespace:\"default\", Name:\"sandbox-tokeninfo-bridge-pr-22-3\", UID:\"42a39bcb-c7ea-11e8-bd06-060172d696fe\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"245331352\", FieldPath:\"\"}): type: 'Normal' reason: 'CreateHPA' Creating HPA default/sandbox-tokeninfo-bridge-pr-22-3 for Deployment default/sandbox-tokeninfo-bridge-pr-22-3"
time="2018-10-04T15:45:38Z" level=error msg="Failed to manage Stack default/sandbox-tokeninfo-bridge-pr-22-3: HorizontalPodAutoscaler.autoscaling \"sandbox-tokeninfo-bridge-pr-22-3\" is invalid: spec.metrics[0].object.targetValue: Required value: must specify a positive target value"
A faulty stackset.yaml
(we don't validate the content yet) led to this in the logs:
ERROR: logging before flag.Parse: E0904 16:04:08.607962 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:09.635793 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:10.656678 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:11.686808 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:12.693538 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:13.702829 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:14.714000 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:15.767220 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:16.771295 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:17.774734 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:18.779158 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:19.787391 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:20.803426 1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
These logs here should be all more considered as debug logs
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-82 for StackSet stack default/cluster-registry-master-82" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-79 for StackSet stack default/cluster-registry-master-79" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-80 for StackSet stack default/cluster-registry-master-80" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-81 for StackSet stack default/cluster-registry-master-81" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:45Z" level=info msg="Updating Deployment default/cluster-registry-master-79 for StackSet stack default/cluster-registry-master-79" controller=stacks namespace=default stackset=cluster-registry
This is more client-go library internal noise.
ERROR: logging before flag.Parse: E0910 07:24:31.764057 1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
We have an app running around 30 pods.
When we switch traffic, we have noticed 2 issues which appeared after we added RPS scaling:
In the following example, we are running stackset rendering-engine-986 and trying to deploy stackset rendering-engine-990:
Fortunately after a while everything goes back to normal but these should be investigated.
Zalando internal issue with more context: https://github.bus.zalan.do/teapot/issues/issues/1548
I tried retiring my test application in stups-test
by deleting my stackset:
$ zk delete stackset my-app-jylipoti
stackset.zalando.org "my-app-jylipoti" deleted
However the delete did not cascade to other resources, i.e. stack my-app-jylipoti-pr-1-1
is still present and so is deployment my-app-jylipoti-pr-1-1
.
This will not heal itself and get into a forever crashloop
panic: cannot handle unexported field: {*v2beta1.HorizontalPodAutoscaler}.Spec.Metrics[0].Object.TargetValue.i
consider using AllowUnexported or cmpopts.IgnoreUnexported
goroutine 138 [running]:
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.invalid.apply(0xc000e7c930, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/options.go:208 +0xf5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).tryOptions(0xc000e7c930, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x13a44c0, 0x11acfc0, 0x0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:307 +0x149
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x11acfc0, 0xc000b301f0, 0x1b9, 0x11acfc0, 0xc0006161f0, 0x1b9)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:202 +0x336
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x11a7820, 0xc000b301f0, 0x199, 0x11a7820, 0xc0006161f0, 0x199, 0x13a44c0, 0x11a7820)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x11a7820, 0xc000b301f0, 0x199, 0x11a7820, 0xc0006161f0, 0x199)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1185b20, 0xc000b301b0, 0x199, 0x1185b20, 0xc0006161b0, 0x199, 0x13a44c0, 0x1185b20)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1185b20, 0xc000b301b0, 0x199, 0x1185b20, 0xc0006161b0, 0x199)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1193b20, 0xc000e1a010, 0x196, 0x1193b20, 0xc0004bc580, 0x196)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:244 +0x1af5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199, 0x13a44c0, 0x1185920)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).statelessCompare(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199, 0xc00022e128, 0xc000215ec0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:176 +0xc9
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareArray.func1(0x0, 0x0, 0x0, 0x1e878c0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:398 +0x108
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/internal/diff.Difference(0x1, 0x1, 0xc0002160b0, 0x1, 0x0, 0x0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/internal/diff/diff.go:212 +0x244
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareArray(0xc000e7c930, 0x1052b80, 0xc00022e148, 0x197, 0x1052b80, 0xc0001c0bc8, 0x197, 0x13a44c0, 0x1052b80)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:396 +0x23d
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1052b80, 0xc00022e148, 0x197, 0x1052b80, 0xc0001c0bc8, 0x197)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:266 +0xd10
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1171100, 0xc00022e108, 0x199, 0x1171100, 0xc0001c0b88, 0x199, 0x13a44c0, 0x1171100)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1171100, 0xc00022e108, 0x199, 0x1171100, 0xc0001c0b88, 0x199)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1171020, 0xc00022e000, 0x199, 0x1171020, 0xc0001c0a80, 0x199, 0x13a44c0, 0x1171020)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1171020, 0xc00022e000, 0x199, 0x1171020, 0xc0001c0a80, 0x199)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1208a60, 0xc00022e000, 0x16, 0x1208a60, 0xc0001c0a80, 0x16)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:244 +0x1af5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.Equal(0x1208a60, 0xc00022e000, 0x1208a60, 0xc0001c0a80, 0xc000b9e240, 0x2, 0x2, 0x0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:86 +0x16b
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.Diff(0x1208a60, 0xc00022e000, 0x1208a60, 0xc0001c0a80, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:100 +0x129
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageAutoscaling(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
/go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:339 +0x25d
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageDeployment(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
/go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:213 +0x4dd
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageStack(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
/go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:54 +0x100
github.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStacks(0xc0001ed2c0, 0x1223a9d, 0x8, 0x12284fe, 0xe, 0xc000053700, 0x18, 0x0, 0x0, 0xc0000ed359, ...)
/go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:49 +0x2ac
github.com/zalando-incubator/stackset-controller/controller.(*StackSetController).Run.func1(0x8, 0x12a2af0)
/go/src/github.com/zalando-incubator/stackset-controller/controller/stackset.go:105 +0x19d
github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc00071f4a0, 0xc00071f7a0)
/go/src/github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
created by github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup.(*Group).Go
/go/src/github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66
currently I can set traffic on a version that doesn't exist. It will correctly reduce the desired traffic to the existing versions but will not add the new version to the annotation (as it doesn't exist).
The actual traffic is not changed so it doesn't do much harm.
The client should probably fail or at least print something.
Description:
Prior to kubernetes-training/pull/32, when using the provided stackset-template as-is, CDP builds will complain about missing application-labels, even though one is provided on the StackSet
-level (see screenshot):
Expected behavior:
Metadata (application) labels should be inherited by the stackTemplate's podTemplate without the need for manual additions (or the example should include said change, see PR mentioned earlier).
Actual Behavior:
A user has to apply this patch to the stackset.yaml
file in order to make the warning go away.
diff --git a/deploy/apply/stackset.yaml b/deploy/apply/stackset.yaml
index fbbf351..a3fe84d 100644
--- a/deploy/apply/stackset.yaml
+++ b/deploy/apply/stackset.yaml
@@ -19,6 +19,9 @@ spec:
replicas: {{{REPLICAS}}}
# full Pod template.
podTemplate:
+ metadata:
+ labels:
+ application: "{{{APPLICATION}}}"
spec:
containers:
- name: "{{{APPLICATION}}}"
mikkeloscar [22 minutes ago]
Hey, can I ask you a somewhat related question? Do you know of any functions for injecting default PodTemplateSpec values? I.e. the defaults you get when you submit a pod spec where not everything is defined. Currently I do it like this:but maybe there's a better way?stackset-controller/controller/stack.go
Lines 482 to 567 in 2097b1d
liggitt [22 minutes ago]
for what purpose?liggitt [22 minutes ago]
the defaulting is part of the apiserver and isn't available to external clientsmikkeloscar [21 minutes ago]
What I'm trying to achieve is a way to determine if a need to update a resource or not. I have a CRD which "wraps" a deployment and I want to check if the deployment is matching the pod spec defined in the CRD.liggitt [14 minutes ago]
Attempting to match exactly with client side defaulting isn't a reliable way to do thatliggitt [14 minutes ago]
You will drift as new fields are added server sideliggitt [14 minutes ago]
and not be able to account for changes made by admission plugins, etcliggitt [13 minutes ago]
a more reliable way is to use metadata.generationliggitt [12 minutes ago]
when your custom resource spec changes, it bumps metadata.generation (if you've enabled spec/status for your CRD) (edited)liggitt [11 minutes ago]
your controller can note the observed generation it has reacted to in your custom resource status.observedGenerationliggitt [10 minutes ago]
and record the generation of the resulting deployment it creates in your custom resource as status.deploymentGenerationliggitt [9 minutes ago]
then, whenever your custom resource's metadata.generation != status.observedGeneration or the deployment's metadata.generation != your custom resource's status.deploymentGeneration, your controller can update the deployment spec and update your custom resource's status.deploymentGeneration (edited)mikkeloscar [4 minutes ago]
I didn't consider the admission plugins, that a good point. I will look into the generation idea. Thanks a lot once again!!
The current workflow to release this project does not include any release notes.
To document for us and the public audience we should use GH releases and set up release notes similar we do in in skipper, for example: https://github.com/zalando/skipper/releases/tag/v0.10.122
The status
section of StackSets contains useful information about the readiness of the StackSet. However, there's more information that could be exposed there so that tools probing for its readiness would have a single, reliable point to look at.
Useful inforamtion includes but isn't limited to:
kubectl delete stackset <foo>
removes the StackSet
but doesn't cleanup the Stack
objects.
Once the finalizer
is removed manually it gets cleanup up. So I guess, the deletion handler currently just doesn't process the finalizers.
upstream issue kubernetes/client-go#374
One fix that would work is wrapping the code with an external timeout, which is not nice but works:
done := make(chan bool, 1) // NB: buffered
go func() {
operationWithoutTimeout()
done <- true
}()
// variance control of operation
select {
case <-done:
case <-time.After(3 * time.Millisecond):
}
When I deploy for the first time and my stackset is broken in someway (e.g. typo in the image name) desired traffic is set to 100% on that initial stack. Subsequent successful deployments will have actual traffic evenly distributed among them.
Is this actually the desired behavior?
I would expect 100% traffic assigned to the very first successful deployment.
Current the controller is tied to Skipper as an ingress controller and the dependency is for the traffic switching annotation. The controller should support other ingress providers as well such like Traefik which works exactly the same as skipper with the annotation traefik.ingress.kubernetes.io/service-weights
. Nginx behaves a little differently because it doesn't define the multiple backends on the same ingress but allows multiple ingresses with the same host.
https://github.bus.zalan.do/teapot/cluster-registry-deploy/pull/39
stackset controller manages the Service object, so users shouldn't mess with it. However, currently it creates a Service for each version with a corresponding name. With this, cluster-local client applications that intend to use the Service's DNS name don't have a stable access point anymore because they always include the version.
Let's also setup a unversioned Service object that points to some application pods. They can't be traffic switched so the question is, what should be the selector?
When testing changes to the stackset-controller it's hard to do so in a cluster where it's already deployed because it will own ALL stacks in the cluster.
We should have a concept of ownership where an annotation on the stackset would let the controller know if it owns it or not. Similar to the ingress class annotation.
We don't have many tests yet. Since stackset-controller is now in production, we might receive feature requests /bug reports, adding tests will help us deliver these more quickly. As we make the code more "testable", this might also improve the quality.
Broken StackSets are sometimes considered ready. Therefore traffic will be switched to it and results in downtime.
/cc @dryewo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.