zalando-incubator / stackset-controller Goto Github PK

Opinionated StackSet resource for managing application life cycle and traffic switching in Kubernetes

License: MIT License

Dockerfile 0.05% Makefile 0.60% Go 98.46% Shell 0.89%

stack traffic-switching stackset-controller crd kubernetes blue-green zalando stackset cloud

stackset-controller's Issues

Controller tries to scale above `maxReplicas` and gets stuck

Currently stackset may decide to pre-scale the deployment to a number of pods higher than what is specified inside maxReplicas. Since HPA can not go beyond maxReplicas stackset continues waiting and never completes the switch.

Proposed change is to cap the prescaling based on maxReplicas

Issue getting stackset-controller running

I am trying to get stackset-controller functional. So far all I have done is take the files in the docs necessary for the controller - rbac, deployment, crds - and wrap them in a simple Helm chart.

I then have tried to deploy the example "StackSet" and while I got that it was created, I neither see any other objects created or log entries from the controller! I am running the controller with the --debug flag and even that is blank.

I do see some errors in the API server however. The first one I suspect is irrelevant:

E1116 20:06:57.456838       1 naming_controller.go:316] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again
E1116 20:07:02.727043       1 establishing_controller.go:105] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try
again
I1116 20:13:01.464879       1 controller.go:597] quota admission added evaluator for: {zalando.org stacksets}
E1116 20:21:46.876114       1 naming_controller.go:316] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again
E1116 20:21:51.930774       1 establishing_controller.go:105] stacksets.zalando.org failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "stacksets.zalando.org": the object has been modified; please apply your changes to the latest version and try again

I am also seeing a bunch of:

I1116 20:28:28.654757       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m22s
I1116 20:36:50.655642       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=9m24s
I1116 20:46:14.659708       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m22s
I1116 20:54:36.663718       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m44s
I1116 21:03:20.665079       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=7m16s
I1116 21:10:36.665996       1 get.go:245] Starting watch for /apis/zalando.org/v1/stacksets, rv=67688687 labels= fields= timeout=8m28s

Here is my StackSet out of the API server:

items:
- apiVersion: zalando.org/v1
  kind: StackSet
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"zalando.org/v1","kind":"StackSet","metadata":{"annotations":{},"name":"my-app","namespace":"kube-system"},"spec":{"ingress":{"backendPort":80,"hosts":["nginx.dev.domain.com"]},"stackLifecycle":{"limit":5,"scaledownTTLSeconds":300},"stackTemplate":{"spec":{"horizon
talPodAutoscaler":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","targetAverageUtilization":50},"type":"Resource"}],"minReplicas":3},"podTemplate":{"spec":{"containers":[{"image":"nginx","name":"nginx","ports":[{"containerPort":80,"name":"ingress"}],"resources":{"limits":{"cpu":"
10m","memory":"50Mi"},"requests":{"cpu":"10m","memory":"50Mi"}}}]}},"replicas":3,"version":"v2"}}}}
    creationTimestamp: 2018-11-16T20:23:01Z
    generation: 1
    name: my-app
    namespace: kube-system
    resourceVersion: "67688687"
    selfLink: /apis/zalando.org/v1/namespaces/kube-system/stacksets/my-app
    uid: 6a0442aa-e9dd-11e8-a879-0a6c577faf70
  spec:
    ingress:
      backendPort: 80
      hosts:
      - nginx.dev.domain.com
    stackLifecycle:
      limit: 5
      scaledownTTLSeconds: 300
    stackTemplate:
      spec:
        horizontalPodAutoscaler:
          maxReplicas: 10
          metrics:
          - resource:
              name: cpu
              targetAverageUtilization: 50
            type: Resource
          minReplicas: 3
        podTemplate:
          spec:
            containers:
            - image: nginx
              name: nginx
              ports:
              - containerPort: 80
                name: ingress
              resources:
                limits:
                  cpu: 10m
                  memory: 50Mi
                requests:
                  cpu: 10m
                  memory: 50Mi
        replicas: 3
        version: v2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

and my Deployment spec

piVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: 2018-11-16T20:21:46Z
  generation: 2
  labels:
    app.kubernetes.io/instance: stackset-controller
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: stackset-controller
    helm.sh/chart: stackset-controller-0.1.4
  name: stackset-controller
  namespace: kube-system
  resourceVersion: "67702210"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/stackset-controller
  uid: 3dcc6f23-e9dd-11e8-8400-06fe3d4492c6
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: stackset-controller
      app.kubernetes.io/name: stackset-controller
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: stackset-controller
        app.kubernetes.io/name: stackset-controller
    spec:
      containers:
      - args:
        - --debug
        image: registry.opensource.zalan.do/teapot/stackset-controller:latest
        imagePullPolicy: Always
        name: stackset-controller
        resources:
          limits:
            cpu: 10m
            memory: 128Mi
          requests:
            cpu: 10m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: stackset-controller
      serviceAccountName: stackset-controller
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-11-16T20:21:50Z
    lastUpdateTime: 2018-11-16T20:21:50Z
    message: Deployment has minimum availability.
    status: "True"
    type: Available
  - lastTransitionTime: 2018-11-16T20:21:46Z
    lastUpdateTime: 2018-11-16T21:08:43Z
    message: ReplicaSet "stackset-controller-84cb74f794" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Add support to stackLifecycle

Implement the field stackLifecycle

apiVersion: zalando.org/v1
kind: StackSet
metadata:
  name: my-app
spec:
  # optional Ingress definition.
  ingress:
    hosts: [my-app.example.org, alt.name.org]
  stackLifecycle:
    scaledownTTLSeconds: 300
    limit: 5
  stackTemplate:
   ....

stackLifecycle definition tells when stackset-controller should clean up the resources from and old stack that gets no traffic anymore.

scaledownTTLSeconds: number of seconds to wait before scaling down the replicas of a stack to 0
limit: at most N stacks should have no traffic

Don't store desired Traffic Policy on the Ingress object.

It might makes sense to extract the information about the desired traffic policy into StackSet, Stack or a dedicated resource.

Currently, the desired traffic policy is stored as an annotation on the Ingress object (for historical reasons). We are already using a different annotation from what skipper needs [0] so there's really no reason to have the new annotation on the Ingress.

It should either be a real field on:

The StackSet, since it already defines the corresponding DNS name that's traffic switched
The individual Stack objects could have a traffic weight field on themselves
A dedicated resource TrafficPolicy that describes how traffic is shaped and doesn't live in the StackSet

[0] we set zalando.org/stack-traffic-weights and skipper reads zalando.org/backend-weights

Forbid setting stack lifecycle limit to 1 or support it properly

Setting stackLifecycle.limit to 1 causes the stackset controller to constantly delete the newly created stack in GC (because it doesn't have traffic) and then recreate it on the next iteration. This should either work correctly as a special case or be disallowed by validation.

add travis CI tests

configure travis
waiting for travis being enabled for this project

Prevent deletion of Stacks still getting traffic.

Currently, if you delete an application stack $ kubectl delete stack <stack-name> Kubernetes will automatically clean up all dependent resources including the Deployment and Service resources.

The resource could be "protected" with a Finalizer so the controller can halt deleting if the stack is still getting traffic. If this happens it should issue events about it (#12).

All-zero weights: is it allowed and how is traffic distributed

There is unclarity how an all-zero traffic weight is handled and interpreted.

Handled (currently):

rejected by at least one client (zkubectl)
Not handled by the server (stackset controller allows this setting)

Interpretation (currently)

stackset-controller resolves all-zero weights to equal traffic distribution
- this seems undesireable by a majority of users at the moment

Clarify what's the desired behaviour:

Should all-zero weights be possible at all
- where to make the validation and reject the request?
How to interpret all-zero weights if allowed
- equal distribution or zero traffic

Unbounded Traffic Switch leads to slow actual switch and high number of pods

Situation before traffic switching

We have 2 stacks running:

rendering-engine-master-356 with 75 pods and 90% traffic
rendering-engine-master-358 with 75 pods and 10% traffic

We realised an issue on rendering-engine-master-358 so we wanted to revert back to rendering-engine-master-356.

Switching traffic back

zkubectl traffic rendering-engine rendering-engine-master-356 100

Problem

Stack rendering-engine-master-356 was prescaled again - but to a very high number of pods: 250 which is much more than the sum of all the pods handling traffic before traffic switch (75+75=150)

Please also note that 250 is currently our maxReplicas limit so I wonder what values would have been picked without this limit.

Two problems arise from this unbounded and broken traffic switch:

Slow traffic switch (~ 10 mins) due to the high number of pods waiting to be ready - in this case, functionality was not totally broken so it was okay-ish but with a real production issue, it is unacceptable to have to wait for very long time due to incorrect prescaling
Downstream services suffer because all these new pods initiate new connections

Missing way to add annotations to deployment

Hi,

I couldn't find a way to add annotations to a deployment. For service, ingress and even pods it's straight forward. The annotation is at Zalando to define a custom log parsers for Scalyr.
Maybe I just missed the way to achieve it.

ScaledownTTLSeconds should mean "since no traffic" everywhere.

The ScaledownTTLSeconds field was intended to have the meaning of scaling down a stack that has not been getting traffic for ScaledownTTLSeconds:

stackset-controller/controller/stack.go

Line 173 in b528bc9

 if !noTrafficSince.IsZero() && time.Since(noTrafficSince) > ssc.ScaledownTTL() { 

However in the stack Garbage collection code it has the meaning of deleting a stack that is not getting traffic and where the CREATION_TIME is older than ScaledownTTLSeconds: https://github.com/zalando-incubator/stackset-controller/blob/master/controller/stackset.go#L668

We should ensure that stacks are only deleted if they have not been getting traffic for ScaledownTTLSeconds independently of their age.

add user docs

how to setup the tools stackset-controller + skipper + optional RBAC
how to use from user point of view (explain traffic) or write a small kubectl plugin stack
how to create a development environment

Separate TTL for newly created stacks

#71 implemented a hacky solution to a problem where the controller scaled down stacks before the deployment even had a chance to switch traffic to those. However, this behaviour is not exactly intuitive, and could lead to deployments being kept alive for longer than necessary if the user quickly switches traffic back and forth. A cleaner approach would be to mark newly created stacks with an annotation that's erased when they get traffic and introducing a separate TTL for stacks in this state.

Find Stacks from Stackset via Label Selector

TODO

"Force" delete Stack objects

kubectl delete stack <foo> seems to hang and never deletes the stack.

This seems to be by design due to the finalizer. Let's deal with it later.

Add support for backend and service ports

Right now we assume serviceport==targetPort, but it should be optionally configurable

Make it possible to verify whether resources were updated or not

It's impossible to figure out whether the controller has processed the changes and updated the resources or not. The following information needs to be exposed so it's possible to figure out if the updates are done or still in progress:

observedGeneration in Stackset.status, updated only after the changes to the stacks are done
observedGeneration on Stack.status, updated only after the subresources have been changed
currentStackName in Stackset.status to avoid duplicating the weird default and name generation logic

Regression: StackSetController chokes on single broken stack

TBD --- just a placeholder

Define OpenAPI v3 Schemas for CRDs

Currently there are no schemas defined for the Stack and StackSet CRDs making it very easy to make mistakes e.g. in the PodTemplateSpec which would render the resource unusable.

Validation should be added to prevent user mistakes.

Remove requirement to increase the StackSet version

StackSet and Stack resources are inspired by the relationship between ReplicaSet and Pods (and also Deployment and ReplicaSet I would say).

When I modify a Deployment object (an in-place update) Kubernetes will seek ReplicaSets matching the desired podTemplate spec and create them for the new version if not present.

Similarly, when I update a StackSet in-place and change the stackTemplate spec a new Stack is created for that version. However, to make it really working one also has to increase the stackVersion field which is transparent in the Deployment/ReplicaSet case (presumeably via the pod-template-hash label).

Send relevant events to Stack/StackSet resources.

The controller should send relevant events to the Stack and StackSet resources so users know what it's doing.

Define a cleaner format to interact with the HPA in the stackset controller

At the moment the stackset-controller just supports Horizontal Pod Autoscaler Spec under the horizontalPodAutoscaler key as is. Therefore, users also have the responsibility to put sensible defaults. The format can also get particularly messy when configuring HPA using custom metrics.

It might make sense for us to provide a more minimalistic key e.g Autoscaler with sensible defaults and a cleaner way to provide customer scaling metrics.

Include ConfigMaps and Secrets into the spec

When deployments reference a ConfigMap or Secret they do get updated in place when a new blue/green rollout is triggered. This can already have unintended effects on the old deployment.

We could include ConfigMaps as a subsection of the spec, similar to ingress, so that the controller can transparently create and reference dedicated config maps per version (similar to kustomize that creates dedicated config maps for each overlay: https://github.com/kubernetes-sigs/kustomize/blob/37f03b4d018235d1a26dda1f031d374776b381a7/examples/ldap/base/kustomization.yaml#L4-L7)
Alternatively, we could use CDP_BUILD_VERSION version as part of the ConfigMap's name and its reference in the podTemplateSpec. This would create a new ConfigMap for each rollout with immutable content. Old versions could be cleaned up by leveraging ownerReference from the Deployment to the ConfigMap, so that once the Stack is deleted the old ConfigMap gets cleaned up too. (Assuming ownerReferences work for ConfigMaps)

See https://github.bus.zalan.do/teapot/tokeninfo-router/pull/10/files for an example of a Deployment using a ConfigMap.

Latest stack without traffic can't be deleted

When I deploy two versions of an app it is not possible to delete the latest stack even though it has no traffic. If it is in a crash loop or smoke tests failed I would like to get rid of that stack. Currently the stack is being recreated. This behavior might be surprising and confusing.

Should we adopt this? Any thoughts?

setting desired and current anotations will be reset

@linki and @arjunrn found a problem, if you set the traffic annotations manually for emergency reasons, stackset-controller will modify current to the old value.

current: zalando.org/backend-weights
desired: zalando.org/stack-traffic-weights

Remove confusing error message due to timeout/watch

These error messages coming from glog, but we should delete these log entries, because the behaviour is intended and the client-go timeout works for WATCH, but not for all other API calls.

...
ERROR: logging before flag.Parse: E0808 14:07:03.579958       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0808 14:07:33.581389       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)

dev environment does not deploy the right image

% kubectl logs stackset-controller-56449647cf-rtjt8
Error from server (BadRequest): container "stackset-controller" in pod "stackset-controller-56449647cf-rtjt8" is waiting to start: trying and failing to pull image
zsh: exit 1 kubectl logs stackset-controller-56449647cf-rtjt8

But delivery.yaml builds IMAGE=registry-write.opensource.zalan.do/teapot/stackset-controller-test with "-test"

Pass-on annotations to Ingress objects

apiVersion: zalando.org/v1
kind: StackSet
metadata:
  name: cluster-registry
spec:
  ingress:
    annotations:
      zalando.org/skipper-filter: ...

Custom annotations specified by users should be passed-on to the generated Ingress object.

Collect more metrics for monitoring and alerting

The prometheus endpoint should expose more metrics that allow to detect whether stackset controller is running well.

Some possibilities:

number of errors / error rate
number of stacksets / stacks under control

controller does only partial cleanups

We get in skipper logs the following errors, because we clean a svc object, but not the link to the service within ingress objects, which we should do, too:

[APP]time="2018-12-12T22:39:50Z" level=error msg="convertPathRule: Failed to get service default, cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef, ingress"

ingress

metadata:
  creationTimestamp: 2018-09-04T16:04:59Z
  generation: 1
  labels:
    deployment-id: d-2mctyfio8r3y2f535k2zocoupk
    stack-version: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
    stackset: cdp-e2e-cdp-cd-robot
  name: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef 
  namespace: default
  ownerReferences:
  - apiVersion: zalando.org/v1
    kind: Stack
    name: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
    uid: 3fc086e8-b05c-11e8-96bd-022ad150ebe8
  resourceVersion: "210581283"
spec:
  rules:
  - host: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef.example.org
    http:
      paths:
      - backend:
          serviceName: cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
          servicePort: ingress

svc

% kubectl get svc cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
Error from server (NotFound): services "cdp-e2e-cdp-cd-robot-e5eb20050a35b3d718d8fd603a20b6bf6f8934ef" not found

stackset

spec:
  ingress:
    hosts:
    - cdp-e2e-cdp-cd-robot.example.org
  stackLifecycle:
    limit: 5
    scaledownTTLSeconds: 300
  stackTemplate:
    spec:
      podTemplate:
        metadata:
          labels:
            application: cdp-e2e-cdp-cd-robot
        spec:
          containers:
          - env:
            - name: VERSION
              value: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef
            image:___
            lifecycle:
              preStop:
                exec:
                  command:
                  - sleep
                  - "5"
            name: cdp-e2e-cdp-cd-robot
            ports:
            - containerPort: 8080
              name: ingress
            readinessProbe:
              httpGet:
                path: /health
                port: 8080
      replicas: 1
      version: e5eb20050a35b3d718d8fd603a20b6bf6f8934ef

Support dark release deployment

An interesting feature instead of starting with canary, and directly shifting traffic to your new stack, it would be nice to first push some shadow traffic to the new stack, and then decide to switch real traffic.

https://www.functionize.com/blog/what-is-canary-testing-and-dark-launching/

custom autoscaling does not work

I created a custom autoscaling with skipper requests per second like this in my stackset:

      horizontalPodAutoscaler:
        minReplicas: 2
        maxReplicas: 10
        metrics:
        - type: Object
          object:
            metricName: requests-per-second
            target:
              apiVersion: extensions/v1beta1
              kind: Ingress
              name: "{{{APPLICATION}}}"
          targetValue: 100

The created stack contains a wrong targetValue:

  horizontalPodAutoscaler:
    maxReplicas: 10
    metadata:
      creationTimestamp: null
    metrics:
    - object:
        metricName: requests-per-second
        target:
          apiVersion: extensions/v1beta1
          kind: Ingress
          name: sandbox-tokeninfo-bridge
        targetValue: "0"
      type: Object
    minReplicas: 2

This is the log I see in stackset-controller:

time="2018-10-04T15:45:38Z" level=info msg="Event(v1.ObjectReference{Kind:\"Stack\", Namespace:\"default\", Name:\"sandbox-tokeninfo-bridge-pr-22-3\", UID:\"42a39bcb-c7ea-11e8-bd06-060172d696fe\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"245331352\", FieldPath:\"\"}): type: 'Normal' reason: 'CreateHPA' Creating HPA default/sandbox-tokeninfo-bridge-pr-22-3 for Deployment default/sandbox-tokeninfo-bridge-pr-22-3"
time="2018-10-04T15:45:38Z" level=error msg="Failed to manage Stack default/sandbox-tokeninfo-bridge-pr-22-3: HorizontalPodAutoscaler.autoscaling \"sandbox-tokeninfo-bridge-pr-22-3\" is invalid: spec.metrics[0].object.targetValue: Required value: must specify a positive target value"

A single broken Stack or StackSet shouldn't affect processing of other Stacks

A faulty stackset.yaml (we don't validate the content yet) led to this in the logs:

ERROR: logging before flag.Parse: E0904 16:04:08.607962       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:09.635793       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:10.656678       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:11.686808       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:12.693538       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:13.702829       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:14.714000       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:15.767220       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:16.771295       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:17.774734       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:18.779158       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:19.787391       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...
ERROR: logging before flag.Parse: E0904 16:04:20.803426       1 reflector.go:205] github.com/zalando-incubator/stackset-controller/controller/stackset.go:482: Failed to list *v1.StackSet: v1.StackSetList.Items: []v1.StackSet: v1.StackSet.Spec: v1.StackSetSpec.StackTemplate: v1.StackTemplate.Spec: v1.StackSpecTemplate.StackSpec: Service: v1.StackServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: �, error found in #10 byte of ...|","port":"80","proto|..., bigger context ...|cas":2,"service":{"ports":[{"name":"http","port":"80","protocol":"TCP","targetPort":"8080"}]},"versi|...

separate logging facilities

These logs here should be all more considered as debug logs

time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-82 for StackSet stack default/cluster-registry-master-82" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-79 for StackSet stack default/cluster-registry-master-79" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-80 for StackSet stack default/cluster-registry-master-80" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:35Z" level=info msg="Updating Deployment default/cluster-registry-master-81 for StackSet stack default/cluster-registry-master-81" controller=stacks namespace=default stackset=cluster-registry
time="2018-09-10T07:24:45Z" level=info msg="Updating Deployment default/cluster-registry-master-79 for StackSet stack default/cluster-registry-master-79" controller=stacks namespace=default stackset=cluster-registry

This is more client-go library internal noise.

ERROR: logging before flag.Parse: E0910 07:24:31.764057       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)

Traffic Switch issues with CPU and RPS scaling enabled

We have an app running around 30 pods.

When we switch traffic, we have noticed 2 issues which appeared after we added RPS scaling:

In the following example, we are running stackset rendering-engine-986 and trying to deploy stackset rendering-engine-990:

First it starts scaling previous more than stackset 2 times, meaning we end up with 70 pods on current stackset.
Second, when it spawns the new stackset, it seems to request a correct amount of pods (35) but then it quickly requests an insane amount of pods and we go up to 87.

Fortunately after a while everything goes back to normal but these should be investigated.

Zalando internal issue with more context: https://github.bus.zalan.do/teapot/issues/issues/1548

Deleting stackset does not delete deployment or stacks

I tried retiring my test application in stups-test by deleting my stackset:

$ zk delete stackset my-app-jylipoti
stackset.zalando.org "my-app-jylipoti" deleted

However the delete did not cascade to other resources, i.e. stack my-app-jylipoti-pr-1-1 is still present and so is deployment my-app-jylipoti-pr-1-1.

panic because of unexported field from hpa

This will not heal itself and get into a forever crashloop

panic: cannot handle unexported field: {*v2beta1.HorizontalPodAutoscaler}.Spec.Metrics[0].Object.TargetValue.i
consider using AllowUnexported or cmpopts.IgnoreUnexported

goroutine 138 [running]:
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.invalid.apply(0xc000e7c930, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/options.go:208 +0xf5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).tryOptions(0xc000e7c930, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x13a44c0, 0x11acfc0, 0x0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:307 +0x149
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x11acfc0, 0xc000b301f0, 0x1b9, 0x11acfc0, 0xc0006161f0, 0x1b9)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:202 +0x336
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x11a7820, 0xc000b301f0, 0x199, 0x11a7820, 0xc0006161f0, 0x199, 0x13a44c0, 0x11a7820)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x11a7820, 0xc000b301f0, 0x199, 0x11a7820, 0xc0006161f0, 0x199)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1185b20, 0xc000b301b0, 0x199, 0x1185b20, 0xc0006161b0, 0x199, 0x13a44c0, 0x1185b20)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1185b20, 0xc000b301b0, 0x199, 0x1185b20, 0xc0006161b0, 0x199)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1193b20, 0xc000e1a010, 0x196, 0x1193b20, 0xc0004bc580, 0x196)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:244 +0x1af5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199, 0x13a44c0, 0x1185920)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).statelessCompare(0xc000e7c930, 0x1185920, 0xc000e1a000, 0x199, 0x1185920, 0xc0004bc570, 0x199, 0xc00022e128, 0xc000215ec0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:176 +0xc9
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareArray.func1(0x0, 0x0, 0x0, 0x1e878c0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:398 +0x108
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/internal/diff.Difference(0x1, 0x1, 0xc0002160b0, 0x1, 0x0, 0x0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/internal/diff/diff.go:212 +0x244
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareArray(0xc000e7c930, 0x1052b80, 0xc00022e148, 0x197, 0x1052b80, 0xc0001c0bc8, 0x197, 0x13a44c0, 0x1052b80)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:396 +0x23d
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1052b80, 0xc00022e148, 0x197, 0x1052b80, 0xc0001c0bc8, 0x197)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:266 +0xd10
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1171100, 0xc00022e108, 0x199, 0x1171100, 0xc0001c0b88, 0x199, 0x13a44c0, 0x1171100)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1171100, 0xc00022e108, 0x199, 0x1171100, 0xc0001c0b88, 0x199)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareStruct(0xc000e7c930, 0x1171020, 0xc00022e000, 0x199, 0x1171020, 0xc0001c0a80, 0x199, 0x13a44c0, 0x1171020)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:503 +0x169
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1171020, 0xc00022e000, 0x199, 0x1171020, 0xc0001c0a80, 0x199)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:272 +0x2522
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.(*state).compareAny(0xc000e7c930, 0x1208a60, 0xc00022e000, 0x16, 0x1208a60, 0xc0001c0a80, 0x16)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:244 +0x1af5
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.Equal(0x1208a60, 0xc00022e000, 0x1208a60, 0xc0001c0a80, 0xc000b9e240, 0x2, 0x2, 0x0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:86 +0x16b
github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp.Diff(0x1208a60, 0xc00022e000, 0x1208a60, 0xc0001c0a80, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/github.com/google/go-cmp/cmp/compare.go:100 +0x129
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageAutoscaling(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
        /go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:339 +0x25d
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageDeployment(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
        /go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:213 +0x4dd
github.com/zalando-incubator/stackset-controller/controller.(*stacksReconciler).manageStack(0xc0002198a0, 0xc0006d9de4, 0x5, 0xc0006d9df0, 0xe, 0xc0003de200, 0x20, 0x0, 0x0, 0xc0006d9e60, ...)
        /go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:54 +0x100
github.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStacks(0xc0001ed2c0, 0x1223a9d, 0x8, 0x12284fe, 0xe, 0xc000053700, 0x18, 0x0, 0x0, 0xc0000ed359, ...)
        /go/src/github.com/zalando-incubator/stackset-controller/controller/stack.go:49 +0x2ac
github.com/zalando-incubator/stackset-controller/controller.(*StackSetController).Run.func1(0x8, 0x12a2af0)
        /go/src/github.com/zalando-incubator/stackset-controller/controller/stackset.go:105 +0x19d
github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc00071f4a0, 0xc00071f7a0)
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
created by github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup.(*Group).Go
        /go/src/github.com/zalando-incubator/stackset-controller/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66

setting desired traffic to a non-existing version should fail

currently I can set traffic on a version that doesn't exist. It will correctly reduce the desired traffic to the existing versions but will not add the new version to the annotation (as it doesn't exist).

The actual traffic is not changed so it doesn't do much harm.

The client should probably fail or at least print something.

Apply application label metadata to podTemplate

Description:

Prior to kubernetes-training/pull/32, when using the provided stackset-template as-is, CDP builds will complain about missing application-labels, even though one is provided on the StackSet-level (see screenshot):

Expected behavior:

Metadata (application) labels should be inherited by the stackTemplate's podTemplate without the need for manual additions (or the example should include said change, see PR mentioned earlier).

Actual Behavior:

A user has to apply this patch to the stackset.yaml file in order to make the warning go away.

diff --git a/deploy/apply/stackset.yaml b/deploy/apply/stackset.yaml
index fbbf351..a3fe84d 100644
--- a/deploy/apply/stackset.yaml
+++ b/deploy/apply/stackset.yaml
@@ -19,6 +19,9 @@ spec:
       replicas: {{{REPLICAS}}}
       # full Pod template.
       podTemplate:
+        metadata:
+          labels:
+            application: "{{{APPLICATION}}}"
         spec:
           containers:
           - name: "{{{APPLICATION}}}"

Use resource generations to determine if resources are up-to-date

mikkeloscar [22 minutes ago]
Hey, can I ask you a somewhat related question? Do you know of any functions for injecting default PodTemplateSpec values? I.e. the defaults you get when you submit a pod spec where not everything is defined. Currently I do it like this:

stackset-controller/controller/stack.go

Lines 482 to 567 in 2097b1d

func applyPodTemplateSpecDefaults(template v1.PodTemplateSpec) v1.PodTemplateSpec {

newTemplate := template.DeepCopy()

applyContainersDefaults(newTemplate.Spec.InitContainers)

applyContainersDefaults(newTemplate.Spec.Containers)

if newTemplate.Spec.RestartPolicy == "" {

newTemplate.Spec.RestartPolicy = v1.RestartPolicyAlways

}

if newTemplate.Spec.TerminationGracePeriodSeconds == nil {

gracePeriod := int64(v1.DefaultTerminationGracePeriodSeconds)

newTemplate.Spec.TerminationGracePeriodSeconds = &gracePeriod

}

if newTemplate.Spec.DNSPolicy == "" {

newTemplate.Spec.DNSPolicy = v1.DNSClusterFirst

}

if newTemplate.Spec.SecurityContext == nil {

newTemplate.Spec.SecurityContext = &v1.PodSecurityContext{}

}

if newTemplate.Spec.SchedulerName == "" {

newTemplate.Spec.SchedulerName = v1.DefaultSchedulerName

}

if newTemplate.Spec.DeprecatedServiceAccount != newTemplate.Spec.ServiceAccountName {

newTemplate.Spec.DeprecatedServiceAccount = newTemplate.Spec.ServiceAccountName

}

return *newTemplate

}

func applyContainersDefaults(containers []v1.Container) {

for i, container := range containers {

for j, port := range container.Ports {

if port.Protocol == "" {

containers[i].Ports[j].Protocol = v1.ProtocolTCP

}

}

for j, env := range container.Env {

if env.ValueFrom != nil && env.ValueFrom.FieldRef != nil && env.ValueFrom.FieldRef.APIVersion == "" {

containers[i].Env[j].ValueFrom.FieldRef.APIVersion = "v1"

}

}

if container.TerminationMessagePath == "" {

containers[i].TerminationMessagePath = v1.TerminationMessagePathDefault

}

if container.TerminationMessagePolicy == "" {

containers[i].TerminationMessagePolicy = v1.TerminationMessageReadFile

}

if container.ImagePullPolicy == "" {

containers[i].ImagePullPolicy = v1.PullIfNotPresent

}

if container.ReadinessProbe != nil {

if container.ReadinessProbe.Handler.HTTPGet != nil && container.ReadinessProbe.Handler.HTTPGet.Scheme == "" {

containers[i].ReadinessProbe.Handler.HTTPGet.Scheme = v1.URISchemeHTTP

}

if container.ReadinessProbe.TimeoutSeconds == 0 {

containers[i].ReadinessProbe.TimeoutSeconds = 1

}

if container.ReadinessProbe.PeriodSeconds == 0 {

containers[i].ReadinessProbe.PeriodSeconds = 10

}

if container.ReadinessProbe.SuccessThreshold == 0 {

containers[i].ReadinessProbe.SuccessThreshold = 1

}

if container.ReadinessProbe.FailureThreshold == 0 {

containers[i].ReadinessProbe.FailureThreshold = 3

}

}

if container.LivenessProbe != nil {

if container.LivenessProbe.Handler.HTTPGet != nil && container.LivenessProbe.Handler.HTTPGet.Scheme == "" {

containers[i].LivenessProbe.Handler.HTTPGet.Scheme = v1.URISchemeHTTP

}

if container.LivenessProbe.TimeoutSeconds == 0 {

containers[i].LivenessProbe.TimeoutSeconds = 1

}

if container.LivenessProbe.PeriodSeconds == 0 {

containers[i].LivenessProbe.PeriodSeconds = 10

}

if container.LivenessProbe.SuccessThreshold == 0 {

containers[i].LivenessProbe.SuccessThreshold = 1

}

if container.LivenessProbe.FailureThreshold == 0 {

containers[i].LivenessProbe.FailureThreshold = 3

}

}

}

}

but maybe there's a better way?

liggitt [22 minutes ago]
for what purpose?

liggitt [22 minutes ago]
the defaulting is part of the apiserver and isn't available to external clients

mikkeloscar [21 minutes ago]
What I'm trying to achieve is a way to determine if a need to update a resource or not. I have a CRD which "wraps" a deployment and I want to check if the deployment is matching the pod spec defined in the CRD.

liggitt [14 minutes ago]
Attempting to match exactly with client side defaulting isn't a reliable way to do that

liggitt [14 minutes ago]
You will drift as new fields are added server side

liggitt [14 minutes ago]
and not be able to account for changes made by admission plugins, etc

liggitt [13 minutes ago]
a more reliable way is to use metadata.generation

liggitt [12 minutes ago]
when your custom resource spec changes, it bumps metadata.generation (if you've enabled spec/status for your CRD) (edited)

liggitt [11 minutes ago]
your controller can note the observed generation it has reacted to in your custom resource status.observedGeneration

liggitt [10 minutes ago]
and record the generation of the resulting deployment it creates in your custom resource as status.deploymentGeneration

liggitt [9 minutes ago]
then, whenever your custom resource's metadata.generation != status.observedGeneration or the deployment's metadata.generation != your custom resource's status.deploymentGeneration, your controller can update the deployment spec and update your custom resource's status.deploymentGeneration (edited)

mikkeloscar [4 minutes ago]
I didn't consider the admission plugins, that a good point. I will look into the generation idea. Thanks a lot once again!!

add release notes

The current workflow to release this project does not include any release notes.
To document for us and the public audience we should use GH releases and set up release notes similar we do in in skipper, for example: https://github.com/zalando/skipper/releases/tag/v0.10.122

Enrich "status" section of StackSets with more information

The status section of StackSets contains useful information about the readiness of the StackSet. However, there's more information that could be exposed there so that tools probing for its readiness would have a single, reliable point to look at.

Useful inforamtion includes but isn't limited to:

availability of the generated Ingress' public DNS name

Deletion of StackSet doesn't cleanup Stacks and Deployments

kubectl delete stackset <foo> removes the StackSet but doesn't cleanup the Stack objects.

Once the finalizer is removed manually it gets cleanup up. So I guess, the deletion handler currently just doesn't process the finalizers.

client-go timeouts do not work for non WATCH

upstream issue kubernetes/client-go#374

One fix that would work is wrapping the code with an external timeout, which is not nice but works:

    done := make(chan bool, 1) // NB: buffered
    go func() {
        operationWithoutTimeout()
        done <- true
    }()
    // variance control of operation
    select {
    case <-done:
    case <-time.After(3 * time.Millisecond):
    }

Traffic to failed init stackset deployment

When I deploy for the first time and my stackset is broken in someway (e.g. typo in the image name) desired traffic is set to 100% on that initial stack. Subsequent successful deployments will have actual traffic evenly distributed among them.

Is this actually the desired behavior?

I would expect 100% traffic assigned to the very first successful deployment.

Add support for other ingress controller

Current the controller is tied to Skipper as an ingress controller and the dependency is for the traffic switching annotation. The controller should support other ingress providers as well such like Traefik which works exactly the same as skipper with the annotation traefik.ingress.kubernetes.io/service-weights. Nginx behaves a little differently because it doesn't define the multiple backends on the same ingress but allows multiple ingresses with the same host.

Create canonical Service object

https://github.bus.zalan.do/teapot/cluster-registry-deploy/pull/39

stackset controller manages the Service object, so users shouldn't mess with it. However, currently it creates a Service for each version with a corresponding name. With this, cluster-local client applications that intend to use the Service's DNS name don't have a stable access point anymore because they always include the version.

Let's also setup a unversioned Service object that points to some application pods. They can't be traffic switched so the question is, what should be the selector?

Make it possible to mark stacksets owned by a certain controller.

When testing changes to the stackset-controller it's hard to do so in a cluster where it's already deployed because it will own ALL stacks in the cluster.

We should have a concept of ownership where an annotation on the stackset would let the controller know if it owns it or not. Similar to the ingress class annotation.

add unit tests

We don't have many tests yet. Since stackset-controller is now in production, we might receive feature requests /bug reports, adding tests will help us deliver these more quickly. As we make the code more "testable", this might also improve the quality.

StackSet shouldn't become ready if it's broken

Broken StackSets are sometimes considered ready. Therefore traffic will be switched to it and results in downtime.

/cc @dryewo

	func applyPodTemplateSpecDefaults(template v1.PodTemplateSpec) v1.PodTemplateSpec {
	newTemplate := template.DeepCopy()

	applyContainersDefaults(newTemplate.Spec.InitContainers)
	applyContainersDefaults(newTemplate.Spec.Containers)

	if newTemplate.Spec.RestartPolicy == "" {
	newTemplate.Spec.RestartPolicy = v1.RestartPolicyAlways
	}
	if newTemplate.Spec.TerminationGracePeriodSeconds == nil {
	gracePeriod := int64(v1.DefaultTerminationGracePeriodSeconds)
	newTemplate.Spec.TerminationGracePeriodSeconds = &gracePeriod
	}
	if newTemplate.Spec.DNSPolicy == "" {
	newTemplate.Spec.DNSPolicy = v1.DNSClusterFirst
	}
	if newTemplate.Spec.SecurityContext == nil {
	newTemplate.Spec.SecurityContext = &v1.PodSecurityContext{}
	}
	if newTemplate.Spec.SchedulerName == "" {
	newTemplate.Spec.SchedulerName = v1.DefaultSchedulerName
	}
	if newTemplate.Spec.DeprecatedServiceAccount != newTemplate.Spec.ServiceAccountName {
	newTemplate.Spec.DeprecatedServiceAccount = newTemplate.Spec.ServiceAccountName
	}
	return *newTemplate
	}

	func applyContainersDefaults(containers []v1.Container) {
	for i, container := range containers {
	for j, port := range container.Ports {
	if port.Protocol == "" {
	containers[i].Ports[j].Protocol = v1.ProtocolTCP
	}
	}

	for j, env := range container.Env {
	if env.ValueFrom != nil && env.ValueFrom.FieldRef != nil && env.ValueFrom.FieldRef.APIVersion == "" {
	containers[i].Env[j].ValueFrom.FieldRef.APIVersion = "v1"
	}
	}
	if container.TerminationMessagePath == "" {
	containers[i].TerminationMessagePath = v1.TerminationMessagePathDefault
	}
	if container.TerminationMessagePolicy == "" {
	containers[i].TerminationMessagePolicy = v1.TerminationMessageReadFile
	}
	if container.ImagePullPolicy == "" {
	containers[i].ImagePullPolicy = v1.PullIfNotPresent
	}
	if container.ReadinessProbe != nil {
	if container.ReadinessProbe.Handler.HTTPGet != nil && container.ReadinessProbe.Handler.HTTPGet.Scheme == "" {
	containers[i].ReadinessProbe.Handler.HTTPGet.Scheme = v1.URISchemeHTTP
	}
	if container.ReadinessProbe.TimeoutSeconds == 0 {
	containers[i].ReadinessProbe.TimeoutSeconds = 1
	}
	if container.ReadinessProbe.PeriodSeconds == 0 {
	containers[i].ReadinessProbe.PeriodSeconds = 10
	}
	if container.ReadinessProbe.SuccessThreshold == 0 {
	containers[i].ReadinessProbe.SuccessThreshold = 1
	}
	if container.ReadinessProbe.FailureThreshold == 0 {
	containers[i].ReadinessProbe.FailureThreshold = 3
	}
	}
	if container.LivenessProbe != nil {
	if container.LivenessProbe.Handler.HTTPGet != nil && container.LivenessProbe.Handler.HTTPGet.Scheme == "" {
	containers[i].LivenessProbe.Handler.HTTPGet.Scheme = v1.URISchemeHTTP
	}
	if container.LivenessProbe.TimeoutSeconds == 0 {
	containers[i].LivenessProbe.TimeoutSeconds = 1
	}
	if container.LivenessProbe.PeriodSeconds == 0 {
	containers[i].LivenessProbe.PeriodSeconds = 10
	}
	if container.LivenessProbe.SuccessThreshold == 0 {
	containers[i].LivenessProbe.SuccessThreshold = 1
	}
	if container.LivenessProbe.FailureThreshold == 0 {
	containers[i].LivenessProbe.FailureThreshold = 3
	}
	}
	}
	}

zalando-incubator / stackset-controller Goto Github PK

stackset-controller's Issues

Situation before traffic switching

Switching traffic back

Problem

Recommend Projects

Recommend Topics

Recommend Org

Jobs