GithubHelp home page GithubHelp logo

zalando-incubator / kubernetes-on-aws Goto Github PK

View Code? Open in Web Editor NEW
620.0 36.0 163.0 19.18 MB

Deploying Kubernetes on AWS with CloudFormation and Ubuntu

Home Page: https://kubernetes-on-aws.readthedocs.io/

License: MIT License

Shell 7.83% Dockerfile 0.42% Makefile 0.84% Go 88.57% Python 2.35%
kubernetes kubernetes-cluster aws cloud

kubernetes-on-aws's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-on-aws's Issues

Updating a cluster will reset ASG size

./cluster.py update <stack-name> <version> will currently reset the ASG size (min, max and DesiredCapacity) as it updates the CF stack unconditionally.

Network policies: protect pods from being accessed by all other pod in the cluster

ensure that pods cannot connect to other pods in the cluster that they are not allowed to connect to, e.g. when running multiple patronis in the cluster.

this is not directly cluster lifecycle related but more how you configure the network policies inside. but we may have to run something in the cluster (e.g. calico node agent).

if we don't care b/c of autobahn then we should write it down here.

Integrate Spot Fleet

We should be able to trade cost-efficiency vs. availability (depending on the application) by balancing between on-demand and spot instances in a "smart" way.

use secret to configure appdynamics agents

follow up PR from #113

  • appdynamics secret should be a secret
  • appdynamics account needs to be configurable

for ease of configuration it may make sense to put them all in one secret, although e.g. appdynamics account name is not necessarily a secret value.

attaching volumes regularly gets stuck

error message is always the same

$ kubectl describe pod foo
...
  25m		1m		12	{kubelet ip-172-31-13-7.eu-central-1.compute.internal}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "etcd-1"/"acid". list of unattached/unmounted volumes=[data]

observations

  • ebs volume is "stuck" for days in AWS (forcing detach doesn't help)
  • reboot of the corresponding instance also doesn't help
  • happens on GKE as well
  • several failed units show up when sshing the box
  • kubelet logs show corresponding error messages

there are several issues reported upstream already
https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue%20is%3Aopen%20%22list%20of%20unattached%22

More node labels

Would be good to have more node labels available for node selectors. Especially: pool-name, #cpu, memory, gpu

Add CloudFormation SUCCESS signal (cfn-signal)

We should add a CloudFormation SUCCESS signal for master and worker nodes. The SUCCESS signal should only be send if everything comes up (on this node) successfully, i.e. kubelet started etc. This allows us to change the Senza file to actually wait on the SUCCESS signal (we do this by default with Taupage, but CoreOS is missing cfn-signal).

figure out split of CF template

having the entire cluster (including all pools) in a single CF stack or having different CF stacks for each set of nodes (masters, worker pool 1, worker pool 2...) is a quite fundamental design decision which we should decide in the team. /cc @mikkeloscar @Raffo @szuecs

Mint Bucket is hardcoded

The IAM policy currently has one mint bucket hardcoded, this will not work in different accounts.

login downtime during cluster update

I just realized that during update of the cluster i got an login connection time out. So it seems that the master is down while increasing the size of the cluster. This is somehow wired because new machines should be added to the cluster and not removed.

./cluster.py update kube-aws-test ghildebrand1 --instance-type=c4.2xlarge --worker-nodes=20

Kubernetes Ingress

Using Kubernetes Ingress with Skipper and SSL termination by ALB (ELBv2):

  • kube-ingress-aws-controller
    • Deployed as deployment with replicas=1
    • Watch on Ingress resources
    • Configure one Application Load Balancer (ALB/ELBv2) per used SSL certificate
    • Attach Auto Scaling Group to ALB (using skipper NodePort)
  • skipper with Kubernetes data client
    • Deployed as DaemonSet (thus running on every node)
    • Exposes NodePort on every EC2 instance
    • Watch on Ingress resources
    • Configure routes defined by host/path matching

kubernetes-ingress-with-skipper

Advantages:

  • SSL termination is done by ALB/ELB
    • No fiddling with SSL certs
    • We can use ACM
  • Skipper is battle-tested as it processes all Zalando customer traffic already
  • Skipper allows us to add more features in the future
    • OAuth verification
    • ...

Tasks for this cluster repo:

prevent users from execing into arbitrary pods

ensure that users can not exec into containers that they are not allowed to exec into.

this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster, implement the functionality in our webhook or configure our api server to not allow certain routes (there are flags for it).

if we don't care b/c of autobahn then we should write it down here.

Correct cluster teardown (clean up AWS resources: ELBs, DNS, ..)

on cluster delete we should delete resources that were created by kubernetes and friends. this can be done as part of cluster.py delete

what do delete

  • ELBs created via services of type LoadBalander
  • DNS records created via services of type LoadBalander (mate)
  • EBS volumes created by Kubernetes PVs
  • ... ?

or just everything?

Can't pull from private registry when setting up cluster.

Currently the appdynamics pod is using images from private docker registry. This fails when the cluster is setup because the pod is created before the imagePullSecret has been set on the service account by secretary.

Theoretically this issue could also occur if the imagePullSecret is changed right after a pod is scheduled (so the Pod has the old secret attached, assuming the old secret is expired).

Mate: cleanup DNS records

Mate also needs to remove DNS records (we only add/update right now).

Idea: flag "mate-managed" DNS records with an additional TXT record.

Updating a cluster leads to API server downtime

First observation: the ELB healthcheck leads to ASG terminating a new instance because the HealthCheckGracePeriod defaults to 5 minutes and the master node sometimes takes longer than 5 minutes to come up 😞

Cluster Lifecycle Management

The whole cluster lifecycle needs to be managed:

  • creating
  • updating (worker and master nodes) --- updating Launch Configuration and respawning nodes could be a first step (for non-stateful apps)
  • deleting (shutting down the cluster)

sshd unit failures

When ssh connection fails, the instance report failed units:

● sshd@x:22-y:z.service    loaded failed failed  OpenSSH per-connection server daemon (y:z)

Web hook: Internal Server Error on missing group

We are already tracking this issue elsewhere, but just a reminder to update the webhook to fix the Internal Server Error when the user misses the proper authorization group.

Error from server: an error on the server ("Internal Server Error: \"/api\"") has prevented the request from succeeding

Updating a cluster causes downtime for worker nodes

Updating the cluster with ./cluster.py update <stack-name> <version> will currently lead to a downtime for worker nodes as we are not waiting for node readiness (we just wait for ASG InService which does not indicate whether the node is completely up and registered).

NOTE: there is no downtime for master nodes while updating as we already correctly wait/check the ELB status for the master API server.

install-kube-system sometimes fails

journalctl -u install-kube-system
...
Dec 08 12:39:50 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Waiting for API server..
Dec 08 12:40:04 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "appdynamics-agent" created
Dec 08 12:40:05 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "kube2iam" created
Dec 08 12:40:07 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "prometheus-node-exporter" created
Dec 08 12:40:09 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: deployment "cluster-autoscaler" created
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Error from server: error when retrieving current configuration of:
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: &{0xc420d50900 0xc4204eaf50 kube-system heapster /srv/kubernetes/manifests/deployments/heapster.yaml &Deployment{ObjectMeta:k8s_io_kubernetes_pkg_api
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: 0000 --estimator=exponential] []  [] [{MY_POD_NAME  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:metadata.name,},ResourceFieldRe
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: from server for: "/srv/kubernetes/manifests/deployments/heapster.yaml": client: etcd cluster is unavailable or misconfigured
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.

prevent users from modifying arbitrary pods

ensure that users can not modify deployments that they are not allowed to modify.

this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster or implement the functionality in our webhook.

if we don't care b/c of autobahn then we should write it down here.

Support node pools

Support having multiple node pools as defined in the cluster registry:

NodePool:
    type: object
    properties:
      name:
        type: string
        example: pool-1
        description: Name of the node pool
      profile:
        type: string
        example: worker/default
        description: Profile used for the node pool. Possible values are "worker/default", "worker/database", "worker/gpu", "master". The "master" profile identifies the pool containing the cluster master
      instance_type:
        type: string
        example: m4.medium
        description: Type of the instance to use for the nodes in the pool. All the nodes in the pool share the same instance types
      discount_strategy:
        type: string
        example: none
        description: |
          A discount strategy indicates the type of discount to be associated with the node pool. This might affect the availability of the nodes in the pools in case of preemptible or spot instances.
          Possible values are "none", "aggressive", "moderate", "reasonable" #TODO naming should be "reasonable" :-D

Proposal: use Senza effectively for better cluster management

TL;DR

embrace all the senza features for better cluster management

Problem

i'm wondering why we don't use the blue/green versions stuff from senza correctly

current setup

  • different clusters are considered the same app from senza's point of view
  • different senza versions represent entirely different clusters for us
  • senza traffic doesn't make sense at all as it would balance traffic between two entirely different clusters
  • senza delete without a version would affect all clusters of an aws account
  • upgrading the stack happens outside of senza

proposal (simplyfied to only consider the masters)

  • create a senza app called "my-cluster" (senza init)
  • deploy the first set of masters as "v1" (senza create)
  • use senza traffic to point 100% of traffic to the v1 masters
  • when deploying a new kubernetes version:
    • deploy the second set of masters as "v2" (senza create)
    • use senza traffic to slowly migrate traffic to the v2 masters
    • drop the v1 masters (senza delete)

additional worker pools would be additional senza stacks. instead of blue/green deployment we could in place updates but don't have to. the senza traffic would be useless for workers.

advantages

  • use all the goodness of senza
  • be safer
  • use blue/green deployments for master components

challenges

  • etcd prefix needs to be /registry-stack-name instead of /registry-stack-version
  • zack: create a new role for each new cluster
    • proposal:
      • a PowerUser role for a specific aws account allows access to all clusters in that aws account
      • OR make a specific Kubernetes-Admin role that allows access to clusters but not to the entire aws account
      • using zack to protected different clusters inside the same aws account is more difficult (we don't want a separate role for each cluster)

Consider adopting kube-aws

This projects gets bigger and we're solving problems that are already solved by similar projects.

We could have a lot of synergistic effects by adopting and contributing to https://github.com/coreos/kube-aws.

One of the main issues we had before has been fixed in the v0.9.0 release of kube-aws, so it may makes sense to reconsider it:

  • Discrete (and HA) etcd cluster

Similar features of kube-aws and this tool

  • based on AWS / Cloud Formation / CoreOS
  • supports cluster upgrades and node draining
  • support for multi-zone worker nodes
  • supports e2e tests
  • allows to specify AMI for specialized cluster nodes

Desired by us and already in or planned for kube-aws

  • initial spot fleet support
  • initial node pools support
  • self-hosted kubernetes deployment
  • dedicated subnets for controller nodes
  • secured via client certs/tls (worker->master->etcd)
  • private node IPs
  • correctly tainted nodes allows scheduling on masters
  • can be used as a library
  • yaml based definition
  • golang

Things to check and could be a blocker to adopt kube-aws

  • security based on client certs must be manageable

please feel free to comment and iterate on the points above

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.