zalando-incubator / kubernetes-on-aws Goto Github PK

View Code? Open in Web Editor NEW

620.0 36.0 163.0 19.18 MB

Deploying Kubernetes on AWS with CloudFormation and Ubuntu

Home Page: https://kubernetes-on-aws.readthedocs.io/

License: MIT License

Shell 7.83% Dockerfile 0.42% Makefile 0.84% Go 88.57% Python 2.35%

kubernetes kubernetes-cluster aws cloud

kubernetes-on-aws's People

Stargazers

Watchers

Forkers

pilgrim2go waldt dryewo spuranam ideahitme rimusz-lab cclear johannesreichard jan-m philips a1exsh cemo mirzak vibhory2j open-source-archive szuecs dexterogieosahon vetinari batuhan rasheedamir prateeknayak stakater anthonywc irfanbaqui charlesakalugwu harsh0707 valmach dshamanthreddy ruiaraujo rajivreddy srve4 40a niksonx thilp ameukam hendrixroa avaczi sbrauner-z harsh0708 while-loop leoncaiau912 tkrop jmirc swade1987 kvolkovich-sc paulk8s christianberg 98labs gt-sun reconbug prasobhpk ikrauchanka gyj0825 xandercrews cmclopes ravikiran338 nhanct john-coleman sarslanhan ghulevishal msooszalando ngocngv vwiessner chs2019 jaypipes pheanex nalinguptalinux ethercrow shoelbling akashtalole dgunjetti ik-kubernetes deenamanick vietwow hhtpcd managedkube csenol rashidxf lfroment0 islammohamed allamand cjbooms heroldus munnysri arun9theja tylern91 apollusehs-devops dzungda linki medined rulai-jianfang rarkins unsafepointer nishantmckc vanishtachangea usi-devops evalle akuhn1 tanujd11 lmineiro

kubernetes-on-aws's Issues

Kube2IAM breaks gerry+secretary

gerry needs to have access to the mint bucket..

Updating a cluster will reset ASG size

./cluster.py update <stack-name> <version> will currently reset the ASG size (min, max and DesiredCapacity) as it updates the CF stack unconditionally.

Install instructions for conda users

I am using stups installations via conda package manager. We should put to the backlog to have proper install instructions for that as well.

e2e fails with "Not all containers are ready"

Needs investigation (larger timeout?)

Network policies: protect pods from being accessed by all other pod in the cluster

ensure that pods cannot connect to other pods in the cluster that they are not allowed to connect to, e.g. when running multiple patronis in the cluster.

this is not directly cluster lifecycle related but more how you configure the network policies inside. but we may have to run something in the cluster (e.g. calico node agent).

if we don't care b/c of autobahn then we should write it down here.

Integrate Spot Fleet

We should be able to trade cost-efficiency vs. availability (depending on the application) by balancing between on-demand and spot instances in a "smart" way.

Scalyr agent for log shipping

Docker container logs
journald system logs

Integrate cluster autoscaler

Integrate https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler (and patch it first to autodetect ASGs and min/max).

Remove version suffix for system pods like kube-dns and kubernetes-dashboard

Minor annoyance: only two pods have a (unnecessary) version in their name. We should remove them (or alternatively have version suffixes for all pods).

I realized this while looking at the screenshot on https://twitter.com/try_except_/status/793537061374623744

Kubelet fails to start API server

Sometimes the master kubelet fails to start the API server (i.e. does not start any pod defined in /etc/kubernetes/manifests).

use secret to configure appdynamics agents

follow up PR from #113

appdynamics secret should be a secret
appdynamics account needs to be configurable

for ease of configuration it may make sense to put them all in one secret, although e.g. appdynamics account name is not necessarily a secret value.

Cleanup: Replace ReplicationController with Deployment

We have some "legacy" manifests (e.g. kube-dns) using ReplicationController, we should replace it with Deployment.

This is low prio.

Consider moving e2e cluster tests into this repo

I think it would make sense to have everything regarding the cluster lifecycle in one repo, i.e. including e2e cluster tests (not our internal Jenkins config, but at least the test cases).

Update dashboard version to 1.5.0

Some nice features we definitely want (e.g. displaying more information, HPA support):
https://github.com/kubernetes/dashboard/releases/tag/v1.5.0

attaching volumes regularly gets stuck

error message is always the same

$ kubectl describe pod foo
...
  25m		1m		12	{kubelet ip-172-31-13-7.eu-central-1.compute.internal}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "etcd-1"/"acid". list of unattached/unmounted volumes=[data]

observations

ebs volume is "stuck" for days in AWS (forcing detach doesn't help)
reboot of the corresponding instance also doesn't help
happens on GKE as well
several failed units show up when sshing the box
kubelet logs show corresponding error messages

there are several issues reported upstream already
https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue%20is%3Aopen%20%22list%20of%20unattached%22

More node labels

Would be good to have more node labels available for node selectors. Especially: pool-name, #cpu, memory, gpu

Consistently set "application" and "version" labels for all pods

We agreed on using app and version as labels for all pods to identify the application ID and version. We are using this for our log shipping so we should set these labels on all pods (including kube-system ones).

Add CloudFormation SUCCESS signal (cfn-signal)

We should add a CloudFormation SUCCESS signal for master and worker nodes. The SUCCESS signal should only be send if everything comes up (on this node) successfully, i.e. kubelet started etc. This allows us to change the Senza file to actually wait on the SUCCESS signal (we do this by default with Taupage, but CoreOS is missing cfn-signal).

figure out split of CF template

having the entire cluster (including all pools) in a single CF stack or having different CF stacks for each set of nodes (masters, worker pool 1, worker pool 2...) is a quite fundamental design decision which we should decide in the team. /cc @mikkeloscar @Raffo @szuecs

Mint Bucket is hardcoded

The IAM policy currently has one mint bucket hardcoded, this will not work in different accounts.

Print userdata on dry-run

May be a dry-run would print user_data_master and worker as well?

#113 (comment)

login downtime during cluster update

I just realized that during update of the cluster i got an login connection time out. So it seems that the master is down while increasing the size of the cluster. This is somehow wired because new machines should be added to the cluster and not removed.

./cluster.py update kube-aws-test ghildebrand1 --instance-type=c4.2xlarge --worker-nodes=20

Kubernetes Ingress

Using Kubernetes Ingress with Skipper and SSL termination by ALB (ELBv2):

kube-ingress-aws-controller
- Deployed as deployment with replicas=1
- Watch on Ingress resources
- Configure one Application Load Balancer (ALB/ELBv2) per used SSL certificate
- Attach Auto Scaling Group to ALB (using skipper NodePort)
skipper with Kubernetes data client
- Deployed as DaemonSet (thus running on every node)
- Exposes NodePort on every EC2 instance
- Watch on Ingress resources
- Configure routes defined by host/path matching

Advantages:

SSL termination is done by ALB/ELB
- No fiddling with SSL certs
- We can use ACM
Skipper is battle-tested as it processes all Zalando customer traffic already
Skipper allows us to add more features in the future
- OAuth verification
- ...

Tasks for this cluster repo:

update Mate to create DNS records for Ingress resources
install kube-skipper as HTTP router
install kube-aws-controller to create ALBs for SSL termination

prevent users from execing into arbitrary pods

ensure that users can not exec into containers that they are not allowed to exec into.

this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster, implement the functionality in our webhook or configure our api server to not allow certain routes (there are flags for it).

if we don't care b/c of autobahn then we should write it down here.

Connection drops when doing kubectl `logs` or `exec`

I often see the connection drop when following the logs of a pod or when exec'ing into a pod.

This usually happens after 30-60 secs.

Correct cluster teardown (clean up AWS resources: ELBs, DNS, ..)

on cluster delete we should delete resources that were created by kubernetes and friends. this can be done as part of cluster.py delete

what do delete

ELBs created via services of type LoadBalander
DNS records created via services of type LoadBalander (mate)
EBS volumes created by Kubernetes PVs
... ?

or just everything?

Hardcoded S3 bucket for etcd backup

https://github.com/zalando-incubator/kubernetes-on-aws/blob/master/cluster/etcd-cluster.yaml#L110

Can't pull from private registry when setting up cluster.

Currently the appdynamics pod is using images from private docker registry. This fails when the cluster is setup because the pod is created before the imagePullSecret has been set on the service account by secretary.

Theoretically this issue could also occur if the imagePullSecret is changed right after a pod is scheduled (so the Pod has the old secret attached, assuming the old secret is expired).

Updated manifests are not applied when updating master nodes

Changing the manifests in the master user data will not actually update the pods.

We probably need to change the curl -X POST call.

Add KMS encryption for worker shared secret

To add another layer of security (worker shared secret appears in user data).

Change Classic ELB for kubelet health check to ALB (ELBv2)

As attaching classic ELB + ALB on the same ASG leads to "undefined" behavior.

Update Kubernetes to v1.4.5

See announcement https://groups.google.com/forum/#!topic/kubernetes-announce/1EQk7UlN320

Mate: cleanup DNS records

Mate also needs to remove DNS records (we only add/update right now).

Idea: flag "mate-managed" DNS records with an additional TXT record.

Updating a cluster leads to API server downtime

First observation: the ELB healthcheck leads to ASG terminating a new instance because the HealthCheckGracePeriod defaults to 5 minutes and the master node sometimes takes longer than 5 minutes to come up 😞

Cluster Lifecycle Management

The whole cluster lifecycle needs to be managed:

creating
updating (worker and master nodes) --- updating Launch Configuration and respawning nodes could be a first step (for non-stateful apps)
deleting (shutting down the cluster)

sshd unit failures

When ssh connection fails, the instance report failed units:

● sshd@x:22-y:z.service    loaded failed failed  OpenSSH per-connection server daemon (y:z)

Enable Kubernetes Alpha features

In order to use "service.alpha.kubernetes.io/external-traffic": "OnlyLocal" (kubernetes/kubernetes#29409).

Web hook: Internal Server Error on missing group

We are already tracking this issue elsewhere, but just a reminder to update the webhook to fix the Internal Server Error when the user misses the proper authorization group.

Error from server: an error on the server ("Internal Server Error: \"/api\"") has prevented the request from succeeding

Updating a cluster causes downtime for worker nodes

Updating the cluster with ./cluster.py update <stack-name> <version> will currently lead to a downtime for worker nodes as we are not waiting for node readiness (we just wait for ASG InService which does not indicate whether the node is completely up and registered).

NOTE: there is no downtime for master nodes while updating as we already correctly wait/check the ELB status for the master API server.

install-kube-system sometimes fails

journalctl -u install-kube-system
...
Dec 08 12:39:50 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Waiting for API server..
Dec 08 12:40:04 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "appdynamics-agent" created
Dec 08 12:40:05 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "kube2iam" created
Dec 08 12:40:07 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "prometheus-node-exporter" created
Dec 08 12:40:09 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: deployment "cluster-autoscaler" created
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Error from server: error when retrieving current configuration of:
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: &{0xc420d50900 0xc4204eaf50 kube-system heapster /srv/kubernetes/manifests/deployments/heapster.yaml &Deployment{ObjectMeta:k8s_io_kubernetes_pkg_api
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: 0000 --estimator=exponential] []  [] [{MY_POD_NAME  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:metadata.name,},ResourceFieldRe
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: from server for: "/srv/kubernetes/manifests/deployments/heapster.yaml": client: etcd cluster is unavailable or misconfigured
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.

Remove custom Docker 1.13 as soon as it's included in CoreOS

Follow-up to #164.

We installed Docker 1.13 RC to fix our "docker hangs" (moby/moby#28889) problem. We should get rid of our custom Docker install as soon as 1.13 is released and CoreOS includes it.

prevent users from modifying arbitrary pods

ensure that users can not modify deployments that they are not allowed to modify.

this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster or implement the functionality in our webhook.

if we don't care b/c of autobahn then we should write it down here.

Warning: No tagged subnets found; will fall-back to the current subnet only.

W1020 16:30:11.032723       1 aws.go:2280] No tagged subnets found; will fall-back to the current subnet only.  This is likely to be an error in a future version of k8s.
from kube-controller-manager.yaml

Support node pools

Support having multiple node pools as defined in the cluster registry:

NodePool:
    type: object
    properties:
      name:
        type: string
        example: pool-1
        description: Name of the node pool
      profile:
        type: string
        example: worker/default
        description: Profile used for the node pool. Possible values are "worker/default", "worker/database", "worker/gpu", "master". The "master" profile identifies the pool containing the cluster master
      instance_type:
        type: string
        example: m4.medium
        description: Type of the instance to use for the nodes in the pool. All the nodes in the pool share the same instance types
      discount_strategy:
        type: string
        example: none
        description: |
          A discount strategy indicates the type of discount to be associated with the node pool. This might affect the availability of the nodes in the pools in case of preemptible or spot instances.
          Possible values are "none", "aggressive", "moderate", "reasonable" #TODO naming should be "reasonable" :-D

e2e: clean up AWS resources (ELBs, EBS volumes, Route53 records)

The e2e tests should clean up all resources, e.g. ELBs. This should be done in all cases (success and error).

Actually it's quite easy for ELBs: we can remove all ELBs without instances (as they are removed when deleting the stack).

Check out kube-node-drainer

https://github.com/coreos/kube-aws/blob/5232aa225911641f9dd48bab571850c32718b0c5/config/templates/cloud-config-worker#L149-L176

is a systemd unit that runs on each worker and "drain[s] this k8s node to make running pods time to gracefully shut down before stopping kubelet".

looks like it moves draining from the client to the server side. therefore reducing client complexity and might also work nicely with node autoscaling and spot instances.

Proposal: use Senza effectively for better cluster management

TL;DR

embrace all the senza features for better cluster management

Problem

i'm wondering why we don't use the blue/green versions stuff from senza correctly

current setup

different clusters are considered the same app from senza's point of view
different senza versions represent entirely different clusters for us
senza traffic doesn't make sense at all as it would balance traffic between two entirely different clusters
senza delete without a version would affect all clusters of an aws account
upgrading the stack happens outside of senza

proposal (simplyfied to only consider the masters)

create a senza app called "my-cluster" (senza init)
deploy the first set of masters as "v1" (senza create)
use senza traffic to point 100% of traffic to the v1 masters
when deploying a new kubernetes version:
- deploy the second set of masters as "v2" (senza create)
- use senza traffic to slowly migrate traffic to the v2 masters
- drop the v1 masters (senza delete)

additional worker pools would be additional senza stacks. instead of blue/green deployment we could in place updates but don't have to. the senza traffic would be useless for workers.

advantages

use all the goodness of senza
be safer
use blue/green deployments for master components

challenges

etcd prefix needs to be /registry-stack-name instead of /registry-stack-version
zack: create a new role for each new cluster
- proposal:
  - a PowerUser role for a specific aws account allows access to all clusters in that aws account
  - OR make a specific Kubernetes-Admin role that allows access to clusters but not to the entire aws account
  - using zack to protected different clusters inside the same aws account is more difficult (we don't want a separate role for each cluster)

Consider adopting kube-aws

This projects gets bigger and we're solving problems that are already solved by similar projects.

We could have a lot of synergistic effects by adopting and contributing to https://github.com/coreos/kube-aws.

One of the main issues we had before has been fixed in the v0.9.0 release of kube-aws, so it may makes sense to reconsider it:

Discrete (and HA) etcd cluster

Similar features of `kube-aws` and this tool

based on AWS / Cloud Formation / CoreOS
supports cluster upgrades and node draining
support for multi-zone worker nodes
supports e2e tests
allows to specify AMI for specialized cluster nodes

Desired by us and already in or planned for kube-aws

initial spot fleet support
initial node pools support
self-hosted kubernetes deployment
dedicated subnets for controller nodes
secured via client certs/tls (worker->master->etcd)
private node IPs
correctly tainted nodes allows scheduling on masters
can be used as a library
yaml based definition
golang

Things to check and could be a blocker to adopt kube-aws

security based on client certs must be manageable

please feel free to comment and iterate on the points above

Update Kubernetes Dashboard to v1.4.2

We are using 1.4.0, but 1.4.2 is available: https://github.com/kubernetes/dashboard/releases/tag/v1.4.2

zalando-incubator / kubernetes-on-aws Goto Github PK

kubernetes-on-aws's People

Stargazers

Watchers

Forkers

kubernetes-on-aws's Issues

TL;DR

Problem

current setup

proposal (simplyfied to only consider the masters)

advantages

challenges

Similar features of kube-aws and this tool

Desired by us and already in or planned for kube-aws

Things to check and could be a blocker to adopt kube-aws

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Similar features of `kube-aws` and this tool