zalando-incubator / kubernetes-on-aws Goto Github PK
View Code? Open in Web Editor NEWDeploying Kubernetes on AWS with CloudFormation and Ubuntu
Home Page: https://kubernetes-on-aws.readthedocs.io/
License: MIT License
Deploying Kubernetes on AWS with CloudFormation and Ubuntu
Home Page: https://kubernetes-on-aws.readthedocs.io/
License: MIT License
gerry needs to have access to the mint bucket..
./cluster.py update <stack-name> <version>
will currently reset the ASG size (min, max and DesiredCapacity
) as it updates the CF stack unconditionally.
I am using stups installations via conda package manager. We should put to the backlog to have proper install instructions for that as well.
Needs investigation (larger timeout?)
ensure that pods cannot connect to other pods in the cluster that they are not allowed to connect to, e.g. when running multiple patronis in the cluster.
this is not directly cluster lifecycle related but more how you configure the network policies inside. but we may have to run something in the cluster (e.g. calico node agent).
if we don't care b/c of autobahn
then we should write it down here.
We should be able to trade cost-efficiency vs. availability (depending on the application) by balancing between on-demand and spot instances in a "smart" way.
Test volumes during e2e testing.
Integrate https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler (and patch it first to autodetect ASGs and min/max).
Minor annoyance: only two pods have a (unnecessary) version in their name. We should remove them (or alternatively have version suffixes for all pods).
I realized this while looking at the screenshot on https://twitter.com/try_except_/status/793537061374623744
Sometimes the master kubelet
fails to start the API server (i.e. does not start any pod defined in /etc/kubernetes/manifests
).
follow up PR from #113
for ease of configuration it may make sense to put them all in one secret, although e.g. appdynamics account name is not necessarily a secret value.
We have some "legacy" manifests (e.g. kube-dns
) using ReplicationController
, we should replace it with Deployment
.
This is low prio.
I think it would make sense to have everything regarding the cluster lifecycle in one repo, i.e. including e2e cluster tests (not our internal Jenkins config, but at least the test cases).
Some nice features we definitely want (e.g. displaying more information, HPA support):
https://github.com/kubernetes/dashboard/releases/tag/v1.5.0
error message is always the same
$ kubectl describe pod foo
...
25m 1m 12 {kubelet ip-172-31-13-7.eu-central-1.compute.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "etcd-1"/"acid". list of unattached/unmounted volumes=[data]
observations
there are several issues reported upstream already
https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue%20is%3Aopen%20%22list%20of%20unattached%22
Would be good to have more node labels available for node selectors. Especially: pool-name, #cpu, memory, gpu
We agreed on using app
and version
as labels for all pods to identify the application ID and version. We are using this for our log shipping so we should set these labels on all pods (including kube-system
ones).
We should add a CloudFormation SUCCESS signal for master and worker nodes. The SUCCESS signal should only be send if everything comes up (on this node) successfully, i.e. kubelet started etc. This allows us to change the Senza file to actually wait on the SUCCESS signal (we do this by default with Taupage, but CoreOS is missing cfn-signal).
having the entire cluster (including all pools) in a single CF stack or having different CF stacks for each set of nodes (masters, worker pool 1, worker pool 2...) is a quite fundamental design decision which we should decide in the team. /cc @mikkeloscar @Raffo @szuecs
The IAM policy currently has one mint bucket hardcoded, this will not work in different accounts.
May be a dry-run would print user_data_master and worker as well?
I just realized that during update of the cluster i got an login connection time out. So it seems that the master is down while increasing the size of the cluster. This is somehow wired because new machines should be added to the cluster and not removed.
./cluster.py update kube-aws-test ghildebrand1 --instance-type=c4.2xlarge --worker-nodes=20
Using Kubernetes Ingress with Skipper and SSL termination by ALB (ELBv2):
Advantages:
Tasks for this cluster repo:
Ingress
resourcesensure that users can not exec into containers that they are not allowed to exec into.
this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster, implement the functionality in our webhook or configure our api server to not allow certain routes (there are flags for it).
if we don't care b/c of autobahn
then we should write it down here.
I often see the connection drop when following the logs of a pod or when exec'ing into a pod.
This usually happens after 30-60 secs.
on cluster delete we should delete resources that were created by kubernetes and friends. this can be done as part of cluster.py delete
what do delete
or just everything?
Currently the appdynamics
pod is using images from private docker registry. This fails when the cluster is setup because the pod is created before the imagePullSecret
has been set on the service account by secretary
.
Theoretically this issue could also occur if the imagePullSecret
is changed right after a pod is scheduled (so the Pod has the old secret attached, assuming the old secret is expired).
Changing the manifests in the master user data will not actually update the pods.
We probably need to change the curl -X POST
call.
To add another layer of security (worker shared secret appears in user data).
As attaching classic ELB + ALB on the same ASG leads to "undefined" behavior.
Mate also needs to remove DNS records (we only add/update right now).
Idea: flag "mate-managed" DNS records with an additional TXT
record.
First observation: the ELB healthcheck leads to ASG terminating a new instance because the HealthCheckGracePeriod
defaults to 5 minutes and the master node sometimes takes longer than 5 minutes to come up π
The whole cluster lifecycle needs to be managed:
When ssh connection fails, the instance report failed units:
β sshd@x:22-y:z.service loaded failed failed OpenSSH per-connection server daemon (y:z)
In order to use "service.alpha.kubernetes.io/external-traffic": "OnlyLocal" (kubernetes/kubernetes#29409).
We are already tracking this issue elsewhere, but just a reminder to update the webhook to fix the Internal Server Error when the user misses the proper authorization group.
Error from server: an error on the server ("Internal Server Error: \"/api\"") has prevented the request from succeeding
Updating the cluster with ./cluster.py update <stack-name> <version>
will currently lead to a downtime for worker nodes as we are not waiting for node readiness (we just wait for ASG InService
which does not indicate whether the node is completely up and registered).
NOTE: there is no downtime for master nodes while updating as we already correctly wait/check the ELB status for the master API server.
journalctl -u install-kube-system
...
Dec 08 12:39:50 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Waiting for API server..
Dec 08 12:40:04 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "appdynamics-agent" created
Dec 08 12:40:05 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "kube2iam" created
Dec 08 12:40:07 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: daemonset "prometheus-node-exporter" created
Dec 08 12:40:09 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: deployment "cluster-autoscaler" created
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: Error from server: error when retrieving current configuration of:
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: &{0xc420d50900 0xc4204eaf50 kube-system heapster /srv/kubernetes/manifests/deployments/heapster.yaml &Deployment{ObjectMeta:k8s_io_kubernetes_pkg_api
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: 0000 --estimator=exponential] [] [] [{MY_POD_NAME &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:metadata.name,},ResourceFieldRe
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal install-kube-system[1029]: from server for: "/srv/kubernetes/manifests/deployments/heapster.yaml": client: etcd cluster is unavailable or misconfigured
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Dec 08 12:40:13 ip-172-31-2-18.eu-central-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.
Follow-up to #164.
We installed Docker 1.13 RC to fix our "docker hangs" (moby/moby#28889) problem. We should get rid of our custom Docker install as soon as 1.13 is released and CoreOS includes it.
ensure that users can not modify deployments that they are not allowed to modify.
this is not directly cluster lifecycle related but more how you configure the access policies inside. but we may have to run something in the cluster or implement the functionality in our webhook.
if we don't care b/c of autobahn
then we should write it down here.
W1020 16:30:11.032723ββββββ 1 aws.go:2280] No tagged subnets found; will fall-back to the current subnet only.ββThis is likely to be an error in a future version of k8s.
from kube-controller-manager.yaml
Support having multiple node pools as defined in the cluster registry:
NodePool:
type: object
properties:
name:
type: string
example: pool-1
description: Name of the node pool
profile:
type: string
example: worker/default
description: Profile used for the node pool. Possible values are "worker/default", "worker/database", "worker/gpu", "master". The "master" profile identifies the pool containing the cluster master
instance_type:
type: string
example: m4.medium
description: Type of the instance to use for the nodes in the pool. All the nodes in the pool share the same instance types
discount_strategy:
type: string
example: none
description: |
A discount strategy indicates the type of discount to be associated with the node pool. This might affect the availability of the nodes in the pools in case of preemptible or spot instances.
Possible values are "none", "aggressive", "moderate", "reasonable" #TODO naming should be "reasonable" :-D
The e2e tests should clean up all resources, e.g. ELBs. This should be done in all cases (success and error).
Actually it's quite easy for ELBs: we can remove all ELBs without instances (as they are removed when deleting the stack).
is a systemd unit that runs on each worker and "drain[s] this k8s node to make running pods time to gracefully shut down before stopping kubelet".
looks like it moves draining from the client to the server side. therefore reducing client complexity and might also work nicely with node autoscaling and spot instances.
embrace all the senza features for better cluster management
i'm wondering why we don't use the blue/green versions stuff from senza correctly
additional worker pools would be additional senza stacks. instead of blue/green deployment we could in place updates but don't have to. the senza traffic would be useless for workers.
This projects gets bigger and we're solving problems that are already solved by similar projects.
We could have a lot of synergistic effects by adopting and contributing to https://github.com/coreos/kube-aws.
One of the main issues we had before has been fixed in the v0.9.0
release of kube-aws
, so it may makes sense to reconsider it:
kube-aws
and this toolplease feel free to comment and iterate on the points above
We are using 1.4.0, but 1.4.2 is available: https://github.com/kubernetes/dashboard/releases/tag/v1.4.2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.