GithubHelp home page GithubHelp logo

rancher / rke Goto Github PK

View Code? Open in Web Editor NEW
3.2K 134.0 581.0 148.29 MB

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.

License: Apache License 2.0

Makefile 0.03% Go 97.58% Shell 2.38%

rke's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rke's Issues

"rke up" got "ssh: rejected: administratively prohibited (open failed)"

rke version: rke version v0.0.7-dev

problem:
when to run "./rke -d up" with the file "cluster.yml" in the same folder, I got:

INFO[0000] [certificates] Generating kubernetes certificates
INFO[0000] [certificates] Generating CA kubernetes certificates
...
INFO[0003][certificates] Deploying kubernetes certificates to Cluster nodes
DEBU[0003] [certificates] Pulling Certificate downloader Image on host [node1]
FATA[0008] Can't pull Docker image rancher/rke-cert-deployer:0.1.0 for host [node1]: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=rancher%2Frke-cert-deployer&tag=0.1.0: Error connecting to Docker socket on host [node1]: ssh: rejected: administratively prohibited (open failed)

Tried to fix it by enabling the ssh config,AllowTcpForwarding yes, but failed.

Any suggestions?

Not able to launch dashboard after one of the existing master node is powered down.

rke version v0.0.3-dev

Steps to reproduce the problem:
Create a cluster with 1 node in controlplane , 1 in etcd and 1 in worker.
Also have dashboard installed as addons.
Add 1 more node in controlplane and 1 more node in woker.
Make sure we are able to launch bashboard successfully.

Power off one of the nodes in the controlplane which has the bashboard container running.

After sometime, dashboard gets launched in the new controlplane node. But is not able to launch correctly.

root@sangee-now-rke-4:~# docker logs bc27c
2017/11/22 21:25:10 Starting overwatch
2017/11/22 21:25:10 Using in-cluster config to connect to apiserver
2017/11/22 21:25:10 Using service account token for csrf signing
2017/11/22 21:25:10 No request provided. Skipping authorization
2017/11/22 21:25:10 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

Custom SSH Port

It would be great, if there was a possiblity to specify a custom ssh port in the cluster.yml file.

Kubernetes cluster upgrade fails with error " FATA[00XX] Failed to set cordonded state for node"

rke version v0.0.2-dev

Steps:

  1. Create a cluster(v1.7.5_coreos.0) with one master [controlplane, etcd] node and the other worker node using cluster.yml file. Use the command ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
    Cluster.yml file example:
auth:
  strategy: x509
network:
  plugin: flannel
hosts:
  - advertised_hostname: server1
    ip: 1.1.1.1
    user: ubuntu
    role: [controlplane, etcd]
    docker_socket: /var/run/docker.sock
    advertise_address: 10.1.1.1
  - advertised_hostname: server2
    ip: 2.2.2.2
    user: ubuntu
    role: [worker]
    advertise_address: 10.2.2.2

services:
  etcd:
    image: quay.io/coreos/etcd:latest
  kube-api:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      v: 4
  kube-controller:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_cidr: 10.233.64.0/18
    service_cluster_ip_range: 10.233.0.0/18
  scheduler:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  kubelet:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_domain: cluster.local
    cluster_dns_server: 10.233.0.3
    infra_container_image: gcr.io/google_containers/pause-amd64:3.0
  kubeproxy:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  1. Upgrade the cluster using a valid cluster file with version v1.7.6_coreos.0 relevant to the cluster created above. Cluster upgrade fails as below:
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster upgrade --cluster-file cluster.yml 
INFO[0000] Upgrading Kubernetes cluster                 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0000] [state] Successfully Fetched cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0000] [certificates] Getting Cluster certificates from Kubernetes 
INFO[0000] [certificates] Successfully fetched Cluster certificates from Kubernetes 
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [soumyarketest1] 
INFO[0000] [ssh] Start tunnel for host [soumyarketest1] 
INFO[0000] [ssh] Start tunnel for host [soumyarketes2]  
INFO[0001] [upgrade] All nodes are Ready                
INFO[0001] [upgrade] Upgrading Control Plane Services   
INFO[0001] [controlplane] Upgrading the Controller Plane.. 
INFO[0072] [controlplane] Successfully pulled kube-api image on host [soumyarketest1] 
INFO[0073] [controlplane] Successfully started kube-api container on host [soumyarketest1] 
INFO[0074] [controlplane] Successfully pulled kube-controller image on host [soumyarketest1] 
INFO[0075] [controlplane] Successfully started kube-controller container on host [soumyarketest1] 
INFO[0077] [controlplane] Successfully pulled scheduler image on host [soumyarketest1] 
INFO[0077] [controlplane] Successfully started scheduler container on host [soumyarketest1] 
INFO[0077] [controlplane] Successfully upgraded Controller Plane.. 
INFO[0077] [upgrade] Control Plane Services updgraded successfully 
INFO[0077] [upgrade] Upgrading Worker Plane Services    
INFO[0077] [worker] Upgrading Worker Plane..            
FATA[0078] Failed to set cordonded state for node: soumyarketest1 

Note : Rerunning the same command again will upgrade the cluster successfully

how to add extra pki dnsname

we need add vip or loadbalance ip to pki dnsname,like this

"1.2.3.4",  # apiserver loadbalance ip
"localhost",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",

Nodes in not ready state with RHEL/CentOS systems

rke v0.0.7-dev

rke failed when deploying network/addon jobs because node were in not ready state, when describing nodes i found this error message:

kubectl describe nodes
Name:               node-01
Roles:              etcd,master,worker
..........
KubeletNotReady              Failed to start ContainerManager mkdir /sys/fs/cgroup/cpuacct,cpu: read-only file system,runtime network not ready

Adding "/sys/fs/cgroup:/sys/fs/cgroup:rw" to kubelet volumes solved the problem

Provide relevant error message when unsupported auth strategy and network plugins are used in cluster.yml file

  1. Create a cluster with one master [controlplane, etcd] node and the other worker node using cluster.yml file. Use the command ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
    Cluster.yml file example:
auth:
  strategy: x200
network:
  plugin: abcd
hosts:
  - advertised_hostname: server1
    ip: 1.1.1.1
    user: ubuntu
    role: [controlplane, etcd]
    docker_socket: /var/run/docker.sock
    advertise_address: 10.1.1.1
  - advertised_hostname: server2
    ip: 2.2.2.2
    user: ubuntu
    role: [worker]
    advertise_address: 10.2.2.2

services:
  etcd:
    image: quay.io/coreos/etcd:latest
  kube-api:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      v: 4
  kube-controller:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_cidr: 10.233.64.0/18
    service_cluster_ip_range: 10.233.0.0/18
  scheduler:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  kubelet:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_domain: cluster.local
    cluster_dns_server: 10.233.0.3
    infra_container_image: gcr.io/google_containers/pause-amd64:3.0
  kubeproxy:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0

Cluster creation fails but does not provide any indication that the fields are unsupported/invalid.
It fails with the below error:

soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster up --cluster-file clusterdigworking.yml
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [rketest1] 
INFO[0000] [ssh] Start tunnel for host [rketest1] 
INFO[0000] [ssh] Start tunnel for host [rketes2]  
INFO[0000] [state] Found local kube config file, trying to get state from cluster 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0030] Timed out waiting for kubernetes cluster to get state 
INFO[0030] [etcd] Building up Etcd Plane..              
INFO[0031] [etcd] Successfully pulled etcd image on host [rketest1] 
INFO[0031] [etcd] Successfully started etcd container on host [rketest1] 
INFO[0031] [etcd] Successfully started Etcd Plane..     
INFO[0031] [controlplane] Building up Controller Plane.. 
INFO[0033] [controlplane] Successfully pulled kube-api image on host [rketest1] 
INFO[0033] [controlplane] Successfully started kube-api container on host [rketest1] 
INFO[0034] [controlplane] Successfully pulled kube-controller image on host [rketest1] 
INFO[0034] [controlplane] Successfully started kube-controller container on host [soumyarketest1] 
INFO[0034] [controlplane] Successfully pulled scheduler image on host [rketest1] 
INFO[0035] [controlplane] Successfully started scheduler container on host [rketest1] 
INFO[0035] [controlplane] Successfully started Controller Plane.. 
INFO[0035] [worker] Building up Worker Plane..          
INFO[0035] [worker] Successfully pulled kubelet image on host [rketest1] 
INFO[0035] [worker] Successfully started kubelet container on host [rketest1] 
INFO[0036] [worker] Successfully pulled kube-proxy image on host [rketest1] 
INFO[0036] [worker] Successfully started kube-proxy container on host [rketest1] 
INFO[0038] [worker] Successfully pulled nginx-proxy image on host [rketes2] 
INFO[0038] [worker] Successfully started nginx-proxy container on host [rketes2] 
INFO[0038] [worker] Successfully pulled kubelet image on host [rketes2] 
INFO[0039] [worker] Successfully started kubelet container on host [rketes2] 
INFO[0039] [worker] Successfully pulled kube-proxy image on host [rketes2] 
INFO[0040] [worker] Successfully started kube-proxy container on host [rketes2] 
INFO[0040] [worker] Successfully started Worker Plane.. 
INFO[0040] [reconcile] Reconciling cluster state        
INFO[0040] [reconcile] This is newly generated cluster  
INFO[0040] [certificates] Save kubernetes certificates as secrets 
INFO[0040] [certificates] Successfuly saved certificates as kubernetes secret [k8s-certs] 
INFO[0040] [state] Saving cluster state to Kubernetes   
FATA[0070] [state] Failed to save configuration state: [state] Timeout waiting for kubernetes to be ready 

Relevant error messages must be provided to help the user fix the issue.

Role for worker nodes is reported as "none".

rke version v0.0.7-dev

Steps to reproduce the problem:
Create a K8s cluster with nodes in control plane , worker and etcd plane.
Role of worker nodes is reported as "none"

Sangeethas-MBP:~ sangeethahariharan1$ ./kubectl get nodes
NAME                                                 STATUS    ROLES     AGE       VERSION
<node-name1>                                        Ready     <none>    15h       v1.8.3-rancher1
<node-name2>                                        Ready     master    15h       v1.8.3-rancher1
<node-name3>                                        Ready     <none>    15h       v1.8.3-rancher1
Sangeethas-MBP:~ sangeethahariharan1$ 

Rename "ip" to "host" and make "advertised_hostname" optional.

host config parameter requirements:

advertised_hostname: Optional - If provided , name should be resolvable
host: Required - can be ip of fqdn that is fully resolvable externally
advertise_address: Optional - Internally resolvable ips

Default resolution order ( in kube api):
advertised_hostname
advertise_address
host

This can be altered by configuring the following in kube api:

  extra_args:
      kubelet-preferred-address-types: "InternalIP,ExternalIP,Hostname"

This will allow for hostname resolution to not take effect and hence allow for providing "advertised_hostname" to be non resolvable names.

Failed to get job complete status when using Calico network plugin

Using the Calico network plugin, I got the message that rke cluster up failed:

INFO[0280] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-netwok-plugin
INFO[0280] [addons] Executing deploy job..
Failed to get job complete status:
FATA[0286] Failed to deploy addon execute job: Failed to get job complete status:

The calico pods were up and running though

Traviss-MacBook-Pro:Downloads tcordingly$ kubectl --kubeconfig .kube_config_cluster.yaml get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6f455dcc5b-nrsx4 1/1 Running 0 17m
calico-node-2lwdb 2/2 Running 0 17m
calico-node-l8tt9 2/2 Running 0 17m
calico-node-lbvw8 2/2 Running 0 17m

When existing worker node is made a control plane node , the node gets removed from the cluster.

rke version v0.0.7-dev

Steps to reproduce the problem:
Created a k8s with 1 controlplane node , 1 etcd and 3 worker nodes.
Updated one of the node in worker plane to become a controlplane node.

When controlplane nodes are getting evaluated:
The updated host, becomes a controlplane node (I do see the system containers relating to control plane being deployed in this host).
Also rolling updates for the nginx-proxy container for all existing worker nodes gets updated with this new host in controlplane node as expetced.

When worker nodes are getting evaluated:
This host gets removed from the k8s cluster.

Expected behavior:
The host should continue to be part of K8s cluster but should be a cotrolplane node.

Should there be more retries when trying to reach the nodes to prevent rke from giving up too early when there are network glicthes?

rke version v0.0.7-dev
When attempting "up" command to alter existing k8s cluster for adding/removing nodes , rke sometimes errors out when an attempt to reach one of the nodes fails.

Most often time , re running the same command with succeed.

INFO[0023] [certificates] Successfully deployed kubernetes certificates to Cluster nodes 
INFO[0023] [etcd] Building up Etcd Plane..              
DEBU[0023] Checking if container etcd is running on host [ec2-18-216-255-43.us-east-2.compute.amazonaws.com] 
FATA[0039] [etcd] Failed to bring up Etcd Plane: Can't get Docker containers for host [ec2-18-216-255-43.us-east-2.compute.amazonaws.com]: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?limit=0: Error establishing SSH connection: dial tcp 18.216.255.43:22: getsockopt: operation timed out 

Should there be more retries when trying to reach the nodes to prevent rke from giving up too early when there are network glicthes?

kubectl exec and logs not working.

rke version v0.0.3-dev

Steps to reproduce the problem:
Create a cluster using hosts from Digitial Ocean provider.
Create user pods.
kubectl exec and logs do not work.

Sangeethas-MBP:~ sangeethahariharan1$ kubectl exec -it k8testnew-0fbtw  bash
Error from server: error dialing backend: dial tcp: lookup sangee-now-rke-5 on 67.207.67.2:53: no such host
Sangeethas-MBP:~ sangeethahariharan1$ kubectl logs k8testnew-0qrpx
Error from server: Get https://sangee-now-rke-5:10250/containerLogs/default/k8testnew-0qrpx/testcontainer: dial tcp: lookup sangee-now-rke-5 on 67.207.67.2:53: no such host

Allow use of bastion host/jumphost

In some environments (DMZ), you can only SSH through a certain host (usually called bastion host/jump host). Would be nice to be able to configure this.

Attempting to delete a node that is not reported as k8S node should be ignored

rke version v0.0.7-dev

In my case , because of this - #66 , it left my cluster.yml inconsistent with the actual K8s cluster.
cluster.yml had an additional control plane entry that is not present in the k8s cluster.
So to make the cluster.yml consistent with K8s cluster , I removed the control plane entry from cluster.yml.

Attempted to do "up" .
This results in FATAL error since the attempt made to delete the control plane entry (that was removed from the cluster.yml) fails.

INFO[0099] [reconcile] Check Control plane hosts to be deleted 
INFO[0099] [hosts] Cordoning host [<ip>]          
DEBU[0099] Error getting node <ip>: nodes "<ip>" not found 
DEBU[0104] Error getting node <ip>: nodes "<ip>" not found 
DEBU[0109] Error getting node <ip>: nodes "<ip>" not found 
DEBU[0114] Error getting node <ip>: nodes "<ip>" not found 
DEBU[0119] Error getting node <ip>: nodes "<ip>" not found 
DEBU[0124] Error getting node <ip>: nodes "<ip>" not found 
FATA[0129] Failed to delete controlplane node <ip> from cluster 

Why would there be an attempt to even delete this host since cluster.yml is now in sync with K8s cluster?

Host ip should be able to accept fqdn names that are resolvable.

rke version v0.0.5-dev

When host ip is provided as fqdn name that is resolvable ,certificate generation has problems.

Sangeethas-MBP:~ sangeethahariharan1$ ./rke_darwin-amd64 cluster up
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [ip-172-31-38-78.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-39-84.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-37-222.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-38-78.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-39-84.us-east-2.compute.internal] 
INFO[0000] [state] Found local kube config file, trying to get state from cluster 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0030] Timed out waiting for kubernetes cluster to get state 
INFO[0030] [certificates] Generating kubernetes certificates 
INFO[0030] [certificates] Generating CA kubernetes certificates 
INFO[0031] [certificates] Generating Kubernetes API server certificates 
FATA[0033] Failed to generate Kubernetes certificates: Failed to generate kube-apiserver certificate: x509: certificate contained IP address of length 0 

Panic-runtime error when upgrading Kubernetes cluster using an invalid cluster file

rke version v0.0.2-dev

Steps:

  1. Create a cluster(v1.7.5_coreos.0) with one master [controlplane, etcd] node and the other worker node using cluster.yml file. Use the command ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
    Cluster.yml file example:
auth:
  strategy: x509
network:
  plugin: flannel
hosts:
  - advertised_hostname: server1
    ip: 1.1.1.1
    user: ubuntu
    role: [controlplane, etcd]
    docker_socket: /var/run/docker.sock
    advertise_address: 10.1.1.1
  - advertised_hostname: server2
    ip: 2.2.2.2
    user: ubuntu
    role: [worker]
    advertise_address: 10.2.2.2

services:
  etcd:
    image: quay.io/coreos/etcd:latest
  kube-api:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      v: 4
  kube-controller:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_cidr: 10.233.64.0/18
    service_cluster_ip_range: 10.233.0.0/18
  scheduler:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  kubelet:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_domain: cluster.local
    cluster_dns_server: 10.233.0.3
    infra_container_image: gcr.io/google_containers/pause-amd64:3.0
  kubeproxy:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  1. Upgrade the cluster using a cluster file with version v1.7.6_coreos.0 but with different hosts/ip addresses not relevant to the existing cluster. It fails with the error below:
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster upgrade --cluster-file cluster1.yml 
INFO[0000] Upgrading Kubernetes cluster                 
WARN[0000] Failed to initiate new Kubernetes Client: stat ./.kube_config_cluster1.yml: no such file or directory 
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1d0 pc=0x1b41fbc]

goroutine 1 [running]:
github.com/rancher/rke/vendor/github.com/urfave/cli.HandleAction.func1(0xc420596900)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:472 +0x28d
panic(0x1c499e0, 0x24b77c0)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/rancher/rke/cmd.ClusterUpgrade(0xc4203f2480, 0x440, 0x440, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/rancher/rke/cmd/cluster.go:202 +0x14c
github.com/rancher/rke/cmd.clusterUpgradeFromCli(0xc42049a000, 0x0, 0x0)
	/go/src/github.com/rancher/rke/cmd/cluster.go:179 +0xdd
reflect.Value.call(0x1c07de0, 0x1dee648, 0x13, 0x1d78844, 0x4, 0xc4205968a0, 0x1, 0x1, 0xc420596828, 0x1d53f40, ...)
	/usr/local/go/src/reflect/value.go:434 +0x91f
reflect.Value.Call(0x1c07de0, 0x1dee648, 0x13, 0xc4205968a0, 0x1, 0x1, 0x140, 0xc42049a000, 0x0)
	/usr/local/go/src/reflect/value.go:302 +0xa4
github.com/rancher/rke/vendor/github.com/urfave/cli.HandleAction(0x1c07de0, 0x1dee648, 0xc42049a000, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:481 +0x198
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.Run(0x1d7ab87, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1d92e4a, 0x22, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:186 +0xab8
github.com/rancher/rke/vendor/github.com/urfave/cli.(*App).RunAsSubcommand(0xc420498180, 0xc4201e5cc0, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:355 +0xa1f
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.startApp(0x1d7a704, 0x7, 0x1d7a704, 0x7, 0x0, 0x0, 0x0, 0x1d8a255, 0x19, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:273 +0x81b
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.Run(0x1d7a704, 0x7, 0x1d7a704, 0x7, 0x0, 0x0, 0x0, 0x1d8a255, 0x19, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:74 +0x142c
github.com/rancher/rke/vendor/github.com/urfave/cli.(*App).Run(0xc420498000, 0xc42000e140, 0x5, 0x5, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:235 +0x5d6
main.mainErr(0xc42025f3d0, 0xc420597f68)
	/go/src/github.com/rancher/rke/main.go:41 +0x2f8
main.main()
	/go/src/github.com/rancher/rke/main.go:14 +0x22

Failed to Save Kubernetes certificates

I tried to use rke from mac mac on rancheros 9.2 (docker v1.12) on aws, and I have this message at the end of the install:

FATA[0389] [certificates] Failed to Save Kubernetes certificates: Failed to save certificate [kube-apiserver] to kubernetes: [certificates] Timeout waiting for kubernetes to be ready

I tried with rke on ubuntu, same result. I tried with ubuntu host with docker-engine 1.12.6, same result...

Changes done to control plane is reflected in only 1 of the worker nodes's nginx proxy container.

rke version v0.0.5-dev

Steps to reproduce the problem:
Have K8s cluster with 2 control nodes , 2 worker nodes and 1etcd node.
Remove one of the control nodes.

Control node gets removed successfully.
But rolling updates to nginx containers on the worker node is done only for 1 of the worker nodes.

Expected Behavior:
Rolling updates to nginx containers should be attempted for all worker nodes.

Better error message and directions on next steps when control plane node is not reachable.

Tested with rke version v0.0.5-dev

Steps to reproduce the problem:

  1. Create a k8s cluster with 1 node each in controlplane,etcd and worker.
  2. Add another node to controlplane.
    Power down the host in controlplane from step 1 , which is the api-server configured in .kube_config_cluster.yml

Using cluster up command results in Error establishing SSH connection.
Can this error message be improved and include suggestion on what can be done to get around this state and allow for connecting to the cluster?

Sangeethas-MBP:~ sangeethahariharan1$ ./rke_darwin-amd64 cluster up
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-***.us-east-2.compute.internal] 
INFO[0000] [state] Found local kube config file, trying to get state from cluster 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0030] Timed out waiting for kubernetes cluster to get state 
INFO[0030] [certificates] Generating kubernetes certificates 
INFO[0030] [certificates] Generating CA kubernetes certificates 
INFO[0030] [certificates] Generating Kubernetes API server certificates 
INFO[0031] [certificates] Generating Kube Controller certificates 
INFO[0031] [certificates] Generating Kube Scheduler certificates 
INFO[0032] [certificates] Generating Kube Proxy certificates 
INFO[0034] [certificates] Generating Node certificate   
INFO[0034] [certificates] Generating admin certificates and kubeconfig 
INFO[0036] [certificates] Deploying kubernetes certificates to Cluster nodes 
FATA[0050] Can't pull Docker image husseingalal/crt-downloader:latest for host [ip-***.us-east-2.compute.internal]: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=husseingalal%2Fcrt-downloader&tag=latest: Error establishing SSH connection: dial tcp **:22: getsockopt: operation timed out 
Sangeethas-MBP:~ sangeethahariharan1$ 

Panic - runtime error when master node is removed from the Kubernetes cluster

rke version v0.0.2-dev

Steps:

  1. Create a cluster with one master [controlplane, etcd] node and the other worker node using cluster.yml file. Use the command ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
    Cluster.yml file example:
auth:
  strategy: x509
network:
  plugin: flannel
hosts:
  - advertised_hostname: server1
    ip: 1.1.1.1
    user: ubuntu
    role: [controlplane, etcd]
    docker_socket: /var/run/docker.sock
    advertise_address: 10.1.1.1
  - advertised_hostname: server2
    ip: 2.2.2.2
    user: ubuntu
    role: [worker]
    advertise_address: 10.2.2.2

services:
  etcd:
    image: quay.io/coreos/etcd:latest
  kube-api:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      v: 4
  kube-controller:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_cidr: 10.233.64.0/18
    service_cluster_ip_range: 10.233.0.0/18
  scheduler:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
  kubelet:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0
    cluster_domain: cluster.local
    cluster_dns_server: 10.233.0.3
    infra_container_image: gcr.io/google_containers/pause-amd64:3.0
  kubeproxy:
    image: quay.io/coreos/hyperkube:v1.7.5_coreos.0

  1. Remove the master node in the cluster.yml file and rerun the ./rke_darwin-amd64 cluster up command
    It fails with a panic as below
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster up --cluster-file cluster.yml 
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [soumyarketes2]  
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0000] [state] Successfully Fetched cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0000] [certificates] Getting Cluster certificates from Kubernetes 
INFO[0000] [certificates] Successfully fetched Cluster certificates from Kubernetes 
INFO[0000] [certificates] Deploying kubernetes certificates to Cluster nodes 
INFO[0008] [certificates] Successfully deployed kubernetes certificates to Cluster nodes 
INFO[0008] [etcd] Building up Etcd Plane..              
INFO[0008] [etcd] Successfully started Etcd Plane..     
INFO[0008] [controlplane] Building up Controller Plane.. 
INFO[0008] [controlplane] Successfully started Controller Plane.. 
INFO[0008] [worker] Building up Worker Plane..          
INFO[0008] [worker] Container nginx-proxy is already running on host [soumyarketes2] 
INFO[0008] [worker] Container kubelet is already running on host [soumyarketes2] 
INFO[0008] [worker] Container kube-proxy is already running on host [soumyarketes2] 
INFO[0008] [worker] Successfully started Worker Plane.. 
INFO[0008] [reconcile] Reconciling cluster state        
INFO[0008] [reconcile] Rebuilding and update local kube config 
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 1 [running]:
github.com/rancher/rke/vendor/github.com/urfave/cli.HandleAction.func1(0xc42057c900)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:472 +0x28d
panic(0x1c499e0, 0x24b7800)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/rancher/rke/cluster.rebuildLocalAdminConfig(0xc4201d0000, 0x25, 0x0)
	/go/src/github.com/rancher/rke/cluster/cluster.go:165 +0x54a
github.com/rancher/rke/cluster.ReconcileCluster(0xc4201d0000, 0xc4204dc240, 0x0, 0x0)
	/go/src/github.com/rancher/rke/cluster/cluster.go:127 +0xa3
github.com/rancher/rke/cmd.ClusterUp(0xc420114480, 0x43a, 0x43a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/rancher/rke/cmd/cluster.go:94 +0x17d
github.com/rancher/rke/cmd.clusterUpFromCli(0xc420224000, 0x0, 0x0)
	/go/src/github.com/rancher/rke/cmd/cluster.go:133 +0xdd
reflect.Value.call(0x1c07de0, 0x1dee640, 0x13, 0x1d78844, 0x4, 0xc4205828a0, 0x1, 0x1, 0xc420582828, 0x1d53f40, ...)
	/usr/local/go/src/reflect/value.go:434 +0x91f
reflect.Value.Call(0x1c07de0, 0x1dee640, 0x13, 0xc4205828a0, 0x1, 0x1, 0x140, 0xc420224000, 0x0)
	/usr/local/go/src/reflect/value.go:302 +0xa4
github.com/rancher/rke/vendor/github.com/urfave/cli.HandleAction(0x1c07de0, 0x1dee640, 0xc420224000, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:481 +0x198
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.Run(0x1d780aa, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1d85712, 0x14, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:186 +0xab8
github.com/rancher/rke/vendor/github.com/urfave/cli.(*App).RunAsSubcommand(0xc4202d4180, 0xc4201e5cc0, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:355 +0xa1f
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.startApp(0x1d7a704, 0x7, 0x1d7a704, 0x7, 0x0, 0x0, 0x0, 0x1d8a255, 0x19, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:273 +0x81b
github.com/rancher/rke/vendor/github.com/urfave/cli.Command.Run(0x1d7a704, 0x7, 0x1d7a704, 0x7, 0x0, 0x0, 0x0, 0x1d8a255, 0x19, 0x0, ...)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/command.go:74 +0x142c
github.com/rancher/rke/vendor/github.com/urfave/cli.(*App).Run(0xc4202d4000, 0xc42000e140, 0x5, 0x5, 0x0, 0x0)
	/go/src/github.com/rancher/rke/vendor/github.com/urfave/cli/app.go:235 +0x5d6
main.mainErr(0xc4202445a0, 0xc420583f68)
	/go/src/github.com/rancher/rke/main.go:41 +0x2f8
main.main()
	/go/src/github.com/rancher/rke/main.go:14 +0x22

When cluster remove is done , kube_config_cluster.yml continues to be present.

rke version v0.0.5-dev

Steps to reproduce the problem:
Create a new K8s cluster using cluster up command.
Remove K8s cluster using cluster remove command.
At this point , kube_config_cluster.yml continues to be present.

When attempting to add a new cluster using ```cluster up`` command , rke now attempts to fetch cluster state from the removed Kubernetes and it fails before it proceeds to creating a new cluster.

Sangeethas-MBP:~ sangeethahariharan1$ ./rke_darwin-amd64 cluster up
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
INFO[0000] [ssh] Start tunnel for host [ip-172-31-38-78.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-39-84.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-37-222.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-38-78.us-east-2.compute.internal] 
INFO[0000] [ssh] Start tunnel for host [ip-172-31-39-84.us-east-2.compute.internal] 
INFO[0000] [state] Found local kube config file, trying to get state from cluster 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0030] Timed out waiting for kubernetes cluster to get state 
INFO[0030] [certificates] Generating kubernetes certificates 
INFO[0030] [certificates] Generating CA kubernetes certificates 
INFO[0031] [certificates] Generating Kubernetes API server certificates 

Should cluster remove , also remove kube_config_cluster.yml to indicate that rke will not be working on this cluster anymore?

Arm support

Popularity of kubeadm comes (partially) from it's support of RPi, because running K8s on 4 Hypriot RPi's or more is an excellent training playground (far better than minikube). I'm sure that I can change images to point to Google Hyperkube, but why not providing k8s:v1.8.3-rancher2 for armhf architectures?

Ability to use real hostname value for hostname from each host

currently hostname_override defaults to address, and kube-api has option:

"--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"
which tell kube-api to skip hostname resolution for the nodes and use internal ip address and external if not found.

We should be able to fetch the hostname from the host, using a docker container with --net=host.

note: kubelet uses /proc/sys/kernel/hostname to get the hostname of the node.

RKE fails to find the key when using ~/.ssh/id_rsa for SSH private Key path

  1. Create a Kubernetes cluster using private key path ~/.ssh/id_rsa in the cluster.yml file
    RKE fails to find the key with the below error:
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 up --config testssh.yml
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Checking private key                   
FATA[0000] Failed to parse the private key: ssh: no key found 

If a config file is generated using ./rke config --name test.yml , the default ssh_key_path is ~/.ssh/id_rsa and users who use this default key path will run into this "no key found" issue

Support for configuring cloud-provider

I ask while I trying to figure out where to set the option:

  kube-controller:
    image: rancher/k8s:v1.8.3-rancher2
    extra_args: {--cloud-provider=aws}
[...]
  kubelet:
    image: rancher/k8s:v1.8.3-rancher2
    extra_args: {--cloud-provider=aws}
    cluster_domain: cluster.local

When nodes are removed , attempt to clean up system containers from the host.

rke version v0.0.4-dev

Steps to reproduce the problem:
Create a cluster with 2 nodes in controlplane and 1 node each in etcd and worker.
Remove one of the nodes in controlplane.

Even after the node is removed from the cluster , system containers continue to be present in the host.

8c9b34e1d7a4        rancher/k8s:v1.8.3-rancher1                                                                                                   "kube-apiserver --ins"   19 minutes ago      Up 19 minutes                           kube-api
aad891bd8af0        rancher/k8s:v1.8.3-rancher2                                                                                                   "kube-proxy --v=2 --h"   2 hours ago         Up 2 hours                              kube-proxy
8de5f4bf533a        rancher/k8s:v1.8.3-rancher2                                                                                                   "kubelet --v=2 --addr"   2 hours ago         Up 2 hours                              kubelet
87473c127186        rancher/k8s:v1.8.3-rancher2                                                                                                   "kube-scheduler --lea"   2 hours ago         Up 2 hours                              scheduler
a9c059b4c9a9        rancher/k8s:v1.8.3-rancher2                                                                                                   "kube-controller-mana"   2 hours ago         Up 2 hours                              kube-controller

It would be good to make an attempt to clean up system containers from the host when the hosts are reachable.

Be on par with minikube

minikube works for the use case of standing up a local k8s. From my efforts so far, I still can't get rke to do the same for such a simple use case. I created https://github.com/flaccid/rke-vagrant to attempt to do so (nightmare).

For minikube:

minikube start
minikube dashboard

Can we attempt to do the same and have it documented? I'm not suggesting that rke manage VMs, but currently there isn't a simple example to 'spin up RKE locally' - I would of thought this would be the simplest case to cover.

Timeout waiting for kubernetes to be ready

RKE version:
rke version v0.0.8-dev

Docker version: (docker version,docker info preferred)
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64

Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Experimental: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

4.4.0-101-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
bare metal

cluster.yml file:
nodes:

  • address: 172.20.27.160
    internal_address: ""
    role:
    • controlplane
    • etcd
      hostname_override: ""
      user: six6
      docker_socket: /var/run/docker.sock
  • address: 172.20.27.161
    internal_address: ""
    role:
    • worker
      hostname_override: ""
      user: six6
      docker_socket: /var/run/docker.sock
  • address: 172.20.27.162
    internal_address: ""
    role:
    • worker
      hostname_override: ""
      user: six6
      docker_socket: /var/run/docker.sock
      services:
      etcd:
      image: quay.io/coreos/etcd:latest
      extra_args: {}
      kube-api:
      image: rancher/k8s:v1.8.3-rancher2
      extra_args: {}
      service_cluster_ip_range: 10.233.0.0/18
      kube-controller:
      image: rancher/k8s:v1.8.3-rancher2
      extra_args: {}
      cluster_cidr: 10.233.64.0/18
      service_cluster_ip_range: 10.233.0.0/18
      scheduler:
      image: rancher/k8s:v1.8.3-rancher2
      extra_args: {}
      kubelet:
      image: rancher/k8s:v1.8.3-rancher2
      extra_args: {}
      cluster_domain: cluster.local
      infra_container_image: gcr.io/google_containers/pause-amd64:3.0
      cluster_dns_server: 10.233.0.3
      kubeproxy:
      image: rancher/k8s:v1.8.3-rancher2
      extra_args: {}
      network:
      plugin: flannel
      options: {}
      auth:
      strategy: x509
      options: {}
      addons: ""
      ssh_key_path: /home/six6/.ssh/id_rsa

Steps to Reproduce:

rke up

Results:
...
INFO[0054] [certificates] Save kubernetes certificates as secrets
FATA[0084] [certificates] Failed to Save Kubernetes certificates: Failed to save certificate [kube-proxy] to kubernetes: [certificates] Timeout waiting for kubernetes to be ready

After running rke remove, and rke up, I still got the same error.

Thanks!

RKE fails with "no key found" error when ssh keys not named id_rsa.pub/id_rsa

rke version: 0.0.1-dev
Steps:

  1. Create ssh keys using ssh-keygen. Use names abc.pub and abc for public/private keys
  2. Add the public key to .ssh/authorized_keys file on the hosts that are used to create the cluster(specified in cluster.yml)
  3. Create a cluster using ./rke cluster up command.
    Cluster creation fails with "key not found" error. The keys are not named id_rsa.pub and id_rsa on the system where we rke is used
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Start tunnel for host [soumyarketestmaster] 
INFO[0000] [ssh] Start tunnel for host [soumyarketestmaster] 
INFO[0000] [ssh] Start tunnel for host [soumyarketestworker] 
INFO[0000] [state] Fetching cluster state from Kubernetes 
WARN[0030] Timed out waiting for kubernetes cluster     
INFO[0030] [certificates] Generating kubernetes certificates 
INFO[0030] [certificates] Generating CA kubernetes certificates 
INFO[0030] [certificates] Generating Kubernetes API server certificates 
INFO[0030] [certificates] Generating Kube Controller certificates 
INFO[0030] [certificates] Generating Kube Scheduler certificates 
INFO[0031] [certificates] Generating Kube Proxy certificates 
INFO[0031] [certificates] Generating Node certificate   
INFO[0032] [certificates] Generating admin certificates and kubeconfig 
INFO[0032] [certificates] Deploying kubernetes certificates to Cluster nodes 
FATA[0032] Error configuring SSH: ssh: no key found     

Remove invalid RestartPolicy for cert-deployer

We specify never as RestartPolicy for the cert-deployer image (https://github.com/rancher/rke/blob/master/pki/deploy.go#L79) but this is not a valid value. The reason it works is that it's not verified in the API version used in Docker 1.12 (1.24). We should not specify the RestartPolicy, as default is "" which is do no restart.

Next to being invalid, this also blocks using newer Docker versions as validation to RestartPolicy was added in API version 1.25.

Containerd Support

As Docker is not my best choice runtime engine for worker nodes, any chance to include support for Containerd in the roadmap?

gz#6781

Check Docker version on host

As we already talk to the Docker API, we can retrieve ServerVersion and compare to supported Docker versions set in rke.

Provide relevant error messages when nodes are added/removed during cluster upgrade

  1. Create a k8s cluster using image rancher/k8s:v1.8.2-rancher5 in cluster.yml file.
    ./rke_darwin-amd64 cluster up --cluster-file cluster.yml
  2. Update the cluster.yml file with rancher/k8s:v1.8.3-rancher2 for all the components
    and add another node
  3. Upgrade the cluster . Adding another node during upgrade gives an error message as below:
soumyas-MBP:rke soumya$ ./rke_darwin-amd64 cluster upgrade --cluster-file cluster.yml
INFO[0000] Upgrading Kubernetes cluster                 
INFO[0000] [state] Found local kube config file, trying to get state from cluster 
INFO[0000] [state] Fetching cluster state from Kubernetes 
INFO[0000] [state] Successfully Fetched cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0000] [certificates] Getting Cluster certificates from Kubernetes 
INFO[0001] [certificates] Successfully fetched Cluster certificates from Kubernetes 
INFO[0001] [ssh] Checking private key                   
INFO[0001] [ssh] Start tunnel for host [soumyarketest1] 
INFO[0001] [ssh] Start tunnel for host [soumyarketest1] 
INFO[0001] [ssh] Start tunnel for host [soumyarketest3] 
INFO[0001] [ssh] Start tunnel for host [soumyarketes2]  
INFO[0001] [ssh] Start tunnel for host [soumyarketest4] 
INFO[0001] [upgrade] All nodes are Ready                
INFO[0001] [upgrade] Upgrading Control Plane Services   
INFO[0001] [controlplane] Upgrading the Controller Plane.. 
INFO[0003] [controlplane] Successfully pulled kube-api image on host [soumyarketest1] 
INFO[0004] [controlplane] Successfully started kube-api container on host [soumyarketest1] 
INFO[0005] [controlplane] Successfully pulled kube-controller image on host [soumyarketest1] 
INFO[0005] [controlplane] Successfully started kube-controller container on host [soumyarketest1] 
INFO[0006] [controlplane] Successfully pulled scheduler image on host [soumyarketest1] 
INFO[0007] [controlplane] Successfully started scheduler container on host [soumyarketest1] 
INFO[0009] [controlplane] Successfully pulled kube-api image on host [soumyarketest3] 
INFO[0009] [controlplane] Successfully started kube-api container on host [soumyarketest3] 
INFO[0011] [controlplane] Successfully pulled kube-controller image on host [soumyarketest3] 
INFO[0011] [controlplane] Successfully started kube-controller container on host [soumyarketest3] 
INFO[0012] [controlplane] Successfully pulled scheduler image on host [soumyarketest3] 
INFO[0013] [controlplane] Successfully started scheduler container on host [soumyarketest3] 
INFO[0013] [controlplane] Successfully upgraded Controller Plane.. 
INFO[0013] [upgrade] Control Plane Services updgraded successfully 
INFO[0013] [upgrade] Upgrading Worker Plane Services    
INFO[0013] [worker] Upgrading Worker Plane..            
INFO[0014] [worker] Successfully pulled kubelet image on host [soumyarketest1] 
INFO[0015] [worker] Successfully started kubelet container on host [soumyarketest1] 
INFO[0016] [worker] Successfully pulled kube-proxy image on host [soumyarketest1] 
INFO[0016] [worker] Successfully started kube-proxy container on host [soumyarketest1] 
INFO[0018] [worker] Successfully pulled kubelet image on host [soumyarketest3] 
INFO[0018] [worker] Successfully started kubelet container on host [soumyarketest3] 
INFO[0019] [worker] Successfully pulled kube-proxy image on host [soumyarketest3] 
INFO[0020] [worker] Successfully started kube-proxy container on host [soumyarketest3] 
INFO[0042] [worker] Successfully pulled kubelet image on host [soumyarketes2] 
INFO[0043] [worker] Successfully started kubelet container on host [soumyarketes2] 
INFO[0044] [worker] Successfully pulled kube-proxy image on host [soumyarketes2] 
INFO[0044] [worker] Successfully started kube-proxy container on host [soumyarketes2] 
FATA[0075] Failed to set cordonded state for node: soumyarketest4 

A relevant error message should be provided to inform the user that addition and removal of nodes is not supported.

DNS resolution very slow ( up to 15 secs) and sometimes fails .

RKE version - v0.0.8-dev

Had a cluster with 3 nodes - 1 node in control plane and etcd, 1 master and 1 worker.
Added 2 more worker nodes.

kube-dns-autoscaler and kube-dns get created as expected.

Sangeethas-MBP:~ sangeethahariharan1$ ./kubectl get pods -n kube-system
NAME                                   READY     STATUS    RESTARTS   AGE
kube-dns-778977457c-6rghh              3/3       Running   0          10m
kube-dns-778977457c-vc8l6              3/3       Running   0          1h
kube-dns-autoscaler-5cb4f55974-74w7z   1/1       Running   0          1h

From within container,DNS resolution very slow ( up to 15 secs) and sometimes it even fails .

root@testnow-qtczp:/# ping google.com
ping: unknown host
root@testnow-qtczp:/# ping google.com
PING google.com (172.217.5.238): 56 data bytes
64 bytes from 172.217.5.238: icmp_seq=0 ttl=44 time=11.682 ms
64 bytes from 172.217.5.238: icmp_seq=1 ttl=44 time=14.199 ms
64 bytes from 172.217.5.238: icmp_seq=2 ttl=44 time=11.719 ms
64 bytes from 172.217.5.238: icmp_seq=3 ttl=44 time=11.698 ms
64 bytes from 172.217.5.238: icmp_seq=4 ttl=44 time=11.618 ms
64 bytes from 172.217.5.238: icmp_seq=5 ttl=44 time=12.764 ms
64 bytes from 172.217.5.238: icmp_seq=6 ttl=44 time=12.890 ms
64 bytes from 172.217.5.238: icmp_seq=7 ttl=44 time=12.630 ms
64 bytes from 172.217.5.238: icmp_seq=8 ttl=44 time=14.780 ms
^C--- google.com ping statistics ---
9 packets transmitted, 9 packets received, 0% packet loss
round-trip min/avg/max/stddev = 11.618/12.664/14.780/1.094 ms
root@testnow-qtczp:/# ping google.com
ping: unknown host
root@testnow-qtczp:/# 

Fatal error when creating Kubernetes cluster with hosts using encrypted ssh keys

rke version: 0.0.1-dev

  1. Create encrypted ssh keys using ssh-keygen
  2. Add these keys to .ssh/authorized_keys file on the hosts that are used to create the cluster(specified in cluster.yml)
  3. Create cluster using ./rke cluster up command.
    Cluster creation fails with a fatal error.
rke soumya$ ./rke_darwin-amd64 cluster up --cluster-file cluster.yml 
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [ssh] Start tunnel for host [soumyarketest-01] 
INFO[0000] [ssh] Start tunnel for host [soumyarketest-01] 
INFO[0000] [ssh] Start tunnel for host [soumyarketest-02] 
WARN[0000] Failed to initiate new Kubernetes Client: stat admin.config: no such file or directory 
INFO[0000] [certificates] Generating kubernetes certificates 
INFO[0000] [certificates] Generating CA kubernetes certificates 
INFO[0000] [certificates] Generating Kubernetes API server certificates 
INFO[0000] [certificates] Generating Kube Controller certificates 
INFO[0000] [certificates] Generating Kube Scheduler certificates 
INFO[0001] [certificates] Generating Kube Proxy certificates 
INFO[0001] [certificates] Generating Node certificate   
INFO[0002] [certificates] Generating admin certificates and kubeconfig 
INFO[0002] [certificates] Deploying kubernetes certificates to Cluster nodes 
FATA[0002] Error configuring SSH: ssh: cannot decode encrypted private keys 

rke remove does not stop the flanneld

./rke -v

rke version v0.0.7-dev

./rke remove

ps aux | grep kube

root 11276 0.0 0.5 313220 10576 ? Ssl 07:08 0:03 /opt/bin/flanneld --ip-masq --kube-subnet-mgr

Ideally remove should also stop flannel

etcd added to an existing cluster is not able to join existing etcd cluster.

rke version v0.0.4-dev

Steps to reproduce the problem:
Create a cluster with following configuration:
host1 in etcd and controlplane
host2 in controlplane
host3 in worker

Update cluster (using cluster up command) by making host2 in etcd as well
etcd service gets deployed in host2 successfully but it is not able join existing cluster.

INFO[0022] [etcd] Building up Etcd Plane..              
INFO[0025] [etcd] Container etcd is already running on host [sangee-test-1] 
INFO[0029] [etcd] Successfully pulled etcd image on host [sangee-test-2] 
INFO[0029] [etcd] Successfully started etcd container on host [sangee-test-2] 
INFO[0029] [etcd] Successfully started Etcd Plane..   
/ # etcdctl member list
client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
; error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout

/ # 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.