pires / kubernetes-vagrant-coreos-cluster Goto Github PK

View Code? Open in Web Editor NEW

597.0 38.0 208.0 3 MB

Kubernetes cluster (for testing purposes) made easy with Vagrant and CoreOS.

License: Apache License 2.0

Shell 65.84% sed 34.16%

kubernetes vagrant coreos cluster

kubernetes-vagrant-coreos-cluster's Introduction

kubernetes-vagrant-coreos-cluster

Turnkey Kubernetes cluster setup with Vagrant 2.1.1+ and CoreOS.

If you're lazy, or in a hurry, jump to the TL;DR section.

Pre-requisites

Vagrant 2.1.1+
a supported Vagrant hypervisor:
- Virtualbox (the default)
- Parallels Desktop
- VMware Fusion or VMware Workstation

MacOS X

On MacOS X (and assuming you have homebrew already installed) run

brew install wget

Windows

The vagrant-winnfsd plugin will be installed in order to enable NFS shares.
The project will run some bash script under the VirtualMachines. These scripts line ending need to be in LF. Git for windows set core.autocrlf true by default at the installation time. When you clone this project repository, this parameter (set to true) ask git to change all line ending to CRLF. This behavior need to be changed before cloning the repository (or after for each files by hand). We recommand to turn this to off by running git config --global core.autocrlf false and git config --global core.eol lf before cloning. Then, after cloning, do not forget to turn the behavior back if you want to run other windows projects: git config --global core.autocrlf true and git config --global core.eol crlf.

Deploy Kubernetes

Current Vagrantfile will bootstrap one VM with everything needed to become a Kubernetes master and, by default, a couple VMs with everything needed to become Kubernetes worker nodes. You can change the number of worker nodes and/or the Kubernetes version by setting environment variables NODES and KUBERNETES_VERSION, respectively. You can find more details below.

vagrant up

Linux or MacOS host

Kubernetes cluster is ready. Use kubectl to manage it.

Windows host

On Windows systems, kubectl is installed on the master node, in the /opt/bin directory. To manage your Kubernetes cluster, ssh into the master node and run kubectl from there.

vagrant ssh master
kubectl cluster-info

Clean-up

vagrant destroy

If you've set NODES or any other variable when deploying, please make sure you set it in vagrant destroy call above, like:

NODES=3 vagrant destroy -f

Notes about hypervisors

Virtualbox

VirtualBox is the default hypervisor, and you'll probably need to disable its DHCP server

VBoxManage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0

Parallels

If you are using Parallels Desktop, you need to install vagrant-parallels provider

vagrant plugin install vagrant-parallels

Then just add --provider parallels to the vagrant up invocations above.

VMware

If you are using one of the VMware hypervisors you must buy the matching provider and, depending on your case, just add either --provider vmware_fusion or --provider vmware_workstation to the vagrant up invocations above.

Private Docker Repositories

If you want to use Docker private repositories look for DOCKERCFG bellow.

Customization

Environment variables

Most aspects of your cluster setup can be customized with environment variables. Right now the available ones are:

HTTP_PROXY sets http proxy address.

Defaults to $HTTP_PROXY of host machine if it exists.

You need customing this proxy setting on VMs when you have to face gfw with tools like shadowsocks, privoxy on host machine, because Vagrantfile uses proxy setting on host machine default,this setting might be not right to VMs, it would lead to internet disconnection of VMs. If you have the problem, you can refer to https://www.linuxbabe.com/virtualbox/how-to-access-host-services-from-a-virtualbox-guest-os for customing this setting.
HTTPS_PROXY

like HTTP_PROXY
NO_PROXY

like HTTP_PROXY
NODES sets the number of nodes (workers).

Defaults to 2.
CHANNEL sets the default CoreOS channel to be used in the VMs.

Defaults to alpha.

While by convenience, we allow an user to optionally consume CoreOS' beta or stable channels please do note that as both Kubernetes and CoreOS are quickly evolving platforms we only expect our setup to behave reliably on top of CoreOS' alpha channel. So, before submitting a bug, either in this project, or in (Kubernetes or CoreOS) make sure it (also) happens in the (default) alpha channel 😄
COREOS_VERSION will set the specific CoreOS release (from the given channel) to be used.

Default is to use whatever is the latest one from the given channel.
SERIAL_LOGGING if set to true will allow logging from the VMs' serial console.

Defaults to false. Only use this if you really know what you are doing.
MASTER_MEM sets the master node VM memory.

Defaults to 1024 (in MB)
MASTER_CPUS sets the number of vCPUs to be used by the master VM.

Defaults to 2.
NODE_MEM sets the worker nodes VM memory.

Defaults to 2048 (in MB)
NODE_CPUS sets the number of vCPUs to be used by node VMs.

Defaults to 2.
DOCKERCFG sets the location of your private docker repositories (and keys) configuration. However, this is only usable if you set USE_DOCKERCFG=true.

Defaults to "~/.dockercfg".

You can create/update a ~/.dockercfg file at any time by running docker login <registry>.<domain>. All nodes will get it automatically, at 'vagrant up', given any modification or update to that file.
DOCKER_OPTIONS sets the additional DOCKER_OPTS for docker service on both master and the nodes. Useful for adding params such as --insecure-registry.
KUBERNETES_VERSION defines the specific kubernetes version being used.

Defaults to 1.10.9. Versions prior to 1.10.0 may not work with current cloud-configs and Kubernetes descriptors.
USE_KUBE_UI defines whether to deploy or not the Kubernetes UI

Defaults to false.
AUTHORIZATION_MODE setting this to RBAC enables RBAC for the kubernetes cluster.

Defaults to AlwaysAllow.
CLUSTER_CIDR defines the CIDR to be used for pod networking. This CIDR must not overlap with 10.100.0.0/16.

Defaults to 10.244.0.0/16.

So, in order to start, say, a Kubernetes cluster with 3 worker nodes, 4GB of RAM and 4 vCPUs per node one just would run:

NODE_MEM=4096 NODE_CPUS=4 NODES=3 vagrant up

or with Kubernetes UI:

NODE_MEM=4096 NODE_CPUS=4 NODES=3 USE_KUBE_UI=true vagrant up

Please do note that if you were using non default settings to startup your cluster you must also use those exact settings when invoking vagrant {up,ssh,status,destroy} to communicate with any of the nodes in the cluster as otherwise things may not behave as you'd expect.

Synced Folders

You can automatically mount in your guest VMs, at startup, an arbitrary number of local folders in your host machine by populating accordingly the synced_folders.yaml file in your Vagrantfile directory. For each folder you which to mount the allowed syntax is...

# the 'id' of this mount point. needs to be unique.
- name: foobar
# the host source directory to share with the guest(s).
  source: /foo
# the path to mount ${source} above on guest(s)
  destination: /bar
# the mount type. only NFS makes sense as, presently, we are not shipping
# hypervisor specific guest tools. defaults to `true`.
  nfs: true
# additional options to pass to the mount command on the guest(s)
# if not set the Vagrant NFS defaults will be used.
  mount_options: 'nolock,vers=3,udp,noatime'
# if the mount is enabled or disabled by default. default is `true`.
  disabled: false

ATTENTION: Don't remove /vagrant entry.

TL;DR

vagrant up

This will start one master and two worker nodes, download Kubernetes binaries start all needed services. A Docker mirror cache will be provisioned in the master, to speed up container provisioning. This can take some time depending on your Internet connection speed.

Please do note that, at any time, you can change the number of worker nodes by setting the NODES value in subsequent vagrant up invocations.

Usage

Congratulations! You're now ready to use your Kubernetes cluster.

If you just want to test something simple, start with [Kubernetes examples] (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/examples/).

For a more elaborate scenario [here] (https://github.com/pires/kubernetes-elasticsearch-cluster) you'll find all you need to get a scalable Elasticsearch cluster on top of Kubernetes in no time.

Troubleshooting

Vagrant displays a warning message when running!

Vagrant 2.1 integrated support for triggers as a core functionality. However, this change is not compatible with the vagrant-triggers community plugin we were and still are using. Since we require this plugin, Vagrant will show the following warning:

WARNING: Vagrant has detected the `vagrant-triggers` plugin. This plugin conflicts
with the internal triggers implementation. Please uninstall the `vagrant-triggers`
plugin and run the command again if you wish to use the core trigger feature. To
uninstall the plugin, run the command shown below:

  vagrant plugin uninstall vagrant-triggers

Note that the community plugin `vagrant-triggers` and the core trigger feature
in Vagrant do not have compatible syntax.

To disable this warning, set the environment variable `VAGRANT_USE_VAGRANT_TRIGGERS`.

This warning is harmless and only means that we are using the community plugin instead of the core functionality. To disable it, set the VAGRANT_USE_VAGRANT_TRIGGERS environment variable to false before running vagrant:

$ VAGRANT_USE_VAGRANT_TRIGGERS=false NODES=2 vagrant up

I'm getting errors while waiting for Kubernetes master to become ready on a MacOS host!

If you see something like this in the log:

==> master: Waiting for Kubernetes master to become ready...
error: unable to load file "temp/dns-controller.yaml": unable to connect to a server to handle "replicationcontrollers": couldn't read version from server: Get https://10.245.1.2/api: dial tcp 10.245.1.2:443: i/o timeout
error: no objects passed to create

You probably have a pre-existing Kubernetes config file on your system at ~/.kube/config. Delete or move that file and try again.

I'm getting errors while waiting for mounting to /vagrant on a CentOS 7 host!

If you see something like this in the log:

mount.nfs: Connection timed out.

It might be caused by firewall, you can check if firewall is active with 'systemctl status firewalld', if yes, you can use 'systemctl stop firewalld' simply.

Kubernetes Dashboard asks for either a Kubeconfig or token!

This behavior is expected in latest versions of the Kubernetes Dashboard, since different people may need to use the Kubernetes Dashboard with different permissions. Since we deploy a service account with administrative privileges you should just click Skip. Everything will work as expected.

Licensing

This work is open source, and is licensed under the Apache License, Version 2.0.

kubernetes-vagrant-coreos-cluster's People

Contributors

Stargazers

Watchers

Forkers

jaysonraymond ualtinok l337ch antoniomeireles zutherb imikushin sturmm albertocsm flaviof relaxdiego bakins furny misteritguru hikariii oxtopus frosenberg ericgray sporkmonger dknell abuchanan920 janroos windwizard digitalism jgoldick capttofu tsungming maxdaten matt-deboer karmux brancz richardknop xoss lilwulin schatt ericjperry jmhmccr nomisbeme lorellalou v1k0d3n ndobson jonaz qdubious lewismarshall prashanthrv geseib oss17888 potgieterdl remoe dcurran90 ccelebi yanns koulio bertcord nanduni-nin chenhuimin simplydata jaohaohsuan jbowen93 thanu vladah nmquyet ankitforcode tdeheurles rajasaur samueltbrown rsrchboy pichlerma dixudx alecalanis chamilad jono-tt matusch wvcardoso se77en eliknebel neilmillard linearregression jsdjayanga sabrao meggarr dillera roadrunner kedaio georgepar w9n vipul-nextev dan70402 najibninaba mskstudio huangu0 spinezhang kienllam jduhamel sglover onek0708 pathcl markeissler sjison jayunit100 rasptomo

kubernetes-vagrant-coreos-cluster's Issues

kube-dns pod does not appear after first vagrant up

When I first create machines with

USE_DOCKERCFG=true DOCKERCFG=../containers/cloud_config/.dockercfg NUM_INSTANCES=1 MASTER_MEM=1024 MASTER_CPUS=2 NODE_MEM=4096 NODE_CPUS=2 vagrant up

then no kube-dns pod is created

$ kubectl get pods                     
POD       IP        CONTAINER(S)   IMAGE(S)   HOST      LABELS    STATUS    CREATED   MESSAGE

But when I do halt and up again, then kube-dns pod appears into kubectl get pods output:

USE_DOCKERCFG=true DOCKERCFG=../containers/cloud_config/.dockercfg NUM_INSTANCES=1 MASTER_MEM=1024 MASTER_CPUS=2 NODE_MEM=4096 NODE_CPUS=2 vagrant halt

USE_DOCKERCFG=true DOCKERCFG=../containers/cloud_config/.dockercfg NUM_INSTANCES=1 MASTER_MEM=1024 MASTER_CPUS=2 NODE_MEM=4096 NODE_CPUS=2 vagrant up

$ kubectl get pods
POD              IP            CONTAINER(S)   IMAGE(S)                                         HOST            LABELS                                                STATUS    CREATED          MESSAGE
kube-dns-8qh8o   10.244.37.2                                                                   172.17.8.102/   k8s-app=kube-dns,kubernetes.io/cluster-service=true   Running   5 minutes        
                               skydns         gcr.io/google_containers/skydns:2015-03-11-001                                                                         Running   About a minute   
                               kube2sky       gcr.io/google_containers/kube2sky:1.5                                                                                  Running   About a minute   
                               etcd           gcr.io/google_containers/etcd:2.0.9                                                                                    Running   About a minute

There is no errors in vagrant output and it always displays that it's configuring Kubernetes cluster DNS while bringing machines up.

Missing closing `"` on line 4 of kubLocalSetup

master, nodes hostname naming problem

@pires
master and nodes get the same node-02 host name, it makes a bit confusing :-)

vagrant ssh master
CoreOS alpha (575.0.0)
Update Strategy: No Reboots
core@node-02 ~ $ ip a
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 08:00:27:3c:27:6e brd ff:ff:ff:ff:ff:ff
    inet 172.17.8.101/24 brd 172.17.8.255 scope global eth1

vagrant ssh node-01
CoreOS alpha (575.0.0)
Update Strategy: No Reboots
core@node-02 ~ $ ip a
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 08:00:27:03:fb:13 brd ff:ff:ff:ff:ff:ff
    inet 172.17.8.102/24 brd 172.17.8.255 scope global eth1

vagrant ssh node-02
CoreOS alpha (575.0.0)
Update Strategy: No Reboots
core@node-02 ~ $ ip a
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 08:00:27:0a:e9:5c brd ff:ff:ff:ff:ff:ff
    inet 172.17.8.103/24 brd 172.17.8.255 scope global eth1

I use OS X 10.10, latest VirtualBox and Vagrant versions

kube2sky failed to connect to etcd server

I followed the instructions for setting up the kubernetes cluster in the README. It seems that the etcd version you're using for your kube2sky pod is not supported by kube2sky?

Failure with kubernetes/pause image on fresh minions

Hi, first time I try to deploy some pods on to the cluster I receive failures in endless loop. This is until I ssh into every minion and run docker pull kubernetes/pause manually.

Wed, 18 Mar 2015 14:13:55 +0000   Wed, 18 Mar 2015 14:13:55 +0000   1                   myservice      BoundPod            implicitly required container POD   pulled              {kubelet 172.17.8.103}   Successfully pulled image "kubernetes/pause:latest"
Wed, 18 Mar 2015 14:13:55 +0000   Wed, 18 Mar 2015 14:13:55 +0000   1                   myservice      BoundPod            implicitly required container POD   pulled              {kubelet 172.17.8.103}   Successfully pulled image "kubernetes/pause:latest"
Wed, 18 Mar 2015 14:13:55 +0000   Wed, 18 Mar 2015 14:13:55 +0000   1                   myservice      BoundPod                                                failedSync          {kubelet 172.17.8.103}   Error syncing pod, skipping: no such image
Wed, 18 Mar 2015 14:13:55 +0000   Wed, 18 Mar 2015 14:13:55 +0000   1                   myservice      BoundPod            implicitly required container POD   failed              {kubelet 172.17.8.103}   Failed to create docker container with error: no such image
Wed, 18 Mar 2015 14:14:05 +0000   Wed, 18 Mar 2015 14:14:05 +0000   1                   myservice      BoundPod                                                failedSync          {kubelet 172.17.8.103}   Error syncing pod, skipping: Authentication is required.
Wed, 18 Mar 2015 14:14:05 +0000   Wed, 18 Mar 2015 14:14:05 +0000   1                   myservice      BoundPod            implicitly required container POD   failed              {kubelet 172.17.8.103}   Failed to pull image "kubernetes/pause:latest"

Install kubectl

kubecfg has been dropped.

kube-apiserver crashlooping on master

I've been banging my head against this for hours with no luck. Running from master, unchanged. Vagrant 1.7.2, virtualbox 4.3.24_OSEr98716, on Arch Linux.

When I ssh into the master I can see that kube-apiserver is starting up, then crashing, then getting restarted over and over. The log is not enlightening. When I start the server from the command line, rather than letting systemctl do it, everything is fine.

All the kubernetes services are failing, actually, but the API server is the one I've dug into.

Perhaps also related is the fact that I couldn't figure out how to stop it from constantly restarting. I added Restart=no to the "kube-apiserver.service" file and reloaded the configs, but it just kept looping, so something else is causing it to restart (and maybe causing it to die in the first place?)

Here's the log... can you reproduce?

Mar 18 02:11:28 master systemd[1]: Starting Kubernetes API Server...
Mar 18 02:11:28 master wget[2071]: --2015-03-18 02:11:28-- https://storage.googleapis.com/kubernetes-release/release/v0.10.1/bin/linux/amd64/kube-apiserver
Mar 18 02:11:28 master wget[2071]: Resolving storage.googleapis.com... 64.233.171.132, 64.233.171.132
Mar 18 02:11:28 master wget[2071]: Connecting to storage.googleapis.com|64.233.171.132|:443... connected.
Mar 18 02:11:28 master wget[2071]: HTTP request sent, awaiting response... 200 OK
Mar 18 02:11:28 master wget[2071]: Length: 19913056 (19M) [application/octet-stream]
Mar 18 02:11:28 master wget[2071]: Server file no newer than local file '/opt/bin/kube-apiserver' -- not retrieving.
Mar 18 02:11:28 master systemd[1]: Started Kubernetes API Server.
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.675790 2119 plugins.go:70] No cloud provider specified.
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.676366 2119 master.go:273] Setting master service IPs based on PortalNet subnet to "10.100.0.1" (read-only) and "10.100.0.2" (read-write).
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.700728 2119 logs.go:40] [restful/swagger] listing is available at 172.17.8.101:8080/swaggerapi/
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.713037 2119 logs.go:40] [restful/swagger] 172.17.8.101:8080/swaggerui/ is mapped to folder /swagger-ui/
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.714246 2119 server.go:287] Serving read-only insecurely on 172.17.8.101:7080
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.714561 2119 server.go:314] Serving securely on 172.17.8.101:8443
Mar 18 02:11:28 master kube-apiserver[2119]: I0318 02:11:28.714922 2119 server.go:342] Serving insecurely on 0.0.0.0:8080
Mar 18 02:11:30 master kube-apiserver[2119]: I0318 02:11:30.817309 2119 server.go:324] Using self-signed cert (/var/run/kubernetes/apiserver.crt, /var/run/kubernetes/apiserver.key)
Mar 18 02:11:33 master systemd[1]: Stopping Kubernetes API Server...
Mar 18 02:11:33 master systemd[1]: kube-apiserver.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 18 02:11:33 master systemd[1]: Unit kube-apiserver.service entered failed state.
Mar 18 02:11:33 master systemd[1]: kube-apiserver.service failed.
Mar 18 02:11:36 master systemd[1]: Starting Kubernetes API Server...
Mar 18 02:11:36 master wget[2199]: --2015-03-18 02:11:36-- https://storage.googleapis.com/kubernetes-release/release/v0.10.1/bin/linux/amd64/kube-apiserver
Mar 18 02:11:36 master wget[2199]: Resolving storage.googleapis.com... 64.233.171.132, 64.233.171.132
Mar 18 02:11:36 master wget[2199]: Connecting to storage.googleapis.com|64.233.171.132|:443... connected.
Mar 18 02:11:36 master systemd[1]: Started Kubernetes API Server.
Mar 18 02:11:36 master wget[2199]: HTTP request sent, awaiting response... 200 OK
Mar 18 02:11:36 master wget[2199]: Length: 19913056 (19M) [application/octet-stream]
Mar 18 02:11:36 master wget[2199]: Server file no newer than local file '/opt/bin/kube-apiserver' -- not retrieving.
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.495671 2242 plugins.go:70] No cloud provider specified.
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.496323 2242 master.go:273] Setting master service IPs based on PortalNet subnet to "10.100.0.1" (read-only) and "10.100.0.2" (read-write).
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.516886 2242 logs.go:40] [restful/swagger] listing is available at 172.17.8.101:8080/swaggerapi/
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.542710 2242 logs.go:40] [restful/swagger] 172.17.8.101:8080/swaggerui/ is mapped to folder /swagger-ui/
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.544579 2242 server.go:287] Serving read-only insecurely on 172.17.8.101:7080
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.545339 2242 server.go:314] Serving securely on 172.17.8.101:8443
Mar 18 02:11:36 master kube-apiserver[2242]: I0318 02:11:36.545723 2242 server.go:342] Serving insecurely on 0.0.0.0:8080
Mar 18 02:11:37 master kube-apiserver[2242]: I0318 02:11:37.296141 2242 server.go:324] Using self-signed cert (/var/run/kubernetes/apiserver.crt, /var/run/kubernetes/apiserver.key)
Mar 18 02:11:41 master systemd[1]: Stopping Kubernetes API Server...
Mar 18 02:11:41 master systemd[1]: kube-apiserver.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 18 02:11:41 master systemd[1]: Unit kube-apiserver.service entered failed state.
Mar 18 02:11:41 master systemd[1]: kube-apiserver.service failed.
Mar 18 02:11:44 master systemd[1]: Starting Kubernetes API Server...

Networking between pods on different minions doesn't work

Upgrade to 0.17.0

A new version of kubernetes has been released.
https://github.com/GoogleCloudPlatform/kubernetes/releases/tag/v0.17.0
All parts of kubernetes and kubectl are released with v0.17.0

VM use DHCP instead of static IP adresses as expected

Hi,

before the commit 7f33709 everything was OK.

Now when I deploy a cluster with virtualbox provider (I configured a DHCP serveur inside virtualbox in the subnet 172.17.8.x-49/99), nodes use IP provided by the DHCP instead of 172.17.8.101 (master) and so on for minions.

P.S. I don't used the sync file, so all is by default.

Tell me if you need more information to reproduce the problem.

eth1 use virtualbox dhcp

When CoreOS are launching they use the virtual box dhcp before setting static addresse. So we have multiple addresse on the card. The one used is 192.168.X.X and not 172.17.8.101 (for the master). It can be solved by disabling dhcp option in virtualbox.
I think it can be solved with something like that:
VBoxManage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0
but i don't really know where can I put it...

Add Kubernetes DNS integration

See https://github.com/GoogleCloudPlatform/kubernetes/tree/master/cluster/addons/dns

kube-apiserver wupiao not getting response from etcd2

This seems to happen whenever I've shutdown vagrant and try to restart it again with vagrant up.

kube-apiserver hangs on wupiao dependency endlessly

● kube-apiserver.service - Kubernetes API Server
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; static; vendor preset: disabled)
   Active: activating (start-pre) since Tue 2015-05-19 22:28:47 ; 1min 15s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 3500 ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-apiserver (code=exited, status=0/SUCCESS)
  Process: 3489 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-apiserver (code=exited, status=0/SUCCESS)
  Process: 3485 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
  Control: 3504 (wupiao)
   Memory: 332.0K
   CGroup: /system.slice/kube-apiserver.service
           └─control
             ├─3504 /bin/bash /opt/bin/wupiao 172.17.8.101:2379/v2/machines
             └─3664 sleep 1

May 19 22:29:53 master wupiao[3504]: .
May 19 22:29:54 master wupiao[3504]: .
May 19 22:29:55 master wupiao[3504]: .
May 19 22:29:56 master wupiao[3504]: .
May 19 22:29:57 master wupiao[3504]: .
May 19 22:29:58 master wupiao[3504]: .
May 19 22:29:59 master wupiao[3504]: .
May 19 22:30:00 master wupiao[3504]: .
May 19 22:30:01 master wupiao[3504]: .
May 19 22:30:02 master wupiao[3504]: .

etcd2 seems to be up and listening on http://localhost:2379 while wupiao is pinging http://172.17.8.101:2379

core@master /etc/systemd/system $ systemctl status etcd2
● etcd2.service - etcd2
   Loaded: loaded (/usr/lib64/systemd/system/etcd2.service; static; vendor preset: disabled)
   Active: active (running) since Tue 2015-05-19 22:18:31 ; 11min ago
 Main PID: 2066 (etcd2)
   Memory: 12.1M
   CGroup: /system.slice/etcd2.service
           └─2066 /usr/bin/etcd2

May 19 22:18:32 master etcd2[2066]: 2015/05/19 22:18:32 raft: 75263048b09334ce received vote from 75263048b09334ce at term 6
May 19 22:18:32 master etcd2[2066]: 2015/05/19 22:18:32 raft: 75263048b09334ce became leader at term 6
May 19 22:18:32 master etcd2[2066]: 2015/05/19 22:18:32 raft.node: 75263048b09334ce elected leader 75263048b09334ce at term 6
May 19 22:18:32 master etcd2[2066]: 2015/05/19 22:18:32 etcdserver: published {Name:98f562dff4b14da5a208cae0dc963fb5 ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster a741eabc2549d13f
May 19 22:20:17 master etcd2[2066]: 2015/05/19 22:20:17 etcdserver: start to snapshot (applied: 110011, lastsnap: 100010)
May 19 22:20:17 master etcd2[2066]: 2015/05/19 22:20:17 etcdserver: compacted log at index 110011
May 19 22:20:17 master etcd2[2066]: 2015/05/19 22:20:17 wal: segmented wal file /var/lib/etcd2/member/wal/000000000000000b-000000000001adbc.wal is created
May 19 22:20:17 master etcd2[2066]: 2015/05/19 22:20:17 etcdserver: saved snapshot at index 110011
May 19 22:20:31 master etcd2[2066]: 2015/05/19 22:20:31 filePurge: successfully removed file /var/lib/etcd2/member/snap/0000000000000002-000000000000ea66.snap
May 19 22:20:31 master etcd2[2066]: 2015/05/19 22:20:31 filePurge: successfully removed file /var/lib/etcd2/member/wal/0000000000000006-000000000000ea67.wal

Single-node cluster if num-instances set to 1?

It would be a handy feature to set up single-node cluster with user-data similar to one from https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/coreos_single_node_cluster.md
How do you think?

Ethernet device named ens34 instead of eth1 on Fusion

I was able to successfully get this working on VMware Fusion along with the Kubernetes Guestbook example. Strange networking issues due to cloud-config files referring to interface eth1, which is correct on VirtualBox, but not on Fusion.

I modified master.yaml and node.yaml to refer to ens34 instead of eth1. Works.

It may be helpful to find a way to conditionally use the applicable interface name.

Private repository pull failed

Hello, I'm using private repository to store docker images. I have setup cluster also providing docker cfg. When starting pods I receive:
myservice registry.my.com/user/myservice Waiting image pull failed for registry.my.com/user/myservice, this may be because there are no credentials on this request. details: (Authentication is required.)

However, I'm able to pull these images directly with docker on master and node.

sed: can't read : No such file or directory

Seeing a few of these when trying to run vagrant --provider=parallels up.

$ vagrant up --provider parallels
Installing the 'vagrant-triggers' plugin. This can take a few minutes...
Installed the plugin 'vagrant-triggers (0.5.0)'!
Bringing machine 'master' up with 'parallels' provider...
Bringing machine 'node-01' up with 'parallels' provider...
Bringing machine 'node-02' up with 'parallels' provider...
==> node-02: Box 'AntonioMeireles/coreos-alpha' could not be found. Attempting to find and install...
    node-02: Box Provider: parallels
    node-02: Box Version: >= 695.0.0
==> master: Running triggers before up...
==> master: Setting Kubernetes version 0.18.0
sed: can't read : No such file or directory
sed: can't read : No such file or directory
sed: can't read : No such file or directory
==> master: Configuring Kubernetes cluster DNS...
sed: can't read : No such file or directory
sed: can't read : No such file or directory
sed: can't read : No such file or directory
==> node-02: Loading metadata for box 'https://vagrantcloud.com/AntonioMeireles/coreos-alpha'

network environment fails to start

Just updated to v0.17.0. When I bring up the vm with vagrant up and ssh into the nodes I'm seeing:

core@node-01  $ systemctl status setup-network-environment.service
setup-network-environment.service - Setup Network Environment
   Loaded: loaded (/etc/systemd/system/setup-network-environment.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2015-05-18 20:49:59 ; 24s ago
     Docs: https://github.com/kelseyhightower/setup-network-environment
  Process: 28297 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment (code=exited, status=8)
  Process: 28296 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)

May 18 20:49:58 node-01 wget[28297]: Location: https://s3.amazonaws.com/github-cloud/releases/24030064/c925b05e-3c44-11e4-83a0-2e78b7fd556a?response-content-disposition=attachmen...
May 18 20:49:58 node-01 wget[28297]: --2015-05-18 20:49:58--  https://s3.amazonaws.com/github-cloud/releases/24030064/c925b05e-3c44-11e4-83a0-2e78b7fd556a?response-content-dispos...
May 18 20:49:58 node-01 wget[28297]: Resolving s3.amazonaws.com... 54.231.244.4, 54.231.244.4
May 18 20:49:58 node-01 wget[28297]: Connecting to s3.amazonaws.com|54.231.244.4|:443... connected.
May 18 20:49:59 node-01 wget[28297]: HTTP request sent, awaiting response... 403 Forbidden
May 18 20:49:59 node-01 wget[28297]: 2015-05-18 20:49:59 ERROR 403: Forbidden.
May 18 20:49:59 node-01 systemd[1]: setup-network-environment.service: control process exited, code=exited status=8
May 18 20:49:59 node-01 systemd[1]: Failed to start Setup Network Environment.
May 18 20:49:59 node-01 systemd[1]: Unit setup-network-environment.service entered failed state.
May 18 20:49:59 node-01 systemd[1]: setup-network-environment.service failed.

cannot connect to master from OS X

@pires
$ export KUBERNETES_MASTER=http://172.17.8.101:8080
$ kubectl get minions
F0128 10:14:50.489095 52140 get.go:75] Get http://172.17.8.101:8080/api/v1beta1/minions?labels=&namespace=default: dial tcp 172.17.8.101:8080: connection refused

Could it be related to the latest k8s 0.9.1 version?

DNS integration - how to access DNS server or other services?

I ran your scripts, works perfectly fine so far.

However, I have a question on the DNS integration:

florian@Florians-MacBook-Pro:~/Code$ kubectl get services
NAME                LABELS                                                              SELECTOR            IP                  PORT
kube-dns            k8s-app=kube-dns,kubernetes.io/cluster-service=true,name=kube-dns   k8s-app=kube-dns    10.100.0.10         53
kubernetes          component=apiserver,provider=kubernetes                             <none>              10.100.0.2          443
kubernetes-ro       component=apiserver,provider=kubernetes                             <none>              10.100.0.1          80

I see those services running now but I have no way of accessing the DNS server for example. The following times out (assuming I have webui as a service deployed):

florian@Florians-MacBook-Pro:~/Code$ dig @10.100.0.10 +short webui.pires-kube
;; connection timed out; no servers could be reached

Actually, every ping to any of the service IPs times out. I presume this means that all k8s services are inaccessible. Is there any step I'm missing?

Document forwarding of external traffic from host to k8s services

Here's my setup:

I have a personal Arch Linux server for NAS, personal HTTP, backup, bitcoin node, etc.
I want to use k8s to make my configuration as stateless as possible. If my server died tomorrow, I should be able to restore my data and get all my programs up and running with a few shell scripts. Also, security through VMs/containerization.
I want to map incoming connections on my host (e.g. http) to k8s services.

My current almost-solution is to have a single minion setup, include the minion's static IP in the service definitions, and use iptables rules on the host to forward traffic coming in on certain ports to the minion. The reason I say this "almost" works is that the default Vagrantfile makes everything a "private network", so communication between the host and minions is fine, but forwarding external traffic to the minion gets blocked.

(The reason it gets blocked is that the network interface on the guest OS is configured without a gateway, so traffic from the host IP is expected on that interface, but if the source IP comes from the outside world, Linux sees it as being outside the interface's subnet and drops it on the floor.)

I worked around this hackily by adding a unit to node.yaml which sets a default gateway on the interface used to talk to the host, but it seems that solution has started interfering with etcd, so I'm looking for a more robust one.

I've considered:

Making the networks be public_networks, but that behaves a bit weirdly (nodes start querying my LAN's DHCP server).
Running kube-proxy on the host, but then I've got a not-super-mature piece of software exposed to the outside world running as root on the host which would kinda defeat the purpose security-wise.
Running a third VM with a public_network and kube-proxy on it, so network weirdness gets hidden from the minions, but that would require some fiddling.

Any established practices for this?

kubecfg doesn't have executable permissions

$ ls -las /usr/local/bin/kubecfg
0 -rw-r--r--  1 root  admin  0 Feb 19 13:42 /usr/local/bin/kubecfg

k8s services have 10.100.x.y addresses

After vagrant up the kubernetes and kubernetes-ro services IP fields are 10.100.0.2 and 10.100.0.1 respectively. Those addresses aren't pingable from either the host machine or from the master.

Specify cloud provider to kube-apiserver

Use the -cloud_provider switch to tell k8s what the cloud provider is being used.

Acceptable values (per https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/kube-env.sh) are: 'gce', 'gke', 'aws', 'azure', 'vagrant', 'vsphere', 'libvirt-coreos', 'juju'

Containers can't access k8s API

See kubernetes/kubernetes#3244

Error setting up minion nodes with CoreOS 695.0

Hi,

I am trying to setup my cluster with CoreOS 695.0 (latest release in the alpha channel) and I get this from journalctl:

May 29 00:23:41 node-01 etcd2[1185]: 2015/05/29 00:23:41 error verifying flags, -advertise-client-urls is required when -listen-client-urls is set explicitly. See 'etcd -help'.
May 29 00:23:41 node-01 etcd2[1185]: 2015/05/29 00:23:41 When listening on specific address(es), this etcd process must advertise accessible url(s) to each connected client.
May 29 00:23:41 node-01 systemd[1]: etcd2.service: main process exited, code=exited, status=1/FAILURE

Seems like the etcd command line has changed a bit. What's the recommend CoreOS version to go with?

Thanks,

Yoanis.

Fix instructions

Some filenames are not correct. Just give it some love.

Disable VirtualBox DHCP server

Why is this required, and will it impact other VirtualBox VMs?

Can you add a note on this

Can't resolve anything via DNS after vagrant reload

Thanks for this awesome project!
Preliminary remark: During the last days I followed up on #74 and cloned your master branch just a few hours ago.

git clone [email protected]:pires/kubernetes-vagrant-coreos-cluster.git
cd kubernetes-vagrant-coreos-cluster
vagrant up
source ~/.bash_profile
kubectl get pods,services,rc

Looks like everything is alright.

Now I spin up a stock nginx and a corresponding service from
https://gist.github.com/reasn/bfbf9425606873b58021 like so:

kubectl create -f https://gist.githubusercontent.com/reasn/bfbf9425606873b58021/raw/6e86a84c26c700c2e47dd5903d5624a1695293cb/svc-nginx.yml
kubectl create -f https://gist.githubusercontent.com/reasn/bfbf9425606873b58021/raw/5906a19ba33eb28cfeb9db088971e4bf5e03907e/rx-nginx.yml
kubectl get pods,services,rc

If I now vagrant ssh node-01 and curl 10.244.62.2 (IP of the nginx pod) I get a response from nginx.
Same with the IP of the nginx service.

But if I curl svc-nginx.default.k8s.local curl can't resolve the host.

The same holds if I docker execinto the nginx container (in my case on node-02) and try to ping svc-nginx.default.k8s.local (Chose ping because the nginx stock container has neither curl nor dig).

I could not find out what I might have done wrong.
It seems to me that something with skydns is not working properly.

Update (2015-04-30 15:10 CEST): I might've vagrant reloaded the machine in between, currently investigating this

Update (2015-04-30 15:31 CEST): The reason is the vagrant reload. That seems to kill the DNS

Drop CoreOS stable support

@AntonioMeireles just saw references to the CoreOS stable channel but we should actually not support it. For instance, the -s=overlay Docker option is not supported in Docker < 1.4.x and I had a lot of issues because of that with stable users. And yes, right now, stable is using Docker 1.4.1 but I'd really like to not hold on to that premise.

Replace env vars in ~/.bash_profile with sed

This is to be implemented in setup script.

vagrant trigger failing?

Any ideas?

kubernetes-vagrant-coreos-cluster jsindy$ vagrant up
Installing the 'vagrant-triggers' plugin. This can take a few minutes...
Bundler, the underlying system Vagrant uses to install plugins,
reported an error. The error is shown below. These errors are usually
caused by misconfigured plugin installations or transient network
issues. The error from Bundler is:

An error occurred while installing nokogiri (1.6.6.2), and Bundler cannot continue.
Make sure that gem install nokogiri -v '1.6.6.2' succeeds before bundling.

failure with vmware_fusion on OS X

Hi there! I can't get this cluster working. It was working flawlessly last weekend. I'm not sure what changeset makes it fail.

Sometimes, it fails because the resolver isn't set up on any of the virtual machines so none of the services that rely on using wget to obtain binaries work.

Other times, I get this failure with setup-network-service (a 403 from github/s3)

Patricks-MacBook-Pro:kubernetes-vagrant-coreos-cluster patg$ vagrant ssh master
CoreOS valpha (681.0.0)
Update Strategy: No Reboots
Failed Units: 1
setup-network-environment.service

core@master ~ $ systemctl status setup-network-environment.service
● setup-network-environment.service - Setup Network Environment
Loaded: loaded (/etc/systemd/system/setup-network-environment.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2015-05-24 16:10:05 ; 6s ago
Docs: https://github.com/kelseyhightower/setup-network-environment
Process: 2449 ExecStart=/opt/bin/setup-network-environment (code=exited, status=0/SUCCESS)
Process: 2445 ExecStartPre=/usr/bin/chmod +x /opt/bin/setup-network-environment (code=exited, status=0/SUCCESS)
Process: 2485 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment (code=exited, status=8)
Process: 2482 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
Main PID: 2449 (code=exited, status=0/SUCCESS)

May 24 16:10:05 master wget[2485]: Location: https://s3.amazonaws.com/github-cloud/releases/24030064/c925b05e-3c44-11e4-...
May 24 16:10:05 master wget[2485]: --2015-05-24 16:10:05-- https://s3.amazonaws.com/github-cloud/releases/24030064/c925...
May 24 16:10:05 master wget[2485]: Resolving s3.amazonaws.com... 54.231.14.72
May 24 16:10:05 master wget[2485]: Connecting to s3.amazonaws.com|54.231.14.72|:443... connected.
May 24 16:10:05 master wget[2485]: HTTP request sent, awaiting response... 403 Forbidden
May 24 16:10:05 master wget[2485]: 2015-05-24 16:10:05 ERROR 403: Forbidden.
May 24 16:10:05 master systemd[1]: setup-network-environment.service: control process exited, code=exited status=8
May 24 16:10:05 master systemd[1]: Failed to start Setup Network Environment.
May 24 16:10:05 master systemd[1]: Unit setup-network-environment.service entered failed state.
May 24 16:10:05 master systemd[1]: setup-network-environment.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

core@master ~ $ /usr/bin/wget -N -P /opt/bin https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment
--2015-05-24 16:11:33-- https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment
Resolving github.com... 192.30.252.131
Connecting to github.com|192.30.252.131|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://s3.amazonaws.com/github-cloud/releases/24030064/c925b05e-3c44-11e4-83a0-2e78b7fd556a?response-content-disposition=attachment%3B%20filename%3Dsetup-network-environment&response-content-type=application/octet-stream&AWSAccessKeyId=AKIAISTNZFOVBIJMK3TQ&Expires=1432483953&Signature=WNxk5676o6Vt7JKsebVmCYzbImw%3D [following]
--2015-05-24 16:11:33-- https://s3.amazonaws.com/github-cloud/releases/24030064/c925b05e-3c44-11e4-83a0-2e78b7fd556a?response-content-disposition=attachment%3B%20filename%3Dsetup-network-environment&response-content-type=application/octet-stream&AWSAccessKeyId=AKIAISTNZFOVBIJMK3TQ&Expires=1432483953&Signature=WNxk5676o6Vt7JKsebVmCYzbImw%3D
Resolving s3.amazonaws.com... 54.231.96.120
Connecting to s3.amazonaws.com|54.231.96.120|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2015-05-24 16:11:33 ERROR 403: Forbidden.

cloud-init stuck at docker-cache.service

Hi @pires
I'm using this project to brought kubernetes up in my local machine, and fleetctl list-machines gives right output, but it failed to do kubectl create, output is

F0307 18:28:04.580672 56532 create.go:81] Post http://172.17.8.101:8080/api/v1beta1/pods?namespace=default: dial tcp 172.17.8.101:8080: connection refused

I sshed in master, and found it didn't start kubernetes api server, and I only find flanneld setup-network-environment waiter.sh in /opt/bin, also ps -aux says coreos-cloudinit is still running and stat is Ssl. I couldn't find where cloud-init's log exists, so I killed cloud-init and started it manually, it seems it stucks at docker-cache.service.

several line at end of output:
2015/03/07 10:27:56 Calling unit command "start" on "fleet.socket"'
2015/03/07 10:27:56 Result of "start" on "fleet.socket": done
2015/03/07 10:27:56 Calling unit command "start" on "fleet.service"'
2015/03/07 10:27:56 Result of "start" on "fleet.service": done
2015/03/07 10:27:56 Calling unit command "start" on "etcd-waiter.service"'
2015/03/07 10:27:56 Result of "start" on "etcd-waiter.service": done
2015/03/07 10:27:56 Calling unit command "start" on "docker-cache.service"'

I don't know how to fix this.

problem launch flannel docker0

Hello,
I have a issue with your last release, pods I launch are not connected in flannel network (10.244.X.X) but in docker0 network (10.1.X.X) so I can connect my pods from other nodes of the cluster

number of node >2

When I try to launch 6 nodes , for example, they all launch but only node 1 and 2 are added to the master. When I get minions with kubectl I only get node-01 and node-02.

Make VM disks configurable

VM disks are not configurable as of the moment. We need this so we don't end up filling disks with more aggressive testbeds.

/cc @albertocsm

Timeouts when trying to post to connect

When I install (or try and use kubectl subsequently) I get errors when trying to connect to the api:

Error: Post https://10.245.1.2/api/v1beta3/namespaces/default/replicationcontrollers: dial tcp 10.245.1.2:443: i/o timeout
Error: Post https://10.245.1.2/api/v1beta3/namespaces/default/services: dial tcp 10.245.1.2:443: i/o timeout

I made sure to run VBoxManage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0
...and have tried rebuilding the whole thing multiple times.
My .bash_profile has the following added to it:

export FLEETCTL_ENDPOINT=http://172.17.8.101:4001
export KUBERNETES_MASTER=http://172.17.8.101:8080

Should those be matching? Is there a hidden config for kubectl?

Missing kubetctl

Hi,

I'm probably doing something terribly wrong, but I'm using Windows as host with the latest Vagrant, VirtualBox, etc.

I run vagrant up, connect to master node, and cannot find kubectl nor kubectl.sh anywhere, nor the "cluster" directory. So far the only thing I found was
core@master / $ ls /opt/bin/
kube-apiserver kube-controller-manager kube-register kube-scheduler setup-network-environment wupiao

Should I try to install k8s again, or am I missing something else?

Using https://get.k8s.io is outdated

Why are projects such as this one still using https://get.k8s.io to derive the latest releases of kubenetes. It looks like it hasn't been update in a while as kube is now 0.12.0.
Oh and since this is my first communication - you people are doing some great work here :)

Failed to mutate etcd when using mutator

On startup the k8s-dns pod containers work fine, but usually when I delete them and the controller brings it back up, I get the above message when I enter "kubectl log kube2sky" and for "kubectl log skydns" I get:

skydns:2015-03-11-001 failing with exit code 2

if i do kubectl get pods, it is continuously failing. perhaps an issue with latest kubernetes? i am running your latest master branch

Cluster does not work behind HTTP proxy

I'm trying to run a Kubernetes cluster behind corporate HTTP proxy. I'm also using vagrant-proxyconf plugin, and define http/s proxy and no_proxy hosts there.
Custer nodes fail to start with "Failed to start Docker Application Container Engine." error.

vagrant status shows only first 2 nodes

In case more than 2 nodes are started, vagrant still sees only node-01 and node-02. It is not possible to ssh into higher nodes via vagrant, nor halt them. If issuing vagrant halt, the master and 2 first nodes are halted but the rest is up.

Fails to start under VMWare Fusion

I'd love to start playing with Kubernetes & CoreOS under Vagrant, but I cannot get it to work.

As far as I can see, I've narrowed it down to this:

core@master ~ $ systemctl status -n 100 setup-network-environment
● setup-network-environment.service - Setup Network Environment
   Loaded: loaded (/etc/systemd/system/setup-network-environment.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2015-04-10 06:14:22 UTC; 1min 58s ago
     Docs: https://github.com/kelseyhightower/setup-network-environment
  Process: 1024 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/k8s/setup-network-environment (code=exited, status=4)
  Process: 1023 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)

Apr 10 06:14:22 master systemd[1]: Starting Setup Network Environment...
Apr 10 06:14:22 master wget[1024]: --2015-04-10 06:14:22--  https://storage.googleapis.com/k8s/setup-network-environment
Apr 10 06:14:22 master wget[1024]: Resolving storage.googleapis.com... failed: Temporary failure in name resolution.
Apr 10 06:14:22 master wget[1024]: wget: unable to resolve host address 'storage.googleapis.com'
Apr 10 06:14:22 master systemd[1]: setup-network-environment.service: control process exited, code=exited status=4
Apr 10 06:14:22 master systemd[1]: Failed to start Setup Network Environment.
Apr 10 06:14:22 master systemd[1]: Unit setup-network-environment.service entered failed state.
Apr 10 06:14:22 master systemd[1]: setup-network-environment.service failed.

Is there something wrong with my network setup?

Killing node doesn't reflect on pod relocation

See kubernetes/kubernetes#5435

`vagrant halt && vagrant up` supported?

me again. Is vagrant halt supported? Currently it seems booting the cluster up again hangs in waiting for the master to become ready, or at least booting up is extremely slow. Or is this a kubernetes problem?

With a fresh start:

$ vagrant up
# waiting for the cluster to become ready
$ vagrant halt
$ vagrant up
# hangs in
==> master: Waiting for Kubernetes master to become ready...

during this, I can ssh into the master with vagrant ssh master in a different shell. The master is up but the api is not accessable:

$ kubectl cluster-info
Kubernetes master is running at http://172.17.8.101:8080
$ kubectl get events
Error: Get http://http://172.17.8.101:8080/api/v1beta3/namespaces/default/events: dial tcp 172.17.8.101:8080 connection rufused

Is this a known limitation of kubernetes and vagrant? But I assume it should be possible according to
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/vagrant.md#interacting-with-your-kubernetes-cluster-with-vagrant

$ vagrant version
Installed Version: 1.7.2
$ VBoxManage --version
4.3.28r100309

Error configuring network.

Things were working fine until one time I tried to load the cluster and things went awry.
SSH failed and so I set the username manually in the Vagrantfile to core.

Now I get this error

==> master: Setting hostname...
==> master: Configuring and enabling network interfaces...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
ifconfig  172.17.8.101 netmask 255.255.255.0
Stdout from the command:

Stderr from the command:
SIOCSIFNETMASK: No such device

I've seen advice along these lines:
https://muffinresearch.co.uk/vmware-siocsifaddr-no-such-device-eth0-after-cloning/
...I have not done any cloning though.

I've tried

sudo systemctl restart systemd-networkd

but no cigar.

vagrant ssh

Unable to ssh into any node

eg.:

$ vagrant ssh <any-node-name>
...
Permission denied (publickey,password,keyboard-interactive).

Vagrant key can be added to ssh-agent to fix this

$ ssh-add ~/.vagrant.d/insecure_private_key

See CoreOS blog for context

Unable to get kube2sky/skydns set up properly

I'm having some problems getting DNS resolution to work in my cluster using the dns-controller.yaml configuration. Once I have the cluster up and running, I can see that the k8s-dns pod is running, however I don't seem to be able to resolve hosts from within the containers. For lack of examples to work from, I am not really sure if I am just "doing it wrong" or if it is set up incorrectly. The skydns log seems to indicate the latter.

k8s-dns-7l30k              10.244.9.5    etcd                 quay.io/coreos/etcd:v2.0.3                       172.17.8.103/172.17.8.103   k8s-app=k8s-dns,kubernetes.io/cluster-service=true,name=k8s-dns   Running   2 hours
                                         kube2sky             gcr.io/google_containers/kube2sky:1.1
                                         skydns               gcr.io/google_containers/skydns:2015-03-11-001

skydns logs:

kubectl log -f k8s-dns-spkyd skydns                                                               2 ↵
2015-04-25T01:30:15.182769603Z 2015/04/25 01:30:15 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [1]
2015-04-25T01:30:15.182769603Z 2015/04/25 01:30:15 skydns: ready for queries on k8s.local. for tcp://0.0.0.0:53 [rcache 0]
2015-04-25T01:30:15.182769603Z 2015/04/25 01:30:15 skydns: ready for queries on k8s.local. for udp://0.0.0.0:53 [rcache 0]

kube2sky seems to work ok:

kubectl log -f k8s-dns-7l30k kube2sky                                                             2 ↵
2015-04-24T21:48:23.378847779Z 2015/04/24 21:48:23 Etcd server found: http://127.0.0.1:4001
2015-04-24T21:48:23.378847779Z 2015/04/24 21:48:23 Using http://10.100.0.1:80 for kubernetes master
2015-04-24T21:48:23.378847779Z 2015/04/24 21:48:23 Using kubernetes API v1beta1
2015-04-25T00:36:00.921066137Z 2015/04/25 00:36:00 Setting dns record: rabbitmq-ssh.default.k8s.local. -> 10.100.31.12:8723
2015-04-25T00:36:00.937654437Z 2015/04/25 00:36:00 Setting dns record: rabbitmq-transport.default.k8s.local. -> 10.100.90.69:5672
2015-04-25T00:36:00.709896737Z 2015/04/25 00:36:00 Setting dns record: beanbag-ssh.default.k8s.local. -> 10.100.63.235:8722
2015-04-25T00:36:00.722825472Z 2015/04/25 00:36:00 Setting dns record: beanbag-transport.default.k8s.local. -> 10.100.119.3:5984
2015-04-25T00:36:00.740116168Z 2015/04/25 00:36:00 Setting dns record: elasticsearch.default.k8s.local. -> 10.100.176.104:9200

Any help would be appreciated