pawankkamboj / kubernetes-ansible Goto Github PK

Setup High-Available Kubernetes cluster using Ansible

Shell 100.00%

kubernetes-ansible's Introduction

kubernetes-ansible

Ansible playbook to create a Highly Available kubernetes cluster using release (1.16.9) on Bare metal system (have been tested on CentOS 7, Ubuntu 18.04).

Requirements:

Ansible 2.7

Download the Kubernetes-Ansible playbook and set up variable according to need in group variable file all.yml. Please read this file carefully and modify according to your need.

git clone https://github.com/pawankkamboj/kubernetes-ansible.git
cd kubernetes-ansible
cp group_vars/all.yml.example group_vars/all.yml
ansible-playbook -i inventory cluster.yml

Ansible roles

yum-proxy - installs Squid proxy server
yum-repo - installs epel repo
sslcert - create all ssl certificates require to run secure K8S cluster
runtime-env - common settings for container runtime environment
docker - install latest docker release on all cluster nodes
containerd - IF you want to use containerd runtime instead of Docker, use this role and enable this in group variable file
etcd - setup etcd cluster, running as container on all master nodes
haproxy - setup haproxy for API service HA, running on all master nodes
keepalived - using keepalive for HA of IP address for kube-api server, running on all master nodes
master - setup kubernetes master controller components - kube-apiserver, kube-controller, kube-scheduler, kubectl client
node - setup kubernetes node agent - kubelet
addon - to create addon services: flanneld, kube-proxy, kube-dns, metrics-server

Note - Addon roles should be run after cluster is fully operational. Addons are in addons.yml playbook. After cluster is up and running then run addon.yml to deploy add-ons. Included addons are: flannel network, kube-proxy, coredns.

ansible-playbook -i inventory addon.yml

kubernetes HA architecture

Below is a sample Kubernetes cluster architecture after successfully building it using playbook. It is just a sample, so number of servers/node may vary according to your setup.

kubernetes-ansible's People

Contributors

Stargazers

Watchers

Forkers

danufriev vaidyanath tomzhang gauravjos domix dano0b xialonglee dwalu talktorana optimuse oceanbluezhang arashkaffamanesh ylorenzati myansible pmondal08 diannaowa hrauchgh cplo rushabh268 raghu999 setkeh manics rohit9211 mofelee ralberto ashishkamat2791 bahazero901 gitkeng lmasiero orimanabu nilesh7756 nnvema techinasia nagpach gxjluck iljaweis felikz markjacksonfishing piwi91 smileisak brunoli binbinzou lzbgt saurabhtmba2005 hemantkbajaj gokulchandrap dreadwind anilbidari malli1983 ashwin2002 xfyecn sudharsan19 k8-kuberntes ashutosh091 nightmareze1 ifa6 apicaddyway codegazers gbvenky toolsqacn kshitizlondon currycan fengren vjremotegithub maxkochubey hungld jukums mavenir onophris sohel2020 swarup-devops sasmita016 nabarunsen doriandinhxuan garyttt ummarvali jaganthoutam acrowise makarov20211221 team2spring2020 log1cb0mb avadhesh-kubematics kioco panteparak oktbabs sreejith-pg mobs75 lwlxwang loesterfranco fakhar101 harper-dev denisonc

kubernetes-ansible's Issues

haproxy config trouble

Hi,
I would like to know which haproxy version you used when you tested it ?
It seem that with the 1.7 at least there is a problem with the 'option httplog' which is not usable with front end 'api-https' ( it say it need 'mode http'). Even so it doesn't seem to crash anything since it switch to 'option tcplog'.

Flannel configuration

maybe you have two little mistakes:
file HA-kubernetes-ansible/roles/flannel/tasks/main.yml

line 15 for https sould be
peers: "{% for hostname in groups['etcd'] %}{{etcd_peer_url_scheme}}://{{ hostname }}:2379{% if not loop.last %},{% endif %}{% endfor %}"
And you have hadcoded http

A another you have missed { in line 39.

Dubravko

[Errno 13] Permission denied

Hi,
I am new to ansible. while following the code, i got the following error

fatal: [10.0.0.141]: FAILED! => {"changed": false, "msg": "There was an issue creating /opt/kubernetes as requested: [Errno 13] Permission denied: '/opt/kubernetes'", "path": "/opt/kubernetes/pki", "state": "absent"}

I am trying to create kubernetes using sudo users. Can anyone help me in this?

Docker Bridge IP - Conflict with default 172.17.0.1

Hello,

I'm focussing a confilict with docker installation. The bridge ip is in conflict with an existing network.

I'm currently using following playbook for testing with ubuntu 20 server. In my playbook I'm using the last octet of the IP address ansible_default_ipv4.address and append it to docker_bridge_network to prevent duplicate IPs.

Whats missing in this playbook is the management of an already existing /etc/docker/daemon.json file, because the daemon.json entry { "bip": "{{ bip_ipaddr }}" } must be present before docker installation. Otherwise the docker daemon is starting with the conflicting default ip address.

What do you think?

Will it be better to manually define the bridge ips for all nodes. Or stick to the last octeet and accept the restriction of max 253 nodes in cluster
Should I open a pull request?

kr
Josef

Defined in group_vars:

# Bridge Network IPv4 used as "bip" in /docker/daemon.json
# The default value if not defined is 172.17.0.0/16 
docker_bridge_network: 172.140.0.0/24

Included tasks to install docker on ubuntu 20 focal:

# Install docker
#
# Parameters:
#    docker_bridge_network (optional) - ipv4/cidr e.g. 172.140.0.0/24

##########################################################################
# Initialization - prepare docker bridge network
##########################################################################

- name: Calculate Docker Bridge IP (bip_addr)
  block:
    - name: Ensure valid IP/CIDR specified in docker_bridge_network
      assert:
        msg: |
          'docker_bridge_network' must contain a valid ipv4/cidr network address!
                                  e.g. docker_bridge_network: 172.140.0.0/24
        that:
          - docker_bridge_network | ipaddr(False)
    - name: Prepare calculation arguments
      set_fact:
        ip_last_octett: "{{ ansible_default_ipv4.address.split('.')[3].split('/') | first }}"
        bip_parts: "{{ docker_bridge_network.split('.') }}"
    - name: Calculate docker bridge ip
      set_fact:
        bip_ipaddr: "{{ bip_parts[0] ~ '.' ~
                        bip_parts[1]  ~ '.' ~
                        bip_parts[2]  ~ '.' ~
                        ip_last_octett  ~ '/' ~
                        bip_parts[3].split('/')[1]
                        }}"
    - name: Check that bridge ip is a valid IP/CIDR addr
      assert:
        msg: |
          ERROR: 'bip_ipaddr' is not a valid ipv4/cidr address!
        that:
          - bip_ipaddr | ipaddr(False)
    - debug:
        msg: "Docker Bridge Address: {{ bip_ipaddr }}"
  when:
    - docker_bridge_network is defined

- name: Setup /etc/docker/daemon.json file if needed
  block:
    - name: Define filename for docker daemon initialization
      set_fact:
        daemon_filename: "/etc/docker/daemon.json"

    - name: Ensure docker directory exists
      file:
        path: /etc/docker
        state: directory
      become: true

    - name: Check that sudoers file exists
      stat:
        path: "{{ daemon_filename }}"
      register: res_daemon_filename
      become: true

    - name: Set flag daemon_file_missing
      set_fact:
        daemon_file_missing: "{{ not (res_daemon_filename.stat.isreg | default(false)) | bool }}"
    - name: Setup bridge ip for docker daemon
      copy:
        dest: "{{ daemon_filename }}"
        content: |
          { "bip": "{{ bip_ipaddr }}" }
      when:
        - daemon_file_missing
      become: true
  when:
    - docker_bridge_network is defined


##########################################################################
# Install docker-ce repository
##########################################################################

- name: Add Docker GPG key
  apt_key:
    url: https://download.docker.com/linux/ubuntu/gpg
    state: present

# //TODO: Add focal repository when available

- name: Install docker and requirements
  apt:
    pkg:
      - apt-transport-https
      - ca-certificates
      - curl
      - software-properties-common
      - docker.io
      - docker-compose
    state: present
    update_cache: true
  become: true

- name: Enable and start service docker
  service:
    name: docker
    enabled: true
    state: started
  become: true

##########################################################################
# Verify docker installation
##########################################################################

- name: Verify installation
  shell: 
    cmd: "{{ item }}"
  become: true
  loop:
    - docker version
    - docker info
    - docker network ls
    - ip link
    - bridge link
    - docker run --rm hello-world
    - docker run --rm alpine cat /etc/resolv.conf
    - docker run --rm alpine ping -c1 8.8.8.8
  changed_when: false

ABAC auth mode not working

Hey,

first of all thanks for this great repository, I was really lost in setting up HA Kubernetes until I found this, so thanks a lot.

I did set up Vagrant using the provided Vagrantfile however after the deployment the cluster did not came up. I checked the kubelet logs and it was filled with error messages like this one.

kubelet_node_status.go:98] Unable to register node "master2" with API server: the server does not allow access to the requested resource (post nodes)

I changed the auth_mode variable to AlwaysAllow and afterwards the cluster was working. So I think this is related to ABAC.

Unable to run kubectl get nodes

Unable to run kubectl get nodes command
Nabaruns-iMac:kubernetes-ansible nabarun$ vagrant ssh master1 -c 'watch -n1 kubectl get nodes'
/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

Every 1.0s: kubectl get nodes Thu Apr 16 09:09:40 2020

The connection to the server localhost:8080 was refused - did you specify the right host or port?

api getsockopt refused

Hi,
I'm not sure if you can help, but i will describe my context.
I'm using your script ansible with some modification ( mostly because it need to be offline ). I use vagrant to make my servers currently for testing.
Main trouble : When kubernetes began to set up, all container are up well BUT if i inspect the kubelet service on master, i can clearly see that the apiserver is not set because he can't tcp on my ip ( https) which is local :
Failed to list *api.Node: Get https://172.31.2.11/api/v1/pods? [...] : dial tcp 172.31.2.11:443: getsockopt: connection refused

Do you have any idea from which part it could come from ? Docker ? vagrant ? kubelet itself ?

PS : I can't delete this post but finally i found that it comes from haproxy which was to old (1.4 instead of 1.7)

Unable to start service kubelet

TASK [master : start and enable kubelet] ***************************************
fatal: [10.1.52.142]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service kubelet: Failed to start kubelet.service: Unit not found.\n"}
fatal: [10.1.52.135]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service kubelet: Failed to start kubelet.service: Unit not found.\n"}
to retry, use: --limit @/root/HA-kubernetes-ansible/cluster.retry

PLAY RECAP *********************************************************************
10.1.52.135 : ok=30 changed=0 unreachable=0 failed=1
10.1.52.142 : ok=26 changed=0 unreachable=0 failed=1

Api load balancer

Hi,,

Can you please explain me this part of configuration:
#- api secure port and api loadbalancer IP
api_secure_port: 5443
api_lb_ip: https://192.168.50.11 # it should be haproxy host IP or network load balancer IP # if using onle one api server then setup IP of it
lb_ip: 192.168.50.11

#- setup haproxy for loadbalancing
haproxy: true # set false when already physical loadbalancer available
haproxy_dir: /etc/haproxy
haproxy_monitor_port: 9090
admin_user: admin

Because I don't understand in your sample this IP 192.168.50.11 is not related to the kubernetes cluster?

Thanks

Master kubetlet never fully started

Hi,

Do the master nodes truly started up for you? I ran the playbook and everything went through fine but kubelet keep on looping and never fully started. Attached is the log. It seems that etcd container won't fully run or something keeping the cluster from starting up. Any help would be appreciated. Thanks.

Mar 10 23:17:50 master1 kubelet[7697]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/admi
nister-cluster/kubelet-config-file/ for more information.                                                                                                                                                           
Mar 10 23:17:50 master1 kubelet[7697]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/admi
nister-cluster/kubelet-config-file/ for more information.                                                                                                                                                           
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.009382    7697 server.go:408] Version: v1.12.4                                                                                                                
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.009921    7697 plugins.go:99] No cloud provider specified.                                                                                                    
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.014389    7697 certificate_store.go:144] Loading cert/key pair from ("/etc/kubernetes/pki/admin.pem", "/etc/kubernetes/pki/admin-key.pem").                   
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.128256    7697 server.go:667] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /                                                
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.128657    7697 container_manager_linux.go:247] container manager verified user specified cgroup-root exists: []                                               
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.128682    7697 container_manager_linux.go:252] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsN
ame: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupNa
me: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim
:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod
:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManag
erReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}                                                                                                                    
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.128854    7697 container_manager_linux.go:271] Creating device plugin manager: true                                                                           
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.128907    7697 state_mem.go:36] [cpumanager] initializing new in-memory state store                                                                           
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.129081    7697 state_mem.go:84] [cpumanager] updated default cpuset: ""                                                                                       
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.129095    7697 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"                                                                              
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.129214    7697 kubelet.go:279] Adding pod path: /etc/kubernetes/manifests                                                                                     
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.129253    7697 kubelet.go:304] Watching apiserver                                                                                                             
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.154503    7697 client.go:75] Connecting to docker on unix:///var/run/docker.sock                                                                              
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.154546    7697 client.go:104] Start docker client with request timeout=2m0s                                                                                   
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.157852    7697 docker_service.go:236] Hairpin mode set to "hairpin-veth"                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: W0310 23:17:51.158013    7697 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d/                                                                  
Mar 10 23:17:51 master1 kubelet[7697]: W0310 23:17:51.163069    7697 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.                          
Mar 10 23:17:51 master1 kubelet[7697]: W0310 23:17:51.163165    7697 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d/                                                                  
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.163207    7697 docker_service.go:251] Docker cri networking managed by cni                                                                                    
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.192738    7697 docker_service.go:256] Docker Info: &{ID:HK2Q:TDVZ:67SU:FL5Y:AYA4:JSO2:IAYF:N2JU:CVIP:B4CX:Q6IS:X3KQ Containers:12 ContainersRunning:6 Containe
rsPaused:0 ContainersStopped:6 Images:5 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvl
an null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShar
es:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:false Debug:false NFd:53 OomKillDisable:true NGoroutines:62 SystemTime:2019-03-10T23:17:51.165590346Z LoggingDriver:json-file Cgroup
Driver:cgroupfs NEventsListener:0 KernelVersion:3.10.0-957.1.3.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc42
09f5b90 NCPU:1 MemTotal:1039335424 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:master1 Labels:[] ExperimentalBuild:false ServerVersion:18.09.3 ClusterStore: ClusterAdver
tise: Runtimes:map[runc:{Path:runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil>} LiveRestoreEnabl
ed:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e Expected:e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e} RuncCommit:{ID:6635b4f0c6af3810594d2770f662f34ddc15b40d 
Expected:6635b4f0c6af3810594d2770f662f34ddc15b40d} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default]}                                                                         
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.192860    7697 docker_service.go:269] Setting cgroupDriver to cgroupfs                                                                                        
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.299058    7697 kuberuntime_manager.go:197] Container runtime docker initialized, version: 18.09.3, apiVersion: 1.39.0                                         
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.301562    7697 server.go:1013] Started kubelet                                                                                                                
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.302308    7697 kubelet.go:1287] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to f
ind data in memory cache                                                                                                                                                                                            
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.304780    7697 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer                                                                                       
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.304820    7697 status_manager.go:152] Starting to sync pod status with apiserver                                                                              
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.304838    7697 kubelet.go:1804] Starting kubelet main sync loop.                                                                                              
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.304871    7697 kubelet.go:1821] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.8547758
07s ago; threshold is 3m0s]                                                                                                                                                                                         
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.305058    7697 server.go:133] Starting to listen on 0.0.0.0:10250                                                                                             
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.308450    7697 server.go:318] Adding debug handlers to kubelet server.                                                                                        
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.312624    7697 volume_manager.go:248] Starting Kubelet Volume Manager                                                                                         
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.317598    7697 desired_state_of_world_populator.go:130] Desired state populator starts to run                                                                 
Mar 10 23:17:51 master1 kubelet[7697]: W0310 23:17:51.340896    7697 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d/                                                                  
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.344560    7697 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not rea
dy: cni config uninitialized                                                                                                                                                                                        
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.411133    7697 kubelet.go:1821] skipping pod synchronization - [container runtime is down]                                                                    
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.417915    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach                                                  
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.419148    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.496957    7697 kubelet_node_status.go:70] Attempting to register node master1                                                                                 
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.531716    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.612786    7697 kubelet.go:1821] skipping pod synchronization - [container runtime is down]                                                                    
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.641415    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.746779    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach                                                  
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.747636    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.760143    7697 cpu_manager.go:155] [cpumanager] starting with none policy                                                                                     
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.760170    7697 cpu_manager.go:156] [cpumanager] reconciling every 10s                                                                                         
Mar 10 23:17:51 master1 kubelet[7697]: I0310 23:17:51.760188    7697 policy_none.go:42] [cpumanager] none policy: Start                                                                                             
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.763617    7697 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "master1" not found                  
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.847827    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:51 master1 kubelet[7697]: E0310 23:17:51.948078    7697 kubelet.go:2236] node "master1" not found                                                                                                      
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.013886    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach                                                  
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.030011    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.032200    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.049437    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/cd39d
a87965adf2f51307186ba6ec5bb-etcd-certs") pod "etcd-master1" (UID: "cd39da87965adf2f51307186ba6ec5bb")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.049562    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/cd39da
87965adf2f51307186ba6ec5bb-etcd-data") pod "etcd-master1" (UID: "cd39da87965adf2f51307186ba6ec5bb")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.064391    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.066962    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.085615    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.086682    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.093205    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.094289    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.098281    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: W0310 23:17:52.101929    7697 pod_container_deletor.go:75] Container "779534ffb99de10e36f5497052166f6b0564f66ed9f7e430b558f1200f7fb36d" not found in pod's containers
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.102004    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.105588    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.109625    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.113146    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:52 master1 kubelet[7697]: W0310 23:17:52.116178    7697 pod_container_deletor.go:75] Container "8cdf30df5af4153aeb6fe1a589acffd62fa810cb47f73ae0ed2ef253353a3847" not found in pod's containers
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.148956    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150141    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/7ea761
6e6935d78903494647458aa9f6-k8s-certs") pod "kube-controller-manager-master1" (UID: "7ea7616e6935d78903494647458aa9f6")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150227    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/144724
d5e5826f9e16fc243502d83b2e-k8s-certs") pod "kube-scheduler-master1" (UID: "144724d5e5826f9e16fc243502d83b2e")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150365    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/462247
d9c09c91d02f24f872b9096f87-k8s-certs") pod "kube-apiserver-master1" (UID: "462247d9c09c91d02f24f872b9096f87")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150392    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "flexvolume-dir" (UniqueName: "kubernetes.io/host-path/7
ea7616e6935d78903494647458aa9f6-flexvolume-dir") pod "kube-controller-manager-master1" (UID: "7ea7616e6935d78903494647458aa9f6")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150425    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "etc-pki" (UniqueName: "kubernetes.io/host-path/7ea7616e
6935d78903494647458aa9f6-etc-pki") pod "kube-controller-manager-master1" (UID: "7ea7616e6935d78903494647458aa9f6")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150451    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/7ea7616
e6935d78903494647458aa9f6-ca-certs") pod "kube-controller-manager-master1" (UID: "7ea7616e6935d78903494647458aa9f6")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150486    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/14472
4d5e5826f9e16fc243502d83b2e-kubeconfig") pod "kube-scheduler-master1" (UID: "144724d5e5826f9e16fc243502d83b2e")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150511    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/462247d
9c09c91d02f24f872b9096f87-ca-certs") pod "kube-apiserver-master1" (UID: "462247d9c09c91d02f24f872b9096f87")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150550    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "etc-pki" (UniqueName: "kubernetes.io/host-path/462247d9
c09c91d02f24f872b9096f87-etc-pki") pod "kube-apiserver-master1" (UID: "462247d9c09c91d02f24f872b9096f87")
Mar 10 23:17:52 master1 kubelet[7697]: I0310 23:17:52.150576    7697 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/7ea76
16e6935d78903494647458aa9f6-kubeconfig") pod "kube-controller-manager-master1" (UID: "7ea7616e6935d78903494647458aa9f6")
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.249289    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.349618    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.455400    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.556830    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.658290    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.759452    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.861874    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:52 master1 kubelet[7697]: E0310 23:17:52.963313    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.075251    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.180767    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.290802    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.401487    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: I0310 23:17:53.462443    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:53 master1 kubelet[7697]: I0310 23:17:53.463968    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:53 master1 kubelet[7697]: I0310 23:17:53.503770    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.504457    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.604825    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.705884    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.810794    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:53 master1 kubelet[7697]: E0310 23:17:53.920871    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.030799    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.134751    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.242775    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.353762    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.454821    7697 kubelet.go:2236] node "master1" not found
Mar 10 23:17:54 master1 kubelet[7697]: I0310 23:17:54.483188    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:54 master1 kubelet[7697]: I0310 23:17:54.507725    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:54 master1 kubelet[7697]: I0310 23:17:54.508529    7697 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 10 23:17:54 master1 kubelet[7697]: E0310 23:17:54.536984    7697 pod_workers.go:186] Error syncing pod cd39da87965adf2f51307186ba6ec5bb ("etcd-master1_kube-system(cd39da87965adf2f51307186ba6ec5bb)"), skipping
: failed to "StartContainer" for "etcd" with CrashLoopBackOff: "Back-off 10s restarting failed container=etcd pod=etcd-master1_kube-system(cd39da87965adf2f51307186ba6ec5bb)"

etcd using https copies certs to wrong destination directory

first. this is awesome project. You helped a ton. I'm only part way through it but i really like it.

if you have https enabled, it copies over the certs to the incorrect directory. This prevents the service from starting up. you get an error. BTW. if you run it in http mode then change it to https. it looks like it works correctly. only reason i found this is tore everything down then rebuilt in https mode first. currently in etcd/tasks/main.yml it has

`- name: copy etcd certificate from ansible host

when: etcd_peer_url_scheme == 'https'

copy: src={{ master_cert_dir }} dest={{ kube_config_dir }}`

kube_config_dir is set in all.yml to kube_config_dir: /etc/kubernetes. however in the ectd.conf.j2 file, it uses the etcd_ca_file which is set in the all.yml to "/etc/kubernetes/pki/ca.pem"

the etcd/tasks/main.yml needs to change to

`- name: copy etcd certificate from ansible host

when: etcd_peer_url_scheme == 'https'

copy: src={{ master_cert_dir }} dest={{ cert_dir }} `

That is set to cert_dir: /etc/kubernetes/pki.

Hopefully i made it clear enough on what the issue is.

Architecture

Hi,
Can you describe graphically describe architecture that you are building using ansible?

Thanks

Do you use it in production?

Hi! Could you please explain about this project purposes.

Is this project really use for any real HA projects?

autoscale ?

hello , does this role also support autoscale ?

deployment getting fail with ABAC authentication

Hi,
Thanks a lot for this forum.. I am using ABAC authentication, i created policy file auth file... authentication is working perfect .. but when we are hitting for deployment .. we are not getting any pod deployed.. .. if we remove ABAC .. deployment working fine.. can any one help us here ...it would be grate help
basic_auth.txt
policy.txt

RBAC working out of the box?

Hi,
thanks for this playbook, great job.
I don't want to use ABAC, so I don't populate the related group_var/all.yaml variables; the cluster gets deployed but the api-server container is in constant restart.
My question is, for the way this deployment is designed, will it work with RBAC or must be ABAC?
If it is supposed to work with RBAC as well, then I will invest more time in investigating the problem I've got.
Thanks,
Sono

Pod not seeing flannel overlay network

I've been using your project to test a bare-metal cluster. Running a busybox pod for testing, the container's interface is tied to the docker interface (172.17.x.x) instead of the overlay interface. Looking at other people's instructions, the docker daemon needs to be re-configured to use the overlay interface and your ansible roles do not perform this re-configuration.

sslhost

Your sslhost need to be the ansible host in the state of your files. You used the command copy to copy files to other remotes. Which is limited to localhost as source. Maybe you should already filled the inventory with it. Or i missed something and in that case i apologize.

( sorry for theses "issue" but there isn't really a chat for this kind of stuff and i haven't the time to pull request theses days )

error flannel configuration into etcd

TASK [flannel : Insert flannel configuration into etcd] ************************
fatal: [10.1.52.143]: FAILED! => {"failed": true, "msg": "the field 'args' has a n invalid value, which appears to include a variable that is undefined. The erro r was: 'etcd' is undefined\n\nThe error appears to have been in '/root/HA-kubern etes-ansible/roles/flannel/tasks/main.yml': line 22, column 3, but may\nbe elsew here in the file depending on the exact syntax problem.\n\nThe offending line ap pears to be:\n\n\n- name: Insert flannel configuration into etcd\n ^ here\n"}
fatal: [10.1.52.144]: FAILED! => {"failed": true, "msg": "the field 'args' has a n invalid value, which appears to include a variable that is undefined. The erro r was: 'etcd' is undefined\n\nThe error appears to have been in '/root/HA-kubern etes-ansible/roles/flannel/tasks/main.yml': line 22, column 3, but may\nbe elsew here in the file depending on the exact syntax problem.\n\nThe offending line ap pears to be:\n\n\n- name: Insert flannel configuration into etcd\n ^ here\n"}
fatal: [10.1.52.142]: FAILED! => {"failed": true, "msg": "the field 'args' has a n invalid value, which appears to include a variable that is undefined. The erro r was: 'etcd' is undefined\n\nThe error appears to have been in '/root/HA-kubern etes-ansible/roles/flannel/tasks/main.yml': line 22, column 3, but may\nbe elsew here in the file depending on the exact syntax problem.\n\nThe offending line ap pears to be:\n\n\n- name: Insert flannel configuration into etcd\n ^ here\n"}
fatal: [10.1.52.135]: FAILED! => {"failed": true, "msg": "the field 'args' has a n invalid value, which appears to include a variable that is undefined. The erro r was: 'etcd' is undefined\n\nThe error appears to have been in '/root/HA-kubern etes-ansible/roles/flannel/tasks/main.yml': line 22, column 3, but may\nbe elsew here in the file depending on the exact syntax problem.\n\nThe offending line ap pears to be:\n\n\n- name: Insert flannel configuration into etcd\n ^ here\n"}
to retry, use: --limit @/root/HA-kubernetes-ansible/cluster.retry

PLAY RECAP *********************************************************************
10.1.52.135 : ok=38 changed=25 unreachable=0 failed=1
10.1.52.142 : ok=33 changed=21 unreachable=0 failed=1
10.1.52.143 : ok=25 changed=17 unreachable=0 failed=1
10.1.52.144 : ok=25 changed=17 unreachable=0 failed=1

centos atomic

hi
is this role just for centos or can it also work on centos atomic hosts ?

remote nodes not connecting to flannel

The three master nodes are fine but the other three are in different universities on different networks.

This is one failing

I1202 19:42:59.910850       1 main.go:475] Determining IP address of default interface
I1202 19:42:59.911938       1 main.go:488] Using interface with name eno1 and address 130.191.49.227
I1202 19:42:59.911982       1 main.go:505] Defaulting external address to interface address (130.191.49.227)
I1202 19:43:00.306259       1 kube.go:131] Waiting 10m0s for node controller to sync
I1202 19:43:00.306347       1 kube.go:294] Starting kube subnet manager
E1202 19:53:00.306574       1 main.go:232] Failed to create SubnetManager: error waiting for nodeController to sync state: timed out waiting for the condition

Kubernetes HA

Hello Pawan,

It seems you are installing HA proxy across to all master nodes. in the var file (group_vars/all.yml) which IP address I have to choose as api_lb_ip, lb_ip, weaveui_ip, grafana_ip, influx_ip, kube_dash_ip

As per your example, you are selecting first master IP address as load balancer IP. But when first master goes down then dashboard and APIs will be un available. This won't give full redundancy.

why don't you approach attached HA model.

Variable kubeadminconfig not defined

I tested your playbook, but it give an error because the variable kubeadminconfig hasn't defined.

I set it to "/etc/kubernetes/kubeadminconfig" and the playbook worked

kube-dns in CrashLoopBackOff

Hi.
I'm trying to setup kubernetes cluster using the provided ansible playbooks. However, the addon kube-dns does not work. The following error in shown in the logs:

`I0530 10:15:33.126625 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://192.168.14.119/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.

E0530 10:15:33.140844 1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://192.168.14.119/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
`

Is this a problem with the scripts? Something that I'm missing?

Thanks in advance.

Ansible version

Hi,
Just to ask if you could put your version of Ansible in the Readme ?
Since the succedeed and all these are only in newest version of it.

ansible_hostname not valid

Hi,

I'm using your playbooks and I'm getting the error below. I traced it to the file etcd/templates/etcd.yaml.j2. Can this be fix or suggest an alternative solution? My ansible version is 2.7.7. Thanks.

TASK [etcd : Write etcd static pods file] **************************************************************************************
fatal: [192.168.50.11]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}

Addon playbook doesn't wait enough for running containers

During execution it throws an error that addon container is not running. Those errors don't break the process, so it works as well, but would be nice to wait more, or eliminate check totally

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble