choerodon / kubeadm-ansible Goto Github PK

Kuberadmin ansible is a toolkit for simple and quick installing k8s cluster.

License: Apache License 2.0

Shell 11.75% HTML 88.25%

shell choerodon python kubernetes k8s ansible ansible-playbook

kubeadm-ansible's Introduction

	This project is no longer supported,There is no plan to support any new features.The successor is open-hand/kubeadm-ha

Kubeadmin Ansible 中文

Kubeadmin ansible is a toolkit for simple and quick installing k8s cluster.

1. Environmental preparation

Note: Currently only centos 7.2+ is supported

Install the ansible run environment on the machine where the ansible script is to be executed:

sudo yum install epel-release -y 
sudo yum install git python36 sshpass -y
sudo python3.6 -m ensurepip
sudo /usr/local/bin/pip3 install --no-cache-dir ansible==2.7.5 netaddr

Clone project：

git clone https://github.com/choerodon/kubeadm-ansible.git

2. Modify hosts

Edit the inventory/hosts file under the toolkit, modify the access address, user name, and password of each machine and maintain the relationship between each node and role. The front name is the hostname of the machine. The user must have root privileges.

Note: The etcd node and the master node need to be on the same machine.

For example, if deploy a single-node cluster, configure it (reference):

[all]
node1 ansible_host=192.168.56.11 ansible_user=root ansible_ssh_pass=change_it ansible_become=true

[kube-master]
node1

[etcd]
node1

[kube-node]
node1

3. Modify the variable

Edit the inventory/vars file under the toolkit, and change the value of k8s_interface to the name of the ipv4 NIC (centos defaults to eth0). If not sure, use the ifconfig command to check it.

k8s_interface: "eth0"

Note: If the names of the network card are not the same between the machines, delete the k8s_interface variable from the inventory/vars file and add an IP address to each machine in the inventory/host file. For example:

[all]
node1 ansible_host=192.168.56.11 ip=192.168.56.11 ansible_user=root ansible_ssh_pass=change_it ansible_become=true
...
...

If all machines access the external network as `proxy', please configure the following variables, otherwise do not configure:

http_proxy: http://1.2.3.4:3128
https_proxy: http://1.2.3.4:3128
no_proxy: localhost,127.0.0.0/8
docker_proxy_enable: true

4. Deploy

If deploy on Alibaba Cloud, please read Alibaba Cloud Deployment first in this page.

Execute:

ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml

View the status of the waiting pod for running:

kubectl get po -n kube-system

If the deployment fails and you want to reset the cluster (all data), execute:

ansible-playbook -i inventory/hosts reset.yml

5. Ingress TSL configuration

Reference: [TSL Configuration Notes] (docs/ingress-nginx.md)

6. Dashboard configuration

Reference: [Dashboard configuration instructions] (docs/dashboard.md)

7. Alibaba Cloud Deployment

Modify Hostname(*)

Modify the hostname of the ECS instance on the control panel of ECS. The name should preferably contain only lowercase letters, numbers, and dash. And keep consistent with the name in the ʻinventory/hosts` and the name of ECS console, restart to take effect.

Segment selection (*)

If the ECS server uses a private network, the segments of pod and service cannot overlap with the VPC segment. For example, refer to:


# If the vpc segment is `172.*`
kube_pods_subnet: 192.168.0.0/20
kube_service_addresses: 192.168.255.0/20

# If the vpc segment is `10.*`
kube_pods_subnet: 172.16.0.0/16
kube_service_addresses: 172.19.0.0/20

# If the vpc segment is `192.168.*`
kube_pods_subnet: 172.16.0.0/16
kube_service_addresses: 172.19.0.0/20

Flannel type (*)

When deploying k8s on an ECS using a VPC network, the backend type of the flannel network needs to be ali-vpc. By default, the vxlan type is used in this script. Although the network is able to communicate in the VPC environment, the instability fluctuates. So it is recommended to use the ali-vpc type.

Therefore, set the default flannel network to not be installed by adding variables in the inventory/vars file:

flannel_enable: false

After running the ansible script, manually install the flannel network plugin and create the configuration file kube-flannel-aliyun.yml on one of the master nodes:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "[PodsSubnet]",
      "Backend": {
        "Type": "ali-vpc"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/flannel:v0.9.0
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conf
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/flannel:v0.9.0
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: ACCESS_KEY_ID
          value: [YOUR_ACCESS_KEY_ID]
        - name: ACCESS_KEY_SECRET
          value: [YOUR_ACCESS_KEY_SECRET]
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

Please pay attention to modify the parameter value in the configuration:

Network：The network segment of Pod.
ACCESS_KEY_ID: Required
ACCESS_KEY_SECRET: Required

TheACCESS_KEY user has the following permissions:

Read-only access to cloud server (ECS) permissions
Manage Permissions for a Private Network (VPC)

Then use the kubectl command to deploy. After the deployment is successful, multiple route entries have been added to the routing table of the VPN. The next hop is the pod IP segment of each node.

kubectl apply -f kube-flannel-aliyun.yml

Next, in the ECS security group, add the address of the pod network segment in the inbound rule. Otherwise, the ports of other nodes' pods cannot be accessed in the pod container. For example:

Authorization Policy	Protocol Type	Port Range	Authorization Type	Authorization Object	...
Allow	All	-1/-1	Address Segment Access	192.168.0.0/20	...

Binding Cloud Storage

Under normal circumstances, pv are stored using nfs, but the efficiency of reading and writing is not very high, for pv with high performance requirements for reading and writing, you can configure the cloud disk as a mount volume.

If use aliyun cloud storage, also need to deploy aliyun-controller components.

First, execute the following command on all nodes. Copy the aliyun-flexv binary file into the kubele plugin directory:

FLEXPATH=/usr/libexec/kubernetes/kubelet-plugins/volume/exec/aliyun~flexv; 
sudo mkdir $FLEXPATH -p; 
docker run --rm -v $FLEXPATH:/opt registry.aliyuncs.com/kubeup/kube-aliyun cp /flexv /opt/

Then, modify the aliyun-controller.yml file under roles/addons/kubeup under this project, and fill in the relevant values with the relevant variables. If are not sure, log in to aliyun to view the corresponding management console, or request the address on the server to query curl --retry 5 -sSL http://100.100.100.200/latest/meta-data/{{META_ID}}.

--cluster-cidr: IP section of pod

ALIYUN_ACCESS_KEY: The API Access Key of Alibaba Cloud

ALIYUN_ACCESS_KEY_SECRET: The API Access Key of Alibaba Cloud

ALIYUN_ZONE: Availability id of cloud server ECS

ALIYUN_ROUTER: Private network vpc routing id

ALIYUN_ROUTE_TABLE: Private network vpc routing id

ALIYUN_REGION: Availability id of cloud server ECS

ALIYUN_VPC: Private network vpc routing id

ALIYUN_VSWITCH: The switch id of private network vpc

After filling in the variables, copy the above file to /etc/kubernetes/manifests/ of all master nodes.

The ACCESS_KEY user has the following permissions:

Read-only access to cloud server (ECS) permissions
Manage Permissions for a Private Network (VPC)

Edit the /etc/kubernetes/manifests/kube-controller-manager.yaml file under all master nodes. Add the following two commands and environment variables in the command command:

command:

--allocate-node-cidrs=true
--configure-cloud-routes=false

Environment variables:

env:
- name: ALIYUN_ACCESS_KEY
  value: [YOUR_ALIYUN_ACCESS_KEY]
- name: ALIYUN_ACCESS_KEY_SECRET
  value: [YOUR_ALIYUN_ACCESS_KEY_SECRET]

Restart all the kubelets of the master node:

systemctl restart kubelet

Check if the kube-controller is healthy:

kubectl get po -n kube-system | grep aliyun-controller

Bind the examples of cloud disk , each cloud disk can only be bound once:

# Using pv binding, diskId is the id of the cloud disk
kind: PersistentVolume
apiVersion: v1
metadata:
  name: test-pv-volume
  labels:
    type: flexVolume
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  flexVolume:
    driver: "aliyun/flexv"
    fsType: "ext4"
    options:
      diskId: "d-bp1i23j39i30if"

# Directly bind pod

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    volumeMounts:
    - name: test
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: test
    flexVolume:
      driver: "aliyun/flexv"
      fsType: "ext4"
      options:
        diskId: "d-1ierokwer8234jowe"

Reporting issues

If you find any shortcomings or bugs, please describe them in the issue.

How to contribute

Pull requests are welcome! Follow this link for more information on how to contribute. Pull requests are welcome! Follow this link for more information on how to contribute.

8. Refresh cluster certificate

The prerequisite for refreshing the certificate is to ensure that the CA root certificate exists. After the certificate is refreshed, the master node kubelet is restarted to apply the new certificate. At this time, the cluster may not be operated for 1-2 minutes, but the business application is not affected.

ansible-playbook -i inventory/hosts -e @inventory/vars renew-certs.yml

kubeadm-ansible's People

Contributors

Stargazers

Watchers

kubeadm-ansible's Issues

Node status keeps on NotReady

Using latest kubeadm-ansible scripts to deploy a 3-node k8s cluster for several times, it keeps getting the NotReady status on the other 2 worker nodes.

Node Status

[root@k8s-master ~] ○ kubectl get node
NAME         STATUS     ROLES     AGE       VERSION
k8s-master   Ready      master    1d        v1.8.5
k8s-node01   NotReady   <none>    1d        v1.8.5
k8s-node02   NotReady   <none>    1d        v1.8.5

k8s-node01 details:

[root@k8s-master ~] ○ kubectl describe node k8s-node01
Name:               k8s-node01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=k8s-node01
Annotations:        flannel.alpha.coreos.com/backend-data={"VtepMAC":"0a:12:15:65:69:3e"}
                    flannel.alpha.coreos.com/backend-type=vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager=true
                    flannel.alpha.coreos.com/public-ip=192.168.123.156
                    node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Mon, 24 Sep 2018 00:46:05 +0800
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                     Message
  ----             ------    -----------------                 ------------------                ------                     -------
  OutOfDisk        False     Mon, 24 Sep 2018 00:47:12 +0800   Mon, 24 Sep 2018 00:46:03 +0800   KubeletHasSufficientDisk   kubelet has sufficient disk space available
  MemoryPressure   Unknown   Mon, 24 Sep 2018 00:47:12 +0800   Mon, 24 Sep 2018 00:47:58 +0800   NodeStatusUnknown          Kubelet stopped posting node status.
  DiskPressure     Unknown   Mon, 24 Sep 2018 00:47:12 +0800   Mon, 24 Sep 2018 00:47:58 +0800   NodeStatusUnknown          Kubelet stopped posting node status.
  Ready            Unknown   Mon, 24 Sep 2018 00:47:12 +0800   Mon, 24 Sep 2018 00:47:58 +0800   NodeStatusUnknown          Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.123.156
  Hostname:    k8s-node01
Capacity:
 cpu:     4
 memory:  7912008Ki
 pods:    110
Allocatable:
 cpu:     3900m
 memory:  6339144Ki
 pods:    110
System Info:
 Machine ID:                 61a1ab8e69e54e74859435e52c8fa778
 System UUID:                499EB1E3-BB93-498F-BE2C-AFAF3B44EF72
 Boot ID:                    def4deec-e4c3-40b4-aad1-cb8f6bd81f87
 Kernel Version:             3.10.0-693.5.2.el7.x86_64
 OS Image:                   CentOS Linux 7 (Core)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://Unknown
 Kubelet Version:            v1.8.5
 Kube-Proxy Version:         v1.8.5
PodCIDR:                     10.233.65.0/24
ExternalID:                  k8s-node01
Non-terminated Pods:         (3 in total)
  Namespace                  Name                      CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                      ------------  ----------  ---------------  -------------
  kube-system                kube-flannel-zqvkw        150m (3%)     300m (7%)   64M (0%)         500M (7%)
  kube-system                kube-proxy-899nh          0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                nginx-proxy-k8s-node01    25m (0%)      300m (7%)   32M (0%)         512M (7%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  175m (4%)     600m (15%)  96M (1%)         1012M (15%)
Events:         <none>

kubelet status on k8s-node01

[root@k8s-node01 ~] ○ systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf, 20-kubelet-override.conf
   Active: active (running) since 一 2018-09-24 00:46:05 CST; 1 day 8h ago
     Docs: http://kubernetes.io/docs/
 Main PID: 61743 (kubelet)
    Tasks: 19
   Memory: 42.1M
   CGroup: /system.slice/kubelet.service
           └─61743 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.233.0.10 --cluster-domain=cluster.local --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt --cadvisor-port=4194 --cgroup-driver=systemd --rotate-certificates=true --cert-dir=/var/lib/kubelet/pki --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/choerodon-tools/pause-amd64:3.0 --fail-swap-on=false --hostname-override=k8s-node01 --eviction-hard=memory.available<512Mi,nodefs.available<10Gi,imagefs.available<10Gi --eviction-minimum-reclaim=memory.available=500Mi,nodefs.available=5Gi,imagefs.available=5Gi --eviction-pressure-transition-period=5m0s --system-reserved=cpu=100m,memory=1Gi

9月 25 09:38:17 k8s-node01 kubelet[61743]: E0925 09:38:17.328812   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-node01&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:17 k8s-node01 kubelet[61743]: W0925 09:38:17.912621   61743 eviction_manager.go:332] eviction manager: attempting to reclaim nodefs
9月 25 09:38:17 k8s-node01 kubelet[61743]: I0925 09:38:17.912688   61743 eviction_manager.go:346] eviction manager: must evict pod(s) to reclaim nodefs
9月 25 09:38:17 k8s-node01 kubelet[61743]: E0925 09:38:17.912704   61743 eviction_manager.go:357] eviction manager: eviction thresholds have been met, but no pods are active to evict
9月 25 09:38:18 k8s-node01 kubelet[61743]: E0925 09:38:18.328171   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: Get https://localhost:6443/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:18 k8s-node01 kubelet[61743]: E0925 09:38:18.329065   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-node01&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:18 k8s-node01 kubelet[61743]: E0925 09:38:18.330004   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-node01&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:19 k8s-node01 kubelet[61743]: E0925 09:38:19.329183   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: Get https://localhost:6443/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:19 k8s-node01 kubelet[61743]: E0925 09:38:19.329978   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-node01&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
9月 25 09:38:19 k8s-node01 kubelet[61743]: E0925 09:38:19.331145   61743 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-node01&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused

Why ansible-playbook still ask me for SUDO password even setting ansible_user=root

I have setup the inventory/hosts with ansible_ssh_user=root and ansible_ssh_pass ， but still get prompted to enter a SUDO password, why?

[all] gzhap ansible_host=192.168.123.149 ansible_user=root ansible_ssh_pass=*****

可否考虑添加离线部署offline

无法挂载共享文件夹 /vagrant => D:/WorkSpaceVMs/kubeadm-ansible

Vagrant was unable to mount VirtualBox shared folders. This is usually
because the filesystem "vboxsf" is not available. This filesystem is
made available via the VirtualBox Guest Additions and kernel module.
Please verify that these guest additions are properly installed in the
guest. This is not a bug in Vagrant and is usually caused by a faulty
Vagrant box. For context, the command attempted was:

mount -t vboxsf -o uid=1000,gid=1000 vagrant /vagrant

The error output from the command was:

/sbin/mount.vboxsf: mounting failed with the error: No such device

Docker 服务设置为开机自启

可否在安装 Docker 的时候将 Docker 服务设置为开机自启动呢？比如遇到服务器宕机或者重启等等情况下，Docker是无法自动启动的。

Ansible task:

---
- name: Enable docker service
  service: docker
    enabled: yes

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.