GithubHelp home page GithubHelp logo

cbws / etcd-operator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from coreos/etcd-operator

37.0 2.0 18.0 3.76 MB

etcd operator creates/configures/manages etcd clusters atop Kubernetes

Home Page: https://blog.cloudbear.nl/reviving-etcd-operator/

License: Apache License 2.0

Go 93.49% Shell 5.24% Dockerfile 1.27%
etcd kubernetes go k8s-operator

etcd-operator's Introduction

etcd operator

unit/integration: Builg Status

Project status: beta

Major planned features have been completed, and while no breaking API changes are currently planned, we reserve the right to address bugs and API changes in a backwards incompatible way before the project is declared stable. See upgrade guide for a safe upgrade process.

Currently user facing etcd cluster objects are created as Kubernetes Custom Resources, however, taking advantage of User Aggregated API Servers to improve reliability, validation and versioning is planned. The use of Aggregated API should be minimally disruptive to existing users but may change what Kubernetes objects are created or how users deploy the etcd operator.

We expect to consider the etcd operator stable soon; backwards incompatible changes will not be made once the project reaches stability.

Overview

The etcd operator manages etcd clusters deployed to Kubernetes and automates tasks related to operating an etcd cluster.

There are more spec examples on setting up clusters with different configurations

Read Best Practices for more information on how to better use etcd operator.

Read RBAC docs for how to setup RBAC rules for etcd operator if RBAC is in place.

Read Developer Guide for setting up a development environment if you want to contribute.

See the Resources and Labels doc for an overview of the resources created by the etcd-operator.

Requirements

  • Kubernetes 1.8+
  • etcd 3.2.13+

Demo

Getting started

etcd Operator demo

Deploy etcd operator

See instructions on how to install/uninstall etcd operator .

Create and destroy an etcd cluster

$ kubectl create -f example/example-etcd-cluster.yaml

A 3 member etcd cluster will be created.

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
example-etcd-cluster-gxkmr9ql7z   1/1       Running   0          1m
example-etcd-cluster-m6g62x6mwc   1/1       Running   0          1m
example-etcd-cluster-rqk62l46kw   1/1       Running   0          1m

See client service for how to access etcd clusters created by the operator.

If you are working with minikube locally, create a nodePort service and test that etcd is responding:

$ kubectl create -f example/example-etcd-cluster-nodeport-service.json
$ export ETCDCTL_API=3
$ export ETCDCTL_ENDPOINTS=$(minikube service example-etcd-cluster-client-service --url)
$ etcdctl put foo bar

Destroy the etcd cluster:

$ kubectl delete -f example/example-etcd-cluster.yaml

Resize an etcd cluster

Create an etcd cluster:

$ kubectl apply -f example/example-etcd-cluster.yaml

In example/example-etcd-cluster.yaml the initial cluster size is 3. Modify the file and change size from 3 to 5.

$ cat example/example-etcd-cluster.yaml
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
spec:
  size: 5
  version: "v3.2.13"

Apply the size change to the cluster CR:

$ kubectl apply -f example/example-etcd-cluster.yaml

The etcd cluster will scale to 5 members (5 pods):

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
example-etcd-cluster-cl2gpqsmsw   1/1       Running   0          5m
example-etcd-cluster-cx2t6v8w78   1/1       Running   0          5m
example-etcd-cluster-gxkmr9ql7z   1/1       Running   0          7m
example-etcd-cluster-m6g62x6mwc   1/1       Running   0          7m
example-etcd-cluster-rqk62l46kw   1/1       Running   0          7m

Similarly we can decrease the size of the cluster from 5 back to 3 by changing the size field again and reapplying the change.

$ cat example/example-etcd-cluster.yaml
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
spec:
  size: 3
  version: "v3.2.13"
$ kubectl apply -f example/example-etcd-cluster.yaml

We should see that etcd cluster will eventually reduce to 3 pods:

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
example-etcd-cluster-cl2gpqsmsw   1/1       Running   0          6m
example-etcd-cluster-gxkmr9ql7z   1/1       Running   0          8m
example-etcd-cluster-rqk62l46kw   1/1       Running   0          9mp

Failover

If the minority of etcd members crash, the etcd operator will automatically recover the failure. Let's walk through this in the following steps.

Create an etcd cluster:

$ kubectl create -f example/example-etcd-cluster.yaml

Wait until all three members are up. Simulate a member failure by deleting a pod:

$ kubectl delete pod example-etcd-cluster-cl2gpqsmsw --now

The etcd operator will recover the failure by creating a new pod example-etcd-cluster-n4h66wtjrg:

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
example-etcd-cluster-gxkmr9ql7z   1/1       Running   0          10m
example-etcd-cluster-n4h66wtjrg   1/1       Running   0          26s
example-etcd-cluster-rqk62l46kw   1/1       Running   0          10m

Destroy etcd cluster:

$ kubectl delete -f example/example-etcd-cluster.yaml

etcd operator recovery

Let's walk through operator recovery in the following steps.

Create an etcd cluster:

$ kubectl create -f example/example-etcd-cluster.yaml

Wait until all three members are up. Then stop the etcd operator and delete one of the etcd pods:

$ kubectl delete -f example/deployment.yaml
deployment "etcd-operator" deleted

$ kubectl delete pod example-etcd-cluster-8gttjl679c --now
pod "example-etcd-cluster-8gttjl679c" deleted

Next restart the etcd operator. It should recover itself and the etcd clusters it manages.

$ kubectl create -f example/deployment.yaml
deployment "etcd-operator" created

$ kubectl get pods
NAME                              READY     STATUS    RESTARTS   AGE
example-etcd-cluster-m8gk76l4ns   1/1       Running   0          3m
example-etcd-cluster-q6mff85hml   1/1       Running   0          3m
example-etcd-cluster-xnfvm7lg66   1/1       Running   0          11s

Upgrade an etcd cluster

Create and have the following yaml file ready:

$ cat upgrade-example.yaml
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
spec:
  size: 3
  version: "3.1.10"
  repository: "quay.io/coreos/etcd"

Create an etcd cluster with the version specified (3.1.10) in the yaml file:

$ kubectl apply -f upgrade-example.yaml
$ kubectl get pods
NAME                              READY     STATUS    RESTARTS   AGE
example-etcd-cluster-795649v9kq   1/1       Running   1          3m
example-etcd-cluster-jtp447ggnq   1/1       Running   1          4m
example-etcd-cluster-psw7sf2hhr   1/1       Running   1          4m

The container image version should be 3.1.10:

$ kubectl get pod example-etcd-cluster-795649v9kq -o yaml | grep "image:" | uniq
    image: quay.io/coreos/etcd:v3.1.10

Now modify the file upgrade-example and change the version from 3.1.10 to 3.2.13:

$ cat upgrade-example
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
spec:
  size: 3
  version: "v3.2.13"

Apply the version change to the cluster CR:

$ kubectl apply -f upgrade-example

Wait ~30 seconds. The container image version should be updated to v3.2.13:

$ kubectl get pod example-etcd-cluster-795649v9kq -o yaml | grep "image:" | uniq
    image: gcr.io/etcd-development/etcd:v3.2.13

Check the other two pods and you should see the same result.

Backup and Restore an etcd cluster

Note: The provided etcd backup/restore operators are example implementations.

Follow the etcd backup operator walkthrough to backup an etcd cluster.

Follow the etcd restore operator walkthrough to restore an etcd cluster on Kubernetes from backup.

Manage etcd clusters in all namespaces

See instructions on clusterwide feature.

etcd-operator's People

Contributors

alaypatel07 avatar avorima avatar bai avatar colhom avatar dependabot-preview[bot] avatar dependabot[bot] avatar fanminshi avatar feiskyer avatar flyer103 avatar hasbro17 avatar hexfusion avatar hongchaodeng avatar jamiehannaford avatar kapouille avatar lander2k2 avatar marlinc avatar meyskens avatar narayanan avatar rafatio avatar rjtsdl avatar seeekr avatar semantic-release-bot avatar seth-miller avatar sgotti avatar soleblaze avatar tvainutis avatar vdice avatar wouter0100 avatar xiang90 avatar zbwright avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

etcd-operator's Issues

Add priorityClassName support

We would like to add a priorityClassName to etcd pods created by the operator, but it looks unsupported. We could contribute it but, before doing it, I'm wondering what's the sentiment. Is it a feature you may be willing to accept?

etcd restore from db backup of another cluster

Hello I'm trying to figure out how I could restore a backup from an existing etcd cluster into this operator.

We have a running etcd cluster from bitnami running and a db backup created with
kubectl exec -it etcd-0 -- etcdctl snapshot restore /tmp/db --name etcd-0 --initial-cluster etcd-0=http://etcd-0.etcd-headless.default.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd-headless.default.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd-headless.default.svc.cluster.local:2380 --initial-cluster-token etcd-cluster-k8s --initial-advertise-peer-urls http://etcd-0.etcd-headless.default.svc.cluster.local:2380

I can't seem to get the new cluster to accept the old configs though. Can someone suggest a solution?

etcd backup to s3 crashing the operator

Hello I'm currently testing the etcd operator for our production enviroment and running into a few issues I'm not able to resolve.
I'm currently testing backup and restore, but the backup_cr I create crashes the etcd-backup-operator.

I'm trying to backup the cluster to a s3 compatible storage called minio.

I tried to use the new ForcePathStyle introduced in release 0.9.4, but I'm not sure if I'm using it properly.
I also had to modify the etcd-operator since our kubernetes version 1.17 extensions/v1beta1.

I compiled a gist of all the settings I'm using.

https://gist.github.com/haroonb/e807d4cdb65270c0051c2645e3e7f7d4

Released docker image requires auth

First off, thanks for taking the initiative on this. We rely on etcd and wanted to help improve the operator but were put off by lack of maintenance on the upstream repo.

I just tried switching to the fork in our CI environment but it fails on:

$ docker pull docker.pkg.github.com/cbws/etcd-operator/operator:v0.10.0
Error response from daemon: Get https://docker.pkg.github.com/v2/cbws/etcd-operator/operator/manifests/v0.10.0: no basic auth credentials

Apparently this is a wont-fix: https://github.community/t5/GitHub-Actions/docker-pull-from-public-GitHub-Package-Registry-fail-with-quot/m-p/32888#M1294

Do you have any to to push to a public repo (Docker Hub would do), or would you accept a patch?

Thanks

Instructions to deploy cbws / etcd-operator

First of all, thanks a lot for maintaining etcd-operator ๐ŸŽ‰

I already see a lot of improvements compared to original repo and want to switch using this one. However, I couldn't find instruction for installation. Existing instructions seems to still installing coreos/etcd-operator.

I am especially interested in deploying via helm chart and noticed that existing chart accepts operator image as parameter, however, how can we ensure that deployment templates are still compatible?

Is there a way to inject environment variables into the etcd containers?

I'm trying to run this on arm, and while I'm able to get the operator running, the pods always fail with

$ kubectl -n etcd logs -l app=etcd
etcd on unsupported platform without ETCD_UNSUPPORTED_ARCH=arm64 set

(side note; setting the image/tag isn't working, but if you set the "version" to be the quay.io tag you want, it works)

And although I've tried a few different things, i cannot for the life of me figure out how to inject the environment variable I need (other than to perhaps create a new image based on that image and just run that locally, which I'd REALLY rather not do.

Urgent: Restore is broken due to missing image `tutum/curl`

The restore operation relies on a now non-existent image,tutum/curl. Note that the image reference is hardcoded. This issue is likely to exacerbate an outage since one must fork, fix, and publish the operator at the worst possible time.

Regarding the fix, a reasonable alternative would be bitnami/bitnami-shell:10 (ref).

Etcd backup operator seem to miss schedule if operator pod/container is restarted

Environment:

K8s is running within Azure.
We have set up a 3 node etcd cluster and set 3 backups (hourly, daily, weekly) with backup directly to Azure blob storage.

What is observed:
Looking at the backup history in the Azure there are gaps in the backup cycle. These gaps are mostly visible with longer backup cycles.

When looked at etcd-backup-operator pod logs there are multiple restart events within timeframe of the missing backups. If I correctly understood restarts were happening due to etcd leader election or something like that.

To validate my suspicions I have set the following script to kill the backup operator pod and later only the container and set it via Cron to happen every 10 minutes. I have set the backup every 20 minutes. As a result backup was never done since 04:39 UTC time, when I started to experiment. Well after 6 restarts pod got into Error state. I will try to continue with less aggressive restart cron schedule to see if that has impact.

Expected result:

Backup is happening according to the schedule regardless of container restarts. Schedule timer should not be linked to container lifetime as container may die any time. Or is it a feature due to the way Kubernetes works?

Script:

#!/bin/bash

cd /root
date +"%Y %m %d - %H:%M" 2>&1 >> kill-operator.log
/usr/local/bin/kubectl -n tep-k8s-test-01 exec -c etcd-backup-operator  $(/usr/local/bin/kubectl -n tep-k8s-test-01 get po -l  name=etcd-backup-operator -o name) -- /bin/kill -5 1  2>&1  >>  kill-operator.log
echo "----" 2>&1 >>  kill-operator.log

Edited backup schedule:

root@atl-cj1-m-ducx:~# kubectl  -n tep-k8s-test-01 describe  EtcdBackup etcd-cluster-backup-weekly
Name:         etcd-cluster-backup-weekly
Namespace:    tep-k8s-test-01
Labels:       <none>
Annotations:  <none>
API Version:  etcd.database.coreos.com/v1beta2
Kind:         EtcdBackup
Metadata:
  Creation Timestamp:  2020-01-15T07:54:50Z
  Finalizers:
    backup-operator-periodic
  Generation:        145
  Resource Version:  81580419
  Self Link:         /apis/etcd.database.coreos.com/v1beta2/namespaces/tep-k8s-test-01/etcdbackups/etcd-cluster-backup-weekly
  UID:               7dd4c2a7-e1e0-4fe1-ae04-100be7ff6d65
Spec:
  Abs:
    Abs Secret:  storage-account-credentials-weekly
    Path:        tep-k8s-test-01/etcd.backup
  Backup Policy:
    Backup Interval In Second:  1200
  Etcd Endpoints:
    http://etcd-cluster-client:2379
  Storage Type:  ABS
Status:
  Etcd Revision:      1098811
  Etcd Version:       3.4.3
  Last Success Date:  2020-01-27T04:39:09Z
  Succeeded:          true
Events:               <none>
root@atl-cj1-m-ducx:~# date
Mon Jan 27 09:05:37 UTC 2020

i am re-posting my colleagues issue in original repo: etc-operator #2152

Reimplement automatic disaster recover

The automatic disaster recovery was removed in coreos#1629 to help splitting the backup/restore operators from the main operator. It was never restored however, we should re-add automatic disaster recovery.

Detecting of failed clusters is required for this to be implemented. Detection of loss of quorum and all pods being dead is covered in #29

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.