cbws / etcd-operator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from coreos/etcd-operator

37.0 2.0 18.0 3.76 MB

etcd operator creates/configures/manages etcd clusters atop Kubernetes

Home Page: https://blog.cloudbear.nl/reviving-etcd-operator/

License: Apache License 2.0

Go 93.49% Shell 5.24% Dockerfile 1.27%

etcd kubernetes go k8s-operator

etcd-operator's Issues

Setup E2E tests using GitHub Actions

Trying to get this up in #19

etcd backup restore using persistent volume as local storage

This is the same issue as coreos#2107 in the original repo. At this point, since I am not sure if PVs work well with the operator, I was hoping to at least have this feature of backing up and restoring from local storage.

BTW Appreciate the effort to get this operator back on track.

tmpfs sizeLimit option for etcd-data volume

Along with the tmpfs enabling for ETCD volume, it would be good to add the sizeLimit option to limit the memory usage by the pod.

Operator crashes when pod policy not defined

While testing the operator using the e2e test suite I noticed the operator is currently crashing when the pod policy hasn't been defined for a cluster. This was caused by the merge of coreos#2103.

etcd-operator not working on k8s 1.21

The operator stopped working on k8s 1.21 because the init container could not run successfully

Created an MR for the fix here: #379

Is there a way to inject environment variables into the etcd containers?

I'm trying to run this on arm, and while I'm able to get the operator running, the pods always fail with

$ kubectl -n etcd logs -l app=etcd
etcd on unsupported platform without ETCD_UNSUPPORTED_ARCH=arm64 set

(side note; setting the image/tag isn't working, but if you set the "version" to be the quay.io tag you want, it works)

And although I've tried a few different things, i cannot for the life of me figure out how to inject the environment variable I need (other than to perhaps create a new image based on that image and just run that locally, which I'd REALLY rather not do.

Instructions to deploy cbws / etcd-operator

First of all, thanks a lot for maintaining etcd-operator 🎉

I already see a lot of improvements compared to original repo and want to switch using this one. However, I couldn't find instruction for installation. Existing instructions seems to still installing coreos/etcd-operator.

I am especially interested in deploying via helm chart and noticed that existing chart accepts operator image as parameter, however, how can we ensure that deployment templates are still compatible?

Add replacement nodes as learner nodes

etcd-io/etcd#9161
etcd-io/etcd#10537
etcd-io/etcd#10913
https://kubernetes.io/blog/2019/08/30/announcing-etcd-3-4/

Released docker image requires auth

First off, thanks for taking the initiative on this. We rely on etcd and wanted to help improve the operator but were put off by lack of maintenance on the upstream repo.

I just tried switching to the fork in our CI environment but it fails on:

$ docker pull docker.pkg.github.com/cbws/etcd-operator/operator:v0.10.0
Error response from daemon: Get https://docker.pkg.github.com/v2/cbws/etcd-operator/operator/manifests/v0.10.0: no basic auth credentials

Apparently this is a wont-fix: https://github.community/t5/GitHub-Actions/docker-pull-from-public-GitHub-Package-Registry-fail-with-quot/m-p/32888#M1294

Do you have any to to push to a public repo (Docker Hub would do), or would you accept a patch?

Thanks

Add priorityClassName support

We would like to add a priorityClassName to etcd pods created by the operator, but it looks unsupported. We could contribute it but, before doing it, I'm wondering what's the sentiment. Is it a feature you may be willing to accept?

etcd backup to s3 crashing the operator

Hello I'm currently testing the etcd operator for our production enviroment and running into a few issues I'm not able to resolve.
I'm currently testing backup and restore, but the backup_cr I create crashes the etcd-backup-operator.

I'm trying to backup the cluster to a s3 compatible storage called minio.

I tried to use the new ForcePathStyle introduced in release 0.9.4, but I'm not sure if I'm using it properly.
I also had to modify the etcd-operator since our kubernetes version 1.17 extensions/v1beta1.

I compiled a gist of all the settings I'm using.

https://gist.github.com/haroonb/e807d4cdb65270c0051c2645e3e7f7d4

Etcd backup operator seem to miss schedule if operator pod/container is restarted

Environment:

K8s is running within Azure.
We have set up a 3 node etcd cluster and set 3 backups (hourly, daily, weekly) with backup directly to Azure blob storage.

What is observed:
Looking at the backup history in the Azure there are gaps in the backup cycle. These gaps are mostly visible with longer backup cycles.

When looked at etcd-backup-operator pod logs there are multiple restart events within timeframe of the missing backups. If I correctly understood restarts were happening due to etcd leader election or something like that.

To validate my suspicions I have set the following script to kill the backup operator pod and later only the container and set it via Cron to happen every 10 minutes. I have set the backup every 20 minutes. As a result backup was never done since 04:39 UTC time, when I started to experiment. Well after 6 restarts pod got into Error state. I will try to continue with less aggressive restart cron schedule to see if that has impact.

Expected result:

Backup is happening according to the schedule regardless of container restarts. Schedule timer should not be linked to container lifetime as container may die any time. Or is it a feature due to the way Kubernetes works?

Script:

#!/bin/bash

cd /root
date +"%Y %m %d - %H:%M" 2>&1 >> kill-operator.log
/usr/local/bin/kubectl -n tep-k8s-test-01 exec -c etcd-backup-operator  $(/usr/local/bin/kubectl -n tep-k8s-test-01 get po -l  name=etcd-backup-operator -o name) -- /bin/kill -5 1  2>&1  >>  kill-operator.log
echo "----" 2>&1 >>  kill-operator.log

Edited backup schedule:

root@atl-cj1-m-ducx:~# kubectl  -n tep-k8s-test-01 describe  EtcdBackup etcd-cluster-backup-weekly
Name:         etcd-cluster-backup-weekly
Namespace:    tep-k8s-test-01
Labels:       <none>
Annotations:  <none>
API Version:  etcd.database.coreos.com/v1beta2
Kind:         EtcdBackup
Metadata:
  Creation Timestamp:  2020-01-15T07:54:50Z
  Finalizers:
    backup-operator-periodic
  Generation:        145
  Resource Version:  81580419
  Self Link:         /apis/etcd.database.coreos.com/v1beta2/namespaces/tep-k8s-test-01/etcdbackups/etcd-cluster-backup-weekly
  UID:               7dd4c2a7-e1e0-4fe1-ae04-100be7ff6d65
Spec:
  Abs:
    Abs Secret:  storage-account-credentials-weekly
    Path:        tep-k8s-test-01/etcd.backup
  Backup Policy:
    Backup Interval In Second:  1200
  Etcd Endpoints:
    http://etcd-cluster-client:2379
  Storage Type:  ABS
Status:
  Etcd Revision:      1098811
  Etcd Version:       3.4.3
  Last Success Date:  2020-01-27T04:39:09Z
  Succeeded:          true
Events:               <none>
root@atl-cj1-m-ducx:~# date
Mon Jan 27 09:05:37 UTC 2020

i am re-posting my colleagues issue in original repo: etc-operator #2152

Urgent: Restore is broken due to missing image `tutum/curl`

The restore operation relies on a now non-existent image,tutum/curl. Note that the image reference is hardcoded. This issue is likely to exacerbate an outage since one must fork, fix, and publish the operator at the worst possible time.

Regarding the fix, a reasonable alternative would be bitnami/bitnami-shell:10 (ref).

etcd restore from db backup of another cluster

Hello I'm trying to figure out how I could restore a backup from an existing etcd cluster into this operator.

We have a running etcd cluster from bitnami running and a db backup created with
kubectl exec -it etcd-0 -- etcdctl snapshot restore /tmp/db --name etcd-0 --initial-cluster etcd-0=http://etcd-0.etcd-headless.default.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd-headless.default.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd-headless.default.svc.cluster.local:2380 --initial-cluster-token etcd-cluster-k8s --initial-advertise-peer-urls http://etcd-0.etcd-headless.default.svc.cluster.local:2380

I can't seem to get the new cluster to accept the old configs though. Can someone suggest a solution?

Reimplement automatic disaster recover

The automatic disaster recovery was removed in coreos#1629 to help splitting the backup/restore operators from the main operator. It was never restored however, we should re-add automatic disaster recovery.

Detecting of failed clusters is required for this to be implemented. Detection of loss of quorum and all pods being dead is covered in #29

cbws / etcd-operator Goto Github PK

etcd-operator's Issues

Setup E2E tests using GitHub Actions

etcd backup restore using persistent volume as local storage

tmpfs sizeLimit option for etcd-data volume

Operator crashes when pod policy not defined

etcd-operator not working on k8s 1.21

Is there a way to inject environment variables into the etcd containers?

Instructions to deploy cbws / etcd-operator

Add replacement nodes as learner nodes

Released docker image requires auth

Add priorityClassName support

etcd backup to s3 crashing the operator

Etcd backup operator seem to miss schedule if operator pod/container is restarted

Urgent: Restore is broken due to missing image `tutum/curl`

etcd restore from db backup of another cluster

Reimplement automatic disaster recover

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs