Comments (10)
@feikesteenbergen thanks for that tidbit... So today I was able to get more information (yesterday I was more focused on getting workaround out to unblock our production).
I was remiss in not pasting/checking tiller logs when helm was stuck. Today, as I did the helm upgrade described in the ticket, I checked to see what tiller was waiting on, and sure enough, it complains that the "backup" pods are not in a "ready" state (which they aren't because the backup job terminates the pods upon completion):
$ kubectl logs tiller-deploy-xxx -n yyy
...
[kube] 2020/04/14 16:36:49 Pod is not ready: <namespace>/timescaledb-xyz-full-weekly-1586657520-bgspt
[tiller] 2020/04/14 16:36:49 warning: Upgrade "timescaledb-xyz" failed: timed out waiting for the condition
Deleting each of the pods it complained about (full-weekly and incremental-daily), helped resolve the issue. Before deleting the pods, the jobs and their pods look like they ran fine (but obviously in a Not Ready State because it "Completed"):
Info on Backup Jobs and Pods
$ kubectl get cronjob,job -l release=timescaledb-live -n xyz
...
NAME COMPLETIONS DURATION AGE
job.batch/timescaledb-xyz-full-weekly-1586657520 1/1 3s 2d17h
job.batch/timescaledb-xyz-incremental-daily-1586571120 1/1 5s 3d17h
job.batch/timescaledb-xyz-incremental-daily-1586743920 1/1 3s 41h
job.batch/timescaledb-xyz-incremental-daily-1586830320 1/1 5s 17h
$ kubectl get pods...
...
timescaledb-xyz-full-weekly-1586657520-bgspt 0/1 Completed 0 2d16h
<similar for a few incremental-daily>
How Helm Deployed Successfully
So basically, while helm was upgrading today, and before it timed out, I deleted the weekly and daily backup pods. Basically kubectl delete po timescaledb-xyz-full-weekly-1586657520-bgspt
along with the incremental-daily pods, and finally helm got the release in a "Deployed" state (rather than "Failed")
Questions:
- Any reason the helm upgrade has tiller complaining about those completed backup pods? It doesn't seem to complain in our test cluster (with only difference being the default namespace).
- I'll wait until the next incremental backup job runs to see if I can indeed reproduce this by doing an upgrade again. In meantime, any ideas?
from helm-charts.
Not sure if the fact this error occurs on clusters where it's deployed on a non-default namespace is related.
We deploy them in different namespaces pretty much always and don't run into this, so that's not the issue.
We'll try to investigate; using a Job
inside a Helm Upgrade path might not be the best thing ever; as it can fail, I don't think failure of the Job is covered very well in these Helm Charts.
The whole purpose of the Job is to do exactly what you described: up the connections from say 100 to 120, without having to restart the pods.
from helm-charts.
OK, ☝️ reproducible in my environment... I disabled and then re-enabled backup:
- The backup job was started and completed:
$ kubectl get cronjob,job -l release=timescaledb-xyz -n xyz
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/timescaledb-xyz-full-weekly 12 02 * * 0 False 0 <none> 7h
cronjob.batch/timescaledb-xyz-incremental-daily 12 02 * * 1-6 False 0 57m 7h
NAME COMPLETIONS DURATION AGE
job.batch/timescaledb-xyz-incremental-daily-1587003120 1/1 5s 57m
## And the Pod
$ kubectl get pods -n xyz
...
timescaledb-xyz-incremental-daily-1587003120-x2rwg 0/1 Completed 0 57m
...
- Did a
helm upgrade ...
by changing a parameter... It was stuck in "PENDING_UPGRADE" so we check the tiller log:
$ kubectl logs tiller-deploy-6dbd4749b6-rxqx5 -n kube-system
[kube] 2020/04/16 03:00:39 beginning wait for 15 resources with timeout of 5m0s
[kube] 2020/04/16 03:00:41 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg
[kube] 2020/04/16 03:00:43 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg
[kube] 2020/04/16 03:00:45 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg
- It will stay like this until i delete the pod
timescaledb-xyz-incremental-daily-1587003120-x2rwg
, at which point it upgrades successfully.
Any clues how we can upgrade parameters without it "waiting" on the already terminated backup job pods?
from helm-charts.
I've updated the title to reflect root reason for the "PENDING_UPGRADE". @feikesteenbergen or anyone else in the timescale community: thoughts on how we can get around that rather than manually deleting pods before the helm upgrade? Or if we're missing something in the above?
from helm-charts.
I'll have a look; I think it may be an issue with labelling: The Backup jobs getting labels that are interpreted by Helm as saying these jobs belong to Helm (they don't).
from helm-charts.
I cannot reproduce this somehow, I used helm 2 and helm 3.
Could you give more information:
- what is your helm version
- what is your charts version
- what parameter(s) do you change during the upgrade
Regardless of this, I'm going to be removing some labels from the jobs and the pods that are scheduled, because I think they are not useful and may actually be the cause here.
from helm-charts.
This is (probably) fixed with This should be fixed with https://github.com/timescale/timescaledb-kubernetes/releases/tag/v0.6.1.
However I could not reproduce it locally, so it would be graet if you could verify whether or not it is fixde? @schahal
from helm-charts.
We'll give it a shot in the coming weeks! We have to schedule an activity to do the 0.5 -> 0.6 upgrade because clusters we can currently reproduce this on are being used. I'll update this ticket when complete.
from helm-charts.
Feel free to reopen if the issue persists, as it should have been fixed.
from helm-charts.
Facing exactly the same issue on a big chart I'm building. Very hard to reproduce as well :-(
Workaround is to delete the Completed
pod that helm is waiting for.
from helm-charts.
Related Issues (20)
- `job-update-patroni` fails if http://{{ template "clusterName" . }}-config:8008/config is not ready, but it exits with success.
- Operation not permitted on `/var/lib/postgresql/pgdata` while using with Amazon EFS HOT 2
- Error in `restore_or_initdb.sh` while restoring a backup
- logs a full of failed to start a background worker HOT 9
- Add support for additional annotations in the statefulset object to support cert-manager/reloader integration
- Add pgbouncer_exporter to Statefulset HOT 1
- Implement option to specify an existing persistent volume in the PVC template HOT 2
- timescaledb ha : Password does not match for user "standby" HOT 3
- Cluster node restarted unexpectedly HOT 2
- Error: INSTALLATION FAILED: repo charts not found HOT 1
- pgbackrest info missing stanza path when BOOTSTRAP_FROM_BACKUP=1 HOT 1
- postgresql core dump crashes timescale database
- pgbackrest_restore.sh exits despite backup being enabled
- Unrecognized option '--comand=/etc/timescaledb/scripts/restore_or_initdb.sh' HOT 2
- timescaledb-single: How to restore from specific backup HOT 1
- timescaledb-single without persistent storage HOT 1
- Helm Chart support HOT 2
- [Bug]: Error - 128 kB is outside the valid range for parameter "maintenance_work_mem HOT 1
- Add CA to allow custom Certificates HOT 3
- Chart requires all images to be installed form one repo, which does not allow for custom images
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from helm-charts.