Deion We had an existing timescale release. We decided to up

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This is (probably) fixed with This should be fixed with <a href="https://github.com/ti

Upgrading helm release causes stuck at "PENDING_UPGRADE" due to backup job pods (which are completed, so not in READY state) about helm-charts HOT 10 CLOSED

timescale commented on June 5, 2024

Upgrading helm release causes stuck at "PENDING_UPGRADE" due to backup job pods (which are completed, so not in READY state)

from helm-charts.

Comments (10)

schahal commented on June 5, 2024 1

@feikesteenbergen thanks for that tidbit... So today I was able to get more information (yesterday I was more focused on getting workaround out to unblock our production).

I was remiss in not pasting/checking tiller logs when helm was stuck. Today, as I did the helm upgrade described in the ticket, I checked to see what tiller was waiting on, and sure enough, it complains that the "backup" pods are not in a "ready" state (which they aren't because the backup job terminates the pods upon completion):

$  kubectl logs tiller-deploy-xxx -n yyy
...
[kube] 2020/04/14 16:36:49 Pod is not ready: <namespace>/timescaledb-xyz-full-weekly-1586657520-bgspt
[tiller] 2020/04/14 16:36:49 warning: Upgrade "timescaledb-xyz" failed: timed out waiting for the condition

Deleting each of the pods it complained about (full-weekly and incremental-daily), helped resolve the issue. Before deleting the pods, the jobs and their pods look like they ran fine (but obviously in a Not Ready State because it "Completed"):

Info on Backup Jobs and Pods

$ kubectl get cronjob,job -l release=timescaledb-live -n xyz
...
NAME                                                      COMPLETIONS   DURATION   AGE
job.batch/timescaledb-xyz-full-weekly-1586657520         1/1           3s         2d17h
job.batch/timescaledb-xyz-incremental-daily-1586571120   1/1           5s         3d17h
job.batch/timescaledb-xyz-incremental-daily-1586743920   1/1           3s         41h
job.batch/timescaledb-xyz-incremental-daily-1586830320   1/1           5s         17h

$ kubectl get pods...
...
timescaledb-xyz-full-weekly-1586657520-bgspt              0/1     Completed   0          2d16h
<similar for a few incremental-daily>

How Helm Deployed Successfully

So basically, while helm was upgrading today, and before it timed out, I deleted the weekly and daily backup pods. Basically kubectl delete po timescaledb-xyz-full-weekly-1586657520-bgspt along with the incremental-daily pods, and finally helm got the release in a "Deployed" state (rather than "Failed")

Questions:

Any reason the helm upgrade has tiller complaining about those completed backup pods? It doesn't seem to complain in our test cluster (with only difference being the default namespace).
I'll wait until the next incremental backup job runs to see if I can indeed reproduce this by doing an upgrade again. In meantime, any ideas?

from helm-charts.

feikesteenbergen commented on June 5, 2024

Not sure if the fact this error occurs on clusters where it's deployed on a non-default namespace is related.

We deploy them in different namespaces pretty much always and don't run into this, so that's not the issue.

We'll try to investigate; using a Job inside a Helm Upgrade path might not be the best thing ever; as it can fail, I don't think failure of the Job is covered very well in these Helm Charts.

The whole purpose of the Job is to do exactly what you described: up the connections from say 100 to 120, without having to restart the pods.

from helm-charts.

schahal commented on June 5, 2024

OK, ☝️ reproducible in my environment... I disabled and then re-enabled backup:

The backup job was started and completed:

$ kubectl get cronjob,job -l release=timescaledb-xyz -n xyz
NAME                                               SCHEDULE        SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/timescaledb-xyz-full-weekly         12 02 * * 0     False     0        <none>          7h
cronjob.batch/timescaledb-xyz-incremental-daily   12 02 * * 1-6   False     0        57m             7h

NAME                                                      COMPLETIONS   DURATION   AGE
job.batch/timescaledb-xyz-incremental-daily-1587003120   1/1           5s         57m

## And the Pod
$ kubectl get pods -n xyz
...
timescaledb-xyz-incremental-daily-1587003120-x2rwg             0/1     Completed   0          57m
...

Did a helm upgrade ... by changing a parameter... It was stuck in "PENDING_UPGRADE" so we check the tiller log:

$ kubectl logs tiller-deploy-6dbd4749b6-rxqx5 -n kube-system
[kube] 2020/04/16 03:00:39 beginning wait for 15 resources with timeout of 5m0s
[kube] 2020/04/16 03:00:41 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg
[kube] 2020/04/16 03:00:43 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg
[kube] 2020/04/16 03:00:45 Pod is not ready: xyz/timescaledb-xyz-incremental-daily-1587003120-x2rwg

It will stay like this until i delete the pod timescaledb-xyz-incremental-daily-1587003120-x2rwg, at which point it upgrades successfully.

Any clues how we can upgrade parameters without it "waiting" on the already terminated backup job pods?

from helm-charts.

schahal commented on June 5, 2024

I've updated the title to reflect root reason for the "PENDING_UPGRADE". @feikesteenbergen or anyone else in the timescale community: thoughts on how we can get around that rather than manually deleting pods before the helm upgrade? Or if we're missing something in the above?

from helm-charts.

feikesteenbergen commented on June 5, 2024

I'll have a look; I think it may be an issue with labelling: The Backup jobs getting labels that are interpreted by Helm as saying these jobs belong to Helm (they don't).

from helm-charts.

feikesteenbergen commented on June 5, 2024

I cannot reproduce this somehow, I used helm 2 and helm 3.

Could you give more information:

what is your helm version
what is your charts version
what parameter(s) do you change during the upgrade

Regardless of this, I'm going to be removing some labels from the jobs and the pods that are scheduled, because I think they are not useful and may actually be the cause here.

from helm-charts.

feikesteenbergen commented on June 5, 2024

This is (probably) fixed with This should be fixed with https://github.com/timescale/timescaledb-kubernetes/releases/tag/v0.6.1.

However I could not reproduce it locally, so it would be graet if you could verify whether or not it is fixde? @schahal

from helm-charts.

schahal commented on June 5, 2024

We'll give it a shot in the coming weeks! We have to schedule an activity to do the 0.5 -> 0.6 upgrade because clusters we can currently reproduce this on are being used. I'll update this ticket when complete.

from helm-charts.

feikesteenbergen commented on June 5, 2024

Feel free to reopen if the issue persists, as it should have been fixed.

from helm-charts.

CVirus commented on June 5, 2024

Facing exactly the same issue on a big chart I'm building. Very hard to reproduce as well :-(
Workaround is to delete the Completed pod that helm is waiting for.

from helm-charts.

Upgrading helm release causes stuck at "PENDING_UPGRADE" due to backup job pods (which are completed, so not in READY state) about helm-charts HOT 10 CLOSED

Comments (10)

Info on Backup Jobs and Pods

How Helm Deployed Successfully

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs