Problem
If a backup is running when an online upgrade is performed, the backup is interrupted without marking it as failed. This means it will remain in a running state until some other event will terminate the Postgres container, changing the container id in the instance pod. At that point, the backup will be marked as failed and retried.
A similar problem affects offline upgrades, but being the pod restarted, the failure is detected immediately.
This behavior impacts mainly on big database instances, where a backup can take a long time.
Proposed solution
We want to change the backup
command to work with this schema
As usual, calling manager backup
triggers a new backup via the HTTP local API
Instead of running barman-cloud-backup
directly, the manager runs manager backup --foreground
The actual backup runs inside the new process
Make sure that logs flow through the right channel (JSON pipe)
Alternate solution
We could delay the upgrades until the backup is finished