cloudnative-pg / cloudnative-pg Goto Github PK

CloudNativePG is a comprehensive platform designed to seamlessly manage PostgreSQL databases within Kubernetes environments, covering the entire operational lifecycle from initial deployment to ongoing maintenance

Home Page: https://cloudnative-pg.io

License: Apache License 2.0

Shell 1.58% Dockerfile 0.02% Makefile 0.37% Go 97.97% jq 0.05%

postgres postgresql kubernetes k8s database sql operator database-management high-availability self-healing

cloudnative-pg's Issues

Replication slots for HA

Cloud Native Postgres currently relies only on the archive log to synchronize those standby servers that have fallen out of sync. However, for some high workload scenarios, this can be inefficient. We need to introduce a smart way for the operator to manage replication slots on the primary and designated primary instances, so that the operator itself regularly moves forward the WAL location based on the most delayed standby. The operator must also remove the replication slot from the former primary and add it to the new one following a failover or switchover.

Cluster conditions don't change when WAL archiving is fixed

We have been able to reproduce the issue in our test environment and confirm this is a bug.
Basically, the procedure will be as follows:

Create two object stores with different endpoint names.

Create a CNPG cluster with a backup section that points to the first object store specifying the endpoint.

Delete the CNPG cluster.

Re-create the CNPG cluster with the same backup section that points to the first object store.

The archive command will then fail, setting the cluster status as follows (check it using the kubectl describe cluster command):
 Conditions: Message: unexpected failure invoking barman-cloud-wal-archive: exit status 1 Reason: Continuous Archiving is Failing Status: False Type: ContinuousArchiving

Change the endpoint in the YAML file of the CNPG cluster to point to the second object store.

Create some traffic in order to start archiving WAL files, and verify that reading the primary's log.

Look at the Cluster status again to verify the Conditions section did not change.

Enhance fencing mechanism

Currently, fencing one single instance in a cluster disables any failover/switchover operation in the Postgres cluster.

We should remove this limitation so that, for example, when one of the two replica instances is fenced, failover might still be triggered on the other standby.

Additionally, in case after an investigation a fenced instance is deleted, the cluster should resume by recreating the instance.

In case of fencing of a primary, once the fence is lifted, the primary should start up again (without issuing a failover like we currently do).

Online data import and major upgrades for Postgres

This feature is also known as “Logical replica clusters” and is the evolution of “Offline data import and major upgrades for Postgres”. It only supports migrating from an existing Postgres 10+ instance. It might benefit from prior development of “Logical replication” support.

Manage errors in recovery/initdb bootstrap methods

If recovery or initdb jobs do not successfully complete, the cluster CR is stuck in Setting up Primary state - and only human intervention can unblock it.

This is currently just a placeholder. This issue requires more analysis and coverage, and might spawn a few more specific issues (for example one on recovery and one on initdb).

Modify our existing implementation of cluster status conditions with kubernetes native meta conditions api

Currently we have implemented our own struct for cluster.status.conditions

If we want to add some more conditions for cluster status than we need to implement our own methods and some more fields in ClusterCondition struct so instead of modifying existing api we can use kubernetes native meta conditions api it has already implemented such kind of functionality.

Parse operator configmap and secrets name from deployment

we have fixed names for configmaps and secrets for cnpg report operator, the configmap name and secrets and would be changed from deployment instand of default, going to parse the configmap and secret name from operator deployment.

latest-postgres-version-check workflow must respect DCO

The PRs created by the workflow latest-postgres-version-check.yml doesn't contain any sign-off so the DCO test it's not passed, this must be fixed using the sign-off from the bot

Report switchover and failover time in the log

Problem

At the moment it is not trivial to know how long it took for a switchover to be completed (meaning the primary being promoted, not the former primary to re-attach). Also, by calculating the difference between timestamps of target and current primary, you only know the last switchover.

Solution

Please report in the log the following:

switchover: the new primary starts up - measure the difference between target and current timestamp and print that in the log
switchover: the former primary is reattached - measure the difference between the start of the standby and the target timestamp, then print it
failover: the new primary starts up - measure the difference between the current timestamp and the most relevant timestamp we have to detect the moment of failure.

Additional problem

How can we push the above duration metrics somehow to Prometheus so that they can be clearly shown in Grafana?

Validation

Check the log contains those timestamps via automated tests

Avoid compiling cnpg plugin on every continuous delivery run

Currently, we build the plugin every time we start the pipeline and this will take up to 17m just because we build the cnpg plugin to. To save time, we should only build the manager and not the kubectl-cnpg plugin every time.

Maintenance windows

An initial implementation of maintenance windows is to introduce their specification on a weekly basis. In the future we might provide more control, including specification of an actual day.

Maintenance windows are enabled only for the “unsupervised” primary update strategy, with “restart” method option. When a maintenance window is defined in a cluster, an update of the image is “postponed” to the first available maintenance window, which will trigger a rollout update.

A maintenance window has a minimal length of "w" hours (initially 6), in which activities cannot start any later than "w - s" hours, where "s" is expressed in hours (by default 3). So, for example, with w=6h and s=3, activities can only start in the first 3 hours of the maintenance window.

Maintenance windows can be specified as an array:

maintenanceWindows:
  - dow: 0, 6
    start_at: 02:00
    end_at: 08:00

Or:

maintenanceWindows:
  - dow: 0
    start_at: 02:00
    end_at: 08:00
  - dow: 6
    start_at: 03:00
    end_at: 09:30

Zombie syslogger processes when the postmaster is restarted

While reviewing the fencing implementation, we discovered that when we restart the postmaster, we are leaving the syslogger processes (the postmaster background worker) as zombie.

bash-4.4$ ps -efwww
UID          PID    PPID  C STIME TTY          TIME CMD
postgres       1       0  0 14:24 ?        00:00:01 /controller/manager instance run --log-level=info
postgres      24       1  0 14:24 ?        00:00:00 [postgres] <defunct>
postgres     351       1  0 14:26 ?        00:00:00 postgres -D /var/lib/postgresql/data/pgdata
postgres     352     351  0 14:26 ?        00:00:00 postgres: cluster-example: logger 
postgres     354     351  0 14:26 ?        00:00:00 postgres: cluster-example: checkpointer 
postgres     355     351  0 14:26 ?        00:00:00 postgres: cluster-example: background writer 
postgres     356     351  0 14:26 ?        00:00:00 postgres: cluster-example: walwriter 
postgres     357     351  0 14:26 ?        00:00:00 postgres: cluster-example: autovacuum launcher 
postgres     358     351  0 14:26 ?        00:00:00 postgres: cluster-example: archiver 
postgres     359     351  0 14:26 ?        00:00:00 postgres: cluster-example: stats collector 
postgres     360     351  0 14:26 ?        00:00:00 postgres: cluster-example: logical replication launcher 
postgres     415     351  0 14:26 ?        00:00:00 postgres: cluster-example: walsender streaming_replica 10.244.0.217(53488) streaming 0/50039B8
postgres     488       0  0 14:26 pts/0    00:00:00 bash
postgres     496     351  0 14:26 ?        00:00:00 postgres: cluster-example: walsender streaming_replica 10.244.0.218(55630) streaming 0/50039B8
postgres     535     488  0 14:27 pts/0    00:00:00 ps -efwww
bash-4.4$ exit

cloud-native-postgresql on  dev/cnp-2090 [$+?] via 🐹 v1.18 took 39s 
❯ k exec -ti cluster-example-1 -- bash
bash-4.4$ ps -efwww
UID          PID    PPID  C STIME TTY          TIME CMD
postgres       1       0  6 14:24 ?        00:00:12 /controller/manager instance run --log-level=info
postgres      24       1  0 14:24 ?        00:00:00 [postgres] <defunct>
postgres     352       1  0 14:26 ?        00:00:00 [postgres] <defunct>
postgres     556       0  0 14:27 pts/1    00:00:00 bash
postgres     566     556  0 14:27 pts/1    00:00:00 ps -efwww
bash-4.4$ exit

This happens only for the syslogger processes, which postmaster does not wait for when it stops, and happens when we restart a PostgreSQL instance for a configuration change, so it’s not related just to fencing.

This issue was originally submitted in the EnterpriseDB ticketing system by @leonardoce.

Rename label "ok-to-merge" in "ok to merge"

Catalog of PostgreSQL images

From a usability standpoint, it would be great if had to only specify the major version of PostgreSQL in an image (e.g. quay.io/enterprisedb/postgresql:14) and let the operator figure out which image to pull down and when to update the operands.

Every time a new set of images is built in the postgres-containers project, update a JSON file that contains the metadata of the container images, including the major version (14, 13, …), the full version (e.g. 14.2), the tag and the digest.

This is still a stub of idea. We need further analysis.

Cleanup old testing images from container registry

The testing images are generated for every pull request and commit, and they never expire.

We need to implement something like

https://github.com/snok/container-retention-policy

And let the old containers in -testing repositories disappear in one week.

Add `backupOwnerReference` field to `ScheduledBackup` CRD

Currently, Backup objects do not have any owner reference. There's an historical reason for this, and it dates back when recovery was only possible through Backup objects and the reason for not having an owner reference was to prevent removing Backup objects when a cluster was deleted - defacto preventing (or complicating life to) users from recovering.

Since the introduction of recovery through external clusters, this requirement is no longer valid. The proposal is to add a field called backupOwnerReference to the ScheduledBackup CRD that accepts 3 values:

none: no owner reference for created backup objects (current behaviour)
self: set the Scheduled backup object as owner of the backup
cluster: set the cluster as owner of the backup

For new installations, I'd like to set the default to "self" - but given that this introduces a change to the existing implementation, we need to explain it well in release notes and docs. Otherwise, the conservative approach is to set it to "none" - but then this will create problems to new installations (I'd rather go with self now).

Please use the analysis phase to rethink the above proposal. It might be enough to just limit to ScheduledBackup ownership. Feel free to change names.

Integrate the contributing documentation with how to test a patch

One of the goal of CloudNativePG is to enable developers to contribute to the project. As maintainers we must provide clear instructions on how to test a patch on the developer's laptop and enable them to become contributors.

We should integrate this page:

https://github.com/cloudnative-pg/cloudnative-pg/blob/main/contribute/README.md

With content from:

https://github.com/cloudnative-pg/cloudnative-pg/blob/main/hack/e2e/README.md

Prevent recovery object store and backup object store from being the same location

When boostrapping a new cluster from a recovery object store, make that, if specified, the backup object store is not identical. This should be intercepted at the earliest phase in the configuration phase using a webhook.

Restart update method

Introduce the “primaryUpdateMethod” option and add “restart” to the existing switchover method for rolling updates.
Updating a cluster now requires a rollout update, where standby servers are first upgraded. Currently, when it comes to the primary, we are now issuing a switchover, and the only available knob is the “primaryUpdateStrategy” accepting “unsupervised” (immediate switchover - default) and “supervised” (for manual switchover through the cnp plugin for kubectl) for possible values.

While this implementation is OK for most cases, there are some cases where this approach can be problematic, in particular in high workload instances where standby servers are lagging with WAL replaying.In these cases, the switchover might find even the most updated standby to be lagging by an arbitrary number of seconds from the primary. Paradoxically, this method increases the risk of data loss in the cluster due to the premature promotion of a standby, as well as the downtime due to the time required to promote the standby. In such cases, a restart of the primary might be more convenient than a switchover. The proposal is to extend the current rollout strategy (update standby servers first, if available) and introduce another option called primaryUpdateMethod, accepting two values: “switchover” (current behavior) and “restart” (instead of issuing a switchover, it restarts the pod of the primary). In the extremely remote case the restart should fail, the failover procedure will be triggered.

Helm chart link broken

Hey folks, amazing work! The helm chart link here https://cloudnative-pg.io/docs/1.15.0/installation_upgrade/#using-the-helm-chart, which is https://github.com/cloudnative-pg/cloudnative-pg-helm, leads to a 404!

Also, it would be great to have that info in the main readme, IMHO.

Tablespaces, including temporary

Add declarative support for tablespace, including temporary ones. This improves vertical scalability in conjunction with declarative partitioning supported by Postgres, as well as for indexes.

Add E2E tests for restore + backup safety

We are adding logic to the operator to:

avoid having a cluster back up into the same location it restored from
avoid having a cluster back up into a location already used by another

See issues #61 and #62
Our existing E2E suite covers backup and recovery separately, but not "restored cluster has a backup spec".

Scenarios:

We have cluster A. We create a cluster B to restore from A, but the YAML for B specifies the same barman object as A. --> FAIL
We have cluster A. Create cluster B which recovers from A and uses a new location for Backup. Create cluster C which recovers from B and uses A's bucket as a Backup location --> FAIL
We have cluster A. Create cluster B which recovers from A and back-up's into X. Get B to come up properly and perhaps force some WAL writing. Create cluster C from A, and back up into X too. --> FAIL
We have cluster A. Create cluster B which backs up to new location X --> AOK

Fix chatops receiver

Currently the workflow slash-command-receiver it's failing since when user runs /test in the PR the workflow fails trying to send option limit to the continuous-delivery workflow and this option it's not available in that workflow

Logical replication

Declarative way to publish changes to one or more tables using logical replication (publications), as well as to subscribe to any Postgres 10+ publication (subscriptions) and consume those events.

Publish kubectl-cnpg plugin on Krew

Currently, users need to manually download our plugin using the command line that we have in the README of the kubectl-cnpg project, but this doesn’t work well on Windows and makes it hard for users to find our plugin.

The idea is to use Krew as the official plugin distribution of Kubernetes, which seems to be the best idea.

A couple of tasks need to be done:

Write the Krew plugin manifest following this documentation, one good example is here

Test the install process locally

Manually submit the plugin to the krew-index following this documentation

Create the automated process recommended by the Kew community

After doing these tasks, there’s an interesting link that you can follow to get some stats of the plugin: stats.krew.dev

This issue was originally submitted in the EnterpriseDB ticketing system by @sxd.

Add DCO action

Our contributing guidelines require contributors to sign their work and provide Developer Certificate of Origin (DCO). Please add a Github action that enforces this with any pull request (example: https://github.com/marketplace/actions/actions-dco)

E2E: investigate failure of configuration update test

The configuration update test is failing often in the cloudnative-pg repository.
See https://github.com/cloudnative-pg/cloudnative-pg/runs/6338341491?check_suite_focus=true

Support for PostgreSQL 15

Add a first stub of FAQ in the documentation

Add a new section in the documentation with Frequently Asked Questions.

Introduce Snyk in CloudNativePG pipeline

Evaluate the introduction of Snyk in the pipeline to shorten the feedback for vulnerabilities and third party dependencies.

Expose conditions to provide a simple and reliable way to wait for a CNPO Cluster to be ready

Currently (re)deployment of a CNPO cluster seems to be a bit tricky to wait for completion of in a kubectl-friendly way.
For pods, you might

1. kubectl wait --for=status=Ready -n the-cnp-cluster-namespace pods

but this is very race-prone. It tends to report premature success if the pods aren't created yet, or if the old pods haven't started termination. Or it can hang indefinitely if some of the old pods still exist but are terminating, because kubectl wait will wait forever for a pod that went away. And it might return premature success if only some of the pods are created yet, but others haven't started creation.
CNPO deletes and creates pods after a new or updated Cluster resource is applied to the k8s config. So the new configured Cluster resource is visible but the pods haven't been terminated or launched yet. That's fine and normal for an operator, but poses issues if you want to wait for readiness.
AFAICS there is no Condition exposed by the Cluster CRD, so something like this will not work:

1. kubectl wait -n app-db --for=condition=Ready clusters/app-db

and kubectl doesn't know how to wait for arbitrary fields like State=Cluster in healthy state. It doesn’t support waiting for Phase which is considered deprecated, nor for generic wait conditions (kubectl wait on arbitrary jsonpath · Issue #83094 · kubernetes/kubernetes ) .
kubectl's --field-selector doesn't seem to help either as it's a pretty limited feature that doesn't seem to work well if at all with CRDs.
I'd like to be able to wait for a Condition that becomes true when readyInstances == pvcCount i.e. phase: Cluster in healthy state
In the mean time all I can see to do is to poll until Cluster in healthy state, as shown in the Status or phase fields. But is that i18n-safe? Is that a stable key I can rely on not changing?

Wrong ClusterName field referenced

In the backup controller we are referring to a ClusterName field that will not exist anymore in k8s.io/api 0.24.0 and is not what we really wanted to do:

https://github.com/cloudnative-pg/cloudnative-pg/blob/main/controllers/backup_controller.go#L276

PKI setup should only run on the leader instance of the controller

We've noticed that PKI initialization runs unconditionally at the start of an instance, so it will run multiple times if the controller has multiple instances. To be correct, it should run only one time after the leader election.

We could move the code inside a dedicated LeaderElectionRunnable to handle the CA initialization and the refresh of the certificates. We also need to watch the certificate secrets to react to their changes in the non-leader instances.

Need a way to specify the app password when bootstrap a cluster from recovery or pg_basebackup

When boostrapping a new cluster from a recovery or pg_basebackup, we need a way to specify the app password secret together with Owner and Database in the BootstrapRecovery configuration and BootstrapPgBaseBackup configuration

Cleanup the docs folder by removing unnecessary images and content

Error with permissions for WAL archiving takes long to recover

From a Support incident: reported errored WAL archiving after adding backup info to an existing cluster.

Repro steps:

create a cluster with no backup section, wait for it to become healthy
add a backup section to the cluster
wait, or write some info to the db to trigger WAL writing
With cnpg plugin verify the WAL archiving is not working, see complaints in pod logs about WAL archiving errors

Root cause:

We amend the cluster to include backup, this triggers creation of Secrets for backup credentials
The role takes a bit to create with the permissions for the backup credentials
meanwhile the instance manager tries to cache the backup info but role permissions fail — and the error is ignored
the wal-archive process doesn't find the credentials in the cache and keeps trying and failing
the error will recover only after the next reconciliation is triggered, which could take more than 30 minutes - 45 in my testing.
An alleviation is to make a trivial modification to the cluster, so a new reconciliation loop is triggered immediately.

fix(plugin): avoid directory time != zip time in cnpg report

Executing kubectl cnpg report, the timestamp in the directory's name inside the zip file may be different from the one in the default file name. It would be better to have them match.

CodeQL shouldn't run when no Go file was modified

The current behavior is to run CodeQL in almost every change, this generated a lot of CodeQL runs that will not make sense, like when modifying the docs/ directory or .yml, etc.

We should run CodeQL action ONLY when a .go file is modified since CodeQL only review Go files

Enable configuration of `archive_timeout`

Currently archive_timeout is a mandatory item that is fixed to 5 minutes.

See: https://github.com/cloudnative-pg/cloudnative-pg/blob/main/pkg/postgres/configuration.go#L400

This was an initial conservative approach that we took, but now we have a better WAL archiving facility and we should unleash this setting. We should leave the default value to 5 minutes, but enable full configuration of the value to the user.

Support updates of inherited labels and annotations on PVCs

With the current implementation, labels and annotations are inherited by PVCs only
during the initial creations.

Add `pg_stat_wal` to exported metrics

PostgreSQL 14 introduced support for the pg_stat_wal view (see https://www.postgresql.org/docs/14/monitoring-stats.html#MONITORING-PG-STAT-WAL-VIEW).

Add this metric to the default Prometheus exporter.

Make sure the backup object store is empty when bootstrapping a new cluster

When bootstrapping a new cluster from a recovery object store, and the target cluster has a backup object store configured, invoke barman-cloud-check-wal-archive to make sure that the destination backup object store is empty.

This prevents users from erroneously overwrite an existing object store. The same concept can be probably extended to any bootstrap phase.

Creation of a new cluster with a separate volume for WAL files (`pg_wal`)

For vertical scalability, introduce a separate and dedicated volume for storing the WAL archive (pg_wal) to parallelize I/O operations between transaction logs and database pages.

This also improves business continuity as we can limit the amount of space used by the archive and dedicate the PGDATA volume for Postgres pages.

In this initial implementation, we only enable this feature at cluster creation time and disable changing this setting on existing clusters.

The second implementation (which will be in a separate ticket) should implement adding the separate volume to an existing cluster that's without it. We should also enable resizing, as well as removing the separate volume from an existing cluster.

Offline data import and major upgrades for Postgres

We need to provide a declarative way to import existing Postgres databases in CloudNativePG, starting from offline migrations - which also cover major upgrades of Postgres databases.

IMPORTANT: Offline means that with this method applications must stop their write operations to the source until the database is migrated to the new cluster. Offline migrations are in contrast to online migrations with 0 cutover time - which are implemented using native logical replication (see “Logical replica clusters” for online migrations).

Offline migrations extend the “initdb” bootstrap method of CloudNativePG to enable users to create a new PostgreSQL cluster using a snapshot of the data available in another PostgreSQL database - which can be accessed via the network through a superuser connection.

Import is from any supported version of Postgres.

The logical import relies on pg_dump and pg_restore executed from the primary of the new cluster, for all the databases that are part of the operation and, if requested, for the roles.

There are two types to import a database:

microservice: select the single database to import in the new cluster as the main application database
monolith: select one or more databases to import (including “*” wildcard), as well as roles (superuser privileges are removed)

The microservice approach is the preferred method to import databases in CloudNativePG, while the monolith approach is more aligned with a shared PostgreSQL instance having multiple databases that DBAs are familiar with.

Add invitation link to Slack channel

Add the following invitation link to join the Slack channel:

https://join.slack.com/t/cloudnativepg/shared_invite/zt-17culux7k-P_UsVOOR9teUYi4dGhDSBQ

Add an easy way to test the operator

Currently, there's no easy way to test the operator in a local environment without reading the documentation, the idea of this issue is to:

Add to Makefile file a way to run the operator locally
- Including creating a kind cluster to run the operator with only one node
Add a way to test the latest build on main in the ghcr.io registry
- Just deploy the latest operator build in the current kubernetes

All this needs to be documented in the README file with a section "how to test the operator" or "quick start"

Change Philippe's company

Philippe has left EDB last week. We need to change the README.

Add CodeQL

Enable CodeQL by default

The plugin command: `cnp report operator` fails if the operator was deployed with more than one replica

If we deploy the operator with more than one replica and run the command

kubectl cnp report operator

We receive the following error:

Error: could not get operator pod: number of running operator pods greater than 1: 2 pods running

This is caused by the following code block:

cloudnative-pg/internal/cmd/plugin/report/deployments/deployments.go

Lines 60 to 62 in 27404c9

 case len(deploymentList.Items) > 1: 

 err := fmt.Errorf("number of operator deployments != 1") 

 return appsv1.Deployment{}, err

We should make the function GetOperatorPod more robust by allowing it to handle multiple replica operator deployments

The backup should run in a subprocess to survive online upgrades

Problem

If a backup is running when an online upgrade is performed, the backup is interrupted without marking it as failed. This means it will remain in a running state until some other event will terminate the Postgres container, changing the container id in the instance pod. At that point, the backup will be marked as failed and retried.

A similar problem affects offline upgrades, but being the pod restarted, the failure is detected immediately.

This behavior impacts mainly on big database instances, where a backup can take a long time.

Proposed solution

We want to change the backup command to work with this schema

As usual, calling manager backup triggers a new backup via the HTTP local API

Instead of running barman-cloud-backup directly, the manager runs manager backup --foreground

The actual backup runs inside the new process

Make sure that logs flow through the right channel (JSON pipe)

Alternate solution

We could delay the upgrades until the backup is finished

	case len(deploymentList.Items) > 1:
	err := fmt.Errorf("number of operator deployments != 1")
	return appsv1.Deployment{}, err

cloudnative-pg / cloudnative-pg Goto Github PK

cloudnative-pg's Issues

Problem

Solution

Additional problem

Validation

Problem

Proposed solution

Alternate solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs