Comments (8)
Maybe I got it, it’s set to 10 from the backup, then at some point replaying WALs, it’s set to 40, but we still try to force it back to 10 because pg_controldata says so. I’ll try to reproduce it, thanks!
(Which is exactly what you said above, but now I got it too 😂)
from cloudnative-pg.
@phisco @gabriele-wolfox @mnencia
I see that this logic was contributed by you in this commit
Unfortunately I couldn't find any explanation why it's in place. Why do we force-override the max_connections and other settings?
from cloudnative-pg.
After more investigation I found the culprit: there was a max_connection change from 10 to 40 after the base backup was written, so that change is only present in the WAL.
When CNPG initializes the config, it uses pg_controldata to figure the max_connections setting. According to the docs "pg_controldata prints information initialized during initdb", so that's why it returned 10 and then CNPG uses that value to override the setting from the postgresql.parameters stanza.
When barman-cloud-restore gets to the WAL entry of increasing max_connections, it pauses with the error message shown above.
The fix to this problem is to not overwrite the max_connections setting with the value from pg_controldata. If we used the user-specified max_connections setting, things would work fine as the user can pick a high enough max_connections.
My solution for restoring the data was to manually start a postgresql server on a clean machine, do the barman-cloud-restore (had to override restore_command and archive_command in postgresql.auto.conf) and make a pg_dump of the database. Then in CNPG I created a clean db using initdb and pg_restored the data into it using kubectl exec.
from cloudnative-pg.
@gsimko can you be more explicit with the steps that brought you to this situation?
E.g.:
- created first cluster with max_connection yy
- backed up first cluster to bucket xxx
- set max_connections to zz on first cluster
- created second cluster restoring from bucket xxx
…
from cloudnative-pg.
AFAICT the following steps led to this situation:
- created cluster with max_connection=10
- created base backup in bucket X
- changed max_connection to 40
- wrote WAL logs to bucket X
- created second cluster restoring from bucket X
- initialization of the cluster primary pod failed
The reason for the failure is that the new cluster is initialized with max_connection=10, but when the restoration process gets to the WAL log with max_connection=40 it stops due to not supporting such an increase.
Hope that helps!
from cloudnative-pg.
But the error message you shared is saying: max_connections = 10 is a lower setting than on the primary server, where its value was 40
. So, it looks like it’s trying to go back to 10 for some reason
The second cluster was created with max_connections set to 10 or 40?
from cloudnative-pg.
On the second cluster I set it at 40.
The reason why it uses 10 - and that's the actual bug - is because CNPG internally overrides the user setting by reading the max_connection setting from pg_controldata, which returns the value at the time when the table was created.
from cloudnative-pg.
I can reproduce it, the problem is during the recovery, we are enforce to use the max_connections from backup.
so let's say,
- if cluster has max_connection=100 and then create a base backup A
- later increase the max_connections to 200 but without create another base backup.
then when we do the full recovery, we are using the max_connection=100 in backup A start the server in standby mode and recovery wals, when reach to the wal which increase the max_connection, the postgres in the recovery job will pause and recovery job will hangs.
If we create a base backup right after increase the max_connection, full restore will success.
from cloudnative-pg.
Related Issues (20)
- Introduce release candidate process
- Backport failure for pull request 4765
- [E2E]: Add pgaudit tests HOT 1
- [Bug]: xmin on standby cluster replication slots causing heavy io on primary due to permanent autovacuum HOT 6
- [Feature]: Use Ciclops thermometer when sending E2E alerts
- [Bug]: Pooler login failing after migrating to Postgres 16 HOT 5
- [Bug]: PostgreSQL switch over endless loop with WAL errors like servers diverged at WAL location
- [Docs]: Release notes for 1.23.2, 1.22.4, and 1.21.6 (EOL)
- Backport failure for pull request 4811 HOT 1
- [Bug]: pgbouncer missing metrics error with podMonitor enabled
- [Feature]: add .spec.timeZone to ScheduledBackup
- [CI] Avoid using GITHUB_TOKEN when opening a PR or updating a label
- Backport failure for pull request 4775 HOT 3
- [Docs]: Cluster-Wide Permissions and Operator Deployment
- [Bug]: Replica cluster monitoring fails if source cluster database is non-default and replica clusters bootstrap stanza does not specify custom database name
- [Bug]: OperatorHub.io release must only happen on the latest release branch
- [Bug]: plugin command status fails getting PDB
- [Feature]: add support for Barman 3.10.1
- Reassests the operator permissions looking to reduce as much as possible
- [Feature]: Make pod liveness/readiness probes timeout configurable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloudnative-pg.