GithubHelp home page GithubHelp logo

Comments (8)

phisco avatar phisco commented on June 14, 2024 1

Maybe I got it, it’s set to 10 from the backup, then at some point replaying WALs, it’s set to 40, but we still try to force it back to 10 because pg_controldata says so. I’ll try to reproduce it, thanks!

(Which is exactly what you said above, but now I got it too 😂)

from cloudnative-pg.

gsimko avatar gsimko commented on June 14, 2024

@phisco @gabriele-wolfox @mnencia
I see that this logic was contributed by you in this commit
Unfortunately I couldn't find any explanation why it's in place. Why do we force-override the max_connections and other settings?

from cloudnative-pg.

gsimko avatar gsimko commented on June 14, 2024

After more investigation I found the culprit: there was a max_connection change from 10 to 40 after the base backup was written, so that change is only present in the WAL.

When CNPG initializes the config, it uses pg_controldata to figure the max_connections setting. According to the docs "pg_controldata prints information initialized during initdb", so that's why it returned 10 and then CNPG uses that value to override the setting from the postgresql.parameters stanza.

When barman-cloud-restore gets to the WAL entry of increasing max_connections, it pauses with the error message shown above.

The fix to this problem is to not overwrite the max_connections setting with the value from pg_controldata. If we used the user-specified max_connections setting, things would work fine as the user can pick a high enough max_connections.

My solution for restoring the data was to manually start a postgresql server on a clean machine, do the barman-cloud-restore (had to override restore_command and archive_command in postgresql.auto.conf) and make a pg_dump of the database. Then in CNPG I created a clean db using initdb and pg_restored the data into it using kubectl exec.

from cloudnative-pg.

phisco avatar phisco commented on June 14, 2024

@gsimko can you be more explicit with the steps that brought you to this situation?
E.g.:

  • created first cluster with max_connection yy
  • backed up first cluster to bucket xxx
  • set max_connections to zz on first cluster
  • created second cluster restoring from bucket xxx

from cloudnative-pg.

gsimko avatar gsimko commented on June 14, 2024

AFAICT the following steps led to this situation:

  • created cluster with max_connection=10
  • created base backup in bucket X
  • changed max_connection to 40
  • wrote WAL logs to bucket X
  • created second cluster restoring from bucket X
  • initialization of the cluster primary pod failed

The reason for the failure is that the new cluster is initialized with max_connection=10, but when the restoration process gets to the WAL log with max_connection=40 it stops due to not supporting such an increase.

Hope that helps!

from cloudnative-pg.

phisco avatar phisco commented on June 14, 2024

But the error message you shared is saying: max_connections = 10 is a lower setting than on the primary server, where its value was 40. So, it looks like it’s trying to go back to 10 for some reason

The second cluster was created with max_connections set to 10 or 40?

from cloudnative-pg.

gsimko avatar gsimko commented on June 14, 2024

On the second cluster I set it at 40.

The reason why it uses 10 - and that's the actual bug - is because CNPG internally overrides the user setting by reading the max_connection setting from pg_controldata, which returns the value at the time when the table was created.

from cloudnative-pg.

litaocdl avatar litaocdl commented on June 14, 2024

I can reproduce it, the problem is during the recovery, we are enforce to use the max_connections from backup.
so let's say,

  1. if cluster has max_connection=100 and then create a base backup A
  2. later increase the max_connections to 200 but without create another base backup.

then when we do the full recovery, we are using the max_connection=100 in backup A start the server in standby mode and recovery wals, when reach to the wal which increase the max_connection, the postgres in the recovery job will pause and recovery job will hangs.
If we create a base backup right after increase the max_connection, full restore will success.

from cloudnative-pg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.