Comments (5)
If in an architecture with 3 datanodes, if one of them is lost, the execution of switchover or promotion is considered incorrect, then it would be good to have a mechanism for blocking such actions, issue an error to the user and not cause an attempt to change roles in order to avoid the situations described above.
from pg_auto_failover.
The documentation regarding regarding fault tolerance, failover behaviour and settings that you can tweak is pretty extensive and available here:
https://pg-auto-failover.readthedocs.io/en/main/fault-tolerance.html
https://pg-auto-failover.readthedocs.io/en/main/failover-state-machine.html
https://pg-auto-failover.readthedocs.io/en/main/ref/configuration.html
You probably shouldn't be switching over a healthy node manually at the same time when you're loosing replicas.
If you're removing replicas from the cluster or it's expected that they'll be out for a long while, you should probably either drop them from the cluster on the monitor side or enable maintenance for them.
It is clear that if we lose another datanode, then primary will not commit transactions, since (in our case) one synchronous replica is required:
If you drop all non-functional nodes from the cluster, except the working primary it will get into "single" state and will be available for writing.
This is an open source project, if you require extensive support, it may be a good idea to look for some company or individual that offers such paid support option. Other than that, adding safe guards to CLI commands seems like a good first issue if you want to have a crack at it and think that there's actually a problem.
from pg_auto_failover.
You may also want to have a look at Patroni-based solution like:
https://github.com/vitabaks/postgresql_cluster
from pg_auto_failover.
Firstly, I want to thank Dimitri Fontaine and other developers for their contribution to the development of this solution!
If one of the nodes has become unavailable, then the pg_auto_failover (witness) of course must be ready for such a possible problem and have in its arsenal a way out of such a situation.
The hang at the report_lsn stage is not good.
I really hope that the developers of this solution will find a good way to solve this problem.
Yes, this is open source software, but not everyone can write code as good as this code.
We really hope to implement the necessary cases in the witness code to solve this problem.
from pg_auto_failover.
I may be a little rusty, as it was a long time since I've experienced multiple node failure in pg_auto_failover clusters, but in the last 4 years of running both production and testing clusters with high volume of transactions, I haven't noticed this behaviour.
One thing is though that I let the cluster "run itself".
The only times when I do some manual switchovers/set maintenance mode is when I update and reboot servers.
from pg_auto_failover.
Related Issues (20)
- Deadlocks during pg_auto_failover operations HOT 1
- PostgreSQL 16 support HOT 13
- Error when building image using docs/citus/Dockerfile HOT 1
- ERROR candidate-priority value 10 is not valid. Valid values are integers from 0 to 100 🧐 HOT 1
- Release 2.1 HOT 5
- Upgrade to 2.1 fails: extension "pgautofailover" has no update path from version "2.0" to version "2.1" HOT 4
- Citus formation upgrade path
- Error when creating Citus worker to a formation HOT 2
- When a secondary node is dropped, the FSM is promoted from secondary to single.
- Configuration parameter in pg_autoctl.cfg for the ability to clear PGDATA before running pg_basebackup HOT 1
- Possible FAILURE STATE in State Machine HOT 1
- How can I configure the maximum number of WAL segments a standby can lag behind the primary in pg_auto_failover? HOT 1
- Hung (unclosed) old connections in idle status on the monitor from datanodes HOT 7
- Question: Running with wal_level = logical?
- Both the data nodes left in read/write mode
- Monitor ERROR: invalid input syntax for type integer: "" HOT 1
- Minimal supported postgresql version
- Upgrade Failure from 2.0 to 2.1-2
- pg_autoctl --version show weird version for pg_autoctl itself
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_auto_failover.