enterprisedb / repmgr Goto Github PK

View Code? Open in Web Editor NEW

1.5K 148.0 249.0 8.69 MB

A lightweight replication manager for PostgreSQL (Postgres)

Home Page: https://repmgr.org/

License: Other

Makefile 0.39% C 98.08% Perl 0.17% Lex 1.36%

repmgr replication postgresql postgres ha failover autofailover cluster replication-manager

repmgr's Introduction

repmgr: Replication Manager for PostgreSQL

repmgr is a suite of open-source tools to manage replication and failover within a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in replication capabilities with utilities to set up standby servers, monitor replication, and perform administrative tasks such as failover or switchover operations.

The most recent repmgr version (5.3.2) supports all PostgreSQL versions from 9.5 to 15. PostgreSQL 9.4 is also supported, with some restrictions.

repmgr is distributed under the GNU GPL 3 and maintained by EnterpriseDB.

Documentation

The full repmgr documentation is available here:

repmgr documentation

Versions

For an overview of repmgr versions and PostgreSQL compatibility, see the repmgr compatibility matrix.

Files

CONTRIBUTING.md: details on how to contribute to repmgr
COPYRIGHT: Copyright information
HISTORY: Summary of changes in each repmgr release
LICENSE: GNU GPL3 details

Directories

contrib/: additional utilities
doc/: DocBook-based documentation files
expected/: expected regression test output
sql/: regression test input

Support and Assistance

EnterpriseDB provides 24x7 production support for repmgr, including configuration assistance, installation verification and training for running a robust replication cluster. For further details see:

EDB Support Services

There is a mailing list/forum to discuss contributions or issues:

https://groups.google.com/group/repmgr

The IRC channel #repmgr is registered with freenode.

Please report bugs and other issues to:

https://github.com/EnterpriseDB/repmgr

Further information is available at https://repmgr.org/

We'd love to hear from you about how you use repmgr. Case studies and news are always welcome.

Thanks from the repmgr core team.

Jaime Casanova
Abhijit Menon-Sen
Simon Riggs
Cedric Villemain

repmgr's People

Contributors

Stargazers

Watchers

Forkers

pkondzior gsorbara machack666 jessegonzalez mnacos ralienpp mprzytulski smkelly koordinates gdott9 jaytaylor rvaralda jbyck chrisroberts fanyeren tm2 dpetzold codingminds dnaeon cynipe greytip merklerock eugenejen wamonite frankzwang pricechild riegie nexperteam zforkdump schmiddy j4m355 skehlet ssrihari deadpoint totemofwolf masbog lifeles666 pathway repositpower mhagander wking gerben-van-eck eternaltyro e4c5 cartodb wdacom abessifi dh-luisneves bengrunt zorgz gregorg soxwellfb constantineg1 gcedb evilelk simonedeponti jiansong63 tvondra rpatters julka7 savulchik sandsman26 yuejiesong1900 bdurrow renard xsls knopwob miguelaustro mlewis033 jonathandelanders fanyangxi swrd isqxie pasichnichenko a-palagin rmwpl ferfebles aekondratiev phyber klaasjan potatosalad filiprem richardthehouse oranenj juliusza artembarbaruk bharling petere modulexcite cutso nunb devopsbox runlevel5 linusramos danprot ryno83 gthb microtan pcockwell costela

repmgr's Issues

pgpass

Hi,

repmgr did not used the .pgpass file.

root# uname -a
Linux rh1.localdomain 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

root# yum list installed | grep repmgr
repmgr.x86_64 2.0-1.rhel6 @pgdg92

postgres$ psql -h aaa.aaa.aaa.aaa -U repmgr repmgr
repmgr# \q
=> connexion is ok reading the password from ~postgres/.pgpass

postrgres$ repmgr -U repmgr -d repmgr standby clone aaa.aaa.aaa.aaa
[2014-09-24 21:01:28] [ERROR] Did not find the configuration file './repmgr.conf', continuing
[2014-09-24 21:01:28] [ERROR] Connection to database failed: fe_sendauth: no password supplied

But cloning a master on a standby, lead to an error. Missing password.
I log debug information about the connections and no password is send.

The table "repl_monitor" is null

config file and repmgrd usage

The comment below is misleading in repmgr.c, because the standby clone command use rsync_options and ssh_options for custom parameters, which are cannot be set without config file:


/*
XXX Do not read config files for action where it is not required (clone
for example).
*/
parse_config(runtime_options.config_file, &options);

Similar probably obsolete information about repmgrd is in the README, because the daemon does not terminate on master, but starts primary connection checking:

If run on a master, repmgrd will exit, as it has nothing to do on them yet. It is only targeted at running on standby servers currently.

repmgr fails to rsync PostgreSQL 9.4's `pg_logical` directory

Per this user report on stack overflow, repmgr fails to rsync pg_logical when making a copy for a new replica with repmgr standby clone.

Marco identified the bug as a missing / here:

https://github.com/2ndQuadrant/repmgr/blob/REL2_0_STABLE/repmgr.c#L1895

that causes the exclusion of pg_log to unwittingly mach pg_logical too.

It's fixed in master already.

Change port of witness server via config or command line

Hello, currently I try to setup the witness server on postgresql via the default port, but I have no option to do so...
I need to manually create the witness server, (which fails ofcourse) and the need to change the port and reboot the server.

EDIT:
Oh and then I need to add the database, manually....

CentOS 7 Installation issue,version showed repmgr 3.0dev instead of repmgr 3.0

I tried to install latest stable version 3.0

I cloned the repo, master branch, and install on CentOS based system:

$ sudo "PATH=/usr/pgsql-9.4/bin:$PATH" make USE_PGXS=1 install
$ repmgr --version
repmgr 3.0dev (PostgreSQL 9.4.1)

$repmgrd --version
repmgrd 3.0dev (PostgreSQL 9.4.1)

Why version showed 3.0dev instead 3.0?

config.c message "unknown name/value pair provided; ignoring" triggered by whitespace?

I installed repmgr from the CentOS rpm:

yum install http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-centos94-9.4-1.noarch.rpm
yum --enablerepo=pgdg94 install postgresql94 postgresql94-server postgresql94-libs postgresql94-devel postgresql94-contrib repmgr94-3.0.1

It seems that the default repmgr.conf has a [space]\n at line 125. (Line 6 in the sed/cat result.)

$ sed -n 120,126p repmgr.conf | cat -An 
     1  # Autofailover options$
     2  failover=automatic  # one of 'automatic', 'manual'$
     3  priority=100        # a value of zero or less prevents the node being promoted to master$
     4  promote_command='/usr/pgsql-9.4/bin/repmgr standby promote -f /etc/repmgr/9.4/repmgr.conf'$
     5  follow_command='/usr/pgsql-9.4/bin/repmgr standby follow -f /etc/repmgr/9.4/repmgr.conf'$
     6   $
     7  # monitoring interval; default is 2s$

When I commented out the Autofailover options, unaware of this space, it resulted in a cryptic error message when executing repmgr commands:

     1  # Autofailover options$
     2  # failover=automatic  # one of 'automatic', 'manual'$
     3  # priority=100        # a value of zero or less prevents the node being promoted to master$
     4  #promote_command='/usr/pgsql-9.4/bin/repmgr standby promote -f /etc/repmgr/9.4/repmgr.conf'$
     5  # follow_command='/usr/pgsql-9.4/bin/repmgr standby follow -f /etc/repmgr/9.4/repmgr.conf'$
     6   $
     7  # monitoring interval; default is 2s$

$  repmgr -f /etc/repmgr/9.4/repmgr.conf --verbose standby register
[2015-05-29 14:56:30] [NOTICE] opening configuration file: /etc/repmgr/9.4/repmgr.conf
[2015-05-29 14:56:30] [WARNING]  
//usr/pgsql-9.4/bin/repmgr standby follow -f /etc/repmgr/9.4/repmgr.conf: unknown name/value pair provided; ignoring
[2015-05-29 14:56:30] [INFO] connecting to standby database
[2015-05-29 14:56:30] [INFO] connecting to master database
...

While it may be a nit-picky request, perhaps we could update the config.c to ignore comments, blank lines (\n), and lines of all whitespace by trimming the line before the comparison and testing for null string as well as the '\n' and '#'?

/* Skip blank lines and comments */
if (buff[0] == '\n' || buff[0] == '#')
    continue;

Or, have I misdiagnosed this issue entirely?

repmgrd should check if the replication slot already exists during failover

When a standby node determines that it needs to follow a new master, if the replication slot already exists then repmgrd will fail. It would be nice if repmgrd could check for the existence of the replication slot before trying to create it.

No master could be elected (repmgrd on standby starts with an SQL error)

Hello, currently I have the following setup:
3 Nodes:
database1 - db1
database2 - db2
database3 - witness

Currently after promoting db1 to the new master and starting repmgrd on db2 i get the following error:

Apr 08 12:55:55 database2 repmgrd[2645]: [2015-04-08 12:55:55] [WARNING] Unable to create event record: ERROR: INSERT cannot be done in a read-only transaction.

The error seems to be harmless, but after a failure of db1 a new master could be elected i.e. witness server says:

12:58:15 database3 repmgrd[23919]: [2015-04-08 12:58:15] [ERROR] connection to database failed: could not connect to server: No route to host
Apr 08 12:58:15 database3 repmgrd[23919]: Is the server running on host "database1" (192.168.32.200) and accepting
Apr 08 12:58:15 database3 repmgrd[23919]: TCP/IP connections on port 5432?
Apr 08 12:58:15 database3 repmgrd[23919]: [2015-04-08 12:58:15] [INFO] checking role of cluster node '2'
Apr 08 12:58:15 database3 repmgrd[23919]: [2015-04-08 12:58:15] [WARNING] unable to determine a valid master server; waiting 1 seconds to retry...

so something broke here. how could i fix that?

oh and somehow if i start db1 again with repmgrd it won't have the error?!

RATS static checker reports numerous results on repmgr 2.0

Hi,

One of our customers is planning to use repmgr in their production environment and did security analysis against the source code, they've raised some issues. The following report is based on repmgr 2.0 code and done with RATS tool:

Severity: High
Issue: umask
umask() can easily be used to create files with unsafe priviledges. It should be set to restrictive values.

File: check_dir.c, lines: 185 187 191 215

Severity: High
Issue: fixed size global buffer
Extra care should be taken to ensure that character arrays that are allocated on the stack are used safely. They are prime targets for buffer overflow attacks.

File: check_dir.c, lines: 224
File: config.c, lines: 28 29 30 95
File: dbutils.c, lines: 33 104 126 223 256 286 323 324 326 496
File: log.c, lines: 44 47
File: repmgr.c, lines: 383 384 432 486 489 490 633 634 636 637 784 792 794 795 796 797 798 799 801 802 803 804 805 806 807 808 813 1283 1284 1290 1291 1292 1294 1393 1394 1395 1400 1402 1403 1541 1543 1544 1550 1552 1804 1805 1842 1869 1870 1871 2078 2235 2286 2287 2288 2289 2290
File: repmgrd.c, lines: 79 184 481 557 558 559 560 561 739 754 1367 1437

Severity: High
Issue: sprintf
Check to be sure that the format string passed as argument 2 to this function call does not come from an untrusted source that could have added formatting characters that the code is not prepared to handle. Additionally, the format string could contain `%s' without precision that could result in a buffer overflow.

File: check_dir.c, lines: 234
File: repmgr.c, lines: 1616 1689
File: repmgrd.c, lines: 768 1324

Severity: High
Issue: system
Argument 1 to this function call should be checked to ensure that it does not come from an untrusted source without first verifying that it contains nothing dangerous.

File: check_dir.c, lines: 235
File: repmgr.c, lines: 1363 1524 1619 1691 1859 1907
File: repmgrd.c, lines: 1057 1082

Severity: High
Issue: strcpy
Check to be sure that argument 2 passed to this function call will not copy more data than can be handled, resulting in a buffer overflow.

File: config.c, lines: 303 305 308 309 310 311 312
File: dbutils.c, lines: 35
File: repmgr.c, lines: 1347 1514 1672
File: repmgrd.c, lines: 524

Severity: High
Issue: vfprintf
Check to be sure that the non-constant format string passed as argument 2 to this function call does not come from an untrusted source that could have added formatting characters that the code is not prepared to handle.

File: log.c, lines: 58

Severity: High
Issue: printf
Check to be sure that the non-constant format string passed as argument 1 to this function call does not come from an untrusted source that could have added formatting characters that the code is not prepared to handle.

File: log.c, lines: 117
File: repmgr.c, lines: 297 1752 1753 1754 1755 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1770 1771 1772 1773 1775 1776 1778 1780 1782 1783 1784 1785 1786 1787 1788 1790 1792 1793
File: repmgrd.c, lines: 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275

Severity: High
Issue: getopt_long
Truncate all input strings to a reasonable length before passing them to this function

File: repmgr.c, lines: 137
File: repmgrd.c, lines: 202

Severity: High
Issue: getenv
Environment variables are highly untrustable input. They may be of any length, and contain any data. Do not make any assumptions regarding content or length. If at all possible avoid using them, and if it is necessary, sanitize them and truncate them to a reasonable length.

File: repmgr.c, lines: 284 285 286 287 2293

Severity: High
Issue: fprintf
Check to be sure that the non-constant format string passed as argument 2 to this function call does not come from an untrusted source that could have added formatting characters that the code is not prepared to handle.

File: repmgr.c, lines: 1743 1744

Severity: Medium
Issue: stat
A potential TOCTOU (Time Of Check, Time Of Use) vulnerability exists. This is the first line where a check has occured. The following line(s) contain uses that may match up with this check: 206 (mkdir)

File: check_dir.c, lines: 194

File: repmgrd.c, lines: 1441

Best regards,

version.h

version.h needs to updated to reflect release candidate 2.

repmgrd configuration questions

1- why are repmgr_funcs.sql installed by default even in the case we don't plan to use repmgrd features ?
Are these functions still necessary if we ONLY use repmgr basic features (cloning and switchover) ?

2- Do we have to start the repmgrd daemon on each node of the cluster ?
For the failover feature, I presume its yes, is that correct ?
But if we simply want to use ONLY its monitoring feature, on which node must we start repmgrd ?
master, standbys or both ?

Regards

If repmgr schema has already been create, repmgr master register fails, but doesn't ERROR out

If you try to register a master on a cluster that already has all the repmgr objects created, if fails, but doesn't inform the user about the failure:

postgres@debian7:~$ repmgr -v -f repmgr.conf master register
Opening configuration file: repmgr.conf
[2015-03-06 11:28:47] [INFO] repmgr connecting to master database
[2015-03-06 11:28:47] [INFO] repmgr connected to master, checking its state
[2015-03-06 11:28:47] [NOTICE] Schema repmgr_test already exists.
postgres@debian7:~$ psql repmgr
psql (9.4.1)
Digite «help» para obtener ayuda.

repmgr=# select * from repmgr_test.repl_nodes ;
 id | cluster | name | conninfo | priority | witness
----+---------+------+----------+----------+---------
(0 filas)

Docker and repmgr?

Do you have any plans to containerize repmgr with Docker? I would like to propose that we undertake this as part of the effort underway at http://www.postgresql.org/list/pgsql-pkg-docker/, so we can make the repmgr container work well with the official PostgreSQL container.

Your input on the list would be appreciated!

Build on CentOS 7

I build repmgr under CentOS 7 and the following errors poping up when building repmgr:

DEBUG: /usr/bin/ld: cannot find -lselinux
DEBUG: /usr/bin/ld: cannot find -lxslt
DEBUG: /usr/bin/ld: cannot find -lxml2
DEBUG: /usr/bin/ld: cannot find -lpam
DEBUG: /usr/bin/ld: cannot find -lssl
DEBUG: /usr/bin/ld: cannot find -lcrypto
DEBUG: /usr/bin/ld: cannot find -lkrb5
DEBUG: /usr/bin/ld: cannot find -lcom_err
DEBUG: /usr/bin/ld: cannot find -lgssapi_krb5
DEBUG: /usr/bin/ld: cannot find -lz
DEBUG: /usr/
DEBUG: RPM build errors:
DEBUG: bin/ld: cannot find -lreadline
DEBUG: collect2: error: ld returned 1 exit status
DEBUG: make: *** [repmgrd] Error 1

Any idea why this happens?

repmgr and createdb

Is is really mandatory to create a new DB for repmgr configuration or can we only let it create his schema in an existing database (for ex postgres DB) ? Any pros and cons ?

repmgr 3.0 rpm package

repmgr rpm package not yet update on Postgresal 9.4 official yum repository:

yum list repmgr*
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.nexcess.net
 * extras: bay.uchicago.edu
 * updates: ftpmirror.your.org
Available Packages
repmgr.x86_64                                                                          2.0.2-1.rhel7                                                                pgdg94
repmgr-debuginfo.x86_64                                                                2.0.2-1.rhel7                                                                pgdg94

Error while cloning master : Unable to create event record: ERROR: relation "repmgr_.repl_events" does not exist

I'm trying to configure master and 2slaves cluster PG-9.4 + repmgr3.0.

while cloning master i got the following warning :

postgres@slave1~ > repmgr -D /var/lib/pgsql/9.4/data -d repmgr -p 5432 -U repmgr -R postgres --verbose standby clone master.example.com
[2015-04-01 23:52:40] [NOTICE] no configuration file provided and default file './repmgr.conf' not found - continuing with default values
[2015-04-01 23:52:40] [NOTICE] destination directory '/var/lib/pgsql/9.4/data' provided
[2015-04-01 23:52:40] [INFO] connecting to upstream node
[2015-04-01 23:52:40] [INFO] connected to upstream node, checking its state
[2015-04-01 23:52:40] [INFO] Successfully connected to upstream node. Current installation size is 27 MB
[2015-04-01 23:52:40] [NOTICE] starting backup...
[2015-04-01 23:52:40] [INFO] checking and correcting permissions on existing directory /var/lib/pgsql/9.4/data ...
[2015-04-01 23:52:40] [INFO] executing: 'pg_basebackup -l "repmgr base backup"  -h master.example.com -p 5432 -U repmgr -D /var/lib/pgsql/9.4/data '
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
[2015-04-01 23:52:41] [NOTICE] standby clone (using pg_basebackup) complete
[2015-04-01 23:52:41] [NOTICE] HINT: you can now start your PostgreSQL server
[2015-04-01 23:52:41] [NOTICE] for example : pg_ctl -D /var/lib/pgsql/9.4/data start
[2015-04-01 23:52:41] [WARNING] Unable to create event record: ERROR:  relation "repmgr_.repl_events" does not exist
LINE 1:  INSERT INTO "repmgr_".repl_events (              node_id,  ...

Repmgr and server boot sequence

I would like to know if there is a best practice concerning start/stop boot sequence for PostgreSQL servers under repmgr ?
Is it OK if my server (master and slave) can automatically (re)boot in any order ?

Same question regarding the automated starting of repmgrd daemon at server boot start ?

Regards

Do not attempt to SSH when --ignore-external-config-files is passed

In the event that pg_basebackup and --ignore-external-config-files are used together, is there any reason to SSH to the remote machine? If not, can that step be bypassed so administrators don't need to configure SSH keys and such?

quote_ident

With repmgr 2.0.2:
In the file repmgr.conf, the parameter cluster seems not to be escaped with quote_ident.
Exemple: When the cluster's name contains a dash '-', repmgr can not create the repmgr schema.

Document pg_bindir

I'm not quite sure how my system upgraded to the latest repmgr but apparently pg_bindir has to be explicitly set.

It would be nice if the example documentation and configuration show that new config variable or maybe just have pg_bindir take the default value of /usr/bin as a sensible default otherwise I have a feeling almost every repmgr upgrade is going to break.

repmgr --force standby register shouldn't stop with exist code 1

Currently if creating a systemd file the command --force standby register will always fail with exit code 1 since it raises the warning:

[WARNING] Unable to delete node record: ...

This is not wished since i explicitly --force the command to do so even if i know that repl_nodes_ has a fk with the id.
Systemd will fail until i specify a =-repmgr --force...

Cannot compile without Postgres 9.3+ libraries in pgdg on Ubuntu 14.04

Greetings,

After a lot of head scratching, I've figured out that repmgr 2.0.2 has a dependency against the 9.3+ version of the Postgres libraries. I'm currently restoring a 9.2 cluster before we migrate it to 9.3 and I am unable to compile repmgr when I install the postgresql-server-dev-9.2 package from the official pgdg repo. When I upgrade to the 9.3 version of the same package, it compiled normally without issue. Here is the stack trace of the make error so you can see the int definitions it is depending upon.

STDERR output:

In file included from repmgr.h:25:0,
                 from dbutils.c:24:
       /usr/include/postgresql/libpq-fe.h:547:1: error: unknown type name 'pg_int64'
        extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
        ^
       /usr/include/postgresql/libpq-fe.h:547:50: error: unknown type name 'pg_int64'
        extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
                                                  ^
       /usr/include/postgresql/libpq-fe.h:551:1: error: unknown type name 'pg_int64'
        extern pg_int64 lo_tell64(PGconn *conn, int fd);
        ^
       /usr/include/postgresql/libpq-fe.h:553:48: error: unknown type name 'pg_int64'
        extern int lo_truncate64(PGconn *conn, int fd, pg_int64 len);
                                                ^
       make: *** [dbutils.o] Error 1

This could be a change outside of repmgr in the included pg libs so I am not sure what could be done to resolve this to make it compatible with the previous versions. Googling around, it looks like people have had similar compile problems in other C libs, but it's not conclusive on which version this caused a break.

No way to specify target host ssh port

In my setup using Docker, I run a special SSH instance on a high port. This the instance which has access to the Postgresql data directory for rsync. Is there way to configure this port?

Example config files doesn't get parsed correctly

Hi,

The example repmgr.conf file (which is presented at the default configuration file in the rpm package from postgres repository) doesn't get parsed properly, as some lines contain comments after the values.

A quick look at parse_line() in config.c seems to support this conclusion. Looks like trim() removes the trailing whitespace, not a trailing comment.

Maybe instead of:

for (++i; i < MAXLEN; ++i)
    if (buff[i] == '\'')
        continue;
    else if (buff[i] != '\n')
        value[j++] = buff[i];
    else
        break;

this:

for (++i; i < MAXLEN; ++i)
    if (buff[i] == '\'')
        continue;
    else if (buff[i] != '\n' && buff[i] != '#')
        value[j++] = buff[i];
    else
        break;

May have unintended consequences, tho.

Support restore_command and archive_cleanup_command in repmgr config

As repmgr standby clone is creating the recovery.conf, it seems like a good idea to be able to give the recovery.conf standby related setup to repmgr so that it's automatically deployed. In particular restore_command and archive_cleanup_command are quite important.

Thanks for your consideration,

gettimeofday

Hi Greg,

$ uname -a
SunOS panix 5.11 11.0 i86pc i386 i86pc

$ make USE_PGXS=1
cc -Xa -m64 -fast -I/opt/postgres/9.3.2/include -I. -I. -I/opt/postgres/9.3.2/include/server -I/opt/postgres/9.3.2/include/internal -I/usr/local/include -I/usr/include/readline -I/usr/include/openssl -c -o dbutils.o dbutils.c
"dbutils.c", line 466: prototype mismatch: 1 arg passed, 2 expected
"dbutils.c", line 474: prototype mismatch: 1 arg passed, 2 expected
cc: acomp failed for dbutils.c

#include <sys/time.h>
int gettimeofday(struct timeval *tp, void *tzp);

It's being called as: int gettimeofday(struct timeval *tp);

repmgr witness create --force

The witness create command accepts a --force command line option, but that only protects the create_pg_dir() call from failing. Calling pg_ctl init on a non-empty already-existing directory will fail at a 100% rate.

I think repmgr should instead purge (as in rm -rf) the target directory first when using --force.

Actively maintained?

Hey, are you guys still actively maintaining this project? I ask because there have been PR's that are needed to be merged in for your product to work (witness creation is completely broken) and they've been sitting there for months.

If you're not, would you be ok with someone else taking it over?

copy_configuration needs 2nd

I think the current copy configuration is using res for both the select and
insert queries which sets the res to the value returned by the insert and PQntuples becomes invalid in the next iteration. Also free res when done.

copy_configuration(PGconn *masterconn, PGconn *witnessconn)
{
char sqlquery[MAXLEN];
PGresult *res;
PGresult *res2;
int num_tuples;
int i;

    sqlquery_snprintf(sqlquery, "TRUNCATE TABLE %s.repl_nodes", repmgr_schema);
    log_debug("copy_configuration: %s\n", sqlquery);
    res = PQexec(witnessconn, sqlquery);
    if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
    {
            fprintf(stderr, "Cannot clean node details in the witness, %s\n",
                    PQerrorMessage(witnessconn));
            return false;
    }

    sqlquery_snprintf(sqlquery, "SELECT id, name, conninfo, priority, witness FROM %s.repl_nodes", repmgr_schema);
    res = PQexec(masterconn, sqlquery);
    if (PQresultStatus(res) != PGRES_TUPLES_OK)
    {
            fprintf(stderr, "Can't get configuration from master: %s\n",
                    PQerrorMessage(masterconn));
            PQclear(res);
            return false;
    }

    for (i = 0; i < PQntuples(res); i++)
    {
            sqlquery_snprintf(sqlquery, "INSERT INTO %s.repl_nodes(id, cluster, name, conninfo, priority, witness) "
                              "VALUES (%d, '%s', '%s', '%s', %d, '%s')",
                              repmgr_schema, atoi(PQgetvalue(res, i, 0)),
                              options.cluster_name, PQgetvalue(res, i, 1),
                              PQgetvalue(res, i, 2),
                              atoi(PQgetvalue(res, i, 3)),
                              PQgetvalue(res, i, 4));

            res2 = PQexec(witnessconn, sqlquery);
            if (!res2 || PQresultStatus(res2) != PGRES_COMMAND_OK)
            {
                    fprintf(stderr, "Cannot copy configuration to witness, %s\n",
                            PQerrorMessage(witnessconn));
                    PQclear(res);
                    return false;
            }
            PQclear(res2);
    }

    PQclear(res);
    return true;

}

recovery.conf file does not contain database password on failover

I have a test setup with three servers (db01, db02, db03). The repmgr postgres user has a password set on it, and when a failover from db01 to db02 occurs, the recovery.conf file on db03 does not get the connection password and thus fails to replicate. The file is reproduced below:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=db02.node.consul user=repmgr application_name=jupiter'

My expectation was that the primary_conninfo line would contain a password=... directive, but it does not.

Connections from repmgrd never close?

We have a cluster two Postgres servers, using streaming replication.

We use repmgrd to monitor the replication lag. Everything is working Ok, in the log of the master server, we can see repmgrd connecting every three seconds. But there is no log entry for disconnection.

$ tail -f /var/lib/pgsql/9.2/data/pg_log/postgresql-Mon.log | grep 192.168.70
2014-03-17 11:26:07 WET [28404]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=192.168.70.1 port=46885
2014-03-17 11:26:10 WET [28410]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=192.168.70.1 port=46886
2014-03-17 11:26:13 WET [28414]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=192.168.70.1 port=46887
2014-03-17 11:26:16 WET [28454]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=192.168.70.1 port=46888
.
.

We can see disconnection entries for other hosts (log_disconnections = on).

We are using the last stable version,
$ /usr/pgsql-9.2/bin/repmgr --version
repmgr 1.2.0 (PostgreSQL 9.2beta2)

Incorrect wal_level check

Prior to 9.4 it was ok to check that wal_level was set to hot_standby. Starting with 9.4, the logical wal_level has been added, which is more then enough for repmgr to work (or actually for replication to work).

We need to test wal_level IN ('hot_standby', 'logical') now.

Distinguish between repmgr and replicated database connections in config file

Database connection information for each node is recorded by create_node_record calls in do_master_register and do_standby_register. However, do_standby_clone and do_standby_follow both call create_recovery_file, which in turn calls write_primary_conninfo, and write_primary_conninfo isn't fetching that connection information from the database. Instead, it extracts most of the connection parameters from runtime options and grabs the password from an undocumented PGPASSWORD environment variable. do_standby_follow partially works around this through the questionable clobbering of some runtime_options settings. Similarly, do_standby_clone is grabbing a subset of possible connection parameters from command-line arguments and using them to connect to the master database.

It makes more sense to me to explicitly store both of the following in the config file (and possibly have two sets of command-line options for folks who don't want to use a config file):

a. Connection information for this node's replicating database, to be pushed to the repmgr database for use by other nodes. This is what's currently conninfo in the config file.
b. Connection information for the repmgr database, for use by this node when figuring out how to connect to other nodes for replication.

That makes it easy to access the repmgr database regardless of the state of the local database (e.g. early in do_standby_clone without this partial copy). And once you've connected to the repmgr database it's easy to figure out the connections for any node in the replicating database. This also makes it easy to use an independent server for the repmgr database if you don't want to keep that information in the replicated database.

If this sounds reasonable, I'm happy to help with a pull request implementing it (although I'm also happy to have someone else write that pull request for me ;).

Does repmgr supports postgresql-common cluster management features?

Hey there,

I've been trying to use repmgr on our cluster for the last few weeks and I've managed to build a PoC and everything is working well, so kudos to you. :)

However, it seems that I can't run multiple DB clusters on the same host. We manage our DB clusters with the tools provided by the postgresql-common package on Debian/Ubuntu and would like to use repmgr to provide replication and automatic fail-over for those clusters. FYI, the postgresql-common package provides tools to manage and run different versions and multiple clusters on the same host.

For instance, let's say i've got two DB clusters running on host A, listening on different ports obviously.
When i use repmgr with any option it says that it can't find a suitable default target:

$ repmgr -f /path/to/repmgr.conf master register
Error: No existing local cluster is suitable as a default target. Please see man pg_wrapper(1) how to specify one.

I understand that repmgr calls various postgresql utilities under the hood and thus its behavior is consistent with what would happen if I used a postgresql binary by myself, such as psql: it would just send me packing as it can't determine to which cluster it must connect.

So I tried to specify the DB port with the -p|--port option but repmgr kicks me out saying that it's unnecessary for a master registration:

$ repmgr -f /path/to/repmgr.conf master register -p db_port
repmgr: Replication manager 
[ERROR] master connection parameters not required when executing MASTER REGISTER
Try "repmgr --help" for more information.

When I give a look at the code that kicks me out, I can see that most repmgr operations would not work in such a situation either.

Even when I want to use repmgr --help I get kicked out by repmgr if I don't specify a port number.

So the question is: am I stuck in a deadlock there? Are the postgresql-common cluster management features not so much used in the community?

Thanks a lot for your help anyway.

Unable to create event record

I try to follow instruction from this website:
https://github.com/2ndQuadrant/repmgr/blob/REL3_0_STABLE/QUICKSTART.md#standby-setup

here is my server information

$ repmgr --version
repmgr 3.0dev (PostgreSQL 9.4.1)

$ uname -a
Linux BackupServer 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

And when use repmgr standby clone , it returns error

postgres@BackupServer:~/9.4$ repmgr -D /var/lib/postgresql/9.4/main -d repmgr_db -U repmgr_usr --verbose standby clone 192.168.200.133
[2015-03-30 23:01:13] [NOTICE] no configuration file provided and default file './repmgr.conf' not found - continuing with default values
[2015-03-30 23:01:13] [NOTICE] destination directory '/var/lib/postgresql/9.4/main' provided
[2015-03-30 23:01:13] [INFO] connecting to upstream node
[2015-03-30 23:01:13] [INFO] connected to upstream node, checking its state
[2015-03-30 23:01:13] [INFO] Successfully connected to upstream node. Current installation size is 48 MB
[2015-03-30 23:01:13] [NOTICE] starting backup...
[2015-03-30 23:01:13] [INFO] creating directory "/var/lib/postgresql/9.4/main"...
[2015-03-30 23:01:13] [INFO] executing: 'pg_basebackup -l "repmgr base backup"  -h 192.168.200.133 -p 5432 -U repmgr_usr -D /var/lib/postgresql/9.4/main '
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
[2015-03-30 23:01:22] [NOTICE] copying configuration files from master
[2015-03-30 23:01:25] [INFO] standby clone: master config file '/etc/postgresql/9.4/main/postgresql.conf'
[2015-03-30 23:01:25] [INFO] rsync command line: 'rsync --archive --checksum --compress --progress --rsh=ssh 192.168.200.133:/etc/postgresql/9.4/main/postgresql.conf /var/lib/postgresql/9.4/main'
receiving incremental file list
postgresql.conf
         20,755 100%   19.79MB/s    0:00:00 (xfr#1, to-chk=0/1)
[2015-03-30 23:01:28] [INFO] standby clone: master hba file '/etc/postgresql/9.4/main/pg_hba.conf'
[2015-03-30 23:01:28] [INFO] rsync command line: 'rsync --archive --checksum --compress --progress --rsh=ssh 192.168.200.133:/etc/postgresql/9.4/main/pg_hba.conf /var/lib/postgresql/9.4/main'
receiving incremental file list
pg_hba.conf
          4,859 100%    4.63MB/s    0:00:00 (xfr#1, to-chk=0/1)
[2015-03-30 23:01:31] [INFO] standby clone: master ident file '/etc/postgresql/9.4/main/pg_ident.conf'
[2015-03-30 23:01:31] [INFO] rsync command line: 'rsync --archive --checksum --compress --progress --rsh=ssh 192.168.200.133:/etc/postgresql/9.4/main/pg_ident.conf /var/lib/postgresql/9.4/main'
receiving incremental file list
pg_ident.conf
          1,636 100%    1.56MB/s    0:00:00 (xfr#1, to-chk=0/1)
[2015-03-30 23:01:34] [NOTICE] standby clone (using pg_basebackup) complete
[2015-03-30 23:01:34] [NOTICE] HINT: you can now start your PostgreSQL server
[2015-03-30 23:01:34] [NOTICE] for example : pg_ctl -D /var/lib/postgresql/9.4/main start
[2015-03-30 23:01:34] [WARNING] Unable to create event record: ERROR:  relation "repmgr_.repl_events" does not exist
LINE 1:  INSERT INTO "repmgr_".repl_events (              node_id,  ...
                     ^

Do you have idea how to solve it?

Many thanks.

[Debian] repmgr's building dependencies

update_node_record_set_master doesn't set new master as active

Hello, I've discovered a problem that after a slave promotion new master doesn't become active, and because of that some operations that depend on 'get_master_node_id' (like 'create_node_record') fail.
According to the code HERE node never gets it's status set to true, to me it looks like a bug.

[ERROR] no tablespace matching path '/opt/pg94master/ssd' found

Hello.

I did a test system on PostgreSQL 9.4.

After the master killed and slave became a master:
$ kill -KILL $(cat /opt/pg94master/data/postmaster.pid)

I decided convert a failed master to a standby:
$ /usr/pgsql-9.4/bin/repmgr -f /etc/repmgr/9.4/repmgr-node1.conf --force --rsync-only -d repmgr -U repmgr --verbose standby clone -h 192.168.1.16 -p 5435
[2015-06-11 15:35:06] [INFO] connecting to upstream node
[2015-06-11 15:35:06] [INFO] connected to upstream node, checking its state
[2015-06-11 15:35:06] [INFO] Successfully connected to upstream node. Current installation size is 247 GB
[2015-06-11 15:35:06] [ERROR] no tablespace matching path '/opt/pg94master/ssd' found

My configs:
$ cat /etc/repmgr/9.4/repmgr-node1.conf
cluster=test
node=1
node_name=node1
conninfo='host=192.168.1.16 port=5434 user=repmgr dbname=repmgr'
rsync_options=--archive --checksum --compress --progress --rsh="ssh -o "StrictHostKeyChecking no""
ssh_options=-o "StrictHostKeyChecking no"
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
priority=100
promote_command='/usr/pgsql-9.4/bin/repmgr standby promote -f /etc/repmgr/9.4/repmgr-node1.conf'
follow_command='/usr/pgsql-9.4/bin/repmgr standby follow -f /etc/repmgr/9.4/repmgr-node1.conf'
loglevel=NOTICE
logfacility=STDERR
pg_bindir=/usr/pgsql-9.4/bin/
logfile='/var/log/repmgr/repmgr-9.4-node1.log'
tablespace_mapping=/opt/pg94master/ssd=/opt/pg94slave/ssd
use_replication_slots=1

$ cat /etc/repmgr/9.4/repmgr-node2.conf
cluster=test
node=2
node_name=node2
conninfo='host=192.168.1.16 port=5435 user=repmgr dbname=repmgr'
rsync_options=--archive --checksum --compress --progress --rsh="ssh -o "StrictHostKeyChecking no""
ssh_options=-o "StrictHostKeyChecking no"
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
priority=100
promote_command='/usr/pgsql-9.4/bin/repmgr standby promote -f /etc/repmgr/9.4/repmgr-node2.conf'
follow_command='/usr/pgsql-9.4/bin/repmgr standby follow -f /etc/repmgr/9.4/repmgr-node2.conf'
loglevel=NOTICE
logfacility=STDERR
pg_bindir=/usr/pgsql-9.4/bin/
logfile='/var/log/repmgr/repmgr-9.4-node2.log'
tablespace_mapping=/opt/pg94slave/ssd=/opt/pg94master/ssd
use_replication_slots=1

From sources:
/*

....
-T/--tablespace-mapping is not available as a pg_basebackup option for
PostgreSQL 9.3 - we can only handle that with rsync, so if --rsync-only
not set, fail with an error
*/

In my command exists --rsync-only, but I get an error:
[ERROR] no tablespace matching path '/opt/pg94master/ssd' found

What did I do wrong?

Thanks.

Automatic failover doesn't work

I have configured a scheme with one master, one standby and one witness.
repmgrd is running on standby and master server (as upstart service).

Example of normal repmgrd initialization log (on standby):

[2014-02-17 14:59:17] [INFO] repmgrd Connecting to database 'host=srv14 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_i
nterval=10 keepalives_count=6'
[2014-02-17 14:59:17] [INFO] repmgrd Connected to database, checking its state
[2014-02-17 14:59:17] [INFO] repmgrd Connecting to primary for cluster 'cluster0'
[2014-02-17 14:59:17] [INFO] finding node list for cluster 'cluster0'
[2014-02-17 14:59:17] [INFO] checking role of cluster node 'host=srv11 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_in
terval=10 keepalives_count=6'
[2014-02-17 14:59:17] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_cluster0'
[2014-02-17 14:59:17] [INFO] repmgrd Checking node 2 in cluster 'cluster0'
[2014-02-17 14:59:17] [INFO] Reloading configuration file and updating repmgr tables
[2014-02-17 14:59:17] [INFO] repmgrd Starting continuous standby node monitoring

Then I killed postgres on master server and got this:

[2014-02-17 15:06:34] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection refused

[2014-02-17 15:06:34] [INFO] repmgrd Connecting to database 'host=srv14 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_i
nterval=10 keepalives_count=6'
[2014-02-17 15:06:34] [INFO] repmgrd Connected to database, checking its state
[2014-02-17 15:06:34] [INFO] repmgrd Connecting to primary for cluster 'cluster0'
[2014-02-17 15:06:34] [INFO] finding node list for cluster 'cluster0'
[2014-02-17 15:06:34] [INFO] checking role of cluster node 'host=srv11 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_in
terval=10 keepalives_count=6'
[2014-02-17 15:06:34] [ERROR] Connection to database failed: could not connect to server: Connection refused
        Is the server running on host "srv11" (10.129.235.24) and accepting
        TCP/IP connections on port 5432?

[2014-02-17 15:06:34] [INFO] checking role of cluster node 'host=srv14 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_interval=10 keepalives_count=6'
[2014-02-17 15:06:34] [INFO] repmgrd Connecting to database 'host=srv14 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_interval=10 keepalives_count=6'
[2014-02-17 15:06:34] [INFO] repmgrd Connected to database, checking its state
[2014-02-17 15:06:34] [INFO] repmgrd Connecting to primary for cluster 'cluster0'
[2014-02-17 15:06:34] [INFO] finding node list for cluster 'cluster0'
[2014-02-17 15:06:34] [INFO] checking role of cluster node 'host=srv11 user=repmgr dbname=repmgr connect_timeout=10 keepalives_idle=10 keepalives_interval=10 keepalives_count=6'
[2014-02-17 15:06:34] [ERROR] Connection to database failed: could not connect to server: Connection refused
        Is the server running on host "srv11" (10.129.235.24) and accepting
        TCP/IP connections on port 5432?

Repmgrd can't be initialized and can't start a failover procedure.

I've already configured similar scheme one month ago and it worked.
So I think a problem came with the latest changes (commits).

repmgr does not call update_node_record_set_upstream() during standby follow

I use repmgr 3.0.1 with postgres 9.3, on Ubuntu 14.04. I have a master and two standbys following it. I do not use repmgrd, but manually handle failover myself.

node1: master, node_id=1
node2: standby, node_id=2, upstream_node_id=1
node3: standby, node_id=3, upstream_node_id=1

I failover from node1 to node2 via:

postgres@node1 ~$ service postgresql stop
postgres@node2 ~$ repmgr standby promote
postgres@node3 ~$ repmgr standby follow

It is worth noting that the above all succeed. node3 immediately begins streaming correctly from node2, however it does not update its upstream_node_id metadata in the repl_nodes tables:

postgres@node3 ~$ psql -x -d repmgr_db -c "select * from repmgr_master.repl_nodes where name = 'node3'"
-[ RECORD 1 ]----+-----------------------------------------------------------------------------
id               | 3
type             | standby
upstream_node_id | 1
cluster          | master
name             | node3
conninfo         | host=node3 dbname=repmgr_db user=replicator password=<passwd> port=5432
slot_name        |
priority         | 100
active           | t

postgres@node2 ~$ psql -x -d repmgr_db -c "select * from repmgr_master.repl_nodes where name = 'node3'"
-[ RECORD 1 ]----+-----------------------------------------------------------------------------
id               | 3
type             | standby
upstream_node_id | 1
cluster          | master
name             | node3
conninfo         | host=node3 dbname=repmgr_db user=replicator password=<passwd> port=5432
slot_name        |
priority         | 100
active           | t

In order to recover the failed master, below, upstream_node_id should be pointing to node_id 2, but it is not. This leads to the following foreign key constraint errors:

postgres@node1 ~$ repmgr --force -w 50 --rsync-only -h node2 -d repmgr_db -U replicator standby clone
postgres@sql-1 ~$ repmgr standby register --force

May 19 13:49:57 ubuntu repmgr[16550]: connecting to standby database
May 19 13:49:57 ubuntu repmgr[16550]: connecting to master database
May 19 13:49:57 ubuntu repmgr[16550]: finding node list for cluster 'master'
May 19 13:49:57 ubuntu repmgr[16550]: checking role of cluster node '2'
May 19 13:49:57 ubuntu repmgr[16550]: checking role of cluster node '4'
May 19 13:49:57 ubuntu repmgr[16550]: checking role of cluster node '1'
May 19 13:49:57 ubuntu repmgr[16550]: registering the standby
May 19 13:49:57 ubuntu repmgr[16550]: Unable to delete node record: ERROR:  update or delete on table "repl_nodes" violates foreign key constraint "repl_nodes_upstream_node_id_fkey" on table "repl_nodes"#012DETAIL:  Key (id)=(1) is still referenced from table "repl_nodes".#012

I see the code in repmgrd.c that handles this, but nothing in repmgr.c does. Can this be supported?

Release v2.0.1 says version is 2.1dev

Downloaded release v.2.0.1. Ran

repmgr --version
repmgr 2.1dev (PostgreSQL 9.4.0)

It appears that version.h says "2.1dev". The version.h of the 2.0 stable branch says "2.0". Which version am I really running?

Documentation section about ssh link setup

Just wanted to clarify something - is there something else happening in this section rather than just copying ssh key to the another host that wouldn't work with plain ssh-copy-id command?

Auto Failover Incorrect delay

I've found a strange behavior when trying to trigger the auto failover
by fencing the master node:

using wmware fencing tool, there is a timeout added to the repmgr
settings. With these settings :

reconnect_attempts=6
reconnect_interval=10

we obtain this kind of log :

[2015-02-12 16:21:19] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection timed out

[2015-02-12 16:22:22] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision
[2015-02-12 16:23:35] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision
[2015-02-12 16:24:48] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision
[2015-02-12 16:26:01] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 30 seconds before failover decision
[2015-02-12 16:27:14] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 20 seconds before failover decision
[2015-02-12 16:28:27] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 10 seconds before failover decision
[2015-02-12 16:29:40] [ERROR] repmgrd: We couldn't reconnect for long enough, exiting...
[2015-02-12 16:30:44] [ERROR] Connection to database failed: could not connect to server: Connection timed out
Is the server running on host "10.100.41.3" and accepting
TCP/IP connections on port 5432?

The timestamp show about 60 seconds added to each loop, which is not
the wanted delay.

I don't have vmware platform to reproduce this behavior, and I don't
know anything about the vmware fencing tool, but I can reproduce this
with some LXC guest and iptables to simulate the shutdown.

on the master guest, with this command :

iptables -A INPUT -s 10.0.3.67 -j REJECT

I obtain this log :

[2015-03-03 09:53:02] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection refused

[2015-03-03 09:53:05] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision
[2015-03-03 09:53:16] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision
[2015-03-03 09:53:27] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision

Which seems ok, but with this command :

iptables -A INPUT -s 10.0.3.67 -j DROP

I got this log :

[2015-03-03 09:55:59] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection timed out

[2015-03-03 09:58:06] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision
[2015-03-03 10:00:23] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision
[2015-03-03 10:02:41] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision

which is not OK, due to the 2 minutes added.

It's not exactly the same, because it's not the same delay, but i can
reproduce the behavior which is not correct, i guess.

Why repmgr role needs superuser privileges ?

Can you detail for which features postgresql role needs supeuuser privs ?
For security reasons i would like to constrain it to the minimum necessary.

Thanks in advance for any reply.

Can't compile repmgr v.3.0.1 and install the software

Hi,
I tried to compile and install repmgr in my CentOS 6.6 server but facing this error:
make USE_PGXS=1
Makefile:26: /usr/pgsql-9.4/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target `/usr/pgsql-9.4/lib/pgxs/src/makefiles/pgxs.mk'. Stop.

Here is my PATH:
echo $PATH
/usr/pgsql-9.4/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
Any help with my case?
Thanks and best regards,
Nhan Pham

split brain issue

Hey

I tried to used the repmgr/repmgrd tools to config the auto-failover.
everything seems working very good except below issue.

I have node 1,2,3 ( master, standby, witness) .
I shutdown the network interface on node1, after a while, node2/3 detect the failure of master , and starts the fail-over process, after a few minutes node2 became the new master.

however, right after that, I bring back the network interface of node 1, then I end up in a situation of two master nodes in the cluster, the witness seems doesn't detect this change and prevent the node1 from becoming master again .

Can you please comments on this issue ?

Thank you for your help

Jin

Leading newlines in syslog entries

If you run this on the repmgr tree, you'll see several cases where a leading newline is added to log entries:
grep -r 'log_[^(]+(_("\n' .

This is undesirable for syslog entries.

witness server's createuser command assumes postgres user exists

https://github.com/2ndQuadrant/repmgr/blob/master/repmgr.c#L1771

This line assumes there's a postgres user. If I run initdb as some other user, such as repmgr, there will be no postgres user and this command will fail. Moreover, there's apparently nowhere to interrupt this process to create the user before repmgr fails trying to do it on its own.