vitabaks / postgresql_cluster Goto Github PK

View Code? Open in Web Editor NEW

1.3K 28.0 343.0 163.61 MB

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.

License: MIT License

Shell 8.66% Python 22.30% Jinja 59.29% Makefile 7.40% Dockerfile 1.62% Roff 0.74%

postgresql high-availability cluster patroni etcd failover postgres replication ansible consul

postgresql_cluster's Introduction

PostgreSQL High-Availability Cluster 🐘 💖

Production-ready PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.

The postgresql_cluster project is designed to deploy and manage high-availability PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.

This project not only facilitates the creation of new clusters but also offers support for integrating with pre-existing PostgreSQL instances. If you intend to upgrade your conventional PostgreSQL setup to a high-availability configuration, then just set postgresql_exists=true in the inventory file. Be aware that initiating cluster mode requires temporarily stopping your existing PostgreSQL service, which will lead to a brief period of database downtime. Please plan this transition accordingly.

🏆 Use the sponsoring program to get personalized support, or just to contribute to this project.

Supported setups of Postgres Cluster

You have three schemes available for deployment:

1. PostgreSQL High-Availability only

This is simple scheme without load balancing (used by default).

Components of high availability:

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Used for automate the management of PostgreSQL instances and auto failover.
etcd is a distributed reliable key-value store for the most critical data of a distributed system. etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log. It is used by Patroni to store information about the status of the cluster and PostgreSQL configuration parameters.

What is Distributed Consensus?

To provide a single entry point (VIP) for database access is used "vip-manager".

vip-manager (optional, if the cluster_vip variable is specified) is a service that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there.
PgBouncer (optional, if the pgbouncer_install variable is true) is a connection pooler for PostgreSQL.

2. PostgreSQL High-Availability with HAProxy Load Balancing

To use this scheme, specify with_haproxy_load_balancing: true in variable file vars/main.yml

This scheme provides the ability to distribute the load on reading. This also allows us to scale out the cluster (with read-only replicas).

port 5000 (read / write) master
port 5001 (read only) all replicas

if variable "synchronous_mode" is 'true' (vars/main.yml):

port 5002 (read only) synchronous replica only
port 5003 (read only) asynchronous replicas only

❗ Note: Your application must have support sending read requests to a custom port 5001, and write requests to port 5000.

Components of load balancing:

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications.
confd manage local application configuration files using templates and data from etcd or consul. Used to automate HAProxy configuration file management.
Keepalived (optional, if the cluster_vip variable is specified) provides a virtual high-available IP address (VIP) and single entry point for databases access. Implementing VRRP (Virtual Router Redundancy Protocol) for Linux. In our configuration keepalived checks the status of the HAProxy service and in case of a failure delegates the VIP to another server in the cluster.

3. PostgreSQL High-Availability with Consul Service Discovery (DNS)

To use this scheme, specify dcs_type: consul in variable file vars/main.yml

This scheme is suitable for master-only access and for load balancing (using DNS) for reading across replicas. Consul Service Discovery with DNS resolving is used as a client access point to the database.

Client access point (example):

master.postgres-cluster.service.consul
replica.postgres-cluster.service.consul

Besides, it can be useful for a distributed cluster across different data centers. We can specify in advance which data center the database server is located in and then use this for applications running in the same data center.

Example: replica.postgres-cluster.service.dc1.consul, replica.postgres-cluster.service.dc2.consul

It requires the installation of a consul in client mode on each application server for service DNS resolution (or use forward DNS to the remote consul server instead of installing a local consul client).

Compatibility

RedHat and Debian based distros (x86_64)

Supported Linux Distributions:

Debian: 10, 11, 12
Ubuntu: 20.04, 22.04
CentOS: 7, 8
CentOS Stream: 8, 9
Oracle Linux: 7, 8, 9
Rocky Linux: 8, 9
AlmaLinux: 8, 9

PostgreSQL versions:

all supported PostgreSQL versions

✅ tested, works fine: PostgreSQL 10, 11, 12, 13, 14, 15, 16

Table of results of daily automated testing of cluster deployment:

Distribution	Test result
Debian 10
Debian 11
Debian 12
Ubuntu 20.04
Ubuntu 22.04
CentOS Stream 8
CentOS Stream 9
Oracle Linux 8
Oracle Linux 9
Rocky Linux 8
Rocky Linux 9
AlmaLinux 8
AlmaLinux 9

Ansible version

Minimum supported Ansible version: 2.11.0

Requirements

This playbook requires root privileges or sudo.

Ansible (What is Ansible?)

if dcs_type: "consul", please install consul role requirements on the control node:

ansible-galaxy install -r roles/consul/requirements.yml

Port requirements

List of required TCP ports that must be open for the database cluster:

5432 (postgresql)
6432 (pgbouncer)
8008 (patroni rest api)
2379, 2380 (etcd)

for the scheme "[Type A] PostgreSQL High-Availability with Load Balancing":

5000 (haproxy - (read/write) master)
5001 (haproxy - (read only) all replicas)
5002 (haproxy - (read only) synchronous replica only)
5003 (haproxy - (read only) asynchronous replicas only)
7000 (optional, haproxy stats)

for the scheme "[Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)":

8300 (Consul Server RPC)
8301 (Consul Serf LAN)
8302 (Consul Serf WAN)
8500 (Consul HTTP API)
8600 (Consul DNS server)

Recommenations

linux (Operation System):

Update your operating system on your target servers before deploying;

Make sure you have time synchronization is configured (NTP). Specify ntp_enabled:'true' and ntp_servers if you want to install and configure the ntp service.

DCS (Distributed Consensus Store):

Fast drives and a reliable network are the most important factors for the performance and stability of an etcd (or consul) cluster.

Avoid storing etcd (or consul) data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem! Store the etcd and postgresql data on different disks (see etcd_data_dir, consul_data_path variables), use ssd drives if possible. See hardware recommendations and tuning guides.

It is recommended to deploy the DCS cluster on dedicated servers, separate from the database servers.

Placement of cluster members in different data centers:

If you’d prefer a cross-data center setup, where the replicating databases are located in different data centers, etcd member placement becomes critical.

There are quite a lot of things to consider if you want to create a really robust etcd cluster, but there is one rule: do not placing all etcd members in your primary data center. See some examples.

How to prevent data loss in case of autofailover (synchronous_modes):

Due to performance reasons, a synchronous replication is disabled by default.

To minimize the risk of losing data on autofailover, you can configure settings in the following way:

synchronous_mode: 'true'
synchronous_mode_strict: 'true'
synchronous_commit: 'on' (or 'remote_apply')

Deployment: quick start

Install Ansible on one control node (which could easily be a laptop)

sudo apt update && sudo apt install -y python3-pip sshpass git
pip3 install ansible

Download or clone this repository

git clone https://github.com/vitabaks/postgresql_cluster.git

Go to the playbook directory

cd postgresql_cluster/

Edit the inventory file

Specify (non-public) IP addresses and connection settings (`ansible_user`, `ansible_ssh_pass` or `ansible_ssh_private_key_file` for your environment

nano inventory

Edit the variable file vars/main.yml

nano vars/main.yml

Minimum set of variables:

proxy_env # if required (for download packages)
cluster_vip # for client access to databases in the cluster (optional)
patroni_cluster_name
postgresql_version
postgresql_data_dir
with_haproxy_load_balancing 'true' (Type A) or 'false'/default (Type B)
dcs_type # "etcd" (default) or "consul" (Type C)

if dcs_type: "consul", please install consul role requirements on the control node:

ansible-galaxy install -r roles/consul/requirements.yml

Try to connect to hosts

ansible all -m ping

Run playbook:

ansible-playbook deploy_pgcluster.yml

Deploy Cluster with TimescaleDB

To deploy a PostgreSQL High-Availability Cluster with the TimescaleDB extension, you just need to add the enable_timescale variable.

Example:

ansible-playbook deploy_pgcluster.yml -e "enable_timescale=true"

Variables

See the vars/main.yml, system.yml and (Debian.yml or RedHat.yml) files for more details.

Cluster Scaling

After you successfully deployed your PostgreSQL HA cluster, you may need to scale it further.
Use the add_pgnode.yml playbook for this.

Add new postgresql node to existing cluster

This playbook does not scaling the etcd cluster or consul cluster.

During the run this playbook, the new nodes will be prepared in the same way as when first deployment the cluster. But unlike the initial deployment, all the necessary configuration files will be copied from the master server.

Steps to add a new Postgres node:

Add a new node to the inventory file with the variable new_node=true
Run add_pgnode.yml playbook

In this example, we add a node with the IP address 10.128.64.144

[master]
10.128.64.140 hostname=pgnode01 postgresql_exists='true'

[replica]
10.128.64.142 hostname=pgnode02 postgresql_exists='true'
10.128.64.143 hostname=pgnode03 postgresql_exists='true'
10.128.64.144 hostname=pgnode04 postgresql_exists=false new_node=true

Run playbook:

ansible-playbook add_pgnode.yml

Add new haproxy balancer node

During the run this playbook, the new balancer node will be prepared in the same way as when first deployment the cluster. But unlike the initial deployment, all necessary configuration files will be copied from the first server specified in the inventory file in the "balancers" group.

Steps to add a new balancer node:

Note: Used if the with_haproxy_load_balancing variable is set to true

Add a new node to the inventory file with the variable new_node=true
Run add_balancer.yml playbook

In this example, we add a balancer node with the IP address 10.128.64.144

[balancers]
10.128.64.140
10.128.64.142
10.128.64.143
10.128.64.144 new_node=true

Run playbook:

ansible-playbook add_balancer.yml

Restore and Cloning

Create new clusters from your existing backups with pgBackRest or WAL-G
Point-In-Time-Recovery

Click here to expand...

Create cluster with pgBackRest:

Edit the main.yml variable file

patroni_cluster_bootstrap_method: "pgbackrest"

patroni_create_replica_methods:
  - pgbackrest
  - basebackup

postgresql_restore_command: "pgbackrest --stanza={{ pgbackrest_stanza }} archive-get %f %p"

pgbackrest_install: true
pgbackrest_stanza: "stanza_name"  # specify your --stanza
pgbackrest_repo_type: "posix"  # or "s3"
pgbackrest_repo_host: "ip-address"  # dedicated repository host (if repo_type: "posix")
pgbackrest_repo_user: "postgres"  # if "repo_host" is set
pgbackrest_conf:  # see more options https://pgbackrest.org/configuration.html
  global:  # [global] section
    - {option: "xxxxxxx", value: "xxxxxxx"}
    ...
  stanza:  # [stanza_name] section
    - {option: "xxxxxxx", value: "xxxxxxx"}
    ...
    
pgbackrest_patroni_cluster_restore_command:
  '/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --type=time "--target=2020-06-01 11:00:00+03" --delta restore'

example for S3 #40 (comment)

Run playbook:

ansible-playbook deploy_pgcluster.yml

Create cluster with WAL-G:

Edit the main.yml variable file

patroni_cluster_bootstrap_method: "wal-g"

patroni_create_replica_methods:
  - wal_g
  - basebackup

postgresql_restore_command: "wal-g wal-fetch %f %p"

wal_g_install: true
wal_g_version: "2.0.1"
wal_g_json:  # config https://github.com/wal-g/wal-g#configuration
  - {option: "xxxxxxx", value: "xxxxxxx"}
  - {option: "xxxxxxx", value: "xxxxxxx"}
  ...
wal_g_patroni_cluster_bootstrap_command: "wal-g backup-fetch {{ postgresql_data_dir }} LATEST"

Run playbook:

ansible-playbook deploy_pgcluster.yml

Point-In-Time-Recovery:

You can run automatic restore of your existing patroni cluster
for PITR, specify the required parameters in the main.yml variable file and run the playbook with the tag:

ansible-playbook deploy_pgcluster.yml --tags point_in_time_recovery

Recovery steps with pgBackRest:

1. Stop patroni service on the Replica servers (if running);
2. Stop patroni service on the Master server;
3. Remove patroni cluster "xxxxxxx" from DCS (if exist);
4. Run "/usr/bin/pgbackrest --stanza=xxxxxxx --delta restore" on Master;
5. Run "/usr/bin/pgbackrest --stanza=xxxxxxx --delta restore" on Replica (if patroni_create_replica_methods: "pgbackrest");
6. Waiting for restore from backup (timeout 24 hours);
7. Start PostgreSQL for Recovery (master and replicas);
8. Waiting for PostgreSQL Recovery to complete (WAL apply);
9. Stop PostgreSQL instance (if running);
10. Disable PostgreSQL archive_command (if enabled);
11. Start patroni service on the Master server;
12. Check PostgreSQL is started and accepting connections on Master;
13. Make sure the postgresql users (superuser and replication) are present, and password does not differ from the specified in vars/main.yml;
14. Update postgresql authentication parameter in patroni.yml (if superuser or replication users is changed);
15. Reload patroni service (if patroni.yml is updated);
16. Start patroni service on Replica servers;
17. Check that the patroni is healthy on the replica server (timeout 10 hours);
18. Check postgresql cluster health (finish).

Why disable archive_command?

This is necessary to avoid conflicts in the archived log storage when archiving WALs. When multiple clusters try to send WALs to the same storage.
For example, when you make multiple clones of a cluster from one backup.

You can change this parameter using patronictl edit-config after restore.
Or set disable_archive_command: false to not disable archive_command after restore.

Maintenance

I recommend that you study the following materials for further maintenance of the cluster:

Changing PostgreSQL configuration parameters

To change the PostgreSQL configuration in a cluster using automation:

Update the postgresql_parameters variable with the desired parameter changes.
- Note: Optionally, set pending_restart: true to automatically restart PostgreSQL if a parameter change requires it.
Execute the config_pgcluster.yml playbook to apply the changes.

Using Git for cluster configuration management (IaC/GitOps)

Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes.
GitOps automates infrastructure updates using a Git workflow with continuous integration (CI) and continuous delivery (CI/CD). When new code is merged, the CI/CD pipeline enacts the change in the environment. Any configuration drift, such as manual changes or errors, is overwritten by GitOps automation so the environment converges on the desired state defined in Git.

Once the cluster is deployed, you can use the config_pgcluster.yml playbook to integrate with Git to manage cluster configurations.
For example, GitHub Action (link), GitLab CI/CD (link)

Details about IaC and GitOps:

Update the PostgreSQL HA Cluster

Use the update_pgcluster.yml playbook for update the PostgreSQL HA Cluster to a new minor version (for example 15.1->15.2, and etc).

Update PostgreSQL

ansible-playbook update_pgcluster.yml -e target=postgres

Update Patroni

ansible-playbook update_pgcluster.yml -e target=patroni

Update all system

includes PostgreSQL and Patroni

ansible-playbook update_pgcluster.yml -e target=system

More details here

PostgreSQL major upgrade

Use the pg_upgrade.yml playbook to upgrade the PostgreSQL to a new major version (for example 14->15, and etc).

Upgrade PostgreSQL

ansible-playbook pg_upgrade.yml -e "pg_old_version=14 pg_new_version=15"

More details here

Disaster Recovery

A high availability cluster provides an automatic failover mechanism, and does not cover all disaster recovery scenarios. You must take care of backing up your data yourself.

etcd

Patroni nodes are dumping the state of the DCS options to disk upon for every change of the configuration into the file patroni.dynamic.json located in the Postgres data directory. The master (patroni leader) is allowed to restore these options from the on-disk dump if these are completely absent from the DCS or if they are invalid.

However, I recommend that you read the disaster recovery guide for the etcd cluster:

etcd disaster recovery

PostgreSQL (databases)

I can recommend the following backup and restore tools:

Do not forget to validate your backups (for example pgbackrest auto).

How to start from scratch

Should you need to start from very beginning, use the playbook remove_cluster.yml.

To prevent the script to be used by accident in a production environment, edit remove_cluster.yml and remove the safety pin. Change these variables accordingly:

remove_postgres: true
remove_etcd: true (or remove_consul)

Run the script and all the data are gone.

ansible-playbook remove_cluster.yml

A new installation can now be made from scratch.

❗ Be careful not to copy this script without the safety pin to the production environment.

Sponsor this project

Join our sponsorship program to directly contribute to our project's growth and gain exclusive access to personalized support. Your sponsorship is crucial for innovation and progress. Become a sponsor today!

Support our work through GitHub Sponsors

Support our work through Patreon

Support our work through a crypto wallet:

USDT (TRC20): TSTSXZzqDCUDHDjZwCpuBkdukjuDZspwjj

License

Licensed under the MIT License. See the LICENSE file for details.

Author

Vitaliy Kukharik (PostgreSQL DBA)
[email protected]

Feedback, bug-reports, requests, ...

Are welcome!

postgresql_cluster's People

Contributors

Stargazers

Watchers

Forkers

antonsmolkov marsdonne anton-pg youcef-f hmilkovi cabecada archetypeconcrete nekulin jions7ihj halasnet raidus crazyeagle555 gridl sergey-gusarov mluchkin hadryan nikonorovi ranjangajare bakink neuroforgede wjwidener xiaolingis forksbot syu-lk4b datadevopscloud cuulee verticaio andrei4ka cccco temafey ivanupsons cxywzx genral73 herrspace sunilkumar2020 simhaonline 2hamed vgrahov jidckii patgpraj shnikita mintonmu seinti-everwage rustwizard technicfly saper44rus donik-ops olegorsha rasooll nnps255 shravan1978 swipswaps pavelkogen fashgubben chalalaz gaecom yusufbulentavci berkanyiildirim kerd davar-playgrounds maintel-icon btlzdravtech jmartign hamnsk ibot3 sncodegit swenum baltazar-demon minge-b kikijolicoeur goryszewskig helldweller franchb imanmousavi serj-lubin erzhick10 digitalocean omkarav dinolinjob muhammadjamee peteryang huiyalin525 edugodoi nikolays postgres-ai docdocker hmaster20 sp1022 mktsnix pymma amnonbb bnei-baruch thyroxine nishantcha1997 vansalat 1gor gititdom staninprague zhuomingliang jonathanspw

postgresql_cluster's Issues

Support for Transparent Data Encryption

Hello,
Can you tell me how to support Transparent Data Encryption inside this setup?
Regards

Postgres cant start with changed postgresql_data_dir

Hi!
I want store all postgresql data in other directory
I changed postgresql_data_dir: "/data/pgsql/{{ postgresql_version }}/data"
While systemctl started, it searches bases in /var/lib/pgsql/... directory
Its default environment in /usr/lib/systemd/system/postgresql-12.service
I solved it manual but it need to be automated, yes?no?

OS: RedHat 7
Cluster: 3 servers

Error : Centos replica node in "start failed " status

Deploy postgres-cluster :
[master]
clusterProd (debian-9,postgresql-10) postgresql_exists='true'

[replica]
poolback2 (debian-9,postgresql-10) postgresql_exists='true'
TomcatSlave (CentOS Linux release 7.6, postgresql-10) postgresql_exists='true'

--> Results:
ok: [197.13.12.131] => {
"patronictl_result.stdout_lines": [
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+",
"| Cluster | Member | Host | Role | State | TL | Lag in MB |",
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+",
"| postgres-cluster | TomcatSlave | IP:5434 | | start failed | | unknown |",
"| postgres-cluster | clusterProd | IP:5434 | Leader | running | 119 | |",
"| postgres-cluster | poolback2 | I:5434 | | running | 119 | 0 |",
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+"

---> failed to start postgresql in TomcatSlave node

[root@TomcatSlave ~]# systemctl status postgresql-10 -l
● postgresql-10.service - PostgreSQL 10 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2020-01-31 00:09:52 CET; 6s ago
Docs: https://www.postgresql.org/docs/10/static/
Process: 535 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
Process: 529 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 535 (code=exited, status=1/FAILURE)

Jan 31 00:09:52 TomcatSlave systemd[1]: Starting PostgreSQL 10 database server...
Jan 31 00:09:52 TomcatSlave postmaster[535]: 2020-01-30 23:09:52.867 GMT [535] LOG: could not open configuration file "/var/lib/pgsql/10/data/postgresql.replication.conf": No such file or directory
Jan 31 00:09:52 TomcatSlave postmaster[535]: 2020-01-30 23:09:52.867 GMT [535] FATAL: configuration file "/var/lib/pgsql/10/data/postgresql.conf" contains errors
Jan 31 00:09:52 TomcatSlave systemd[1]: postgresql-10.service: main process exited, code=exited, status=1/FAILURE
Jan 31 00:09:52 TomcatSlave systemd[1]: Failed to start PostgreSQL 10 database server.
Jan 31 00:09:52 TomcatSlave systemd[1]: Unit postgresql-10.service entered failed state.
Jan 31 00:09:52 TomcatSlave systemd[1]: postgresql-10.service failed.

journalctl -xe
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[715efbb63c01704
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2
Jan 31 00:17:54 TomcatSlave etcd[16715]: request sent was ignored (cluster ID mismatch: peer[d9fb87ce1e610e2

Best regards

A slight tweak to your code

Hi,

Excellent work BTW.

FYI, I had to add the owner, group and mode lines to the following code in roles/patroni/tasks/main.yml to overcome restrictions on a hardened Centos 7 image I was using. It would be appreciated if you could merge this tweak in to master.

# Patroni configure
- name: Create conf directory
  file:
    path: /etc/patroni
    state: directory
    owner: postgres
    group: postgres
    mode: 0750
  tags: patroni, patroni_conf

Regards
Carl

вариант реализации callback.sh

Добрый день!

Спасибо большое за вашу работу!

Предложение:
предлагаю добавить в master ветку простую default реализацию информирования об изменении ролей кластера
приложил свой файл callback.sh.j2
для отсылки письма я использую программу mutt, а для настройки написал простую роль email которую также вызываю при настройке кластера в deploy_pgcluster.yml

    - role: email
      when: email_install|bool

Вид формируемого письма:

callback.sh.zip

Добавил переменную в которой указываю кому присылать письмо
patroni_callback_email

В каталог roles\patroni\templates
добавил приложенный callback.sh.j2 файл и поправил файл patroni.yml.j2

  callbacks:
    on_start: /etc/patroni/callback.sh
    on_stop: /etc/patroni/callback.sh
    on_restart: /etc/patroni/callback.sh
    on_reload: /etc/patroni/callback.sh
    on_role_change: /etc/patroni/callback.sh

В файл roles\patroni\tasks\main.yml

- name: Generate callback file "/etc/patroni/callback.sh"
  template:
    src: templates/callback.sh.j2
    dest: /etc/patroni/callback.sh
    owner: postgres
    group: postgres
    mode: 0755
  when: existing_pgcluster is not defined or not existing_pgcluster|bool
  tags: patroni, patroni_conf

и в - block: # for add_pgnode.yml

    - name: Fetch callback.sh file from master
      run_once: true
      fetch:
        src: /etc/patroni/callback.sh
        dest: files/callback.sh
        validate_checksum: true
        flat: true
      delegate_to: "{{ groups.master[0] }}"

    - name: Copy callback.sh conf file to replica

      copy:
        src: files/callback.sh
        dest: /etc/patroni/callback.sh
        owner: postgres
        group: postgres
        mode: 0640

детали моей роли email:
vars/main.yml

# to set up mail
email_install: true                             # true if you need to configure mail
#email_realname: "db-postgres"                  # name for email send for "from". Comment for value as ansible_hostname 
email_machine_name: "email.dvl.ru"              # name email server
email_domen_name: "dvl.ru"                      # part of name is "Hostname"
email_for_test: "[email protected]"              # address for checking the health of mail settings

файл с ролью приложил, вдруг кому-то это будет полезно, а вдруг это понравится и войдет в настройку кластера :)

email.yml.zip

Ещё раз большое спасибо за вашу работу!

проблема в patroni.service.j2

Добрый день,

в файле сервиса для патрони есть вот такая строка
WorkingDirectory=~
при запуске видим вот такое сообщение:

Sep 10 15:15:55 sfactspostgres1 systemd[1]: [/etc/systemd/system/patroni.service:14] Not an absolute path, ignoring: ~
Sep 10 15:15:55 sfactspostgres1 systemd[1]: [/etc/systemd/system/patroni.service:14] Not an absolute path, ignoring: ~

в systemd сервисных файлах нужно указывать абсолютные пути.

Роли для maintenace режима patroni

Предлагаю добавить 2 простые роли

patroni_pause
patroni_resume

их очень удобно использовать во время написания своих доп.ролей по обслуживанию кластера

- name: Prepare patroni | pause patroni ...
  run_once: true
  become: true
  become_user: root
  command: "/usr/local/bin/patronictl -c /etc/patroni/patroni.yml pause --wait {{ patroni_cluster_name }}"
  ignore_errors: yes

- name: Prepare patroni | resume patroni ...
  run_once: true
  become: true
  become_user: root
  command: "/usr/local/bin/patronictl -c /etc/patroni/patroni.yml resume --wait {{ patroni_cluster_name }}"
  ignore_errors: yes

Can you use latest version of etcd

Hi,

Your solution is using older version of etcd (etcd_ver: "v3.3.15"). Can you pls look into using the latest version of etcd? BTW, pls note that etcdctl command to check cluster health may not work in the latest version of etcd, and there might be a different approach.

Patroni 2.0.0 и подготовка локалей на машине

На днях вышла новая версия патрони 2.0.0 и я неожиданно столкнулся с проблемой при равёртывании кластера.
У меня ошибка выглядела так (часть из ansible.log):

TASK [deploy-finish : Check postgresql cluster health] **************************************************************************
fatal: [10.10.2.16]: FAILED! => {"changed": false, "cmd": ["patronictl", "-c", "/etc/patroni/patroni.yml", "list"], "delta": "0:00:00.379505", "end": "2020-09-11 10:09:38.744379", "msg": "non-zero return code", "rc": 1, "start": "2020-09-11 10:09:38.364874", 
"stderr": "Traceback (most recent call last):\n  File \"/usr/local/bin/patronictl\", line 11, in <module>\n
    sys.exit(ctl())\n  File \"/usr/local/lib/python3.6/site-packages/click/core.py\", line 829, in __call__\n
    return self.main(*args, **kwargs)\n
  File \"/usr/local/lib/python3.6/site-packages/click/core.py\", line 760, in main\n
    _verify_python3_env()\n
  File \"/usr/local/lib/python3.6/site-packages/click/_unicodefun.py\", line 130, in _verify_python3_env\n
    \" mitigation steps.{}\".format(extra)\nRuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/python3/ for mitigation steps.\n\n
This system lists a couple of UTF-8 supporting locales that you can pick from. The following suitable locales were discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, ....

здесь мы видим сообщение от одной из библиотек питона что-то насчет кодировок...

Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/python3/ for mitigation steps.

и как оказалось само развертывание patroni было сложным :) (часть из лога патрони):

patroni[62783]: /var/run/postgresql:5432 - rejecting connections
patroni[62783]: 2020-09-11 10:07:16,494 ERROR: Can not fetch local timeline and lsn from replication connection
patroni[62783]: Traceback (most recent call last):
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 735, in get_replica_timeline
patroni[62783]: with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
patroni[62783]: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
patroni[62783]: return next(self.gen)
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 730, in get_replication_connection_cursor
patroni[62783]: with get_connection_cursor(**conn_kwargs) as cur:
patroni[62783]: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
patroni[62783]: return next(self.gen)
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
patroni[62783]: with psycopg2.connect(**kwargs) as conn:
patroni[62783]: File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 127, in connect
patroni[62783]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
patroni[62783]: psycopg2.OperationalError: could not connect to server: Connection refused
patroni[62783]: Is the server running on host "localhost" (::1) and accepting
patroni[62783]: TCP/IP connections on port 5432?
patroni[62783]: FATAL:  password authentication failed for user "replicator"

и эта ошибка повторялась несколько раз подряд, пока не смогла разрешиться...

patroni[62783]: 2020-09-11 10:07:19,770 ERROR: Can not fetch local timeline and lsn from replication connection
patroni[62783]: Traceback (most recent call last):
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 735, in get_replica_timeline
patroni[62783]: with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
patroni[62783]: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
patroni[62783]: return next(self.gen)
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 730, in get_replication_connection_cursor
patroni[62783]: with get_connection_cursor(**conn_kwargs) as cur:
patroni[62783]: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
patroni[62783]: return next(self.gen)
patroni[62783]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
patroni[62783]: with psycopg2.connect(**kwargs) as conn:
patroni[62783]: File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 127, in connect
patroni[62783]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
patroni[62783]: psycopg2.OperationalError: could not connect to server: Connection refused
patroni[62783]: Is the server running on host "localhost" (::1) and accepting
patroni[62783]: TCP/IP connections on port 5432?
patroni[62783]: FATAL:  password authentication failed for user "replicator"
patroni[62783]: 2020-09-11 10:07:25,406 INFO: Lock owner: sfactspostgres1; I am sfactspostgres1
patroni[62783]: 2020-09-11 10:07:25,861 INFO: Lock owner: sfactspostgres1; I am sfactspostgres1
patroni[62783]: 2020-09-11 10:07:26,576 INFO: no action.  i am the leader with the lock
patroni[62783]: 2020-09-11 10:07:27,486 INFO: Lock owner: sfactspostgres1; I am sfactspostgres1
patroni[62783]: 2020-09-11 10:07:27,692 INFO: no action.  i am the leader with the lock

за это время БД успела уже переключиться на Timeline 4, когда при обычном режиме равёртывания я всегда получал на выходе Timeline 2
Поиск проблемы занял пару дней и ещё пару дней экспериментов.

Итого:
добавление в файл /etc/environment язык активной локали (для английского):

export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8

полностью решает эту проблему!

Патрони ставится без проблем и ошибок :)

Предлагаю в роль locales доработать и прописывать в этот файл строчки текущей локали.

Please add install wal-g if wal-g = true

postgresql cluster (patroni) with PostgresPro

Compatibility with Postgres Pro Standard

❗ To use any version of Postgres Pro, you must purchase a license. You can get the version of the DBMS you are interested in for free for testing, exploring the capabilities of the DBMS, and developing application software.

Рефакторинг - Имена переменных

Добрый день!

Спасибо большое за вашу работу!

Предложение:
есть переменная: install_pgbouncer однако если посмотреть на имена других подобных переменных, то мы увидим:
patroni_install_version
wal_g_install
pgbackrest_install
предлагаю изменить имя переменной на: pgbouncer_install

Ещё раз большое спасибо за вашу работу!

How to use vipmanager for default port 5432

hi @vitabaks

Thank you for this ansible repo to setup HA postgres.

I have a requirement where the VIP needs to listen on 5432 as the web server is configured only to connect to postgres on 5432. Unfortunately I cannot change this code to get the application to use other port for postgres.

Can you please advise what is the best way to setup the vip to listen on 5432?

Missing ‘|’ in 'tasks/add-repository.yml' at line 30

TASK [Add repository] ************************************************************************************************************************************************************************************************* fatal: [192.168.56.136]: FAILED! => {"msg": "The conditional check 'yum_repository length > 0' failed. The error was: template error while templating string: expected token 'end of statement block', got 'length'. String: {% if yum_repository length > 0 %} True {% else %} False {% endif %}\n\nThe error appears to have been in '/root/postgresql_cluster/tasks/add-repository.yml': line 23, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - name: Add repository\n ^ here\n"}

the code is

when: yum_repository length > 0

should be

when: yum_repository | length > 0

Patroni fails to start when `patroni_superuser_password` and `patroni_replication_password` are set to digits only phrases

Apparently when Patroni wants to start it uses String.encode() on password phrases and when we set the password to something like 123456 it is stored as a number thus failing to execute encode on it and consequently the patroni service fails to start.

Error init cluster from pgbackrest backup

Привет!
Я тут столкнулся с проблемой при инициализации кластера из бэкапа pgbackrest

TASK [patroni : Run "/usr/bin/pgbackrest --stanza=sdelka --delta restore" on Master] ******************************************************************************************************************************
changed: [172.30.102.110]
Friday 07 August 2020  12:17:43 +0500 (0:00:02.191)       0:03:13.031 ********* 
Friday 07 August 2020  12:17:43 +0500 (0:00:00.144)       0:03:13.176 ********* 

TASK [patroni : Waiting for restore from backup] ******************************************************************************************************************************************************************
failed: [172.30.102.110] (item=True) => {"ansible_job_id": "317881301454.4512", "ansible_loop_var": "item", "attempts": 1, "changed": false, "cmd": "/var/tmp/ansible-tmp-1596784661.27-18286-75430763482321/AnsiballZ_command.py", "finished": 1, "item": {"ansible_job_id": "317881301454.4512", "changed": true, "failed": false, "finished": 0, "results_file": "/var/lib/postgresql/.ansible_async/317881301454.4512", "started": 1}, "msg": "[Errno 13] Permission denied: '/var/tmp/ansible-tmp-1596784661.27-18286-75430763482321/AnsiballZ_command.py'", "outdata": "", "stderr": "", "stderr_lines": []}

Мне кажется этот блог выглядит как магия ...
Может как то упростить этот кусок ?

 - block:  # for pgbackrest only (for use --delta restore)
        - name: Run "{{ pgbackrest_patroni_cluster_restore_command }}" on Master
          command: >
            {{ pgbackrest_patroni_cluster_restore_command }}
            {{ '--target-action=promote' if pgbackrest_patroni_cluster_restore_command is search('--type=') else '' }}
          async: 86400  # timeout 24 hours
          poll: 0
          register: pgbackrest_restore_master
          when: is_master == "true"

          # if patroni_create_replica_methods: "pgbackrest"
        - name: Run "{{ pgbackrest_patroni_cluster_restore_command }}" on Replica
          command: >
            {{ pgbackrest_patroni_cluster_restore_command }}
            {{ '--target-action=shutdown' if pgbackrest_patroni_cluster_restore_command is search('--type=') else '' }}
          async: 86400  # timeout 24 hours
          poll: 0
          register: pgbackrest_restore_replica
          when: is_master != "true" and 'pgbackrest' in patroni_create_replica_methods

        - name: Waiting for restore from backup
          async_status:
            jid: "{{ item.ansible_job_id }}"
          loop:
            - "{{ pgbackrest_restore_master }}"
            - "{{ pgbackrest_restore_replica }}"
          loop_control:
            label: "{{ item.changed }}"
          register: pgbackrest_restore_jobs_result
          until: pgbackrest_restore_jobs_result.finished
          retries: 2880  # timeout 24 hours
          delay: 30
          when: item.ansible_job_id is defined

What do you think of the integrated pgpool-ii at the front end of this cluster, using it to proxy read and write separation

i try it，it‘s work。

i deployed pgpool-ii， set one backend host with cluster-vip, port 5000 and 'ALWAYS_MASTER' , and another backend host with cluster-vip, port 5001 and to 'DISALLOW_TO_FAILOVER' , and 'load_balance_mode = on'.

then，the application only need to connect pgpool-ii, just like a pgsql server.

haproxy failed to start after the host restarts，log is "Starting frontend GLOBAL: cannot bind UNIX socket [/run/haproxy/admin.sock]"

haproxy failed to start after the host restarts, i searched the log by:

journalctl -b -0 -u haproxy

got a message:

Sep 19 21:25:11 pgnode01 haproxy[2437]: [ALERT] 261/212511 (2437) : Starting frontend GLOBAL: cannot bind UNIX socket [/run/haproxy/admin.sock]

run: ll /run/haproxy is: cannot access /run/haproxy: No such file or directory
run: ‘mkdir /run/haproxy’ and systemctl start haproxy it has started

but，Why is it running very well when it was just deployed?

vip-manager Destination directory /etc/patroni does not exist (CentOS 7.6)

Download

git clone https://github.com/vitabaks/postgresql_cluster

git diff

diff --git a/inventory b/inventory
index ca71651..73b8ed9 100644
--- a/inventory
+++ b/inventory
@@ -7,25 +7,23 @@
 
 # (if dcs_exists: 'false' and dcs_type: 'etcd')
 [etcd_cluster]  # recommendation: 3 or 5-7 nodes
-10.128.64.140
-10.128.64.142
-10.128.64.143
-
+172.26.9.254
+172.26.9.11
+172.26.9.10
 
 # (if with_haproxy_load_balancing: 'true')
 [balancers]
-10.128.64.140
-10.128.64.142
-10.128.64.143
-
+172.26.9.254
+172.26.9.11
+172.26.9.10
 
 # PostgreSQL nodes
 [master]
-10.128.64.140 hostname=pgnode01 postgresql_exists='false'
+172.26.9.254 hostname=pgnode01 postgresql_exists='false'
 
 [replica]
-10.128.64.142 hostname=pgnode02
-10.128.64.143 hostname=pgnode03
+172.26.9.11 hostname=pgnode02
+172.26.9.10 hostname=pgnode03
 
 [postgres_cluster:children]
 master
@@ -40,8 +38,8 @@ replica
 [all:vars]
 ansible_connection='ssh'
 ansible_ssh_port='22'
-ansible_user='root'
-ansible_ssh_pass='testpas'  # "sshpass" package is required for use "ansible_ssh_pass"
+ansible_user='centos'
+#ansible_ssh_pass='testpas'  # "sshpass" package is required for use "ansible_ssh_pass"
 #ansible_ssh_private_key_file=
 # ansible_python_interpreter='/usr/bin/python3'  # is required for use python3
 
diff --git a/vars/main.yml b/vars/main.yml
index 2602498..8e3968f 100644
--- a/vars/main.yml
+++ b/vars/main.yml
@@ -7,7 +7,7 @@ proxy_env: {}
 # -------------------------------------------
 
 # Cluster variables
-cluster_vip: "10.128.64.145" # for client access to databases in the cluster
+cluster_vip: "172.26.9.252" # for client access to databases in the cluster
 vip_interface: "{{ ansible_default_ipv4.interface }}" # interface name (ex. "ens32")
 
 patroni_cluster_name: "postgres-cluster"  # specify the cluster name

Error on CentOS 7.6:

TASK [vip-manager | generate conf file "/etc/patroni/vip-manager.yml"] *************************************************************************
fatal: [172.26.9.254]: FAILED! => {"changed": false, "checksum": "de29ab222f4365c6b4b6bb065939907042311d66", "msg": "Destination directory /etc/patroni does not exist"}
fatal: [172.26.9.11]: FAILED! => {"changed": false, "checksum": "84806428038495ebdb917ecbca97857ca4f5a2ba", "msg": "Destination directory /etc/patroni does not exist"}
fatal: [172.26.9.10]: FAILED! => {"changed": false, "checksum": "8d8073734897b6045fc164b39a97e25c5e6fdd83", "msg": "Destination directory /etc/patroni does not exist"}

NO MORE HOSTS LEFT *****************************************************************************************************************************

PLAY RECAP *************************************************************************************************************************************
172.26.9.10                : ok=54   changed=37   unreachable=0    failed=1    skipped=120  rescued=0    ignored=1   
172.26.9.11                : ok=54   changed=37   unreachable=0    failed=1    skipped=120  rescued=0    ignored=1   
172.26.9.254               : ok=56   changed=37   unreachable=0    failed=1    skipped=124  rescued=0    ignored=1

postgresql cluster (patroni) with timescaledb.

Question:

Could you please send across, or the changes that i need to make on postgresql cluster to make it work for timescaledb.

Answer:

Everything is very simple!
And I myself use timescaledb in production in this cluster scheme.

Please read the documentation to get started: https://docs.timescale.com/latest/getting-started/installation/

Example for:
Installation on: Debian
Install method: apt
PostgreSQL version: 15

Add timescaledb repository to download packages (in the /vars/Debian.yml variable file):

apt_repository_keys:
  - key: "https://www.postgresql.org/media/keys/ACCC4CF8.asc" # postgresql repository apt key
  - key: "https://packagecloud.io/timescale/timescaledb/gpgkey" # timescaledb repository apt key

apt_repository:
  - repo: "deb http://apt.postgresql.org/pub/repos/apt/ {{ ansible_distribution_release }}-pgdg main" # postgresql apt repository
  - repo: "deb https://packagecloud.io/timescale/timescaledb/debian/ {{ ansible_distribution_release }} main" # timescaledb apt repository

Add timescaledb packages for automatic installation in the postgresql_packages variable (in the /vars/Debian.yml variable file):

postgresql_packages:
  - postgresql-{{ postgresql_version }}
  - postgresql-client-{{ postgresql_version }}
  - postgresql-server-dev-{{ postgresql_version }}
  - postgresql-contrib-{{ postgresql_version }}
  - timescaledb-2-postgresql-{{ postgresql_version }}

Add/edit the necessary PostgreSQL parameters for timescaledb (in the /vars/main.yml variable file):

postgresql_version: "15"

postgresql_users: # this is optional
postgresql_databases: # this is optional
postgresql_extensions:
  - {ext: "timescaledb", db: "postgres"} # or my database name

postgresql_parameters:
  - {option: "max_locks_per_transaction", value: "512"}
  - {option: "shared_preload_libraries", value: "timescaledb"}

Specify all other variables according to your personal requirements for the database cluster.

Deployment: quick start

extra]

LXC containers: ignore errors in handlers io-scheduler and transparent_huge_pages

LXC containers will not allow to modify those values

see attached diff
handlers_diff.txt

provide gpgcheck option of yum_repository

some repository need gpgcheck: no

can you add option for 'yum_repository', just like:

Repository (optional)

yum_repository: []

name: "repo name"
description: "repo description"
baseurl: "https://repo.url"
gpgkey: "https://key.url"
gpgcheck: yes
`

and use it in 'tasks/add-repository.yml':

RedHat CentOS

block:
- name: Add repository
  yum_repository:
  name: "{{ item.name }}"
  description: "{{ item.description }}"
  baseurl: "{{ item.baseurl }}"
  gpgkey: "{{ item.gpgkey }}"
  gpgcheck: "{{ item.gpgcheck }}"
  loop: "{{ yum_repository | flatten(1) }}"
  when: yum_repository | length > 0
  `

Implement offline installation support without a proxy server

RU: Хотелось бы, что бы вы реализовали поддержку offline установки без прокси сервера.

EN: I would like you to implement support for offline installations without a proxy server.

http vs https

Добрый день!

Спасибо большое за вашу работу!

Сейчас в коде в разных местах используется доступ в Internet и http и https

Предложение:
Сделать везде https, так как во многих компаниях, и у меня в том числе, открывают на серверах только https выход в Internet.
Как вариант ввести параметризацию через переменную.

Ещё раз большое спасибо за вашу работу!

ERROR! 'cluster_vip' is undefined

Очень странное поведение, по пути выполнение ансибл как будто теряет переменную, при чём каждый раз в разных местах. При том в той таске в которой идёт выполнение, она вообще может не использоваться.

но тут все переменные в файле, перенеёс в inventory/dev/all.yaml

всё из vars/main.yml и vars/system.yml
по сути больше почти ничего не правил...

запускаю :

$ ansible-playbook -i inventory/dev/ balancers.yml

падает вот так:

TASK [Generate conf file "/etc/confd/conf.d/haproxy.toml"] ********************************************************************************************************************************************************
Wednesday 26 August 2020  13:15:54 +0500 (0:00:01.561)       0:00:47.078 ****** 
ERROR! 'cluster_vip' is undefined

запускаю снова, падает уже на другой таске

TASK [Generate conf file "/etc/confd/conf.d/haproxy.toml"] ********************************************************************************************************************************************************
Wednesday 26 August 2020  13:20:42 +0500 (0:00:01.862)       0:00:42.198 ****** 
ok: [172.30.101.212]
ok: [172.30.101.210]
ok: [172.30.101.211]

TASK [Generate template "/etc/confd/templates/haproxy.tmpl"] ******************************************************************************************************************************************************
Wednesday 26 August 2020  13:20:44 +0500 (0:00:01.549)       0:00:43.747 ****** 
ERROR! 'cluster_vip' is undefined

в дебаге инфы не больше

$ ansible-playbook -i inventory/dev/ balancers.yml  -vvv

https://paste2.org/gED0YdAg

В чём может быть проблема ?
Впервые с таким встречаюсь и только в этом плейбуке...

Problem with add-repository.yml task

With ansible installed on Centos 7 :

$ansible --version
ansible 2.9.1
.....
python version = 2.7.5 (default, Aug 7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

pgnode01 and pgnode02 on Debian 9

uname -a

Linux pgnode01 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux

python --version

Python 2.7.13

problem with add-reposotory.yml task :
TASK [Add repository] ***********************************************************************
task path: /home/iset/postgresql_cluster/tasks/add-repository.yml:19
The full traceback is:
WARNING: The below traceback may not be related to the actual failure.
File "/tmp/ansible_apt_repository_payload_mZNQU0/ansible_apt_repository_payload.zip/ansible/modules/packaging/os/apt_repository.py", line 548, in main
File "/usr/lib/python2.7/dist-packages/apt/cache.py", line 464, in update
raise FetchFailedException(e)

failed: [192.168.4.175] (item={u'repo': u'deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main'}) => {
"ansible_loop_var": "item",
"changed": false,
"invocation": {
"module_args": {
"codename": null,
"filename": null,
"install_python_apt": true,
"mode": null,
"repo": "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main",
"state": "present",
"update_cache": true,
"validate_certs": true
}
},
"item": {
"repo": "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main"
},
"msg": "apt cache update failed"
}

.......

task path: /home/iset/postgresql_cluster/tasks/add-repository.yml:19
failed: [192.168.4.175] (item={u'repo': u'deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main'}) => {"ansible_loop_var": "item", "changed": false, "item": {"repo": "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main"}, "msg": "apt cache update failed"}
failed: [192.168.4.174] (item={u'repo': u'deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main'}) => {"ansible_loop_var": "item", "changed": false, "item": {"repo": "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main"}, "msg": "apt cache update failed"}

Best regards

Hello, I seem to have a problem. I always fail in this place

failed: [xxx.xxx.xx.xx] (item=python36-devel) => {"ansible_loop_var": "item", "changed": false, "changes": {"installed": ["python36-devel"]}, "item": "python36-devel", "msg": "\n\nTransaction check error:\n file /etc/rpm/macros.python from install of python-rpm-macros-3-32.el7.noarch conflicts with file from package python-devel-2.7.5-80.el7_6.x86_64\n\nError Summary\n-------------\n\n", "rc": 1, "results": ["Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\nPackage python36-devel is obsoleted by python3-devel, trying to install python3-devel-3.6.8-10.el7.x86_64 instead\nResolving Dependencies\n--> Running transaction check\n---> Package python3-devel.x86_64 0:3.6.8-10.el7 will be installed\n--> Processing Dependency: redhat-rpm-config for package: python3-devel-3.6.8-10.el7.x86_64\n--> Processing Dependency: python3-rpm-macros for package: python3-devel-3.6.8-10.el7.x86_64\n--> Processing Dependency: python3-rpm-generators for package: python3-devel-3.6.8-10.el7.x86_64\n--> Processing Dependency: python-rpm-macros for package: python3-devel-3.6.8-10.el7.x86_64\n--> Running transaction check\n---> Package python-rpm-macros.noarch 0:3-32.el7 will be installed\n--> Processing Dependency: python-srpm-macros for package: python-rpm-macros-3-32.el7.noarch\n---> Package python3-rpm-generators.noarch 0:6-2.el7 will be installed\n---> Package python3-rpm-macros.noarch 0:3-32.el7 will be installed\n---> Package redhat-rpm-config.noarch 0:9.1.0-88.el7.centos will be installed\n--> Processing Dependency: dwz >= 0.4 for package: redhat-rpm-config-9.1.0-88.el7.centos.noarch\n--> Processing Dependency: zip for package: redhat-rpm-config-9.1.0-88.el7.centos.noarch\n--> Processing Dependency: perl-srpm-macros for package: redhat-rpm-config-9.1.0-88.el7.centos.noarch\n--> Running transaction check\n---> Package dwz.x86_64 0:0.11-3.el7 will be installed\n---> Package perl-srpm-macros.noarch 0:1-8.el7 will be installed\n---> Package python-srpm-macros.noarch 0:3-32.el7 will be installed\n---> Package zip.x86_64 0:3.0-11.el7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package Arch Version Repository\n Size\n================================================================================\nInstalling:\n python3-devel x86_64 3.6.8-10.el7 base 215 k\nInstalling for dependencies:\n dwz x86_64 0.11-3.el7 base 99 k\n perl-srpm-macros noarch 1-8.el7 base 4.6 k\n python-rpm-macros noarch 3-32.el7 base 8.8 k\n python-srpm-macros noarch 3-32.el7 base 8.4 k\n python3-rpm-generators noarch 6-2.el7 base 20 k\n python3-rpm-macros noarch 3-32.el7 base 7.7 k\n redhat-rpm-config noarch 9.1.0-88.el7.centos base 81 k\n zip x86_64 3.0-11.el7 base 260 k\n\nTransaction Summary\n================================================================================\nInstall 1 Package (+8 Dependent packages)\n\nTotal download size: 704 k\nInstalled size: 1.8 M\nDownloading packages:\n--------------------------------------------------------------------------------\nTotal 411 kB/s | 704 kB 00:01 \nRunning transaction check\nRunning transaction test\n"]}

Offline installation

Good afternoon!
Please create a complete list of all packages for the packages_from _file variable. Specify the packages in the correct order. Thanks for the work you've done!

Добрый день!
Пожалуйста, создайте полный список всех пакетов для переменной packages_from _file. Укажите пакеты в правильном порядке. Спасибо за проделанную работу!

Vip-manager access?

Hi!

Thanks for such a good repo. I'm trying to deploy it to 3 nodes in private network on DO. Everything works fine, but I can't understand how to access db through vip-manager (so I always access db though one ip address).

What should I put to vip-manager in vars/main.yml?

Install haproxy from repo

For Centos 7 install haproxy 1.8:

yum install centos-release-scl
yum install rh-haproxy18-haproxy rh-haproxy18-haproxy-syspaths

Oracle Linux:
https://yum.oracle.com/repo/OracleLinux/OL7/SoftwareCollections/x86_64/
for example:
https://www.thegeekdiary.com/how-to-enable-php-7-0-and-httpd24-on-oracle-linux-7/

RHEL:
https://www.softwarecollections.org/en/scls/rhscl/rh-haproxy18/

Debian/Ubuntu
https://haproxy.debian.net

Problem with postgresql cluster deployment on RHEL/CentOS 8.x (pgbouncer install failed)

FYI

An error occurred during the installation of the pgbouncer rpm package from the postgresql (pgdg) yum repository:

TASK [PgBouncer | install package] ***************************************************************
fatal: [10.128.64.144]: FAILED! => {"changed": false, "failures": [], "msg": "Depsolve Error occured: \n Problem: package pgbouncer-1.12.0-1.rhel8.x86_64 requires python-psycopg2, but none of the providers can be installed\n  - cannot install the best candidate for the job\n  - package python2-psycopg2-2.8.3-2.rhel8.x86_64 is excluded\n  - package python2-psycopg2-2.8.2-1.rhel8.x86_64 is excluded\n  - package python2-psycopg2-2.8.3-1.rhel8.x86_64 is excluded", "rc": 1, "results": []}

Cause:
pgbouncer-1.12.0-1.rhel8.x86_64.rpm package has specified the "python-psycopg2" (instead of python2-psycopg2 or python3-psycopg2) as dependency which is not in the CentOS and PGDG repositories.

The problem is in the pgbouncer rpm package itself.

Issue:
pgbouncer/pgbouncer#465
https://redmine.postgresql.org/issues/5223

Workaround (RHEL8):
Deploy without pgbouncer connection pooler.

You can disable pgbouncer installation by specifying install_pgbouncer: 'false' in the vars / main.yml variable file.

P.S.
Also, you can try install pgbouncer from source if you need it.

VIP issue

Hi,

We have three node cluster (with ETCD/HAProxy/postgres in all node). We rebooted two nodes. We have following two issues.

VIP automatically assigned to node2 which is not a Leader (node 3 is now leader).
Node 1 is in failed state.

how to solve this. I am new in this patroni postgres cluster.

Thanks

Изменить present->latest в roles\packages\tasks\main.yml

Добрый день!

Спасибо большое за вашу работу!

При эксплуатации наткнулся на проблему не обновления пакета ввиду уже имеющейся инсталяции предыдущей версии, в частности это был пакет python-psycopg2. Анализ показал, что это из-за того, что для установки используется параметр:
state: present
Из-за этого у меня в частности не смог стартовать patroni так как версия пакета оказалась старее чем необходимо.

Предложение:
Предлагаю указать принудительную установку последней версии: state: latest

Ещё раз большое спасибо за вашу работу!

добавить поддержку pg_ident.conf

В моих серверах используется настроечный файл pg_ident.conf
Однако в вашей редакции этот файл обойдён вниманием :)
Предлагаю добавить его поддержку при настройке кластера, так как это один из важных файлов с настройками.
Я добавил такую поддержку по аналогии с файлом pg_hba.conf
добавил в файл vars/main.yml

# specify additional hosts that will be added to the pg_ident.conf
postgresql_pg_ident:
  - {mapname: "main", system_username: "postgres", pg_username: "backup"} # default values is set for pg_probackup utility

в каталог roles/patroni/templates добавил новый шаблон pg_ident.conf.j2 внутри которого вот такой текст:

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
{% for client in postgresql_pg_ident %}
  {{ client.mapname.ljust(25) |default('main') }}{{ client.system_username.ljust(25) |default('postgres') }}{{ client.pg_username.ljust(25) |default('backup') }}
{% endfor %}

в roles/patroni/tasks/main.yml добавил следующие блоки:

- block:  # when postgresql exists (master)
...
    - name: Prepare PostgreSQL | generate pg_ident.conf on Master
      template:
        src: templates/pg_ident.conf.j2
        dest: "{{ postgresql_conf_dir }}/pg_ident.conf"
        owner: postgres
        group: postgres
        mode: 0640
...
- block:  # pg_hba (using a templates/pg_hba.conf.j2)
...
    - name: Prepare PostgreSQL | generate pg_ident.conf
      template:
        src: templates/pg_ident.conf.j2
        dest: "{{ postgresql_conf_dir }}/pg_ident.conf"
        owner: postgres
        group: postgres
        mode: 0640
...
- block:  # for add_pgnode.yml
...
    - name: Prepare PostgreSQL | fetch pg_ident.conf file from master
      run_once: true
      fetch:
        src: "{{ postgresql_conf_dir }}/pg_ident.conf"
        dest: files/pg_ident.conf
        validate_checksum: true
        flat: true
      delegate_to: "{{ groups.master[0] }}"

    - name: Prepare PostgreSQL | copy pg_ident.conf file to replica
      copy:
        src: files/pg_ident.conf
        dest: "{{ postgresql_conf_dir }}/pg_ident.conf"
        owner: postgres
        group: postgres
        mode: 0640
...

Благодарю вас за вашу работу!

Проблема с stats_temp_directory

Добрый день!

Спасибо большое за вашу работу!

Параметр stats_temp_directory в postgres позволяет указать где хранить данные сборщика статистики и этот каталог можно делать прямо в ОЗУ (tmpfs) чтобы улучшить ввод вывод для этих операций. Прекрасно, что в данной реализации скриптов учтен этот важный момент, однако пару моментов не учтены.

нет возможности указать размер для такой директории в ОЗУ
нет возможности сменить имя для этой директории
нет возможности просто отказаться от такой возможности и оставить default вариант - каталог на диске как обычно.

в качестве частичного варианта реализации пункта 3 я использовал --skip-tags "pgsql_stats_tmp", но при этом пришлось вручную комментировать соответствующий параметр в roles/patroni/templates/patroni.yml.j2:
# stats_temp_directory: /var/lib/pgsql_stats_tmp

Предлагаю внести соответствующие параметры для этого в vars/main.yml
stats_temp_directory_install
stats_temp_directory_path
stats_temp_directory_size

haproxy 5000 port write/read not working

Hi,

Thank you very much for the game book.

When I type in the VIP ip address and port 5000 with Zabbix, I want it to load balance but it doesn't work.

Does the VIP ip address not provide both reading and writing to the system over port 5000? I installed the system on Type = A.

Thanks for your help in advance

How best practice to add postgres user and database to postgresql and pgbouncer?

Hello! Thanks for postgresql_cluster
How to add postgres user and database in postgresql_cluster ?

Works on Ubuntu and CentOS but fails on OracleLinux

Hi,

You have created great asset to help people deploy Postgres. Your project is the only one that deploys all (postgres, patroni, haproxy, keepalive, vip), and I am very thrilled to check it out. It deployed on Ubuntu 18.04, but failed on CentOS 7 and OracleLinux 7. Sent you an email with the logs, and it appears that etcd is failing to start due to firewall ports or other reason. Happy to chat with you via zoom/webex if needed.

etcd error

Debian Buster (GNU/Linux 4.19.0-6-amd64 x86_64 )

TASK [etcd cluster | enable and start systemd service] *******************************************************************************************************
fatal: [ip1]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee "systemctl status etcd.service" and "journalctl -xe" for details.\n"}
fatal: [ip2]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee "systemctl status etcd.service" and "journalctl -xe" for details.\n"}
fatal: [ip3]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee "systemctl status etcd.service" and "journalctl -xe" for details.\n"}

Restart
ansible-playbook deploy_pgcluster.yml

TASK [etcd cluster | wait until the etcd cluster is healthy] *************************************************************************************************
fatal: [ip1]: FAILED! => {"attempts": 10, "changed": false, "cmd": ["/usr/local/bin/etcdctl", "cluster-health"], "delta": "0:00:02.014084", "end": "2020-02-03 19:11:56.270724", "msg": "non-zero return code", "rc": 4, "start": "2020-02-03 19:11:54.256640", "stderr": "Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused\n; error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout\n\nerror #0: dial tcp 127.0.0.1:4001: connect: connection refused\nerror #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout", "stderr_lines": ["Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused", "; error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout", "", "error #0: dial tcp 127.0.0.1:4001: connect: connection refused", "error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout"], "stdout": "cluster may be unhealthy: failed to list members", "stdout_lines": ["cluster may be unhealthy: failed to list members"]}

Restart
ansible-playbook deploy_pgcluster.yml

Issue with ansible-role-firewall on Ubuntu 18.04

Thank you so much for the ansible playbooks...gives a good starting point on the steps to get HA setup with postgresql. I am trying to run this playbook on Ubuntu 18.04 (64 bit) and while running the playbook on my virtual box local network... i get the below issue onn the firewall role.. can you please help on this.

TASK [ansible-role-firewall : Load the nf_conntrack_ipv4 module] ***********************************************************************************
fatal: [192.168.1.134]: FAILED! => {"changed": false, "msg": "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'\nmodprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic\n", "name": "nf_conntrack_ipv4", "params": "", "rc": 1, "state": "present", "stderr": "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'\nmodprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic\n", "stderr_lines": ["modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'", "modprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [192.168.1.136]: FAILED! => {"changed": false, "msg": "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'\nmodprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic\n", "name": "nf_conntrack_ipv4", "params": "", "rc": 1, "state": "present", "stderr": "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'\nmodprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic\n", "stderr_lines": ["modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.18.0-15-generic/modules.dep.bin'", "modprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-15-generic"], "stdout": "", "stdout_lines": []}
...ignoring
changed: [192.168.1.135]

TASK [ansible-role-firewall : Configure the kernel to keep connections alive when enabling the firewall] *******************************************
fatal: [192.168.1.134]: FAILED! => {"changed": false, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal: No such file or directory\n"}
...ignoring
changed: [192.168.1.135]
fatal: [192.168.1.136]: FAILED! => {"changed": false, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal: No such file or directory\n"}
...ignoring

TASK [ansible-role-firewall : Configure the firewall service.] *************************************************************************************
fatal: [192.168.1.134]: FAILED! => {"changed": false, "msg": "Unable to start service firewall: Job for firewall.service failed because the control process exited with error code.\nSee "systemctl status firewall.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.136]: FAILED! => {"changed": false, "msg": "Unable to start service firewall: Job for firewall.service failed because the control process exited with error code.\nSee "systemctl status firewall.service" and "journalctl -xe" for details.\n"}
changed: [192.168.1.135]

As a result the firewall service fails to start and the ansible exits ... any pointers would be helpful

Добавление порта zabbix 10050 в firewall_allowed_tcp_ports_for

Добрый день!

Спасибо большое за вашу работу!

Думаю, что многие, а не только я, используют zabbix. В default настройке порт zabbix не входит в список разрешённых...

Предложение:
предлагаю добавить в master ветку в список заранее разрешённых портов порт zabbix - 10050

файл vars/system.yml
...
firewall_allowed_tcp_ports_for:
...
  balancers:
    ...
    - "10050" # Zabbix agent порт

Ещё раз большое спасибо за вашу работу!

Have you considered rapid deployment in offline environments

I want to deploy in an offline environment. What should I do ？
Do I need to download the application and modify the configuration in advance？

Issue importing GPG from Postgres Repository and epel-release

while running the playbook there is an error now regarding GPG importation
"msg": "Failed to validate GPG signature for pgdg-redhat-repo-42.0-13.noarch",
This is maybe due to changes on posgres repository.
The same happen with epel-release. It was needed to add it manually

OS: Centos 8.
About a week ago was working without any issue from my tests.

extra

FATAL: role \"postgres\" does not exist"

I am running into an error at the TASK [get postgresql database list]

Eror msg:

fatal: [10.16.13.14 -> 10.16.13.14]: FAILED! => {"changed": false, "cmd": ["/usr/lib/postgresql/11/bin/psql", "-p", "5432", "-c", "SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, pg_catalog.pg_encoding_to_char(d.encoding) as Encoding, d.datcollate as Collate, d.datctype as Ctype, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END as Size, t.spcname as Tablespace FROM pg_catalog.pg_database d JOIN pg_catalog.pg_tablespace t on d.dattablespace = t.oid WHERE not datistemplate ORDER BY 1"], "delta": "0:00:00.009237", "end": "2020-02-11 13:20:24.826949", "msg": "non-zero return code", "rc": 2, "start": "2020-02-11 13:20:24.817712", "stderr": "psql: FATAL: role "postgres" does not exist", "stderr_lines": ["psql: FATAL: role "postgres" does not exist"], "stdout": "", "stdout_lines": []}
...ignoring

Any suggestions?

Question about "start failed " status

Hi
I deployed a cluster with 3 members;

[master]
clusterProd (debian-9,postgresql-10)

[replica]
poolback2 (debian-9,postgresql-10)
TomcatSlave (CentOS Linux release 7.6, postgresql-10)

output of the task excution
TASK [PostgreSQL Cluster health] ******************************************************
ok: [197.13.12.131] => {
"patronictl_result.stdout_lines": [
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+",
"| Cluster | Member | Host | Role | State | TL | Lag in MB |",
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+",
"| postgres-cluster | TomcatSlave |HidenIP:5434 | | start failed | | unknown |",
"| postgres-cluster | clusterProd | HidenIP:5434 | Leader | running | 118 | |",
"| postgres-cluster | poolback2 | HidenIP:5434 | | running | 118 | 0 |",
"+------------------+-------------+--------------------+--------+--------------+-----+-----------+"
]
}

I can not start the postgresql server on centos node:

_[root@TomcatSlave ~]# systemctl start postgresql-10
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.
[root@TomcatSlave ~]# systemctl status postgresql-10.service
● postgresql-10.service - PostgreSQL 10 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-01-30 20:43:39 CET; 9s ago
Docs: https://www.postgresql.org/docs/10/static/
Process: 29512 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
Process: 29505 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 29512 (code=exited, status=1/FAILURE)

Jan 30 20:43:39 TomcatSlave systemd[1]: Starting PostgreSQL 10 database server...
Jan 30 20:43:39 TomcatSlave postmaster[29512]: 2020-01-30 19:43:39.334 GMT [29512] LOG: could not open...tory
Jan 30 20:43:39 TomcatSlave postmaster[29512]: 2020-01-30 19:43:39.334 GMT [29512] LOG: could not open...tory
Jan 30 20:43:39 TomcatSlave systemd[1]: postgresql-10.service: main process exited, code=exited, status...LURE
Jan 30 20:43:39 TomcatSlave systemd[1]: Failed to start PostgreSQL 10 database server.
Jan 30 20:43:39 TomcatSlave systemd[1]: Unit postgresql-10.service entered failed state.
Jan 30 20:43:39 TomcatSlave systemd[1]: postgresql-10.service failed.
Hint: Some lines were ellipsized, use -l to show in full._

During the deployment, I removed the content of /var/lib/postgresql/10/main/* directory.
because I have the following error
TASK [fail] ***************************************************************************
fatal: [IP-OF-poolback2-NODE]: FAILED! => {"changed": false, "msg": "Whoops! data directory /var/lib/postgresql/10/main is already initialized"}
fatal: [IP-OF-TomcatSlave-NODE]: FAILED! => {"changed": false, "msg": "Whoops! data directory /var/lib/pgsql/10/data is already initialized"}

Thank you

Extra databases added to vars/main.yml are not added to pgbouncer.ini config.

Thank you so much for creating this walkthrough and ansible playbooks!

A small thing, but I felt other users that stumbled upon this, may benefit.

If I add a database in vars/main.yml, it is created as it should, but not added to the pgbouncer.ini file.
This causes then error "database doesn't exist" when trying to work with this database.
An extra line needed per each extra db in the pgbouncer.ini would be:

mydbname = host=127.0.0.1 port=5432 dbname=mydbname

I don't know for a moment if same omission takes place for the users as well. Will verify later.

Thank you again! A lot to learn from!

please provide a way to change 'shared_preload_libraries', for some extensions(e.g. timescaledb)

Some extensions need to be preloaded, e.g. timescaledb
i add yum_repository and postgresql_packages for timescaledb in vars/Redhat.yml , but i also need preload it, otherwise I will get an error, can you provide it?

TASK [PostgreSQL extensions | add extensions to the databases] ******************************************************************************************************************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: before or while processing the request. failed: [192.168.56.139] (item={u'ext': u'timescaledb', u'db': u'mydatabase'}) => {"changed": false, "item": {"db": "mydatabase", "ext": "timescaledb"}, "msg": "Database query failed: extension \"timescaledb\" must be preloaded\nHINT: Please preload the timescaledb library via shared_preload_libraries.\n\nThis can be done by editing the config file at: /var/lib/pgsql/11/data/postgresql.conf\nand adding 'timescaledb' to the list in the shared_preload_libraries config.\n\t# Modify postgresql.conf:\n\tshared_preload_libraries = 'timescaledb'\n\nAnother way to do this, if not preloading other libraries, is with the command:\n\techo \"shared_preload_libraries = 'timescaledb'\" >> /var/lib/pgsql/11/data/postgresql.conf \n\n(Will require a database restart.)\n\nIf you REALLY know what you are doing and would like to load the library without preloading, you can disable this check with: \n\tSET timescaledb.allow_install_without_preload = 'on';\nserver closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n"}

О настройке patroni 'pgpass:'

Обратил внимание на перезапись файла с паролями .pgpass и начал разбираться...

Cвязался с разработчиками patroni и выяснил интересные факты о настройке patroni 'pgpass:' (сейчас default в Debian.yml и Redhat.yml - '.pgpass' )
разработчики ответили, что этот файл может быть с любым именем и используется для потоковой репликации. Нет никакой нужды давать ему имя .pgpass
указанный файл перезаписывается в следующих случаях:

вход и выход из режима обслуживания на всех серверах кластера
при рестарте ноды с ролью реплика на этой реплике
при switchover или failover на всех серверах

В связи с этим предлагаю изменить в yaml скриптах начальное имя этого файла к примеру на '.pgpass_patroni'
Думаю, что многие как и я используют файл '.pgpass' со нужными мне доступами, то для меня это изменение очень важно.

Далее просто для новичков в patroni :)

Я уточнил у разработчиков как правильно менять настройки patroni, к примеру имя этого файла.
Если нужно изменить настройки patroni, кроме тех, которые хранятся в DCS (правятся командой edit-config), нужно сделать следующее:

изменить данные файла конфигурации (/etc/patroni/patroni.yml)
перевести кластер в режим обслуживания: patronictl -c /etc/patroni/patroni.yml pause --wait <кластер>
перезапустить patroni на всех хостах: systemctl restart patroni
снять режим обслуживания: patronictl -c /etc/patroni/patroni.yml resume <кластер>

Спасибо за вашу работу!

vitabaks / postgresql_cluster Goto Github PK

postgresql_cluster's Introduction

PostgreSQL High-Availability Cluster 🐘 💖

Production-ready PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.

Supported setups of Postgres Cluster

1. PostgreSQL High-Availability only

Components of high availability:

2. PostgreSQL High-Availability with HAProxy Load Balancing

if variable "synchronous_mode" is 'true' (vars/main.yml):

Components of load balancing:

3. PostgreSQL High-Availability with Consul Service Discovery (DNS)

Compatibility

Supported Linux Distributions:

PostgreSQL versions:

Ansible version

Requirements

Port requirements

Recommenations

Deployment: quick start

Specify (non-public) IP addresses and connection settings (ansible_user, ansible_ssh_pass or ansible_ssh_private_key_file for your environment

Minimum set of variables:

Deploy Cluster with TimescaleDB

Variables

Cluster Scaling

Steps to add a new Postgres node:

Steps to add a new balancer node:

Restore and Cloning

Create cluster with pgBackRest:

Create cluster with WAL-G:

Point-In-Time-Recovery:

Maintenance

Changing PostgreSQL configuration parameters

Using Git for cluster configuration management (IaC/GitOps)

Update the PostgreSQL HA Cluster

PostgreSQL major upgrade

Disaster Recovery

etcd

PostgreSQL (databases)

How to start from scratch

Sponsor this project

License

Author

Feedback, bug-reports, requests, ...

postgresql_cluster's People

Contributors

Stargazers

Watchers

Forkers

postgresql_cluster's Issues

---> failed to start postgresql in TomcatSlave node

Best regards

Deployment: quick start

Repository (optional)

RedHat CentOS

uname -a

python --version

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Specify (non-public) IP addresses and connection settings (`ansible_user`, `ansible_ssh_pass` or `ansible_ssh_private_key_file` for your environment