GithubHelp home page GithubHelp logo

toluaina / pgsync Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 173.0 1.5 MB

Postgres to Elasticsearch/OpenSearch sync

Home Page: https://pgsync.com

License: MIT License

Dockerfile 0.08% Makefile 0.36% Python 98.01% Shell 1.56%
change-data-capture elasticsearch elasticsearch-sync etl kibana opensearch postgresql python sql

pgsync's People

Contributors

aglazek avatar asovanek avatar bartoszpijet avatar chokosabe avatar daemonsnake avatar densol92 avatar echi1995 avatar egopingvina avatar fdeschenes avatar harish-everest avatar hyperphoton avatar jacobreynolds avatar julescarbon avatar laurent-pujol avatar loren avatar meruslan avatar mpaolino avatar shogoki avatar sinwoobang avatar tacrocha avatar toluaina avatar yangyaofei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgsync's Issues

pip or pip3?

pip install pgsync doesn't work for me but pip3 install pgsync does.

Also, likely the following should be included in the readme
sudo apt install python3-pip

Cannot install via pypi

Error Message (if any):

(env) ~ $ pip install pgsync
ERROR: Could not find a version that satisfies the requirement pgsync (from versions: none)

How can we run pgsync as a background job?

I am able to run pgsync once and it works well.
However, it gets killed if I try with --daemon.
What is the best practice to get this running as a background job in a production like environment ?
I am afraid if the process would get killed if the terminal is detached even if I get to run it with --daemon. I tried with nohup but it still get killed.

Unable to set redis port via environment variable.

PGSync version: 1.1.8

Postgres version: 12.4

Elasticsearch version: 7.9

Redis version: 5.0.9

Python version: 3.7

Problem Description:

Hi @toluaina - I'm attempting to run pgsync in a docker container on Heroku. I'm using Heroku's redis add-on which uses a non-default port, so I must set the port in pgsync via the REDIS_PORT environment variable. While all the other env vars seem to properly set values for pgsync, REDIS_PORT doesn't seem to have an effect.

Here are the environment variables that are configured for my app:
Screen Shot 2020-08-25 at 10 56 07 PM

When the app runs, you can see that some variables (like REDIS_HOST, for example) seem to work, while REDIS_PORT does not. Rather than trying to connect to port 8629, pgsync attempts to connect to the default port 6379, and the connection is, of course, refused.
Screen Shot 2020-08-25 at 11 02 46 PM

Is it possible that the REDIS_PORT env var is not being passed to the redis client properly?

Add support for foreign tables

PGSync version:
1.1.10
Postgres version:
12.3
Elasticsearch version:
7.2
Redis version:
2.4.5
Python version:
3.8
Problem Description:
When trying to get data from a postgresql foreign table it is not found

Error Message (if any):

pgsync.exc.TableNotFoundError: 'Table "main_database_foreign_schema.contacts" not found in registry'

ValueError null

PGSync version: 1.1.1

Postgres version: 9.6

Elasticsearch version: 7.7.1

Redis version: latest

Python version: 3.7

Problem Description:

ValueError: invalid literal for int() with base 10: 'null'
When it syncs null values.

Full traceback:
Traceback (most recent call last):
File "/Users/masterlexa/PycharmProjects/venv/bin/pgsync", line 7, in
sync.main()
File "pgsync/utils.py", line 82, in pgsync.utils.timeit.timed
File "/Users/masterlexa/PycharmProjects/venv/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/Users/masterlexa/PycharmProjects/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/masterlexa/PycharmProjects/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/masterlexa/PycharmProjects/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "pgsync/sync.py", line 791, in pgsync.sync.main
File "pgsync/sync.py", line 742, in pgsync.sync.Sync.pull
File "pgsync/sync.py", line 143, in pgsync.sync.Sync.logical_slot_changes
File "pgsync/sync.py", line 138, in pgsync.sync.Sync.logical_slot_changes
File "pgsync/base.py", line 577, in pgsync.base.Base.parse_logical_slot
File "pgsync/base.py", line 515, in _parse_logical_slot
File "pgsync/base.py", line 473, in pgsync.base.Base.parse_value
File "pgsync/base.py", line 471, in pgsync.base.Base.parse_value
ValueError: invalid literal for int() with base 10: 'null'

Connect to Postgres in SSL mode?

Problem Description:

We have an encrypted postgreSQL database on AWS RDS to which we connect via SSL key (managed by AWS KMS)
How can we achieve the same with pgsync. Couldn't find any info regarding SSL in the documentations

Here is our ENV:

### Elastic Search Variables
export ELASTICSEARCH_SCHEME=https
export ELASTICSEARCH_HOST=xxx.aws.cloud.es.io
export ELASTICSEARCH_PORT=9243
export ELASTICSEARCH_USER=xxx
export ELASTICSEARCH_PASSWORD=xxx
export ELASTICSEARCH_TIMEOUT=10

### PostgreSQL Variables
export PG_HOST=xxx.rds.amazonaws.com
export PG_USER=xxxx
export PG_PASSWORD=xxx
export PG_PORT=5432

### Redis Variables
export REDIS_HOST=xxx.cache.amazonaws.com
export REDIS_PORT=6379

which env variable should we set to point to the cert file?

Is it possible to keep multiple tables/indices synced with a single pgsync process?

Currently, it appears that multiple instances of pgsync need to be run in daemon mode if I want to sync multiple tables with their indices, with each instance managing a single schema json file.

Is it possible to specify more than one schema within a single command?

Perhaps something like:

pgsync --config_dir ./configs

Where ./configs has a bunch of schema files

./schema-1.json
./schema-2.json
...

missing .env.sample file when building.

hi @toluaina ,
I don't see .env.sample file when building,
What is special in that?

ERROR: Service 'pgsync' failed to build: ADD failed: stat /var/lib/docker/tmp/docker-builder050697468/.env.sample: no such file or directory

failed to connect redis - WRONGPASS invalid username-password pair

PGSync version:
1.1
Postgres version:
12
Elasticsearch version:
7.6.2
Redis version:

Python version:
3.7
Problem Description:
docker-compose airbnb example is not working`

Error Message (if any):

Traceback (most recent call last):
  File "examples/airbnb/schema.py", line 130, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "examples/airbnb/schema.py", line 125, in main
    teardown(config=config)
  File "pgsync/helper.py", line 44, in pgsync.helper.teardown
  File "pgsync/redisqueue.py", line 33, in pgsync.redisqueue.RedisQueue.__init__
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1351, in ping
    return self.execute_command('PING')
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 875, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1185, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 561, in connect
    self.on_connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 637, in on_connect
    auth_response = self.read_response()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 752, in read_response
    raise response
redis.exceptions.ResponseError: WRONGPASS invalid username-password pair


Large Memory Demands When Initializing Sync

PGSync version: 1.1.14

Postgres version: 12-alpine

Elasticsearch version: 7.7.0

Redis version: 6.0.8

Python version: 3.7

Problem Description:

I just (10/16/20) created a fresh install using docker and your provided docker-compose file. I am trying to sync from a large postgres database (~600GB) and am experiencing memory issues. This is particularly true for large tables (~350M rows). I have found that reducing QUERY_CHUNK_SIZE from 10k to 1k helps a lot but am still experiencing ram usage upwards of 90GB during the initial sync (docker exec -it pgessync_pgsync_1 --config /code/myschema.json --daemon -v). When running with -v I noticed that the ram utilization problem occurs while executing the "Query" phase. Do you have any recommendations/solutions for keeping memory utilization low when syncing from a large postgres database/know what is causing this?

Resolving relationships

PGSync version:
1.0.1
Postgres version:
11.6
Elasticsearch version:
7.6.1
Redis version:

Python version:
3.7
Problem Description:
I have defined a schema which is being consumed by pgsync, but there is a field which returns null in al cases.
Here is the schema.json

[{
    "index": "application_db",
    "nodes": [{
        "table": "product",
        "columns": ["name", "hsn", "created_date"],
        "children": [{
            "table": "product_variant",
            "columns": ["description", "quantity", "quantity_multiplier"],
            "relationship": {
                "variant": "object",
                "type": "one_to_many"
            },
            "children": [{
                "table": "code_unit",
                "columns": ["unit_name"],
                "relationship": {
                    "variant": "object",
                    "type": "one_to_one"
                }
            },
                         {
                             "table": "stock",
                             "columns": ["mrp", "selling_price"],
                             "relationship": {
                                 "variant": "object",
                                 "type": "one_to_one"
                             }
                         }, {
                             "table": "product_variant_images",
                             "columns": ["image_url"],
                             "relationship": {
                                 "variant": "object",
                                 "type": "one_to_one"
                             }
                         }]
        }, {
            "table": "category",
            "columns": ["name", "type"],
            "relationship" : {
                "variant": "object",
                "type": "one_to_one"
            }
        }]
    }]
}]

category always returns null. Also the table product has two columns which have relationship with category table.
Please advice.

Traceback AttributeError: id

PGSync version: 1.1.7

Postgres version: 9.8

Elasticsearch version: 7.7.1

Redis version: latest

Python version: 3.7

Problem Description:
I have traceback with sync

After I see a queue in Redis, which is full. Sync works but doesn't load rows to elastic.

Error Message (if any):

Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 2020-08-17 21:18:50.522:ERROR:pgsync.sync: Exception id
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: Traceback (most recent call last):
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     return self._data[key]
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: KeyError: 'id'
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: During handling of the above exception, another exception occurred:
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: Traceback (most recent call last):
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/sync.py", line 572, in pgsync.sync.Sync._sync
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/query_builder.py", line 804, in pgsync.query_builder.QueryBuilder.build_queries
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/query_builder.py", line 258, in pgsync.query_builder.QueryBuilder._children
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     raise AttributeError(key)
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: AttributeError: id
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 2020-08-17 21:18:50.620:ERROR:pgsync.sync: Exception: id
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: Traceback (most recent call last):
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     return self._data[key]
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: KeyError: 'id'
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: During handling of the above exception, another exception occurred:
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: 
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: Traceback (most recent call last):
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/sync.py", line 644, in pgsync.sync.Sync.sync_payloads
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/elastichelper.py", line 59, in pgsync.elastichelper.ElasticHelper.bulk
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 425, in parallel_bulk
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     actions, chunk_size, max_chunk_bytes, client.transport.serializer
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     raise value
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     result = (True, func(*args, **kwds))
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 140, in _helper_reraises_exception
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     raise ex
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 292, in _guarded_task_generation
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     for i, x in enumerate(iterable):
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 128, in _chunk_actions
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     for action, data in actions:
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/sync.py", line 575, in _sync
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/sync.py", line 572, in pgsync.sync.Sync._sync
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/query_builder.py", line 804, in pgsync.query_builder.QueryBuilder.build_queries
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "pgsync/query_builder.py", line 258, in pgsync.query_builder.QueryBuilder._children
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]:     raise AttributeError(key)
Aug 18 00:18:50 elastics01 sync_product_shop/product_shop[19354]: AttributeError: id


downloaded the zip file in docker

Problem Description:
Downloaded the zip file docker

Error Message (if any):
Error at Redis authentication, server not running. server connecting thru redis-cli
i believe the redis_auth environ variable not accepting

:ERROR:pgsync.redisqueue: Redis server is not running: Authentication required.

Explanation of schema.json?

Trying to use pgsync, but I was only able to find samples of schema.json but no document explaining the general structure, possible nodes, properties, their meanings etc.

Also, is there a way to auto generate schema.json from an postgresql table? Or any utilities that can help map a index schema to a table definition or query or view definition?

What if I'd like a complex query mapped on to a elasticsearch index? E.g.: Say there's a one-to-many mapping of objects to categories, and I would like a column that string_agg()'s the categories into one index property? Is this possible? (E.g. Obj o of table T, would have a property category containing categories c1, c2...cn from table C)

More Schema Docs

There is a need for more complete docs for the behavior in schema.json. For instance:

What would configuring multiple nodes in a given database/index do?

[
    {
        "database": "mydb",
        "index": "mydb-index",
        "nodes": [
            {
                "table": "table1",
                "schema": "public",
                "columns": []
           },
           {
                 "table": "table2",
                 "schema": "public",
                 "columns": []
           }
      }
]

What are the possible values of variant and what do they do?

In the example I see:

  • "variant": "object"
  • "variant": "scalar"

I did try "nested" once to see if it would work and it blew up 😬

Renaming Attributes

The README describes renaming columns:

You can also configure PGSync to rename attributes via the schema config e.g

{
"isbn": "9781471331435",
"this_is_a_custom_title": "1984",
"desc": "1984 was George Orwell’s chilling prophecy about the dystopian future",
"contributors": ["George Orwell"]
}

But I cannot find a reference or docs for what the schema.json would look like to rename an attribute.

bootstrap on schemas

PGSync version: 1.1.5

Postgres version: 9.6

Elasticsearch version: 7.7.1

Redis version: latest

Python version: 3.7

Problem Description:
bootstrap makes triggers on different schemas for tables with the same name.
For example:
-public
--table1
-my_schema
--table1

it makes triggers in public and my_shema but remove only from the public.

Error Message (if any):



Some count indices

PGSync version: 1.1.1

Postgres version: 9.6

Elasticsearch version: 7.7.1

Redis version: latest

Python version: 3.8

Problem Description:

What to do with some count indices?
For example, I want to load indices with the scheme:

[
{
"database": "main",
"index": "brand",
"nodes": [
{
"table": "brand",
"schema": "public",
"columns": [
"id",
"name",
"approved",
"popularity"
]
}
]
},
{
"database": "main",
"index": "company",
"nodes": [
{
"table": "company",
"schema": "public",
"columns": [
"id",
"user_id",
"organizational_form",
"disabled"
]
}
]
}
]

pgsnc.exc.foreignkeyerror: no foreign key relationship between tableA and tableB

PGSync version: 1.1.6

Postgres version: 11.7

Elasticsearch version: 7.9.2
Redis version: 5.0.7

Python version: 3.6.3

Problem Description:
It raises an error when I execute pgsync --config schema.json. Yes, there is no foreign key between the tables? Is it necessary to define a foreign key?

Error Message (if any):



pgsnc.exc.foreignkeyerror: no foreign key relationship between tableA and tableB

Add a proper tutorial

It's a little hard to figure out how to get started with this. It starts off with docker-compose up, should I pull the repo first?, then it uses a random dummy schema instead of instructing the user how they can build their own (instructions for which I'm unable to find anywhere)

Let's improve the readme and add a proper onboarding style tutorial, simple clear instructions. Do this for a quick sample, do these steps for getting it up and running on your own Postgresql/elasticsearch instances, type in these configuration variables, some best practices and off you go.

Something like this would be ideal 😁

Command 'pgsync' not found

Command 'pgsync' not found, did you mean:

  command 'pkgsync' from deb pkgsync

Try: sudo apt install <deb name>

Reproduction steps:

  1. sudo apt get update
  2. sudo apt install python3-pip
  3. pip3 install pgsync
  4. pgsync
  5. (above output)

Enable logical decoding by setting wal_level=logical

Operating System

Mac

Problem Description:

I am using debezium/postgres:12-alpine and I keep getting the error below when I try

pgsync --config examples/airbnb/schema.json --daemon.

I went into the image and it seems the psql config file has
wal_level = logical

max_replication_slots = 4

Am I doing anything wrong?

Error Message (if any):

RuntimeError: Enable logical decoding by setting wal_level=logical

Docker image not working

image

My docker yaml


services:
  postgres:
    image: debezium/postgres:12-alpine
    ports:
      - "15432:5432"
    environment:
      - POSTGRES_USER=pgsync
      - POSTGRES_PASSWORD=pgsync
      - POSTGRES_DB=pgsync
  redis:
    image: redis
    command: redis-server
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.7.0
    ports:
      - "9201:9200"
      - "9301:9300"
    environment:
      - xpack.security.enabled=false
      - network.host=127.0.0.1
      - http.host=0.0.0.0
  pgsync:
    build:
      context: .
      dockerfile: Dockerfile
    command: ./runserver.sh
    labels:
      org.label-schema.name: "pgsync"
      org.label-schema.description: "Postgres to elasticsearch sync"
      com.label-schema.service-type: "daemon"
    depends_on:
      - postgres
      - redis
      - elasticsearch
    environment:
      - PG_USER=pgsync
      - PG_HOST=postgres
      - PG_PASSWORD=pgsync
      - LOG_LEVEL=INFO
      - ELASTICSEARCH_PORT=9200
      - ELASTICSEARCH_SCHEME=http
      - ELASTICSEARCH_HOST=elasticsearch
      - REDIS_HOST=redis
      - REDIS_AUTH=

One-To-Many Update or Insert AttributeError

When I try to update or insert new row to one-to-many table, I saw error and updates doesnt show on elastic.

When I run bootstrap again before pgsync, I didnt show any error message and complete sync successfully but This case doesnt best practice because I want to use pgsync with deamon mode.

Could you please help me :)

My Schema:

[
{
  "database": "postgres",
  "index": "public",
  "nodes": [
    {
      "table": "assets",
      "schema": "public",
      "columns": [
        "title",
        "description"
      ],
      "children": [
        {
          "table": "asset_keywords",
          "schema": "public",
          "columns": [
            "keyword_keyw"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_many"
          },
          "transform": {
            "rename": {
              "keyword_keyw": "keyword"
            }
          }
        },
        {
          "table": "asset_types",
          "schema": "public",
          "columns": [
            "title"
          ],
          "transform": {
            "rename": {
              "title": "assetType"
            }
          },
          "relationship": {
            "variant": "object",
            "type": "one_to_one"
          }
        },
        {
          "table": "visual_types",
          "schema": "public",
          "columns": [
            "title"
          ],
          "transform": {
            "rename": {
              "title": "categoryName"
            }
          },
          "relationship": {
            "variant": "object",
            "type": "one_to_one"
          }
        },
        {
          "table": "users",
          "schema": "public",
          "columns": [
            "username"
          ],
          "relationship": {
            "variant": "scalar",
            "type": "one_to_one"
          }
        }
      ]
    }
  ]
}
]

Error Message:

2020-10-27 10:35:33.473:ERROR:pgsync.sync: Exception keyword_keyw
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'keyword_keyw'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pgsync/sync.py", line 622, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 789, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 271, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: keyword_keyw
2020-10-27 10:35:33.480:ERROR:pgsync.sync: Exception: keyword_keyw
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'keyword_keyw'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pgsync/sync.py", line 691, in pgsync.sync.Sync.sync_payloads
  File "pgsync/elastichelper.py", line 56, in pgsync.elastichelper.ElasticHelper.bulk
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "pgsync/sync.py", line 625, in _sync
  File "pgsync/sync.py", line 622, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 789, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 271, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: keyword_keyw
Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'keyword_keyw'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "pgsync/sync.py", line 724, in pgsync.sync.Sync.poll_redis
  File "pgsync/sync.py", line 791, in pgsync.sync.Sync.on_publish
  File "pgsync/sync.py", line 697, in pgsync.sync.Sync.sync_payloads
  File "pgsync/sync.py", line 691, in pgsync.sync.Sync.sync_payloads
  File "pgsync/elastichelper.py", line 56, in pgsync.elastichelper.ElasticHelper.bulk
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "pgsync/sync.py", line 625, in _sync
  File "pgsync/sync.py", line 622, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 789, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 271, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: keyword_keyw
Polling db postgres: 1 cache items 

When I tried this schema, I saw same error message. I think Its not about pivot table using or not. I think redis can not cache one to many table

[
{
  "database": "postgres",
  "index": "public",
  "nodes": [
    {
      "table": "assets",
      "schema": "public",
      "columns": [
        "title",
        "description"
      ],
      "children": [
        {
          "table": "keywords",
          "schema": "public",
          "columns": [
            "keyw"
          ],
          "label": "keyword",
          "relationship": {
            "variant": "scalar",
            "type": "one_to_many"
          }
        },
        {
          "table": "asset_types",
          "schema": "public",
          "columns": [
            "title"
          ],
          "transform": {
            "rename": {
              "title": "assetType"
            }
          },
          "relationship": {
            "variant": "object",
            "type": "one_to_one"
          }
        },
        {
          "table": "visual_types",
          "schema": "public",
          "columns": [
            "title"
          ],
          "transform": {
            "rename": {
              "title": "categoryName"
            }
          },
          "relationship": {
            "variant": "object",
            "type": "one_to_one"
          }
        },
        {
          "table": "users",
          "schema": "public",
          "columns": [
            "username"
          ],
          "relationship": {
            "variant": "scalar",
            "type": "one_to_one"
          }
        }
      ]
    }
  ]
}
]

example not working

Just cloning this repro and run "docker-compose" on MacOS. As far as I understand, it should start an postgresql with sample data and run against it.

Problem Description: start up fails with the following Error message.

Error Message:

elasticsearch_1  | {"type": "server", "timestamp": "2020-09-09T12:27:41,668Z", "level": "INFO", "component": "o.e.x.i.a.TransportPutLifecycleAction", "cluster.name": "docker-cluster", "node.name": "afb2573497b4", "message": "adding index lifecycle policy [slm-history-ilm-policy]", "cluster.uuid": "Wt9naFgLQ7iZkdkyYYHQtw", "node.id": "8cuQXauMTSiIDbOEg8YGkQ"  }
pgsync_1         | Traceback (most recent call last):
pgsync_1         |   File "examples/airbnb/schema.py", line 130, in <module>
pgsync_1         |     main()
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
pgsync_1         |     return self.main(*args, **kwargs)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
pgsync_1         |     rv = self.invoke(ctx)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
pgsync_1         |     return ctx.invoke(self.callback, **ctx.params)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
pgsync_1         |     return callback(*args, **kwargs)
pgsync_1         |   File "examples/airbnb/schema.py", line 125, in main
pgsync_1         |     teardown(config=config)
pgsync_1         |   File "pgsync/helper.py", line 28, in pgsync.helper.teardown
pgsync_1         |   File "pgsync/sync.py", line 66, in pgsync.sync.Sync.__init__
pgsync_1         |   File "pgsync/sync.py", line 121, in pgsync.sync.Sync.create_mapping
pgsync_1         |   File "pgsync/node.py", line 236, in pgsync.node.Tree.build
pgsync_1         |   File "pgsync/base.py", line 110, in pgsync.base.Base.model
pgsync_1         | pgsync.exc.TableNotFoundError: 'Table "public.users" not found in registry'
elasticsearch_1  | {"type": "server", "timestamp": "2020-09-09T12:27:41,779Z", "level": "INFO", "component": "o.e.l.LicenseService", "cluster.name": "docker-cluster", "node.name": "afb2573497b4", "message": "license [f10b15c7-635d-4dfb-a13d-3fd7e2f4d051] mode [basic] - valid", "cluster.uuid": "Wt9naFgLQ7iZkdkyYYHQtw", "node.id": "8cuQXauMTSiIDbOEg8YGkQ"  }
pgsync_1         | Traceback (most recent call last):
pgsync_1         |   File "examples/airbnb/data.py", line 209, in <module>
pgsync_1         |     main()
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
pgsync_1         |     return self.main(*args, **kwargs)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
pgsync_1         |     rv = self.invoke(ctx)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
pgsync_1         |     return ctx.invoke(self.callback, **ctx.params)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
pgsync_1         |     return callback(*args, **kwargs)
pgsync_1         |   File "examples/airbnb/data.py", line 21, in main
pgsync_1         |     teardown(drop_db=False, config=config)
pgsync_1         |   File "pgsync/helper.py", line 28, in pgsync.helper.teardown
pgsync_1         |   File "pgsync/sync.py", line 66, in pgsync.sync.Sync.__init__
pgsync_1         |   File "pgsync/sync.py", line 121, in pgsync.sync.Sync.create_mapping
pgsync_1         |   File "pgsync/node.py", line 236, in pgsync.node.Tree.build
pgsync_1         |   File "pgsync/base.py", line 110, in pgsync.base.Base.model
pgsync_1         | pgsync.exc.TableNotFoundError: 'Table "public.users" not found in registry'
pgsync_1         | Traceback (most recent call last):
pgsync_1         |   File "/usr/local/bin/bootstrap", line 63, in <module>
pgsync_1         |     main()
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
pgsync_1         |     return self.main(*args, **kwargs)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
pgsync_1         |     rv = self.invoke(ctx)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
pgsync_1         |     return ctx.invoke(self.callback, **ctx.params)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
pgsync_1         |     return callback(*args, **kwargs)
pgsync_1         |   File "/usr/local/bin/bootstrap", line 54, in main
pgsync_1         |     sync = Sync(document, verbose=verbose, **params)
pgsync_1         |   File "pgsync/sync.py", line 65, in pgsync.sync.Sync.__init__
pgsync_1         |   File "pgsync/sync.py", line 114, in pgsync.sync.Sync.validate
pgsync_1         |   File "pgsync/node.py", line 236, in pgsync.node.Tree.build
pgsync_1         |   File "pgsync/base.py", line 110, in pgsync.base.Base.model
pgsync_1         | pgsync.exc.TableNotFoundError: 'Table "public.users" not found in registry'
pgsync_1         | Traceback (most recent call last):
pgsync_1         |   File "/usr/local/bin/pgsync", line 7, in <module>
pgsync_1         |     sync.main()
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
pgsync_1         |     return self.main(*args, **kwargs)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
pgsync_1         |     rv = self.invoke(ctx)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
pgsync_1         |     return ctx.invoke(self.callback, **ctx.params)
pgsync_1         |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
pgsync_1         |     return callback(*args, **kwargs)
pgsync_1         |   File "pgsync/utils.py", line 69, in pgsync.utils.timeit.timed
pgsync_1         |   File "pgsync/sync.py", line 907, in pgsync.sync.main
pgsync_1         |   File "pgsync/sync.py", line 65, in pgsync.sync.Sync.__init__
pgsync_1         |   File "pgsync/sync.py", line 114, in pgsync.sync.Sync.validate
pgsync_1         |   File "pgsync/node.py", line 236, in pgsync.node.Tree.build
pgsync_1         |   File "pgsync/base.py", line 110, in pgsync.base.Base.model
pgsync_1         | pgsync.exc.TableNotFoundError: 'Table "public.users" not found in registry'
pg-sync_pgsync_1 exited with code 1

pip install pgsync doesn't work

Python version: 3.6

Problem Description:
When trying to install pgsync from pip, it throws an error

Error Message (if any):

pip install pgsync
ERROR: Could not find a version that satisfies the requirement pgsync (from versions: none)
ERROR: No matching distribution found for pgsync

pgsync process not reducing the transaction logs created by the postgresql setting logical_replication=1

I have been using pg_sync for couple of months and haev enabled logical_replication to 1 as mentioned in pg_sync documentation. I am seeing a decrease in storage sizes in my AWS RDS postgresql server . Upon investigation i came to know the RDS will retain the transaction logs for external consumers of the logs for replication purpose and if there is no external consumer for these logs, they just remain there forever.

I tried the following query on my db

select slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) as replicationSlotLag,
active from pg_replication_slots ;

slot_name         replicationslotlog.    active 
db_name_1       643 GB                       false
db_name_2      512 GB                       false

we are running pgsync process once in 10 minutes via cron as against the --daemon ( as --daemon was crashing with nohup ). Is there a way to solve this issue?

From AWS documentation, I read the following:

Replication slots can be created as part of logical decoding feature of AWS Database Migration Service (AWS DMS). For logical replication, the slot parameter rds.logical_replication is set to 1. Replication slots retain the WAL files until the files are externally consumed by a consumer, for example, by pg_recvlogical; extract, transform, and load (ETL) jobs; or AWS DMS.

If you set the rds.logical_replication parameter value to 1, AWS DMS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. Changing these parameters can increase WAL generation, so it's a best practice to set the rds.logical_replication parameter only when you are using logical slots. If this parameter is set to 1 and logical replication slots are present but there isn't a consumer for the WAL files retained by the replication slot, then you can see an increase in the transaction logs disk usage. This also results in a constant decrease in free storage space.

db to elasticsearch

PGSync version: latest

Postgres version: 12-alpine

Elasticsearch version: 7.7.0

Redis version: latest

Python version: 3.7

OS : Debian

Problem Description:

Hi Tolu Aina,
The system is running, but the data added to the database is not visible on the elasticsearch side. Where can I go wrong?

[
{ "database":"db",
"index":"table",
"nodes":[

        {
            "table": "table_name",
            "schema": "public",
            "columns": [
            ]
        }
        ]
   }

]

pgsync -c examples/db/table_name.json

  • table_name
    [==================================================] 100.0%
    main ((), {'config': 'examples/db/table_name.json', 'daemon': False, 'password': False, 'user': None, 'host': None, 'port': None, 'verbose': False}) 0.7559013366699219 secs

postgres db
trigger

table_name_notify ->active
table_name_truncate -> active

insert into table_name.... -> successfully

http://localhost:9201/table/_search?q=e

{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

Works with App Search?

Will this library work with the elastic search App Search engine? Any caveats to know about?

Possible bug with UUID field/One-to-many?

PGSync version: 1.1.7

Postgres version: 12.4

Elasticsearch version: 7.7

Redis version: 5.0.7

Python version: 3.8.5

Problem Description:
I have 2 simple tables as below

ER diagram

My schema.json is as below:

[
  {
    "database": "mydb",
    "index": "rating",
    "nodes": [
      {
        "table": "users",
        "schema": "public",
        "columns": [
          "user_id"
        ],
        "transform": {
          "mapping": {
            "user_id": {
              "type": "text"
            }
          }
        },
        "children": [
          {
            "table": "ratings",
            "schema": "public",
            "columns": [
              "movie_id"
            ],
            "transform": {
              "mapping": {
                "movie_id": {
                  "type": "text"
                }
              }
            },
            "relationship": {
              "variant": "object",
              "type": "one_to_many"
            }
          }
        ]
      }
    ]
  }
]

During the first run of bootstrap and pgsync --daemon, everything is ok.
But when I insert something into ratings table, I can see below error message and the inserted row is not synced to ElasticSearch

2020-11-05 01:40:47.809:ERROR:pgsync.sync: Exception rating_id
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'rating_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 634, in _sync
    self.query_builder.build_queries(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 789, in build_queries
    self._children(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 254, in _children
    getattr(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: rating_id
2020-11-05 01:40:47.810:ERROR:pgsync.sync: Exception: rating_id
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'rating_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 703, in sync_payloads
    self.es.bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 55, in bulk
    for _ in parallel_bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 634, in _sync
    self.query_builder.build_queries(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 789, in build_queries
    self._children(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 254, in _children
    getattr(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: rating_id
Exception in thread Thread-16:
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 210, in __getattr__
    return self._data[key]
KeyError: 'rating_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 736, in poll_redis
    self.on_publish(payloads)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 817, in on_publish
    self.sync_payloads(_payloads)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 703, in sync_payloads
    self.es.bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 55, in bulk
    for _ in parallel_bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 634, in _sync
    self.query_builder.build_queries(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 789, in build_queries
    self._children(node)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 254, in _children
    getattr(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/_collections.py", line 212, in __getattr__
    raise AttributeError(key)
AttributeError: rating_id

After that, if I run pgsync manually (without --daemon), I encounter this error:

 - users
    - ratings
 [==================================================] 100.0%
2020-11-05 01:41:26.995:ERROR:pgsync.sync: Exception: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type uuid: "'05911e41"
LINE 4: WHERE users_1.user_id = '''05911e41'
                                ^

[SQL: SELECT JSON_BUILD_ARRAY(anon_1._keys) AS "JSON_BUILD_ARRAY_1", JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_2)s, users_1.user_id, %(JSON_BUILD_OBJECT_3)s, anon_1.ratings) AS "JSON_BUILD_OBJECT_1", users_1.user_id
FROM public.users AS users_1 LEFT OUTER JOIN (SELECT CAST(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_4)s, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_5)s, JSON_BUILD_ARRAY(ratings_1.rating_id)))) AS JSONB) AS _keys, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_6)s, ratings_1.movie_id)) AS ratings, ratings_1.user_id AS user_id
FROM public.ratings AS ratings_1 GROUP BY ratings_1.user_id) AS anon_1 ON anon_1.user_id = users_1.user_id
WHERE users_1.user_id = %(user_id_1)s]
[parameters: {'JSON_BUILD_OBJECT_2': 'user_id', 'JSON_BUILD_OBJECT_3': 'ratings', 'JSON_BUILD_OBJECT_4': 'ratings', 'JSON_BUILD_OBJECT_5': 'rating_id', 'JSON_BUILD_OBJECT_6': 'movie_id', 'user_id_1': "'05911e41"}]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type uuid: "'05911e41"
LINE 4: WHERE users_1.user_id = '''05911e41'
                                ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 703, in sync_payloads
    self.es.bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 55, in bulk
    for _ in parallel_bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 642, in _sync
    row_count = self.query_count(node._subquery)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/base.py", line 637, in query_count
    return conn.execute(query).rowcount
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type uuid: "'05911e41"
LINE 4: WHERE users_1.user_id = '''05911e41'
                                ^

[SQL: SELECT JSON_BUILD_ARRAY(anon_1._keys) AS "JSON_BUILD_ARRAY_1", JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_2)s, users_1.user_id, %(JSON_BUILD_OBJECT_3)s, anon_1.ratings) AS "JSON_BUILD_OBJECT_1", users_1.user_id
FROM public.users AS users_1 LEFT OUTER JOIN (SELECT CAST(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_4)s, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_5)s, JSON_BUILD_ARRAY(ratings_1.rating_id)))) AS JSONB) AS _keys, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_6)s, ratings_1.movie_id)) AS ratings, ratings_1.user_id AS user_id
FROM public.ratings AS ratings_1 GROUP BY ratings_1.user_id) AS anon_1 ON anon_1.user_id = users_1.user_id
WHERE users_1.user_id = %(user_id_1)s]
[parameters: {'JSON_BUILD_OBJECT_2': 'user_id', 'JSON_BUILD_OBJECT_3': 'ratings', 'JSON_BUILD_OBJECT_4': 'ratings', 'JSON_BUILD_OBJECT_5': 'rating_id', 'JSON_BUILD_OBJECT_6': 'movie_id', 'user_id_1': "'05911e41"}]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
 0.0:0.0:0.19710540771484375 (0.20 sec)
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type uuid: "'05911e41"
LINE 4: WHERE users_1.user_id = '''05911e41'
                                ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/pgsync", line 7, in <module>
    sync.main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 955, in main
    sync.pull()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 839, in pull
    self.logical_slot_changes(txmin=txmin, txmax=txmax)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 239, in logical_slot_changes
    self.sync_payloads(payloads)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 703, in sync_payloads
    self.es.bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 55, in bulk
    for _ in parallel_bulk(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/sync.py", line 642, in _sync
    row_count = self.query_count(node._subquery)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pgsync/base.py", line 637, in query_count
    return conn.execute(query).rowcount
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type uuid: "'05911e41"
LINE 4: WHERE users_1.user_id = '''05911e41'
                                ^

[SQL: SELECT JSON_BUILD_ARRAY(anon_1._keys) AS "JSON_BUILD_ARRAY_1", JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_2)s, users_1.user_id, %(JSON_BUILD_OBJECT_3)s, anon_1.ratings) AS "JSON_BUILD_OBJECT_1", users_1.user_id
FROM public.users AS users_1 LEFT OUTER JOIN (SELECT CAST(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_4)s, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_5)s, JSON_BUILD_ARRAY(ratings_1.rating_id)))) AS JSONB) AS _keys, JSON_AGG(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_6)s, ratings_1.movie_id)) AS ratings, ratings_1.user_id AS user_id
FROM public.ratings AS ratings_1 GROUP BY ratings_1.user_id) AS anon_1 ON anon_1.user_id = users_1.user_id
WHERE users_1.user_id = %(user_id_1)s]
[parameters: {'JSON_BUILD_OBJECT_2': 'user_id', 'JSON_BUILD_OBJECT_3': 'ratings', 'JSON_BUILD_OBJECT_4': 'ratings', 'JSON_BUILD_OBJECT_5': 'rating_id', 'JSON_BUILD_OBJECT_6': 'movie_id', 'user_id_1': "'05911e41"}]
(Background on this error at: http://sqlalche.me/e/13/9h9h)

Is this something to do with UUID primary key or one-to-many? Or did I do something wrong?

Error Message (if any):



sync.py

PGSync version:
1.0.1

Hi. First of all, great project. 😃
I was browsing through the code base just to understand how it works and was trying to figure out where sync.py is?

TypeError: '<' not supported between instances of 'dict' and 'str'

PGSync version: 1.1.22
Postgres version: 10.14
Elasticsearch version: 7.9.0
Redis version: 6.0.9
Python version: 3.8

Problem Description:

I have bootstraped db schema:

dropshipping

With following config file:

[
  {
    "database": "dropshipping",
    "index": "dropshipping",
    "nodes": [
      {
        "table": "SellerProducts",
        "schema": "public",
        "columns": [
          "id",
          "name",
          "identifiers",
          "availableQuantity",
          "price",
          "url",
          "images",
          "description",
          "createdAt",
          "updatedAt"

        ],
        "children": [
          {
            "table": "Sellers",
            "columns": [
              "name",
              "createdAt",
              "updatedAt"
            ],
            "label": "Sellers",
            "relationship": {
              "variant": "object",
              "type": "one_to_many"
            }
          }
        ]
      }
    ]
  }
]

Then I have run pg sync with following command, and pgsync has been crashed

pgsync --config elastic-search-schema.json

Error Message (if any):

 - SellerProducts
    - Sellers
 [=========================-------------------------] 50.5% 2020-11-29 16:32:14.995:ERROR:pgsync.sync: Exception '<' not supported between instances of 'dict' and 'str'
Traceback (most recent call last):
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 731, in sync
    self.es.bulk(self.index, docs)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 56, in bulk
    for _ in parallel_bulk(
  File "/home/tomasz/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/tomasz/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 694, in _sync
    row = transform(self.__name, row, nodes[0])
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/utils.py", line 163, in transform
    return map_fields(row, structure)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/utils.py", line 134, in map_fields
    value = sorted(value)
TypeError: '<' not supported between instances of 'dict' and 'str'
 0.0:0.0:3.648371696472168 (3.65 sec)
Traceback (most recent call last):
  File "/home/tomasz/.local/bin/pgsync", line 7, in <module>
    sync.main()
  File "/home/tomasz/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/tomasz/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/tomasz/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tomasz/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 1012, in main
    sync.pull()
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 881, in pull
    self.sync(txmin=txmin, txmax=txmax)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 731, in sync
    self.es.bulk(self.index, docs)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/elastichelper.py", line 56, in bulk
    for _ in parallel_bulk(
  File "/home/tomasz/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 431, in parallel_bulk
    for result in pool.imap(
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
    raise ex
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/home/tomasz/.local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 141, in _chunk_actions
    for action, data in actions:
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/sync.py", line 694, in _sync
    row = transform(self.__name, row, nodes[0])
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/utils.py", line 163, in transform
    return map_fields(row, structure)
  File "/home/tomasz/.local/lib/python3.8/site-packages/pgsync/utils.py", line 134, in map_fields
    value = sorted(value)
TypeError: '<' not supported between instances of 'dict' and 'str'


How can I config analyzers

WIth my basic knowledge with elastic search, we could configure analyzers like snowball.
how do we achieve it with pgsync ?

SQLAlchemy err: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'

PGSync version: 1.1.6

SQLAlchemy version: 1.3.18

Postgres version: 9.6.8

Elasticsearch version: latest

Redis version: latest

Python version: 3.7

Problem Description: Syncing throws the listed error in logs, I can't be sure yet why it's doing this. Having no access to the actual sync source code makes it hard to troubleshoot.

Error Message (if any):

Polling db pmx_test: 3 items <= Cache
2020-08-13 13:59:47.819:ERROR:pgsync.sync: Exception Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 744, in __getattr__
    return getattr(self.comparator, key)
AttributeError: 'Comparator' object has no attribute '_orig'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pgsync/sync.py", line 572, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 746, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 249, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 755, in __getattr__
    replace_context=err,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
AttributeError: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'
2020-08-13 13:59:47.918:ERROR:pgsync.sync: Exception: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 744, in __getattr__
    return getattr(self.comparator, key)
AttributeError: 'Comparator' object has no attribute '_orig'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pgsync/sync.py", line 644, in pgsync.sync.Sync.sync_payloads
  File "pgsync/elastichelper.py", line 59, in pgsync.elastichelper.ElasticHelper.bulk
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 425, in parallel_bulk
    actions, chunk_size, max_chunk_bytes, client.transport.serializer
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 140, in _helper_reraises_exception
    raise ex
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 292, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 128, in _chunk_actions
    for action, data in actions:
  File "pgsync/sync.py", line 575, in _sync
  File "pgsync/sync.py", line 572, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 746, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 249, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 755, in __getattr__
    replace_context=err,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
AttributeError: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'
Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 744, in __getattr__
    return getattr(self.comparator, key)
AttributeError: 'Comparator' object has no attribute '_orig'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "pgsync/sync.py", line 677, in pgsync.sync.Sync.poll_redis
  File "pgsync/sync.py", line 744, in pgsync.sync.Sync.on_publish
  File "pgsync/sync.py", line 650, in pgsync.sync.Sync.sync_payloads
  File "pgsync/sync.py", line 644, in pgsync.sync.Sync.sync_payloads
  File "pgsync/elastichelper.py", line 59, in pgsync.elastichelper.ElasticHelper.bulk
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 425, in parallel_bulk
    actions, chunk_size, max_chunk_bytes, client.transport.serializer
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 140, in _helper_reraises_exception
    raise ex
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 292, in _guarded_task_generation
    for i, x in enumerate(iterable):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 128, in _chunk_actions
    for action, data in actions:
  File "pgsync/sync.py", line 575, in _sync
  File "pgsync/sync.py", line 572, in pgsync.sync.Sync._sync
  File "pgsync/query_builder.py", line 746, in pgsync.query_builder.QueryBuilder.build_queries
  File "pgsync/query_builder.py", line 249, in pgsync.query_builder.QueryBuilder._children
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 755, in __getattr__
    replace_context=err,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
AttributeError: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'

Amazon RDS superuser permission error

PGSync version: 1.1.1

Postgres version: 11.2-R1 (AWS RDS Managed)

Elasticsearch version: 7.4

Redis version: 6.0.4

Python version: 3.7.6

Problem Description: Looks like I can connect to the DB, because I don't get an error (like I get with wrong credentials). But after start (either through Docker, or from CLI) I see the following:

...
redis_1   | 1:M 01 Jun 2020 17:38:37.461 # Server initialized

And the system didn't show anything for about 10 minutes, then I got the following error:

Error Message:

pgsync.exc.SuperUserError: 'You need to be a superuser to perform this action: Current user: master'

I gave the following permissions to the master role:

 grant rds_superuser to master;
 grant rds_replication to master;

Also I set the rds.logical_replication static parameter to 1, as described here: To enable logical decoding for an Amazon RDS for PostgreSQL DB instance

After that, I could create replication slots as the master role.

Then I looked at the pg_catalog.pg_user table:

SELECT usesuper, userepl FROM pg_user WHERE usename = 'master';

The result is: false, false

I think this also related to RDS internal implementation.

And I can not give superuser or replication permissions to the master role because of RDS specificity. When I try to perform the following query:

grant superuser to master;

I get the following error: ERROR: role "superuser" does not exist.

The same error for the replication permission.

I'm not sure if there is a way to check the ability to create replication slots in AWS RDS without performing the following query:

SELECT usesuper FROM pg_user WHERE usename = 'master';

Trouble with Trigger (pg_notify())

PGSync version: 1.1.1

Postgres version: 9.6

Elasticsearch version: 7.7.1

Redis version: latest

Python version: 3.7

Problem Description:
Problem with a column that contains a big type.
In the official documentation, the payload has a limit of 8000 bytes.
https://www.postgresql.org/docs/9.0/sql-notify.html

Error Message (if any):

ERROR:  payload string too long
КОНТЕКСТ:  SQL statement "SELECT PG_NOTIFY(channel, notification::TEXT)"
PL/pgSQL function table_notify() line 37 at PERFORM

If i want to sync multiple tables from postgres to elasticsearch , do I need to run multiple instance of pg sync with different table schema for each instance ?Or is it possible to run multiple schema with one pgsync instance

I f I want to sync multiple tables ( each has different schema ) from postgres to elastic search at same time , Do i need to run multiple instance of pgsync (one instance for each db schema ) or is it possible to run multiple schema at same time with a single demon of pgsync ?



pgsync gets killed without throwing error

I ran the following as usual

$ pgsync --config staging_schema.json
[=====================================-------------] 74.6% Killed
$

Where can I see the error? how do i debug this ?

Bootstrap fail to execute because try to recognize tables from other schema

PGSync version:

Postgres version: 11.7

Elasticsearch version: 7.9.1

Redis version: 6.0.8

Python version:

Problem Description:

PGsync is a great software but currently it fails when trying to bootstrap over existing database.

I have multiple schemas in my db called app

Here is my simple schema.json. Syncing data from public.slot

[
    {
        "database": "app",
        "index": "slots",
        "nodes": [
            {
                "table": "slot",
                "schema": "public",
                "columns": [
                  "id",
                  "name"
                ]
            }
        ]
    }
]

This is the structure that I have for public.slot

CREATE TABLE public.slot (
  id                UUID PRIMARY KEY
                    DEFAULT uuid_generate_v4(),
  name              TEXT NOT NULL
);

insert into public.slot (name) values ('test');
insert into public.slot (name) values ('test 2'); 

But I also have another schema api with a lot of tables.

CREATE EXTENSION citext;
CREATE SCHEMA api;

BEGIN;

  CREATE TABLE api.user (
    id                UUID PRIMARY KEY
                      CONSTRAINT slot_pkey
                      DEFAULT uuid_generate_v4(),
    name         CITEXT NOT NULL
  );

COMMIT;

The message that I see is because the table api.user is having type citext. Remember, this table is in api schema, not in public

Error Message (if any):

root@f024d67dd716:/code# bootstrap --config examples/public/schema.json
 - slot
/usr/local/lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/base.py:3131: SAWarning: Did not recognize type 'citext' of column 'name'
  "Did not recognize type '%s' of column '%s'" % (attype, name)

I also see errors when running bootstrap if TimescaleDB extension is enabled (which apparently we use) .

  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 679, in reflecttable
    raise exc.NoSuchTableError(table.name)
sqlalchemy.exc.NoSuchTableError: cache_inval_hypertable

Please explain why it is necessary for bootstrap to read other schemas structures, if they are not placed in schema.json file

first database integration.

PGSync version: latest

Postgres version: 12.4

Elasticsearch version: 7.7.0

Redis version: 6.0.6

Python version: 3.7.9

Problem Description: first database integration.

Error Message (if any):
hi, pgsync works by default. my problem is not being able to integrate my own database.

Os : debian

~ / docker / pgsync $ docker-compose run pgsync bootstrap --config examples / db / schema.json

Starting pgsync_pg_admin_1 ... done
Starting pgsync_elasticsearch_1 ... done
Starting pgsync_postgres_1 ... done
Starting pgsync_redis_1 ... done
Traceback (most recent call last):
File "/ usr / local / bin / bootstrap", line 59, in
main ()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main (* args, ** kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke (ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke (self.callback, ** ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback (* args, ** kwargs)
File "/ usr / local / bin / bootstrap", line 44, in main
config = get_config (config)
File "pgsync / utils.py", line 430, in pgsync.utils.get_config
OSError: Schema config "examples / db / schema.json" not found

Docs for renaming attributes

The README describes renaming columns:

You can also configure PGSync to rename attributes via the schema config e.g

  {
      "isbn": "9781471331435",
      "this_is_a_custom_title": "1984",
      "desc": "1984 was George Orwell’s chilling prophecy about the dystopian future",
      "contributors": ["George Orwell"]
  }

But I cannot find a reference or docs for what the schema.json would look like to rename an attribute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.