peerdb-io / peerdb Goto Github PK

View Code? Open in Web Editor NEW

1.6K 11.0 48.0 10.98 MB

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

Home Page: https://peerdb.io

License: Other

Rust 14.23% Dockerfile 0.16% Go 60.48% Shell 0.12% TypeScript 24.89% CSS 0.01% JavaScript 0.02% HCL 0.09%

etl sql bigquery cloud-native distributed-systems kafka postgres postgresql realtime rust

peerdb's People

Contributors

Stargazers

Watchers

peerdb's Issues

Validate WITH options for query replication flow

When I created a flow with an option key as sync_dataformat as opposed to sync_data_format it currently fails very late in the flow. Make it fail early.

cc: @heavycrystal

DROP MIRROR support in peerdb

DROP MIRROR mirror_name;

Delete everything on source, catalog, temporal and target.

Source

Replication slot
Publication

Catalog

Row entries in flows table for that flow name

Temporal - Any workflow associated to the MIRROR:

PeerFlow
NormalizedFlow
SyncFlow
SetupFlow
If we will PeerFlow, all the others should die.

Target

Staging table (only for BQ)
Raw table

Occasionally peerdb fails to start when docker compose up

I think peerdb tries to connect to flow_api before flow_api is ready. This causes peerdb not to start. Have seen this mostly happen when we first time docker compose up. Restarting the docker compose up command fixes the issue most of the times.

peerdb-stack-peerdb-1       | 2023-06-12T17:17:21.339614Z  INFO Migration Applied -  Name: peer_connections, Version: 2
peerdb-stack-peerdb-1       | 2023-06-12T17:17:21.339617Z  INFO Migration Applied -  Name: add_workflow_id_to_flows, Version: 3
peerdb-stack-peerdb-1       | 2023-06-12T17:17:21.340716Z  INFO Listening on 0.0.0.0:9900
peerdb-stack-peerdb-1       | Error: failed to send health check request
peerdb-stack-peerdb-1       |
peerdb-stack-peerdb-1       | Caused by:
peerdb-stack-peerdb-1       |     0: error sending request for url (http://flow_api:8112/health): error trying to connect: dns error: failed to lookup address information: Name or service not known
peerdb-stack-peerdb-1       |     1: error trying to connect: dns error: failed to lookup address information: Name or service not known
peerdb-stack-peerdb-1       |     2: dns error: failed to lookup address information: Name or service not known
peerdb-stack-peerdb-1       |     3: failed to lookup address information: Name or service not known
peerdb-stack-peerdb-1 exited with code 1```

Querying unknown peer crashes server

If I specify a peer that doesn't exist, peerdb doesn't project the error. Rather it just crashes the connection:
my_bq_bs doesn't exist

saisrirampur=> SELECT chain FROM my_bq_bs.transactions LIMIT 8 OFFSET 0;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.5 (Homebrew), server 13.2.0)

This happens for failed queries on the catalog too.

Build and Test Flow on CI

Build Flow on CI: see https://github.com/PeerDB-io/nexus/blob/main/.github/workflows/flow_ci.yaml for inspiration.
There are some automated tests in https://github.com/PeerDB-io/peerdb/tree/main/flow/e2e folder

Support querying multiple types in Snowflake through PeerDB

The database is "sai_fresh" and peer name is "sai_fresh_sf"
There exists a table all_types in public schema which consists of 30 types
SELECT * FROM sai_fresh_sf.public.all_types;

Set expiry to staging table (and/or raw tables) used in MIRROR

Currently we capture every change since MIRROR jobs inception on the target staging and raw tables. This could increase storage costs of customer. To avoid that, set expiry.

Postgres Peer Query Types Test

CREATE TABLE test_types_issue_1 (
    c1 UUID NOT NULL PRIMARY KEY,
    c2 UUID,
    "from" TIMESTAMP NOT NULL,
    c4 NUMERIC,
    c5 TIMESTAMP NOT NULL,
    c6 TIMESTAMP NOT NULL,
    c7 BYTEA,
    c8 VARCHAR,
    c9 UUID,
    c10 INTEGER,
    c11 INTEGER DEFAULT 0 NOT NULL,
    c12 INTEGER NOT NULL,
    c13 VARCHAR,
    c14 UUID,
    c15 UUID,
    c16 BOOLEAN DEFAULT false,
    c17 DOUBLE PRECISION,
    c18 DOUBLE PRECISION,
    c19 BOOLEAN DEFAULT false NOT NULL,
    c20 NUMERIC,
    c21 UUID,
    c22 NUMERIC NOT NULL,
    c23 INTEGER,
    c24 UUID,
    c25 TIMESTAMP,
    c26 VARCHAR,
    c27 TIMESTAMP,
    c28 INTEGER
);

When I try to query this table with SELECT * FROM postgres_peer.public.test_types_issue_1. Its crashing

Change catalog port from 5432 to 9901 in docker

5432 might already be reserved to already postgres server that user might be running. So better to use a more unique port number.

Make tests fully isolated from any state in any DWH

Create a folder called seed_sql.
Have a file for seed sql fields for each data store: e.g seed_bq.sql.

For BigQuery

Create a test data set: peerdb_test.
Create all the requisite tables and data that we need for our tests as part of this data set.
Modify our tests to use this dataset.

Query returning NULLs are crashing the connection for catalog and postgres peer

On catalog:

CREATE TABLE test_null(id int);
INSERT INTO test_null VALUES (NULL);

From peerdb:

admin=> SELECT * FROM test_null;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (15.2 (Debian 15.2-1.pgdg110+1), server 14)

Support array as function arguments for BigQuery Peer

SELECT i FROM my_bq_b.test_array WHERE (test_dataset.array_intersect(i, '{1,2}'::integer[]));
to
SELECT i FROM my_bq_b.test_array WHERE (test_dataset.array_intersect(i, [1,2]));

Similar to ANY on the RHS, if there is an argument (first or second or nth) of array type within a function rewrite it as [1,2,3,4]

ANY(ARRAY) unsupported

SELECT * FROM my_bq_b.events WHERE id = ANY(ARRAY[1,2]);

ERROR: Internal error: Response error (error: ResponseError { error: NestedResponseError { code: 400, errors: [{"domain": "global", "reason": "invalidQuery", "location": "q", "locationType": "parameter", "message": "Syntax error: Unexpected ")" at [1:48]"}], message: "Syntax error: Unexpected ")" at [1:48]", status: "INVALID_ARGUMENT" } })

Workaround:

SELECT * FROM my_bq_b.events WHERE id = ANY('{1,2}'::bigint[]);

SELECT * FROM my_bq_b.events WHERE id IN (1,2)

create peer postgres hangs when the connection is to an invalid host

try the following

-- CREATE POSTGRES PEER
CREATE PEER postgres_peer FROM POSTGRES WITH
(
    host = '34.162.117.149',
    port = '5432',
    user = 'postgres',
    password = 'peerdb123#!#',
    database = 'postgres'
);

ARRAY of text wrong translation

Looks like SELECT * FROM my_bq_b.events WHERE os = ANY(ARRAY['mac']::text[]); is getting translated to SELECT * FROM test_dataset.events WHERE os IN ('''mac''')

There are extra quotes around elements within array. It should be

SELECT * FROM test_dataset.events WHERE os IN ('mac')

Write more tests for BQ query integration

1. SELECT array_agg(c1) FROM my_bq_b.test_types LIMIT 1;
2. SELECT array_agg(c2) FROM my_bq_b.test_types LIMIT 1;
3. SELECT array_agg(c3) FROM my_bq_b.test_types LIMIT 1;
4. SELECT array_agg(c13) FROM my_bq_b.test_types LIMIT 1;
5. SELECT array_agg(c20) FROM my_bq_b.test_types LIMIT 1;
6. SELECT count(*) FROM my_bq_b.transactions where tx_timestamp > '2022-09-07 14:42:49.105761-07'::timestamp - interval '3 months'; ➡️ SELECT count(*) FROM test_dataset.transactions where tx_timestamp > DATE_SUB(CAST('2022-09-07 14:42:49.105761-07' AS TIMESTAMP),interval 90 DAY); 
7. SELECT count(*) FROM my_bq_b.transactions where tx_timestamp > '2022-09-07 14:42:49.105761-07'::timestamp - interval '3 days';
8. SELECT * FROM my_bq_b.transactions where tx_timestamp >now() - interval '3 months';➡️SELECT * FROM test_dataset.transactions where tx_timestamp >DATE_SUB(CURRENT_TIMESTAMP,interval 90 DAY);
9. SELECT date_trunc('month',tx_timestamp),count(*) FROM my_bq_b.transactions GROUP BY 1;➡️SELECT date_trunc(tx_timestamp,MONTH),count(*) FROM test_dataset.transactions GROUP BY 1;
10. SELECT array_agg(chain) FROM my_bq_b.transactions; ➡️ Same as above - SELECT array_agg(chain) FROM test_dataset.transactions;
11. SELECT COALESCE(chain,'sai'::text) FROM my_bq_b.transactions;➡️SELECT COALESCE(chain,'sai') FROM test_dataset.transactions;
12. SELECT * FROM my_bq_b.transactions UNION SELECT * FROM my_bq_b.transactions;➡️ SELECT * FROM test_dataset.transactions UNION DISTINCT SELECT * FROM test_dataset.transactions;
13. SELECT NULL FROM my_bq_b.transactions
14. SELECT count(*) FROM (my_bq_b.users r1 INNER JOIN my_bq_b.events r2 ON (((r2.user_id = 1)) AND ((r1.id = 1))));➡️ SELECT count(*) FROM (test_dataset.users r1 INNER JOIN test_dataset.events r2 ON (((r2.user_id = 1)) AND ((r1.id = 1))));

Publish docker image for every release

We need the following:

Merges to master must push peerdb:dev.
Merges to stable must push peerdb:latest.
Tagged releases must push peerdb:<tag>

Null rows in query replication is causing issues

Add a test in qrep_flow_test.go where we have a row which has null fields for all supported types.

cc: @heavycrystal

Ensure proper handling/error out in case of non-ASCII characters being used as table/column names.

The support of non-ASCII characters as identifiers differs between databases:

Postgres appears to have comprehensive support for it [including with encodings aside from UTF-8, like SHIFT-JIS and other obscure encodings]
Snowflake sticks to pure ASCII for identifiers.
Unsure about Bigquery, will update when confirmed.

Need to perform through testing in this regard.

Setup golangci linter for Flow

use: https://github.com/golangci/golangci-lint-action
with: https://github.com/PeerDB-io/peerdb/blob/main/flow/.golangci.yml

CREATE MIRROR FOR SELECT wiring + communication between nexus and FLOW

SQL Command:

CREATE MIRROR customizable_etl
FROM postgres_peer TO snowflake_peer FOR
$$SELECT user_id, country, to_json(payload)
FROM events JOIN users ON events.user_id = users.id
WHERE country = 'USA' AND events.updated_at BETWEEN {{.start}} AND {{.end}}$$
WITH OPTIONS (destination_table_name='events_denormalized', watermark_column = 'updated_at', watermark_table_name = 'events',
mode = 'upsert', unique_key_columns = ('user_id', 'id' ), parallelism = 8, , refresh_interval=2,batch_size_int=10000, sync_data_format='avro');

OPTION considerations:

Config Name	Optional/Required	Default	Considerations
destination_table_name	Required	N/A	If destination_table is not schema qualified - store it as `public.<tablename>` for Snowflake and Postgres Peers and just the `<tablename>` for BigQuery (as BigQuery PEER is dataset specific)
watermark_column	Required	N/A
watermark_table_name	Required	N/A	If watermark_table is not schema qualified - store it as `public.<tablename>`.
mode	Required	N/A	append or upsert
unique_key_columns	If mode is upsert then Required else not needed	N/A
parallelism	Optional	2	should be > 0
refresh_interval	Optional (in seconds)	10	should be greater than or equal to 10s
batch_size_int	Optional	10000	should be greater than 0
batch_duration_timestamp	Optional	60	should be greater than 0
sync_data_format	Optional	default	avro or default

Options mapping to flow rest api

Nexus	Flow rest api	Values
destination_table_name	destination_table_identifier	Get from CREATE MIRROR
watermark_column	watermark_column	Get from CREATE MIRROR
watermark_table_name	watermark_table	Get from CREATE MIRROR
mode	write_mode.write_type	0 if append, 1 if upsert
unique_key_columns	write_mode.upsert_key_columns	array of columns - parse coma separated string unique_key
parallelism	max_parallel_workers	Get from CREATE MIRROR, default 2
refresh_interval	wait_between_batches_seconds	Get from CREATE MIRROR, default 10
batch_size_int	batch_size_int	Get from CREATE MIRROR, default 10000
batch_duration_timestamp	batch_duration_timestamp	Get from CREATE MIRROR, default 60s
sync_data_format	sync_mode	0 if avro, 1 if default

Example rest api:

curl --location 'http://localhost:8112/qrep/start' \
--header 'Content-Type: application/json' \
--data-raw '{
    "flow_job_name": "7_owner_flow_rat_fix",
    "source_peer": {
        "name": "test_postgres_peer",
        "type": 3,
        "postgres_config": {
            "host": "host.docker.internal",
            "port": 7132,
            "user": "postgres",
            "password": "<>",
            "database": "postgres"
        }
    },
    "destination_peer": {
        "name": "test_bq_peer",
        "bigquery_config": {
            "auth_type": "service_account",
            "project_id": "custom-program-353117",
            "private_key_id": "<>",
            "private_key": "<>",
            "client_email": "<>",
            "client_id": "<>,
            "auth_uri": "<>",
            "token_uri": "<>",
            "auth_provider_x509_cert_url": "<>",
            "client_x509_cert_url": "<>",
            "dataset_id": "<>"
        }
    },
    "destination_table_identifier": "<>", (destination_table_name)
    "watermark_table":"<>", (watermark_table_name)
    "query": "<>",
    "watermark_column": "<>",
    "inital_copy_only": false, (always false)
    "sync_mode": 1, (sync_format, 1 is avro and 0 is default)
    "wait_between_batches_seconds": <>, (get this from refresh_interval in minutes and send it in seconds)
    "max_parallel_workers":<>, (parallelism)
    "batch_size_int":<>, (batch size)
    "write_mode" : { "write_type": 0 (append) or 1 (upsert), "upsert_key_columns" : [ "k1", "k2" ] (only if write_type is 1) }
}'

Limit the number of messages read from slot

Currently we seem to wait on pgconn.TimeOut https://github.com/PeerDB-io/peerdb/blob/main/flow/connectors/postgres/cdc.go#L129

Setup and Document a release process for this repository

https://gist.github.com/digitaljhelms/4287848 outlines the general strategy.

Setup branches
Make actions build for releases
Add a RELEASING.md documenting the release process

issue querying arrays in WHERE

SELECT * FROM my_bq_b.events WHERE id = ANY(CAST(ARRAY[1] AS BIGINT[]));

SELECT * FROM my_bq_b.events WHERE id = ANY(ARRAY[1]);

Ensure that we get a lock on the table when we create the QRepPartitions

This is to ensure that there are no transactions in-flight that touch older rows.

cc: @saisrirampur

Add tests for visitor logic in sqlparser-rs

https://github.com/PeerDB-io/sqlparser-rs/pull/8/files PR

Query on postgres_fdw foreign tables connected to peerdb error out

Steps to reproduce

-- connect to catalog
 CREATE DATABASE fdw_test;
 \c fdw_test
 CREATE EXTENSION postgres_fdw;
 
 CREATE SERVER bigquery FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host 'host.docker.internal', port '9900');
 
 CREATE USER MAPPING FOR postgres SERVER bigquery OPTIONS(user 'peerdb',password 'peerdb');
 
 CREATE FOREIGN TABLE test_fdw (id int, t1 text, t2 text) SERVER bigquery 
 OPTIONS(schema_name '<your_peer_name>',table_name 'test'); -- assumes that the table test is present in your peer.
 
   SELECT * FROM test_fdw;
 ERROR:  could not obtain message string for remote error
CONTEXT:  remote SQL command: SELECT id, t1, t2 FROM sai_fresh_bq.test

DROP PEER support in peerdb

To drop peers, customers need to manually DELETE entries from the catalog tables:

BEGIN;
delete from peer_connections where peer_name='<peer_name>';
delete from peers where name='<peer_name>';
COMMIT;

It would be great we can support DROP PEER peer_name command, which would take care of removing all the necessary information in our catalogs.

Check how Postgres logical replication interacts with TOAST tables

Support querying multiple types in BigQuery through PeerDB

I have a table in bigquery dataset (sai_fresh) with multiple types.

Make SELECT * FROM sai_fresh.test_all_types_4 work

Refactoring and code cleanup of flow

Seperate connector interface to push and pull connector interfaces [PG for pull, BQ and SF for push for now]
Move model to protos to make it autogenerated and also tightly coupled to workflow versions
Refactor SQL for all connectors into seperate, reusable functions to avoid overload of string constants.
Figure out a common way to write and display errors with consistency across connectors.

DROP MIRROR support for Query based replication

UNION to UNION DISTINCT BQ

SELECT * FROM my_bq_b.transactions UNION SELECT * FROM my_bq_b.transactions;➡️ SELECT * FROM test_dataset.transactions UNION DISTINCT SELECT * FROM test_dataset.transactions;

Add support for S3 peer for Query Replication

Context:

When we replicate a query to snowflake, we already stage to an S3 bucket. We write avro files to that S3 bucket and then we issue a COPY command to copy from the external S3 stage to the snowflake table. We can think of this as a transitional stage.

code references:

aws.go (credentials)
snowflake/qrep_avro_sync.go
snowflake/qrep.go is the entry point for query replication.

Requirement:

Treat S3 as a first class Peer so that we can write Avro to the S3 destination as a final point.

Approach:

In peers.proto add definition for an S3 peer. This would be the following
a. S3 url: s3://<bucket_name>/<prefix>
create a connector for S3, the same way we do it for Snowflake, BQ, etc. All the CDC methods will error saying not supported, only query replication will be supported.
When we get a batch of records from the source, we will write them as avro files to the given s3 url. Most of this logic already exists in snowflake/qrep_avro_sync.go, refactor to reuse.
Make it so that down the line its easy to support parquet etc.

Fix all rust warnings and add clippy to prevent them in presubmits

consider https://github.com/giraffate/clippy-action

Fix test failure when using prepared statements

See:

---- extended_query_protocol_no_params_catalog stdout ----
Starting server...
peerdb-server Server started
thread 'extended_query_protocol_no_params_catalog' panicked at 'Failed to prepare query: Error { kind: UnexpectedMessage, cause: None }', server/tests/server_test.rs:192:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
   1: core::panicking::panic_fmt
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
   2: core::result::unwrap_failed
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1750:5
   3: core::result::Result<T,E>::expect
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1047:23
   4: server_test::extended_query_protocol_no_params_catalog
             at ./tests/server_test.rs:190:16
   5: server_test::extended_query_protocol_no_params_catalog::{{closure}}
             at ./tests/server_test.rs:185:48
   6: core::ops::function::FnOnce::call_once
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Stopping server...
Server stopped

https://github.com/PeerDB-io/peerdb/actions/runs/5049232543/jobs/9058413988 for full failure.

Support querying interval types in BigQuery

SELECT * FROM sai_fresh.test_interval_1;
ERROR: Internal error: Request error (error: error decoding response body: unknown variant INTERVAL, expected one of STRING, BYTES, INTEGER, INT64, FLOAT, FLOAT64, NUMERIC, BIGNUMERIC, BOOLEAN, BOOL, TIMESTAMP, DATE, TIME, DATETIME, RECORD, STRUCT at line 12 column 26)

Make a debian installer for OSS

See: https://crates.io/crates/cargo-deb

Also see this section: https://github.com/PeerDB-io/nexus/blob/main/server/Cargo.toml#L7

cc: @saisrirampur

PeerDB is not "open source"

The docs go,

PeerDB is Open Source

PeerDB is fully Open Source and licensed under Elastic License 2.0 (ELv2). Here goes the link to our github repo: https://github.com/PeerDB-io/peerdb

Elastic License isn't an OSI / FSF approved license.

One could say PeerDB is "source-available".

Improve and add support for more types in QValue type system.

1) NUMERIC - needs to be handled specifically since they have different ranges across different databases.
2) TIME - TIME needs to be parsed into an Avro friendly format and sent for both BQ and SF.
3) TIMETZ - TIMETZ needs to be either marked as explicitly unsupported or converted to a string since neither BQ or SF have support for a TIME with timezone support.
4) DATE - issues with ingesting dates with Avro - fix for both SF and BQ Avro
5) MONEY, MACADDR, XML, CIDR, INET, INTERVAL - add support or mark as unsupported [can be parsed as a string]
6) JSON, JSONB - expand support or parse into a string.

DROP MIRROR doesn't end if the MIRROR is stuck in setup flow

For example if the MIRROR is stuck in creating the slot/publication and I want to DROP THAT MIRROR. It doesn't finish with the below error in the dropflow:

{
  "message": "failed to cleanup source: error dropping publication: ERROR: publication \"peerflow_pub_test10\" does not exist (SQLSTATE 42704)",
  "source": "GoSDK",
  "stackTrace": "",
  "encodedAttributes": null,
  "cause": {
    "message": "error dropping publication: ERROR: publication \"peerflow_pub_test10\" does not exist (SQLSTATE 42704)",
    "source": "GoSDK",
    "stackTrace": "",
    "encodedAttributes": null,
    "cause": {
      "message": "ERROR: publication \"peerflow_pub_test10\" does not exist (SQLSTATE 42704)",
      "source": "GoSDK",
      "stackTrace": "",
      "encodedAttributes": null,
      "cause": null,
      "applicationFailureInfo": {
        "type": "PgError",
        "nonRetryable": false,
        "details": null
      }
    },
    "applicationFailureInfo": {
      "type": "wrapError",
      "nonRetryable": false,
      "details": null
    }
  },
  "applicationFailureInfo": {
    "type": "wrapError",
    "nonRetryable": false,
    "details": null
  }
}

CREATE MIRROR crashes when an unknown peer is used

amogh=> CREATE MIRROR multi_mirror_test FROM pg_test TO unknown_peer WITH TABLE MAPPING (cats:cats,birds:birds);
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (15.3, server 14)

The cause of this is an unwrap() in catalog/lib.rs: (second last line)

pub async fn get_peer_type_for_id(&self, peer_id: i32) -> anyhow::Result<DbType> {
        let stmt = self
            .pg
            .prepare_typed("SELECT type FROM peers WHERE id = $1", &[types::Type::INT4])
            .await?;

        self.pg
            .query_opt(&stmt, &[&peer_id])
            .await?
            .map(|row| row.get(0))
            .map(|r#type| DbType::from_i32(r#type).unwrap()) // if row was inserted properly, this should never fail
            .context("Failed to get peer type")
    }

r#type in this case is a None value since no such peer_id will exist. This results in the error:

thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', /root/nexus/catalog/src/lib.rs:197:52

Investigate timestamp mapping in PG and BQ

SetupFlow should check if target peer's mirror_jobs table already has a mirror with the same name, even if catalog doesn't have metadata

This can happen if the catalog database is truncated or otherwise modified, which could happen across installs. Also if multiple instances of PeerDB are running and they are connected to different catalog databases.

We could either error out, since we cannot be sure if another flow worker is running, or set sync_batch_id and normalize_batch_id to 0 and proceed with flow creation.

Make CREATE MIRROR synchronous (wait for setup flow to complete)

Stabilizing Query Based Replication

When source table in PG is empty - getting the below error:

  "message": "failed to get partitions from source: unsupported type: <nil>",
  "source": "GoSDK",
  "stackTrace": "",
  "encodedAttributes": null,
  "cause": {
    "message": "unsupported type: <nil>",
    "source": "GoSDK",
    "stackTrace": "",
    "encodedAttributes": null,
    "cause": null,
    "applicationFailureInfo": {
      "type": "",
      "nonRetryable": false,
      "details": null
    }
  },
  "applicationFailureInfo": {
    "type": "wrapError",
    "nonRetryable": false,
    "details": null
  }
}

In query based replication, we should allow 2 MIRRORs with same name. But as we removed the unique constraint for name (because of multi-table CDC), we are allowing. We might need to rethink the data-model for flows table. One approach is to separate metadata tables for CDC and Query Based Replication.

peerdb crashes for a pgbench test with high connections

pgbench "host=localhost password=peerdb port=9901" -f 1.sql -c 50 -j 50
max_connections on the catalog is 100.

2 concerns here:

Why are we crashing (server crash, not even client) when we hit max_connections?
Why are we hitting max_connections limits when we are using hitting peerdb with 50 connection.

With nexus, this doesn't happen. At 100 connections, there are errors that max_connection limits are reached. However the server doesn't crash.

Error messages in pgbench:

2023-05-18 20:44:58.593 PDT [11501] FATAL:  sorry, too many clients already
2023-05-18 20:44:58.597 PDT [11407] LOG:  could not receive data from client: Connection reset by peer
2023-05-18 20:44:58.603 PDT [11441] LOG:  could not receive data from client: Connection reset by peer
2023-05-18 20:44:58.603 PDT [11447] LOG:  could not receive data from client: Connection reset by peer
2023-05-18 20:44:58.605 PDT [11471] LOG:  could not receive data from client: Connection reset by peer
pgbench: error: client 22 script 0 aborted in command 2 query 0: ERROR:  connection closed
pgbench: pgbench: error: client 12 script 0 aborted in command 2 query 0: ERROR:  connection closed
error: client 45 script 0 aborted in command 2 query 0: ERROR:  connection closed
pgbench: error: client 41 script 0 aborted in command 2 query 0: ERROR:  connection closed
pgbench: error: client 27 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 47 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 44 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 25 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 1 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 36 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 40 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 19 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 0 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 11 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 35 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 4 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 37 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 49 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 9 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 23 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 6 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 2 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 32 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 16 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 43 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 18 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 15 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 5 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 33 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 42 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 39 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 7 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 14 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 20 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 30 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 21 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 24 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 26 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 10 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 34 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 13 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 48 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 46 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 38 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 31 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 8 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 29 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 3 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 28 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
pgbench: error: client 17 aborted in command 2 (SQL) of script 0; perhaps the backend died while processing
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 50
number of transactions per client: 10
number of transactions actually processed: 0/500
pgbench: fatal: Run was aborted; the above results are incomplete.

peerdb logs:

Error: Failed to connect to catalog database

Caused by:
    0: db error: FATAL: sorry, too many clients already
    1: FATAL: sorry, too many clients already

Concurrent updates to peerdb_mirror_jobs table not allowed in BigQuery

https://cloud.google.com/bigquery/docs/transactions#transaction_concurrency

We plan to run SyncFlow and NormalizeFlow (for the same mirror) asynchronously. Both of these can issue conflicting (same row) UPDATE the peerdb_mirror_jobs table and BigQuery cancels one of the UPDATEs. This could happen often, once we make the workflow run async.

I checked if bigquery provides ability to explicitly acquire LOCKs and it doesn't seem to. @iskakaushik any ideas on how we can address this issue? Does temporal provide a way of controlling the execution of 2 related but async workflows? Or any other design we can follow?

Snowflake might not have this issue as it does the required locking https://docs.snowflake.com/en/sql-reference/transactions#resource-locking (read UPDATE, DELETE, MERGE etc) cc @heavycrystal

Make NormalizeFlow and SyncFlow async

Currently NormalizeFlow runs right after SyncFlow. They are already independent of each other.

We need to make them run asynchronously. Also NormalizeFlow should start only after Initial Load. SyncFlow will always be running.

Not capturing ERRORs in the log file

SELECT * FROM my_bq_b.test_array ,UNNEST(i);
ERROR: Internal error: Response error (error: ResponseError { error: NestedResponseError { code: 400, errors: [{"message": "Table-valued function not found: test_dataset at [1:40]", "locationType": "parameter", "domain": "global", "reason": "invalidQuery", "location": "q"}], message: "Table-valued function not found: test_dataset at [1:40]", status: "INVALID_ARGUMENT" } })

We just capture the below in the log file:

2023-05-29T19:44:53.002505Z INFO handling peer query: my_bq_b

Another example:
admin=> SELECT * FROM test_dataset.test_array;
ERROR: Internal error: error getting schema: db error: ERROR: relation "test_dataset.test_array" does not exist

Log file:

2023-05-29T19:47:25.580027Z INFO handling catalog query

peerdb-io / peerdb Goto Github PK

peerdb's People

Contributors

Stargazers

Watchers

Forkers

peerdb's Issues

For BigQuery

Context:

code references:

Requirement:

Approach:

Recommend Projects

Recommend Topics

Recommend Org

Jobs