GithubHelp home page GithubHelp logo

corcelli / peerdb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from peerdb-io/peerdb

0.0 0.0 0.0 9.81 MB

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

Home Page: https://peerdb.io

License: Other

Shell 0.13% JavaScript 0.02% Go 59.87% Rust 14.56% TypeScript 25.16% CSS 0.01% HCL 0.09% Dockerfile 0.17%

peerdb's Introduction

PeerDB Banner

Frustratingly simple ETL for Postgres

Workflow Status ElV2 License Slack Community

PeerDB

At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to Data Warehouses, Queues and Storage engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value.

We support different modes of streaming - log based (CDC), cursor based (timestamp or integer) and XMIN based. Performance wise, we are 10x faster than existing tools. Features wise, we support native Postgres features such as comprehensive set of data-types incl. jsonb/arrays/geospatial, efficiently streaming toast columns, schema changes and so on.

Get started

git clone --recursive [email protected]:PeerDB-io/peerdb.git
cd peerdb

# Run docker containers: postgres as catalog, temporal, PeerDB server, PeerDB flow API + workers, PeerDB UI
# Requires docker and docker-compose installed: https://docs.docker.com/engine/install/
bash ./run-peerdb.sh
# OR for local development, images will be built locally.
# Requires docker, docker-compose as well as the buf compiler for protobuf generation
# https://buf.build/docs/installation
bash ./generate-protos.sh
bash ./dev-peerdb.sh

# connect to peerdb and query away (Use psql version >=14.0)
psql "port=9900 host=localhost password=peerdb"

Follow this 5-minute Quickstart Guide to see PeerDB in action i.e. streaming data in real-time across stores.

Why PeerDB

Current data tools prioritize a wide range of connectors, often neglecting to optimize for Postgres users. This can be problematic for those storing large amounts of data in Postgres and frequently transferring it. As a result, many resort to building custom pipelines when existing tools don't meet their needs. We've developed this project to provide a straightforward and reliable solution specifically for Postgres.

Postgres-first Approach

PeerDB is an ETL/ELT tool built for PostgreSQL. We implement multiple Postgres native and infrastructural optimizations to provide a fast, reliable and a feature-rich experience for moving data in/out of PostgreSQL.

For performance - we can parallelize initial load for a large table, still ensuring consistency. Syncing 100s of GB reduces from days to minutes. Our architecture is designed for real-time syncs and implements multiple logical replication related optimizations (tuning Postgres configs, parallel reading of slot etc.). This enables 10x faster Change Data Capture with data-freshness of a few 10s of seconds even at large throughputs (10k+ tps).

For reliability, we have mechanisms in place for fault tolerance - state management, automatic retries, handling idempotency and consistency and so on (https://blog.peerdb.io/using-temporal-to-scale-data-synchronization-at-peerdb) Configurable batching and parallelism prevent out of memory (OOMs) and crashes.

From a feature richness standpoint, we support efficient syncing of tables with large (TOAST) columns. We support multiple streaming modes - Log based (CDC) based, Query based streaming etc. We provide rich data-type mapping and plan to support every possible (incl. Custom types) that Postgres supports to the best extent possible on the target data-store.

Postgres-compatible SQL interface to do ETL

The Postgres-compatible SQL interface for ETL is unique to PeerDB and enables you to operate in a language you are familiar with. You can do ETL the same way you work with your databases.

You can use Postgres’ eco-system to manage your ETL —

  1. Client tools like pgAdmin, psql to run SQL commands.
  2. BI tools like Grafana, Tableau to visually monitor syncs and transforms.
  3. Database migration and versioning tools like Flyway to manage your ETL.
  4. Any language (Python, Go, Node.js etc) and Scheduler (AirFlow) for development.
  5. And many more

Status

We support multiple target connectors to move data from Postgres and a couple of source connectors to move data into Postgres. Check the status of connectors here

License

PeerDB is licensed under Elastic License 2.0 (ELv2). Please see the LICENSE file for additional information. If you have any licensing questions please email [email protected]

peerdb's People

Contributors

iskakaushik avatar serprex avatar amogh-bharadwaj avatar heavycrystal avatar saisrirampur avatar pankaj-peerdb avatar iamkunalgupta avatar yasinzaehringer-paradime avatar arajkumar avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.