Replibyte is an application to replicate your cloud databases
from one place to the other while hiding sensitive data ๐ต๏ธโโ๏ธ
The installation from the package managers is coming soon
You need to have pg_dump and psql binaries installed on your machine. Download Postgres.
git clone https://github.com/Qovery/replibyte.git
# you need to install rust compiler before
cargo build --release
# feel free to move the binary elsewhere
./target/release/replibyte
Example with Postgres as a Source and Destination database AND S3 as a Bridge (cf configuration file)
Backup your Postgres databases into S3
replibyte -c prod-conf.yaml backup run
Backup from local Postgres dump file into S3
cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i
Restore your Postgres databases from S3
replibyte -c prod-conf.yaml backup list
type name size when
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am
replibyte -c prod-conf.yaml restore -v latest
OR
replibyte -c prod-conf.yaml restore -v backup-1647706359405
Create your prod-conf.yaml
configuration file to source your production database.
source:
connection_uri: $DATABASE_URL
transformers:
- database: public
table: employees
columns:
- name: last_name
transformer: random
- name: birth_date
transformer: random-date
- name: first_name
transformer: first-name
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
Run the app for the source
replibyte -c prod-conf.yaml
Create your staging-conf.yaml
configuration file to sync your production database with your staging database.
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
connection_uri: $DATABASE_URL
Run the app for the destination
replibyte -c staging-conf.yaml
RepliByte is built to replicate small and very large databases from one place (source) to the other (destination) with a bridge as intermediary (bridge). Here is an example of what happens while replicating a Postgres database.
sequenceDiagram
participant RepliByte
participant Postgres (Source)
participant AWS S3 (Bridge)
Postgres (Source)->>RepliByte: 1. Dump data
loop Transformer
RepliByte->>RepliByte: 2. Obfuscate sensitive data
end
RepliByte->>AWS S3 (Bridge): 3. Upload obfuscated dump data
RepliByte->>AWS S3 (Bridge): 4. Write index file
- RepliByte connects to the Postgres Source database and makes a full SQL dump of it.
- RepliByte receives the SQL dump, parse it, and generates random/fake information in real-time.
- RepliByte streams and uploads the modified SQL dump in real-time on AWS S3.
- RepliByte keeps track of the uploaded SQL dump by writing it into an index file.
Once at least a replica from the source Postgres database is available in the S3 bucket, RepliByte can use and inject it into the destination PostgresSQL database.
sequenceDiagram
participant RepliByte
participant Postgres (Destination)
participant AWS S3 (Bridge)
AWS S3 (Bridge)->>RepliByte: 1. Read index file
AWS S3 (Bridge)->>RepliByte: 2. Download dump SQL file
RepliByte->>Postgres (Destination): 1. Restore dump SQL
- RepliByte connects to the S3 bucket and reads the index file to retrieve the latest SQL to download.
- RepliByte downloads the SQL dump in a stream bytes.
- RepliByte restores the SQL dump in the destination Postgres database in real-time.
- Complete data synchronization
- Work on different VPC/network
- Generate random/fake information
- Backup TB of data (read Design)
- On-the-fly data (de)compression (Zlib)
Here are the features we plan to support
- Incremental data synchronization
- Auto-detect sensitive fields and generate fake data
- Auto-clean up bridge data
- PostgreSQL
- MySQL (Coming Soon)
- MongoDB (Coming Soon)
- Local dump file (Yes for PostgreSQL)
A transformer is useful to change / hide the value of a column. RepliByte provides pre-made transformers.
Check out the list of our available Transformers
The S3 wire protocol, used by RepliByte bridge, is supported by most cloud providers. Here is a non-exhaustive list of S3 compatible services.
Cloud Service Provider | S3 service name | S3 compatible |
---|---|---|
Amazon Web Services | S3 | Yes (Original) |
Google Cloud Platform | Cloud Storage | Yes |
Microsoft Azure | Blob Storage | Yes |
Digital Ocean | Spaces | Yes |
Scaleway | Object Storage | Yes |
Minio | Object Storage | Yes |
Feel free to drop a PR to include another S3 compatible solution.
- PostgreSQL
- MySQL (Coming Soon)
- MongoDB (Coming Soon)
- Local dump file (Coming soon)
Written in Rust, RepliByte can run with 512 MB of RAM and 1 CPU to replicate 1 TB of data (we are working on a benchmark). RepliByte replicate the data in a stream of bytes and does not store anything on a local disk.
- Tested with Postgres 13 and 14. It should work with prior versions.
An index file describe the structure of your backups and all of them.
Here is the manifest file that you can find at the root of your target Bridge
(E.g: S3).
{
"backups": [
{
"size": 1024000, // in bytes
"directory_name": "backup-{epoch timestamp}",
"created_at": "epoch timestamp"
}
]
}
At Qovery (the company behind RepliByte), developers can clone their applications and databases just with one click. However, the cloning process can be tedious and time-consuming, and we end up copying the information multiple times. With RepliByte, the Qovery team wants to provide a comprehensive way to seed cloud databases from one place to another.
The long-term motivation behind RepliByte is to provide a way to clone any database in real-time. This project starts small, but has big ambition!
Scenario | Supported |
---|---|
Synchronize the whole Postgres instance | Yes |
Synchronize the whole Postgres instance and replace sensitive data with fake data | Yes |
Synchronize specific Postgres tables and replace sensitive data with fake data | WIP |
Synchronize specific Postgres databases and replace sensitive data with fake data | WIP |
Migrate from one database hosting platform to the other | Yes |
Do you want to support an additional use-case? Feel free to contribute by opening an issue or submitting a PR.
RepliByte is not an ETL like AirByte, AirFlow, Talend, and it will never be. If you need to synchronize versatile data sources, you are better choosing a classic ETL. RepliByte is a tool for software engineers to help them to synchronize data from the same databases. With RepliByte, you can only replicate data from the same type of databases. As mentioned above, the primary purpose of RepliByte is to duplicate into different environments. You can see RepliByte as a specific use case of an ETL, where an ETL is more generic.
โฌ๏ธ Open an issue if you have any question - I'll pick the most common questions and put them here with the answer
For local development, you will need to install Docker and run docker compose -f ./docker-compose-postgres-minio.yml
to start the local databases.
At the moment, docker-compose
includes 2 Postgres database instances and a Minio bridge. One source, one destination database and one bridge. In the future, we will provide more options.
The Minio console is accessible at http://localhost:9001.
Once your Docker instances are running, you can run the RepliByte tests, to check if everything is configured correctly:
AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin cargo test
RepliByte is in its early stage of development and need some time to be usable in production. We need some help, and you are welcome to contribute. To better synchronize consider joining our #replibyte channel on our Discord. Otherwise, you can pick any open issues and contribute.
Check the open issues and their priority.
3 options:
- Open an issue.
- Join our #replibyte channel on our discord.
- Drop us an email to
github+replibyte {at} qovery {dot} com
.
Romaric, main contributor to RepliByte does some live coding session on Twitch to learn more about RepliByte and explain how to develop in Rust. Feel free to join the sessions.
Thanks to all people sharing their ideas to make RepliByte better. We do appreciate it. I would also thank AirByte, a great product and a trustworthy source of inspiration for this project.