citusdata / citus Goto Github PK

Distributed PostgreSQL as an extension

License: GNU Affero General Public License v3.0

Makefile 0.25% Shell 0.24% C 57.61% Perl 0.87% PLpgSQL 35.01% Ruby 3.44% Python 2.26% M4 0.14% sed 0.12% GDB 0.01% Dockerfile 0.06%

database citus multi-tenant postgresql scale sharding sql distributed-database postgres citus-extension

citus's Introduction

The Citus database is 100% open source. Learn what's new in the Citus 12.1 release blog and the Citus Updates page.

What is Citus?

Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.

With Citus, you extend your PostgreSQL database with new superpowers:

Distributed tables are sharded across a cluster of PostgreSQL nodes to combine their CPU, memory, storage and I/O capacity.
References tables are replicated to all nodes for joins and foreign keys from distributed tables and maximum read performance.
Distributed query engine routes and parallelizes SELECT, DML, and other operations on distributed tables across the cluster.
Columnar storage compresses data, speeds up scans, and supports fast projections, both on regular and distributed tables.
Query from any node enables you to utilize the full capacity of your cluster for distributed queries

You can use these Citus superpowers to make your Postgres database scale-out ready on a single Citus node. Or you can build a large cluster capable of handling high transaction throughputs, especially in multi-tenant apps, run fast analytical queries, and process large amounts of time series or IoT data for real-time analytics. When your data size and volume grow, you can easily add more worker nodes to the cluster and rebalance the shards.

Our SIGMOD '21 paper Citus: Distributed PostgreSQL for Data-Intensive Applications gives a more detailed look into what Citus is, how it works, and why it works that way.

Since Citus is an extension to Postgres, you can use Citus with the latest Postgres versions. And Citus works seamlessly with the PostgreSQL tools and extensions you are already familiar with.

Why Citus?
Getting Started
Using Citus
Schema-based sharding
Setting up with High Availability
Documentation
Architecture
When to Use Citus
Need Help?
Contributing
Stay Connected

Why Citus?

Developers choose Citus for two reasons:

Your application is outgrowing a single PostgreSQL node

If the size and volume of your data increases over time, you may start seeing any number of performance and scalability problems on a single PostgreSQL node. For example: High CPU utilization and I/O wait times slow down your queries, SQL queries return out of memory errors, autovacuum cannot keep up and increases table bloat, etc.

With Citus you can distribute and optionally compress your tables to always have enough memory, CPU, and I/O capacity to achieve high performance at scale. The distributed query engine can efficiently route transactions across the cluster, while parallelizing analytical queries and batch operations across all cores. Moreover, you can still use the PostgreSQL features and tools you know and love.
PostgreSQL can do things other systems can’t

There are many data processing systems that are built to scale out, but few have as many powerful capabilities as PostgreSQL, including: Advanced joins and subqueries, user-defined functions, update/delete/upsert, constraints and foreign keys, powerful extensions (e.g. PostGIS, HyperLogLog), many types of indexes, time-partitioning, and sophisticated JSON support.

Citus makes PostgreSQL’s most powerful capabilities work at any scale, allowing you to handle complex data-intensive workloads on a single database system.

Getting Started

The quickest way to get started with Citus is to use the Azure Cosmos DB for PostgreSQL managed service in the cloud—or set up Citus locally.

Citus Managed Service on Azure

You can get a fully-managed Citus cluster in minutes through the Azure Cosmos DB for PostgreSQL portal. Azure will manage your backups, high availability through auto-failover, software updates, monitoring, and more for all of your servers. To get started Citus on Azure, use the Azure Cosmos DB for PostgreSQL Quickstart.

Running Citus using Docker

The smallest possible Citus cluster is a single PostgreSQL node with the Citus extension, which means you can try out Citus by running a single Docker container.

# run PostgreSQL with Citus on port 5500
docker run -d --name citus -p 5500:5432 -e POSTGRES_PASSWORD=mypassword citusdata/citus

# connect using psql within the Docker container
docker exec -it citus psql -U postgres

# or, connect using local psql
psql -U postgres -d postgres -h localhost -p 5500

Install Citus locally

If you already have a local PostgreSQL installation, the easiest way to install Citus is to use our packaging repo

Install packages on Ubuntu / Debian:

curl https://install.citusdata.com/community/deb.sh > add-citus-repo.sh
sudo bash add-citus-repo.sh
sudo apt-get -y install postgresql-16-citus-12.1

Install packages on CentOS / Red Hat:

curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh
sudo bash add-citus-repo.sh
sudo yum install -y citus121_16

To add Citus to your local PostgreSQL database, add the following to postgresql.conf:

shared_preload_libraries = 'citus'

After restarting PostgreSQL, connect using psql and run:

CREATE EXTENSION citus;

You’re now ready to get started and use Citus tables on a single node.

Install Citus on multiple nodes

If you want to set up a multi-node cluster, you can also set up additional PostgreSQL nodes with the Citus extensions and add them to form a Citus cluster:

-- before adding the first worker node, tell future worker nodes how to reach the coordinator
SELECT citus_set_coordinator_host('10.0.0.1', 5432);

-- add worker nodes
SELECT citus_add_node('10.0.0.2', 5432);
SELECT citus_add_node('10.0.0.3', 5432);

-- rebalance the shards over the new worker nodes
SELECT rebalance_table_shards();

For more details, see our documentation on how to set up a multi-node Citus cluster on various operating systems.

Using Citus

Once you have your Citus cluster, you can start creating distributed tables, reference tables and use columnar storage.

Creating Distributed Tables

The create_distributed_table UDF will transparently shard your table locally or across the worker nodes:

CREATE TABLE events (
  device_id bigint,
  event_id bigserial,
  event_time timestamptz default now(),
  data jsonb not null,
  PRIMARY KEY (device_id, event_id)
);

-- distribute the events table across shards placed locally or on the worker nodes
SELECT create_distributed_table('events', 'device_id');

After this operation, queries for a specific device ID will be efficiently routed to a single worker node, while queries across device IDs will be parallelized across the cluster.

-- insert some events
INSERT INTO events (device_id, data)
SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,1000000) s;

-- get the last 3 events for device 1, routed to a single node
SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3;
┌───────────┬──────────┬───────────────────────────────┬───────────────────────────────────────┐
│ device_id │ event_id │          event_time           │                 data                  │
├───────────┼──────────┼───────────────────────────────┼───────────────────────────────────────┤
│         1 │  1999901 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.88722643925054}     │
│         1 │  1999801 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.6512231304621992}   │
│         1 │  1999701 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.019368766051897524} │
└───────────┴──────────┴───────────────────────────────┴───────────────────────────────────────┘
(3 rows)

Time: 4.588 ms

-- explain plan for a query that is parallelized across shards, which shows the plan for
-- a query one of the shards and how the aggregation across shards is done
EXPLAIN (VERBOSE ON) SELECT count(*) FROM events;
┌────────────────────────────────────────────────────────────────────────────────────┐
│                                     QUERY PLAN                                     │
├────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate                                                                          │
│   Output: COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint)       │
│   ->  Custom Scan (Citus Adaptive)                                                 │
│         ...                                                                        │
│         ->  Task                                                                   │
│               Query: SELECT count(*) AS count FROM events_102008 events WHERE true │
│               Node: host=localhost port=5432 dbname=postgres                       │
│               ->  Aggregate                                                        │
│                     ->  Seq Scan on public.events_102008 events                    │
└────────────────────────────────────────────────────────────────────────────────────┘

Creating Distributed Tables with Co-location

Distributed tables that have the same distribution column can be co-located to enable high performance distributed joins and foreign keys between distributed tables. By default, distributed tables will be co-located based on the type of the distribution column, but you define co-location explicitly with the colocate_with argument in create_distributed_table.

CREATE TABLE devices (
  device_id bigint primary key,
  device_name text,
  device_type_id int
);
CREATE INDEX ON devices (device_type_id);

-- co-locate the devices table with the events table
SELECT create_distributed_table('devices', 'device_id', colocate_with := 'events');

-- insert device metadata
INSERT INTO devices (device_id, device_name, device_type_id)
SELECT s, 'device-'||s, 55 FROM generate_series(0, 99) s;

-- optionally: make sure the application can only insert events for a known device
ALTER TABLE events ADD CONSTRAINT device_id_fk
FOREIGN KEY (device_id) REFERENCES devices (device_id);

-- get the average measurement across all devices of type 55, parallelized across shards
SELECT avg((data->>'measurement')::double precision)
FROM events JOIN devices USING (device_id)
WHERE device_type_id = 55;

┌────────────────────┐
│        avg         │
├────────────────────┤
│ 0.5000191877513974 │
└────────────────────┘
(1 row)

Time: 209.961 ms

Co-location also helps you scale INSERT..SELECT, stored procedures, and distributed transactions.

Distributing Tables without interrupting the application

Some of you already start with Postgres, and decide to distribute tables later on while your application using the tables. In that case, you want to avoid downtime for both reads and writes. create_distributed_table command block writes (e.g., DML commands) on the table until the command is finished. Instead, with create_distributed_table_concurrently command, your application can continue to read and write the data even during the command.

CREATE TABLE device_logs (
  device_id bigint primary key,
  log text
);

-- insert device logs
INSERT INTO device_logs (device_id, log)
SELECT s, 'device log:'||s FROM generate_series(0, 99) s;

-- convert device_logs into a distributed table without interrupting the application
SELECT create_distributed_table_concurrently('device_logs', 'device_id', colocate_with := 'devices');


-- get the count of the logs, parallelized across shards
SELECT count(*) FROM device_logs;

┌───────┐
│ count │
├───────┤
│   100 │
└───────┘
(1 row)

Time: 48.734 ms

Creating Reference Tables

When you need fast joins or foreign keys that do not include the distribution column, you can use create_reference_table to replicate a table across all nodes in the cluster.

CREATE TABLE device_types (
  device_type_id int primary key,
  device_type_name text not null unique
);

-- replicate the table across all nodes to enable foreign keys and joins on any column
SELECT create_reference_table('device_types');

-- insert a device type
INSERT INTO device_types (device_type_id, device_type_name) VALUES (55, 'laptop');

-- optionally: make sure the application can only insert devices with known types
ALTER TABLE devices ADD CONSTRAINT device_type_fk
FOREIGN KEY (device_type_id) REFERENCES device_types (device_type_id);

-- get the last 3 events for devices whose type name starts with laptop, parallelized across shards
SELECT device_id, event_time, data->>'measurement' AS value, device_name, device_type_name
FROM events JOIN devices USING (device_id) JOIN device_types USING (device_type_id)
WHERE device_type_name LIKE 'laptop%' ORDER BY event_time DESC LIMIT 3;

┌───────────┬───────────────────────────────┬─────────────────────┬─────────────┬──────────────────┐
│ device_id │          event_time           │        value        │ device_name │ device_type_name │
├───────────┼───────────────────────────────┼─────────────────────┼─────────────┼──────────────────┤
│        60 │ 2021-03-04 16:00:31.189963+00 │ 0.28902084163415864 │ device-60   │ laptop           │
│         8 │ 2021-03-04 16:00:31.189963+00 │ 0.8723803076285073  │ device-8    │ laptop           │
│        20 │ 2021-03-04 16:00:31.189963+00 │ 0.8177634801548557  │ device-20   │ laptop           │
└───────────┴───────────────────────────────┴─────────────────────┴─────────────┴──────────────────┘
(3 rows)

Time: 146.063 ms

Reference tables enable you to scale out complex data models and take full advantage of relational database features.

Creating Tables with Columnar Storage

To use columnar storage in your PostgreSQL database, all you need to do is add USING columnar to your CREATE TABLE statements and your data will be automatically compressed using the columnar access method.

CREATE TABLE events_columnar (
  device_id bigint,
  event_id bigserial,
  event_time timestamptz default now(),
  data jsonb not null
)
USING columnar;

-- insert some data
INSERT INTO events_columnar (device_id, data)
SELECT d, '{"hello":"columnar"}' FROM generate_series(1,10000000) d;

-- create a row-based table to compare
CREATE TABLE events_row AS SELECT * FROM events_columnar;

-- see the huge size difference!
\d+
                                          List of relations
┌────────┬──────────────────────────────┬──────────┬───────┬─────────────┬────────────┬─────────────┐
│ Schema │             Name             │   Type   │ Owner │ Persistence │    Size    │ Description │
├────────┼──────────────────────────────┼──────────┼───────┼─────────────┼────────────┼─────────────┤
│ public │ events_columnar              │ table    │ marco │ permanent   │ 25 MB      │             │
│ public │ events_row                   │ table    │ marco │ permanent   │ 651 MB     │             │
└────────┴──────────────────────────────┴──────────┴───────┴─────────────┴────────────┴─────────────┘
(2 rows)

You can use columnar storage by itself, or in a distributed table to combine the benefits of compression and the distributed query engine.

When using columnar storage, you should only load data in batch using COPY or INSERT..SELECT to achieve good compression. Update, delete, and foreign keys are currently unsupported on columnar tables. However, you can use partitioned tables in which newer partitions use row-based storage, and older partitions are compressed using columnar storage.

To learn more about columnar storage, check out the columnar storage README.

Schema-based sharding

Available since Citus 12.0, schema-based sharding is the shared database, separate schema model, the schema becomes the logical shard within the database. Multi-tenant apps can a use a schema per tenant to easily shard along the tenant dimension. Query changes are not required and the application usually only needs a small modification to set the proper search_path when switching tenants. Schema-based sharding is an ideal solution for microservices, and for ISVs deploying applications that cannot undergo the changes required to onboard row-based sharding.

Creating distributed schemas

You can turn an existing schema into a distributed schema by calling citus_schema_distribute:

SELECT citus_schema_distribute('user_service');

Alternatively, you can set citus.enable_schema_based_sharding to have all newly created schemas be automatically converted into distributed schemas:

SET citus.enable_schema_based_sharding TO ON;

CREATE SCHEMA AUTHORIZATION user_service;
CREATE SCHEMA AUTHORIZATION time_service;
CREATE SCHEMA AUTHORIZATION ping_service;

Running queries

Queries will be properly routed to schemas based on search_path or by explicitly using the schema name in the query.

For microservices you would create a USER per service matching the schema name, hence the default search_path would contain the schema name. When connected the user queries would be automatically routed and no changes to the microservice would be required.

CREATE USER user_service;
CREATE SCHEMA AUTHORIZATION user_service;

For typical multi-tenant applications, you would set the search path to the tenant schema name in your application:

SET search_path = tenant_name, public;

Setting up with High Availability

One of the most popular high availability solutions for PostgreSQL, Patroni 3.0, has first class support for Citus 10.0 and above, additionally since Citus 11.2 ships with improvements for smoother node switchover in Patroni.

An example of patronictl list output for the Citus cluster:

postgres@coord1:~$ patronictl list demo

+ Citus cluster: demo ----------+--------------+---------+----+-----------+
| Group | Member  | Host        | Role         | State   | TL | Lag in MB |
+-------+---------+-------------+--------------+---------+----+-----------+
|     0 | coord1  | 172.27.0.10 | Replica      | running |  1 |         0 |
|     0 | coord2  | 172.27.0.6  | Sync Standby | running |  1 |         0 |
|     0 | coord3  | 172.27.0.4  | Leader       | running |  1 |           |
|     1 | work1-1 | 172.27.0.8  | Sync Standby | running |  1 |         0 |
|     1 | work1-2 | 172.27.0.2  | Leader       | running |  1 |           |
|     2 | work2-1 | 172.27.0.5  | Sync Standby | running |  1 |         0 |
|     2 | work2-2 | 172.27.0.7  | Leader       | running |  1 |           |
+-------+---------+-------------+--------------+---------+----+-----------+

Documentation

If you’re ready to get started with Citus or want to know more, we recommend reading the Citus open source documentation. Or, if you are using Citus on Azure, then the Azure Cosmos DB for PostgreSQL is the place to start.

Our Citus docs contain comprehensive use case guides on how to build a multi-tenant SaaS application, real-time analytics dashboard, or work with time series data.

Architecture

A Citus database cluster grows from a single PostgreSQL node into a cluster by adding worker nodes. In a Citus cluster, the original node to which the application connects is referred to as the coordinator node. The Citus coordinator contains both the metadata of distributed tables and reference tables, as well as regular (local) tables, sequences, and other database objects (e.g. foreign tables).

Data in distributed tables is stored in “shards”, which are actually just regular PostgreSQL tables on the worker nodes. When querying a distributed table on the coordinator node, Citus will send regular SQL queries to the worker nodes. That way, all the usual PostgreSQL optimizations and extensions can automatically be used with Citus.

When you send a query in which all (co-located) distributed tables have the same filter on the distribution column, Citus will automatically detect that and send the whole query to the worker node that stores the data. That way, arbitrarily complex queries are supported with minimal routing overhead, which is especially useful for scaling transactional workloads. If queries do not have a specific filter, each shard is queried in parallel, which is especially useful in analytical workloads. The Citus distributed executor is adaptive and is designed to handle both query types at the same time on the same system under high concurrency, which enables large-scale mixed workloads.

The schema and metadata of distributed tables and reference tables are automatically synchronized to all the nodes in the cluster. That way, you can connect to any node to run distributed queries. Schema changes and cluster administration still need to go through the coordinator.

Detailed descriptions of the implementation for Citus developers are provided in the Citus Technical Documentation.

When to use Citus

Citus is uniquely capable of scaling both analytical and transactional workloads with up to petabytes of data. Use cases in which Citus is commonly used:

Customer-facing analytics dashboards: Citus enables you to build analytics dashboards that simultaneously ingest and process large amounts of data in the database and give sub-second response times even with a large number of concurrent users.

The advanced parallel, distributed query engine in Citus combined with PostgreSQL features such as array types, JSONB, lateral joins, and extensions like HyperLogLog and TopN allow you to build responsive analytics dashboards no matter how many customers or how much data you have.

Example real-time analytics users: Algolia
Time series data: Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day.

Citus integrates seamlessly with Postgres table partitioning and has built-in functions for partitioning by time, which can speed up queries and writes on time series tables. You can take advantage of Citus’s parallel, distributed query engine for fast analytical queries, and use the built-in columnar storage to compress old partitions.

Example users: MixRank
Software-as-a-service (SaaS) applications: SaaS and other multi-tenant applications need to be able to scale their database as the number of tenants/customers grows. Citus enables you to transparently shard a complex data model by the tenant dimension, so your database can grow along with your business.

By distributing tables along a tenant ID column and co-locating data for the same tenant, Citus can horizontally scale complex (tenant-scoped) queries, transactions, and foreign key graphs. Reference tables and distributed DDL commands make database management a breeze compared to manual sharding. On top of that, you have a built-in distributed query engine for doing cross-tenant analytics inside the database.

Example multi-tenant SaaS users: Salesloft, ConvertFlow
Microservices: Citus supports schema based sharding, which allows distributing regular database schemas across many machines. This sharding methodology fits nicely with typical Microservices architecture, where storage is fully owned by the service hence can’t share the same schema definition with other tenants. Citus allows distributing horizontally scalable state across services, solving one of the main problems of microservices.
Geospatial: Because of the powerful PostGIS extension to Postgres that adds support for geographic objects into Postgres, many people run spatial/GIS applications on top of Postgres. And since spatial location information has become part of our daily life, well, there are more geospatial applications than ever. When your Postgres database needs to scale out to handle an increased workload, Citus is a good fit.

Example geospatial users: Helsinki Regional Transportation Authority (HSL), MobilityDB.

Need Help?

Slack: Ask questions in our Citus community Slack channel.
GitHub issues: Please submit issues via GitHub issues.
Documentation: Our Citus docs have a wealth of resources, including sections on query performance tuning, useful diagnostic queries, and common error messages.
Docs issues: You can also submit documentation issues via GitHub issues for our Citus docs.
Updates & Release Notes: Learn about what's new in each Citus version on the Citus Updates page.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Stay Connected

Twitter: Follow us @citusdata to track the latest posts & updates on what’s happening.
Citus Blog: Read our popular Citus Open Source Blog for posts about PostgreSQL and Citus.
Citus Newsletter: Subscribe to our monthly technical Citus Newsletter to get a curated collection of our favorite posts, videos, docs, talks, & other Postgres goodies.
Slack: Our Citus Public slack is a good way to stay connected, not just with us but with other Citus users.
Sister Blog: Read the PostgreSQL posts on the Azure Cosmos DB for PostgreSQL blog about our managed service on Azure.
Videos: Check out this YouTube playlist of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slot’s talk at Carnegie Mellon titled Citus: Distributed PostgreSQL as an Extension that was part of Andy Pavlo’s Vaccination Database Talks series at CMUDB.
Our other Postgres projects: Our team also works on other awesome PostgreSQL open source extensions & projects, including: pg_cron, HyperLogLog, TopN, pg_auto_failover, activerecord-multi-tenant, and django-multitenant.

citus's People

Contributors

Stargazers

Watchers

Forkers

supr directorscut82 ticketscale eeeebbbbrrrr varver jghoman flinkt gurjeet uikit0 r1015 a320321wb dieface digoal hiproz shengxianye ajmssc daamien dailypipsgxj xluffy-fork kast0rtr0y thisiswei ezhangle codearchival kkkato xubingyue lemonhall delkyd fengshao0907 dama2010 fdr sharper bogdan-pr eydunn inverselina zoujiaqing caidongyun chinpeng eric-seekas tharanga-abeyseela drob bhanug cuulee madhawa ejchet erhuabushuo muxinc tinhol travisjeffery theory magrinya mapbased amosbird credativ dadbob payssion jnoliver2 constructagility is00hcw zhoudaqing joliny dut3062796s sbilly pombredanne fnet123 rucky2013 mingxuanchen liqingrikeiikyeong zmyer iokays ozgune ralamgstromg skyformat99 devin001 michelp stevenyao968 a520ass joerg84 segmond robin900 vlubarsky shangshujie365 number0 blunney1 slowisfast168 samjaninf stephensong teotikalki forkeer smkingsoft ictlyh qcjxberin joe2hpimn edwardbetts 7758285 liuzhenhzong tighterman eagle518 wahello vincentchen tpetmanson

citus's Issues

Task batching improvements in task-tracker executor

The task tracker executor runs into performance bottleneck when assigning and tracking a high number of tasks (1M+). @marcocitus has a change that batches task assignment and task tracking queries together and this change notably improves the task tracker executor's performance.

Subselect push down #3

We split the subselect push down project into three projects. This task refers to complex subselect queries that can be pushed down to worker nodes for human real-time queries. These complex queries are mostly applicable in the context of session and funnel analytics queries.

Distributed EXPLAIN

@marcocitus has a pull request out for distributed EXPLAIN. Marco mentioned that @samay-sharma could be a good person to review these changes. We need to document what distributed EXPLAIN covers and review the pull request.

Simpler single node install (Docker image) and/or OS X install

Most new users start with trying out unofficial Citus Docker images. We also currently don't have an install story except from source for OS X. We need to have simpler single node install (Docker image) and/or OS X install instructions.

Repartitioned subselects over multiple tables

We need to implement repartitioned subselects over multiple tables. Internally, we refer to this project as subselect #2.

Write a user-defined function to propagate DELETE commands to all shards

Write a user-defined function that extends and propagates DELETE commands to all the shards. This function doesn't need to be as safe as functionality that's built into Citus and should help in removing the need for customers to write their own scripts.

Masterless: Requirements and design doc

We need to have separate requirements and design documents. We started working on these as one document, but haven't reached a conclusion yet.

https://docs.google.com/document/d/13CNsb87A5BMwEvAdKcI9eILyWa7UYBE-cBu3iKfuoRI/edit#heading=h.2n8vrtd2ibzj

Distributed materialized views

Citus users currently create aggregate tables in the following way: they pick a distribution column, for example customer_id. They then ingest all event data related to the customer_id. They then create roll-up tables for per-hour and per-day aggregates on customer_id. Since all tables are hash partitioned on customer_id, both raw data and roll-up tables end up being co-located on the same node.

Citus users currently use PL/PgSQL or simple scripts to create these roll-up tables. We need to make this simpler. One way is by offering a UDF that propagates certain DDL commands.

This issue could also be a duplicate of #11.

Improve and test parallel copy script for hash partitioned tables

Improve and test the parallel loading script to replace copy_from_distributed_table in CitusDB 5

Outer Join Improvements: Generate replacement for prunable outer join

We need to generate replacement for prunable outer join (e.g. join with an empty table).

Upgrading from Citus 4.0 to the extension

We currently document between Citus versions here: https://www.citusdata.com/documentation/citusdb-documentation/admin_guide/upgrading_citusdb.html

We should revise these steps and test for the upgrade from 4.0 to the new extension.

Simplify data migration from PostgreSQL to Citus

Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).

This issue has several components to it and each one would be beneficial in isolation:

Migrate data from an existing PostgreSQL database to the Citus coordinator. AWS has a data migration service for Postgres that could be worth looking into.
- Do we provide this for Citus Cloud (managed), AWS, or on-prem deployments?
- Do we take any downtime when replicating the data?
- Do we follow an approach that uses logical replication (Slony, pg_logical) or physical replication?
Load data in Citus coordinator into distributed tables
- One way to do that is by running an INSERT INTO ... SELECT. #782 and #1117 provides a good workaround for this step.
- Do we take any downtime when replicating the data?
Enable schema migrations for the multi-tenant data model. During migrations, Citus may require changes to the underlying data definition. For example:
- You may need to add a tenant_id column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.
- You may then need to change your primary key or foreign key declarations.
Enable schema migrations from "one schema per tenant" databases to "shared tables." The Apartment gem & corresponding blog post talks about the "one schema per tenant" approach. We could look to easily migrate prospective users to Citus' multi-tenant model.
Enable schema migrations from other relational databases to PostgreSQL. AWS has a schema migration tool that may be worth looking into.
Automate data remodeling for the multi-tenant use case. In this migration task, we'd write software to automate the following: understand the current table schema (likely in the relational database model), pick the table that's at the top of the hierarchy, and convert the relational database model into the hierarchical one while also adding the tenant_id column to the corresponding tables.

Masterless: Propagate metadata changes to all nodes

We need to propagate metadata changes to all nodes. That is, when the user creates a distributed table or creates new shards for that distributed table, we need to propagate these changes to all nodes in the cluster. For this, we could use the 2PC protocol built-into Postgres or pg_paxos.

Real-time PostgreSQL: Blog entry, webinar, and documentation

Once we pick and implement an approach for roll-up tables in #38, we need to gather feedback and create content to communicate this approach.

COPY FROM for hash partitioned tables

We currently don't have a native way to bulk insert data into hash partitioned tables. The built-in copy_to_distributed_table script only supports certain COPY arguments. More importantly, this script uses triggers to ingest data and therefore doesn't have desirable performance characteristics.

What are we building? [bulk ingest into hash-partitioned tables?]

COPY for CitusDB with the goal of:

Expanding (Postgre)SQL support
Bulk ingesting into hash and range-partitioned tables

Who are we building it for?

Users of hash-partitioned and range-partitioned tables, e.g. co-located join and key-value scenarios. Could be used for experimenting with sample data, initial data load, or loading production data. The capacity is limited to a certain number of cores in single master world (which allows tens of thousands of rows/s per core).

What is the user experience?

The current code has a configurable transaction manager, which allows for 3 models:

2PC model:

set-up: max_prepared_transactions to > 0 on all worker nodes
usage: \COPY or COPY as per PostgreSQL docs
recovery: call a UDF (probably)

regular model:

set-up: none
usage: \COPY or COPY as per PostgreSQL docs
recovery: none, data can be partially ingested

choice model:

user can choose between the former 2

superuser is required for COPY .. FROM 'file', but not \COPY .. FROM 'file'.

Performance on a typical cluster: What must/should be our throughput on a single core? On multiple cores?

Should be 100x faster than copy_to_distributed_table on a single core, scalable by the number of cores.

Failure semantics: What must/should be our behavior on (a) bad data and (b) node failures?

The current code has a configurable transaction manager which offers 2 models:

2PC model:
(a) roll back transaction
(b) worker failure before copy - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - roll back/forward transaction upon recovery
master failure - roll back/forward transaction upon recovery

regular model:
(a) roll back transaction
(b) worker failure before - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - leave partially copied data
master failure before or during copy - roll back transaction
master failure during commit - leave partially copied data

choice model:

user can choose between the former 2

Delivery mechanism: How do we deliver this to the customer? Is this a script, a binary, or something that gets checked into product?

As part of the CitusDB extension.

Better data ingest for append partitioned tables

Citus users copy data into their cluster by starting up csql and using the \stage command. This way, data doesn't flow through the master node and the master doesn't become a bottleneck.

This approach however has the usability drawback that users can't ingest data into append partitioned tables using standard PostgreSQL connectors. We need to offer a more native and also scalable alternative to \stage.

Outer Join Improvements: Integration with subqueries

Subquery push downs currently have a separate code path to enable pushing down of outer joins. Also, repartitioned subselects don't yet have support for joins. We need to integrate outer join logic with the current subquery logic.

Simpler multiple node install: Installation script that uses SSH

Users manually install our packages on each node and edit relevant config. We need to have simpler multiple node install steps. We could do this by using an installation scripts that uses SSH.

Outer Join post-5.0 Improvements

Citus 5.0 has partial support for outer join queries. We'd now like to refactor and expand on this feature's scope.

Design unified join order planner (high priority)
Implement unified join order planner (high priority)
Generate replacement for prunable outer join (e.g. join with empty table)
Do not push down filters on outer join output in re-partition joins
Create bucket(s) for non-matching values in single-repartition outer joins
Integration with subqueries

Masterless: Replication group automated fail-over: Design document

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

This task investigates current fail-over solutions for PostgreSQL, understands their use and popularity, and documents them.

Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.

Outer Join Improvements: Do not push down filters on outer join output in re-partition joins

We shouldn't push down filters on outer join output in re-partition joins.

Implement master_aggregate_table_shards(source table, destination table, aggregation_query)

Our documentation refers to the user-defined function master_aggregate_table_shards to help users create distributed materialized views. We need to implement this function or remove the reference to it from our documentation.

https://www.citusdata.com/documentation/citusdb-documentation/examples/id_querying_aggregated_data.html

The mechanism through which we implement this function could potentially be similar to #10

Remove master node (masterless)

Citus currently has a master node that holds authoritative metadata. This issue is to remove the master node from the Citus architecture and make sure that all nodes can ingest data and hold metadata.

The subtasks for this project are:

Requirements and design document (#18)
Propagate metadata changes to all nodes (2PC or pg_paxos) (#19)
Replication group automated fail-over (Requirements document: review existing solutions) (#20)
Replication group fail-over (#21)
Replication group initialization logic (#22)
Educating users on PostgreSQL's streaming replication (#23)
Answer compatibility questions with existing Citus deployments (#24)

Masterless: Answer compatibility questions with existing Citus deployments

Citus currently replicates incoming changes to all shard placements. If a shard placement is unavailable, Citus marks the placement as invalid.

This approach is different than having all changes go through a primary in a replication group. We need to answer compatibility questions with Citus' existing replication model.

Outer Join Improvements: Create bucket(s) for non-matching values

We need to create bucket(s) for non-matching values in single repartition outer joins.

Fast OLAP tutorial: Decide on the underlying installation platform (Docker, VM, etc.)

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers how we intend to distribute and have Citus installed for the tutorials (apt-get, Docker, VM, etc.).

This item is a subtask of #3

Create tutorial that shows fast aggregations and sub-second OLAP queries

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. These tutorials should enable customers to set up a Citus cluster themselves and run OLAP queries on real-time data.

A rough breakdown of these tasks include:

Decide on the underlying installation platform (Docker, VM, etc.)
Revisit the examples section in documentation
Write an example server script to help with installation and start-up
Introduce a cache in front of immutable shards (for historical data) to cache query fragment results
Write an example client script to read example data in real-time and insert it in
Install HyperLogLog and topN extensions and aggregate functions for them out of the box

Fast OLAP tutorial: Revisit the Examples section in our documentation

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask revisits the Examples section in our documentation.

This item is a subtask of #3

Fast OLAP tutorial: Write an example client script to read example data in real-time and insert it in

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers writing an example client script to read example data in real-time and insert it in.

This item is a subtask of #3

PostgreSQL extension that can skip faulty lines in COPY

When users load large data sets (from S3 or files), these datasets might have a few bad records. Most data warehousing solutions can be configured to skip over a predefined number of bad lines.

This has also been discussed for PostgreSQL: https://wiki.postgresql.org/wiki/Error_logging_in_COPY

This task proposes to extend COPY to skip over a configurable number of records.

Session analytics package on single node

Funnel and cohort queries in SQL (PostgreSQL) are hard to execute in human real-time. The session analytics package improves performance for funnel queries by 10-100x; it owes this to an array based execution model.

http://www.redbook.io/ch8-interactive.html ("array-based" executor)

We need to consider writing a tutorial for the session analytics package on a single node. For multiple nodes, this issue also relates to #41.

HLL, topN, histogram packaging and communication

We offer several extensions to our customers to enable human real-time queries. We could consider packaging these extensions together and communicating their benefits. The extensions I can think of are HyperLogLog (HLL), topN, histogram, and approximate percentile.

Masterless: Replication group automated fail-over

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

We will first investigate existing solutions in #20 and come up with a design document. This task then implements and tests the picked solution.

Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.

Masterless: Educating users on streaming replication

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

Most users get confused by PostgreSQL's documentation on streaming replication. This chapter communicates various alternatives and feels like a choose your own adventure guide. We need to find a way to better articulate how streaming replication works -- @anarazel had an internal presentation that was pretty insightful.

PostgreSQL "better" materialized views

PostgreSQL's materialized views don't get updated on-demand. Users need to refresh a materialized view, and the refresh command discards old contents and completely replaces the contents of a materialized view.

Commercial databases incrementally update a materialized view's contents -- this is particularly helpful for aggregate queries. This task involves improving PostgreSQL's materialized views for incremental updates.

Master node setup: Instructions for streaming replica + load balancer setup; etc.

When users have questions around master node failover, we currently point them to relevant sections in the PostgreSQL manual. We need to have more streamlined steps / scripts around setting up streaming replication and load balancer.

This task may need a requirements document.

Publish TPC-DS benchmarks on PostgreSQL

TPC-DS benchmarks are new and target mixed workloads. The TPC-DS website shows that there are currently no published benchmark results.

We considered publishing benchmark results for PostgreSQL. However, the benchmark looked too long and complex for us to prioritize this ahead of other activities.

Fast OLAP tutorial: Install HyperLogLog and topN extensions and aggregate functions for them out of the box

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers installing HyperLogLog and topN extensions and aggregate functions for them out of the box.

This item is a subtask of #3

Masterless: Replication group initialization logic

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

We need to write scripts / functions to easily set up and configure these replication groups.

When writing this logic, if we introduce new dependencies such as a new scripting language, we should also think about incorporating them into #13 and #14.

Outer Join Improvements: Implement unified join order planner

We currently have join order planning logic spread across the join order and logical planners. This item unifies these two code paths.

Propagate DDL commands to workers v2

Citus 5.0 propagates Alter Table and Create Index commands to worker nodes. We implemented this feature using Citus' current replication model. We also decided to switch to using 2PC (or pg_paxos) once the metadata propagation changes were implemented.

This issue tracks v2 of the DDL propagation changes and depends on #19.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble