GithubHelp home page GithubHelp logo

PipelineDB has joined Confluent, read the blog post here.

PipelineDB will not have new releases beyond 1.0.0, although critical bugs will still be fixed.

PipelineDB

Gitter chat Twitter

Overview

PipelineDB is a PostgreSQL extension for high-performance time-series aggregation, designed to power realtime reporting and analytics applications.

PipelineDB allows you to define continuous SQL queries that perpetually aggregate time-series data and store only the aggregate output in regular, queryable tables. You can think of this concept as extremely high-throughput, incrementally updated materialized views that never need to be manually refreshed.

Raw time-series data is never written to disk, making PipelineDB extremely efficient for aggregation workloads.

Continuous queries produce their own output streams, and thus can be chained together into arbitrary networks of continuous SQL.

PostgreSQL compatibility

PipelineDB runs on 64-bit architectures and currently supports the following PostgreSQL versions:

  • PostgreSQL 10: 10.1, 10.2, 10.3, 10.4, 10.5
  • PostgreSQL 11: 11.0

Getting started

If you just want to start using PipelineDB right away, head over to the installation docs to get going.

If you'd like to build PipelineDB from source, keep reading!

Building from source

Since PipelineDB is a PostgreSQL extension, you'll need to have the PostgreSQL development packages installed to build PipelineDB.

Next you'll have to install ZeroMQ which PipelineDB uses for inter-process communication. Here's a gist with instructions to build and install ZeroMQ from source. You'll also need to install some Python dependencies if you'd like to run PipelineDB's Python test suite:

pip install -r src/test/py/requirements.txt

Build PipelineDB:

Once PostgreSQL is installed, you can build PipelineDB against it:

make USE_PGXS=1
make install

Test PipelineDB (optional)

Run the following command:

make test

Bootstrap the PipelineDB environment

Create PipelineDB's physical data directories, configuration files, etc:

make bootstrap

make bootstrap only needs to be run the first time you install PipelineDB. The resources that make bootstrap creates may continue to be used as you change and rebuild PipeineDB.

Run PipelineDB

Run all of the daemons necessary for PipelineDB to operate:

make run

Enter Ctrl+C to shut down PipelineDB.

make run uses the binaries in the PipelineDB source root compiled by make, so you don't need to make install before running make run after code changes--only make needs to be run.

The basic development flow is:

make
make run
^C

# Make some code changes...
make
make run

Send PipelineDB some data

Now let's generate some test data and stream it into a simple continuous view. First, create the stream and the continuous view that reads from it:

$ psql
=# CREATE FOREIGN TABLE test_stream (key integer, value integer) SERVER pipelinedb;
CREATE FOREIGN TABLE
=# CREATE VIEW test_view WITH (action=materialize) AS SELECT key, COUNT(*) FROM test_stream GROUP BY key;
CREATE VIEW

Events can be emitted to PipelineDB streams using regular SQL INSERTS. Any INSERT target that isn't a table is considered a stream by PipelineDB, meaning streams don't need to have a schema created in advance. Let's emit a single event into the test_stream stream since our continuous view is reading from it:

$ psql
=# INSERT INTO test_stream (key, value) VALUES (0, 42);
INSERT 0 1

The 1 in the INSERT 0 1 response means that 1 event was emitted into a stream that is actually being read by a continuous query. Now let's insert some random data:

=# INSERT INTO test_stream (key, value) SELECT random() * 10, random() * 10 FROM generate_series(1, 100000);
INSERT 0 100000

Query the continuous view to verify that the continuous view was properly updated. Were there actually 100,001 events counted?

$ psql -c "SELECT sum(count) FROM test_view"
  sum
-------
100001
(1 row)

What were the 10 most common randomly generated keys?

$ psql -c "SELECT * FROM test_view ORDER BY count DESC limit 10"
key  | count 
-----+-------
 2   | 10124
 8   | 10100
 1   | 10042
 7   |  9996
 4   |  9991
 5   |  9977
 3   |  9963
 6   |  9927
 9   |  9915
10   |  4997
 0   |  4969

(11 rows)

PipelineDB's Projects

docker icon docker

Dockerfiles for PipelineDB Docker images

docs icon docs

PipelineDB Documentation

pipelinedb icon pipelinedb

High-performance time-series aggregation for PostgreSQL

postgis icon postgis

PostGIS spatial database extension to PipelineDB

talks icon talks

Slides and other content used in talks given by PipelineDB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.