Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. Archive older data on a rolling basis to keep your database size under control.
🍊 Battle-tested at Instacart
pgslice is a command line tool. To install, run:
gem install pgslice
This will give you the pgslice
command.
-
Ensure the table you want to partition has been created. We’ll refer to this as
<table>
. -
Specify your database credentials
export PGSLICE_URL=postgres://localhost/myapp_development
- Create an intermediate table
pgslice prep <table> <column> <period>
Period can be day
or month
.
This creates a table named <table>_intermediate
with the appropriate trigger for partitioning.
- Add partitions
pgslice add_partitions <table> --intermediate --past 3 --future 3
This creates child tables that inherit from the intermediate table.
Use the --past
and --future
options to control the number of partitions.
- Optional, for tables with data - Fill the partitions in batches with data from the original table
pgslice fill <table>
Use the --batch-size
and --sleep
options to control the speed.
To sync data across different databases, check out pgsync.
- Analyze tables
pgslice analyze <table>
- Swap the intermediate table with the original table
pgslice swap <table>
The original table is renamed <table>_retired
and the intermediate table is renamed <table>
.
- Fill the rest (rows inserted between the first fill and the swap)
pgslice fill <table> --swapped
- Back up the retired table with a tool like pg_dump and drop it
pg_dump -c -Fc -t <table>_retired $PGSLICE_URL > <table>_retired.dump
psql -c "DROP <table>_retired" $PGSLICE_URL
pgslice prints the SQL commands that were executed on the server. To print without executing, use the --dry-run
option.
pgslice prep visits created_at month
-- Postgres 10
BEGIN;
CREATE TABLE visits_intermediate (LIKE visits INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS) PARTITION BY RANGE (created_at);
COMMENT ON TABLE visits_intermediate is 'column:created_at,period:day';
COMMIT;
-- Postgres 9.6 and below
BEGIN;
CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL);
CREATE FUNCTION visits_insert_trigger()
RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'Create partitions first.';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER visits_insert_trigger
BEFORE INSERT ON visits_intermediate
FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger();
COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month';
COMMIT;
pgslice add_partitions visits --intermediate --past 1 --future 1
-- Postgres 10
BEGIN;
CREATE TABLE visits_201608 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-08-01') TO ('2016-09-01');
ALTER TABLE visits_201608 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201608 USING btree (user_id);
CREATE TABLE visits_201609 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-09-01') TO ('2016-10-01');
ALTER TABLE visits_201609 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201609 USING btree (user_id);
CREATE TABLE visits_201610 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-10-01') TO ('2016-11-01');
ALTER TABLE visits_201610 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201610 USING btree (user_id);
COMMIT;
-- Postgres 9.6 and below
BEGIN;
CREATE TABLE visits_201608
(CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201608 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201608 USING btree (user_id);
CREATE TABLE visits_201609
(CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201609 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201609 USING btree (user_id);
CREATE TABLE visits_201610
(CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201610 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201610 USING btree (user_id);
CREATE OR REPLACE FUNCTION visits_insert_trigger()
RETURNS trigger AS $$
BEGIN
IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN
INSERT INTO visits_201609 VALUES (NEW.*);
ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN
INSERT INTO visits_201610 VALUES (NEW.*);
ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN
INSERT INTO visits_201608 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Ensure partitions are created.';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
COMMIT;
pgslice fill visits
/* 1 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
SELECT "id", "user_id", "ip", "created_at" FROM visits
WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
/* 2 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
SELECT "id", "user_id", "ip", "created_at" FROM visits
WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
/* 3 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
SELECT "id", "user_id", "ip", "created_at" FROM visits
WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
pgslice analyze visits
ANALYZE VERBOSE visits_201608;
ANALYZE VERBOSE visits_201609;
ANALYZE VERBOSE visits_201610;
ANALYZE VERBOSE visits_intermediate;
pgslice swap visits
BEGIN;
SET LOCAL lock_timeout = '5s';
ALTER TABLE visits RENAME TO visits_retired;
ALTER TABLE visits_intermediate RENAME TO visits;
ALTER SEQUENCE visits_id_seq OWNED BY visits.id;
COMMIT;
To add partitions, use:
pgslice add_partitions <table> --future 3
Add this as a cron job to create a new partition each day or month.
# day
0 0 * * * pgslice add_partitions <table> --future 3 --url ...
# month
0 0 1 * * pgslice add_partitions <table> --future 3 --url ...
Add a monitor to ensure partitions are being created.
SELECT 1 FROM
pg_catalog.pg_class c
INNER JOIN
pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
c.relkind = 'r' AND
n.nspname = 'public' AND
c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
-- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')
Back up and drop older partitions each day or month.
pg_dump -c -Fc -t <table>_201609 $PGSLICE_URL > <table>_201609.dump
psql -c "DROP <table>_201609" $PGSLICE_URL
If you use Amazon S3 for backups, s3cmd is a nice tool.
s3cmd put <table>_201609.dump s3://<s3-bucket>/<table>_201609.dump
To undo prep (which will delete partitions), use:
pgslice unprep <table>
To undo swap, use:
pgslice unswap <table>
This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.
Before Postgres 10, if you use INSERT
statements with a RETURNING
clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record(s) back. If you need this, you can either:
- Insert directly into the partition
- Get value before the insert with
SELECT nextval('sequence_name')
(for multiple rows, appendFROM generate_series(1, n)
)
When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on created_at
, try to include it in queries:
SELECT * FROM
visits
WHERE
user_id = 123 AND
-- for performance
created_at >= '2016-09-01' AND created_at < '2016-09-02'
For this to be effective, ensure constraint_exclusion
is set to partition
(default value) or on
.
SHOW constraint_exclusion;
You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.
pgslice prep <table> --no-partition
pgslice fill <table> --where "id > 1000" # use any conditions
pgslice swap <table>
Once a table is partitioned, here’s how to change the schema:
- To add, remove, or modify a column, make the update on the master table only
- To add or remove an index, make the update on the master table and all partitions
Postgres 10 introduces declarative partitioning. A major benefit is INSERT
statements with a RETURNING
clause work as expected. If you prefer to use trigger-based partitioning instead (not recommended), pass the --trigger-based
option to the prep
command.
Run:
gem install pgslice
To use master, run:
gem install specific_install
gem specific_install https://github.com/ankane/pgslice.git
docker build -t pgslice .
alias pgslice="docker run --rm -e PGSLICE_URL pgslice"
This will give you the pgslice
command.
- Command to sync index changes with partitions
- Disable indexing for faster
fill
- ETA for
fill
Also check out:
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To run tests, do:
git clone https://github.com/ankane/pgslice.git
cd pgslice
bundle install
createdb pgslice_test
bundle exec rake