GithubHelp home page GithubHelp logo

marcusvieira88 / scaling-postgres Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 62 KB

This project has a goal to horizontal scale a Postgres database used in a Spring REST API with Citus extension.

Java 100.00%
postgres java spring citus scalability docker rest-api

scaling-postgres's Introduction

Scaling Postgres

This project has a goal to horizontal scale a Postgres database used in a Spring REST API with Citus extension.

Postgres can be scale vertically and horizontally.

In the vertical scale you increase the machine power (CPU, RAM, Network, etc..), it can be achieved easily in an AWS platform only changing the type of the RDS database instance.

The horizontal scaling is more complex because it involves more than one machine. When you scale horizontally you need to load balance the operations (queries, inserts, updates, etc..) between the machines, this type of scaling is used calling sharding.

At AWS you can reach only the read horizontally scaling increasing the number of read replicas (RDS Postgres has a limitation of max 5 replica instances). To scale the writes you need to use some service like Azure Hiperscale (Citus), it allows you to add many workers (shards) giving you the possibility to distribute your table between the workers, it has a coordinator responsible for load balance the writes between workers.

Also, these tools/services can help to scale Postgres: Pgpool, Postgres-xl, Pgbouncer and Greenplum.

Project REST API

Create product - POST http://localhost:8080/products/:

{
	"description":"Product 1",
	"units":10
}

Get product - GET http://localhost:8080/products/1

Update product - PUT http://localhost:8080/products/:

{
	"description":"Product 1 Update",
	"units":10
}

Delete product - DELETE http://localhost:8080/products/1.

Get products - http://localhost:8080/products:

[
    {
        "id": 1,
        "description": "Product 1 Update",
        "units": 10
    }
]

Run Scaling Project

Start Postgres Citus cluster:

docker-compose -p citus up

Scale citus workers:

docker-compose -p citus scale worker=5

Connect to citus coordinator:

docker exec -it citus_master psql -U postgres

List all databases:

\l

List all tables:

\dt

Connect to the database:

\c postgres 

Set the replication factor (number of workers that will keep a copy of table data):

SET citus.shard_replication_factor=2;

In another terminal start the spring application to create the Product table(spring.jpa.hibernate.ddl-auto=create) into Citus cluster:

mvn spring-boot:run

Distribute the Product table between workers:

SELECT create_distributed_table('public.products', 'id');

Now we need to add some data to execute our queries, for it you can use our Project REST API.

After insert some products you can check:

-- get distribution column name for products table
SELECT column_to_column_name(logicalrelid, partkey) AS dist_col_name
 FROM pg_dist_partition
 WHERE logicalrelid='products'::regclass;

-- get products table replication factor
SELECT logicalrelid AS tablename,
       count(*)/count(DISTINCT ps.shardid) AS replication_factor
  FROM pg_dist_shard_placement ps
  JOIN pg_dist_shard p ON ps.shardid=p.shardid
  GROUP BY logicalrelid;

-- get size of product table shards
SELECT *
  FROM run_command_on_shards('public.products', $cmd$
    SELECT json_build_object(
      'shard_name', '%1$s',
      'size',       pg_size_pretty(pg_table_size('%1$s'))
    );
  $cmd$);

-- find where some specific item is stored, you can see that we have 5 workers 
-- but the item is only in 2 of them, because our replication factor is 2
SELECT shardid, shardstate, shardlength, nodename, nodeport, placementid
  FROM pg_dist_placement AS placement,
       pg_dist_node AS node
 WHERE placement.groupid = node.groupid
   AND node.noderole = 'primary'
   AND shardid = (
     SELECT get_shard_id_for_distribution_column('products', 1)
   );

Shut down Postgres Citus cluster:

docker-compose -p citus down

Analyze Query Cost

To analize the cost of distributed query execute the explain command:

explain analyze select * from products;

Reference Documentation

Below the technologies used in this project:

scaling-postgres's People

Contributors

marcusvieira88 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.