GithubHelp home page GithubHelp logo

hge-stress-test's Introduction

stress test for the hasura graphql engine

this script allows you to configure and run workloads against a running graphql-engine instance, and produces an animated visualisation of memory usage. both bursty and sustained workloads are possible, and workloads can be configured to get heavier or lighter over time. in addition, it supports running workloads that do (for example) the same amount of "work" but with different amounts of concurrency.

workload structure

test workloads are of the following form, parameterised by the given options, which are all specified in the config file:

  • we run loop_count loops
    • each loop consists of bursts_per_loop bursts
      • each burst consists of requests_per_burst requests
        • the request is a graphql file located at the path given by payload
        • after each request, we wait for request_delay seconds
      • once all requests are sent, the burst is complete
      • if wait_for_bursts_to_complete, we wait for graphql-engine to respond to all requests made as part of the burst here
      • then we wait for burst_delay seconds
  • then we wait for all bursts to complete, so all requests made as part of the loop have been responded to by graphql-engine
  • then we wait for loop_delay before starting a new loop

requests_per_burst and bursts_per_loop are configured in a way that lets them linearly ramp up (or down) over time to simulate increasingly heavy workloads, with _min and _incr options in the config file. requests_per_burst is changed after each burst, and reset at the start of a new loop. bursts_per_loop is changed after each loop.

setting the _incr option to zero allows for workloads of constant intensity.

visualisation format

img

  • the marked regions on the graph correspond to periods in which a burst is being sent
  • each short line marker is a response received from graphql-engine
  • each green line corresponds to a burst which has been fully serviced by graphql-engine
  • the heading is a concise description of the workload parameters:
  rpb_min(+rpb_incr) reqs + req_delay
> bpl_min(+bpl_incr) bursts + burst_delay
> loop_count loops + loop_delay

workload quality

bursty workloads can be simulated with fewer, larger bursts and a large burst delay. for sustained load tests, burst delay can be set to zero.

to implement concurrency scaling, one can do, e.g.

bursts_per_loop_min = [2, 4, 8, 16]
requests_per_burst_min = [16, 8, 4, 2]

this will run the same number of requests in each loop, but with decreasing concurrency.

to avoid overloading the server with too many pending requests, set wait_for_bursts_to_complete to true.

setup instructions

stress.py expects a graphql-engine pid to monitor for memory usage, and a configuration file that defines the parameters for the test. by default, it expects a graphql-engine instance on port 9080.

# install python deps
$ pip install -r requirements.txt

# set up a postgres db on port 7432
$ docker run --rm --name hge2 -e "POSTGRES_PASSWORD=password" -p 7432:5432 -d postgres -c "log_statement=all"

# import the data dump
$ PGPASSWORD=password psql -h 172.17.0.1 -U postgres -d postgres -p 7432 --single-transaction -f dump.sql

# start hge, e.g. for 1.3.2 (client repro version):
$ docker run --rm -p 9080:8080 hasura/graphql-engine:v1.3.2 graphql-engine \
  --database-url='postgresql://postgres:[email protected]:7432/postgres' serve \
  --enable-telemetry false --enable-console --query-plan-cache-size 0 --use-prepared-statements 'false'

# or for main:
$ cabal new-run -- exe:graphql-engine \
  --database-url='postgres://postgres:password@localhost:7432/postgres' \
  serve --server-port 9080 --enable-console --console-assets-dir=../console/static/dist \
--enable-telemetry false --query-plan-cache-size 0 --use-prepared-statements 'false' \
--enabled-apis "developer,graphql,metadata"

# now import the metadata in the console. you can also do this, at least running from main:
$ curl -XPOST -d '{"type":"replace_metadata","args":'$(cat metadata.json)'}' "http://localhost:9080/v1/metadata"

# run the tests
$ ./stress.py config/bursty.toml $(pidof graphql-engine)

single-loop configurations are provided as well, to test varying concurrency manually, using a total of 64 requests:

# 2 bursts, 32 requests each
$ ./stress.py config/single/2-32.toml $(pidof graphql-engine)

# 4 bursts, 16 requests each
$ ./stress.py config/single/4-16.toml $(pidof graphql-engine)

# 8 bursts, 8 requests each
$ ./stress.py config/single/8-8.toml $(pidof graphql-engine)

hge-stress-test's People

Contributors

evertedsphere avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hge-stress-test's Issues

"burst latency" and general tail latencies

Just leaving this here for visibility.

When benchmarking constant loads we see zipf-ish tail latencies, where the max increases with the number of samples collected (it seems).

https://hasura.io/blog/decreasing-latency-noise-and-maximizing-performance-during-end-to-end-benchmarking/

  • is the burst throughput/latency metric just a reflection of this same tail latency phemonenon? Put another way: is the concept of "per-burst latency" another model we can use to motivate lowering tail latencies? (related to the more common example of the way that tail latencies affect UX on a web-page that makes many requests to render a single view)
  • Is there reason to expect that latencies should be distributed as they are? I have no non-hand-wavy explanation for the far outliers. Maybe the tests here provide some insight (e.g. does it suggest poor scheduling in the RTS in some way?)

constant burst delay

currently burst delay is actually implemented as "sleep for burst_delay after starting the burst process", which is silly because the burst takes time to run

this should be changed, but with an option that lets users recover the old behaviour in case that is useful for some tests that require workloads with decreasing burst delays

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.