familysearch / pewpew Goto Github PK

The HTTP load testing tool

Home Page: https://familysearch.github.io/pewpew

License: Apache License 2.0

Dockerfile 0.04% HTML 0.01% TypeScript 87.53% JavaScript 0.42% Shell 0.16% CSS 0.11% Rust 11.74%

pewpew's Introduction

Pewpew

Pewpew is an HTTP load test tool designed for ease of use and high performance. See the guide for details on its use. Also see the examples which run against the test-server.

There is also a system which can be deployed to the cloud to dynamically spin up test clients in AWS. It consists of a Controller written in Typescript and Next.js which can then start a test which runs on an Agent written in Typescript which then uploads the results to AWS S3 for viewing on the Controller.

Components

Binaries

Precompiled Pewpew binaries and WebAssemblies can be found under Releases

Pewpew executable is the base of this entire project. It is an HTTP load testing tools written in Rust. The code is in both the src folder which contains the actual binary code and the lib folder which contains the sub components used to build pewpew.
Config WebAssembly is a Rust WebAssembly used to validate yaml files before they are run. It will validate the yaml syntax is correct and all environment variables needed by the test are provided. Used by the Controller before tests are run and by the Agent as well
HDR Histogram WebAssembly is a RUST WebAssembly used by the Controller or Guide to display results in readable charts.
Test Server is a special executable that was created to make easy Bug Reports. It is a simple HTTP server that will bounce replies back and all Examples will run against it.

PPaaS (PewPew as a Service)

The PPaaS System works by having an always running Next.js Controller running which can be used to start a test. Starting a test will upload the yaml file, any required provider files, and any environment variables to AWS S3 (Simple Storage Service). The controller also puts a Message on an SQS Queue (Simple Queue Service) which then will cause a scaling event to bring up a Typescript Agent which then downloads the test files from S3, along with a Pewpew binary and runs the test. During the run, the agent will periodically upload results to Splunk, and check for updated files from the controller. At the completion of the test, it uploads the final results to S3 and notifies the Controller via SQS that it is complete. It then removes an SQS message from the SQS Scale In queue allowing the agent to be scaled in. A single Controller can be attached to multiple SQS queues and subsequent autoscale groups of agents. This allows you to have different sized instances depending on your load tests. A single controller could have one queue/autoscale group using t3 instances for small load tests or a c5n.18xlarge for very large load tests. Our recommended default is one of the network optomized instances (currently c5n.large). Shared code by the Controller and Agent is in the Common folder. This includes the code for accessing S3 and SQS. Logs are also written via bunyan using a log4j format that can be injested into a log parser such as Splunk.

AWS Resources

(Multiple) SQS SCALE_OUT_QUEUE - Start test messages are put on this queue to cause a scale out event and autoscale up an agent. Any new messages on the Scale Out queue will cause a new autoscale event. Each Agent autoscale group needs its own Scale out queue.
(Multiple) SQS SCALE_IN_QUEUE - Used for scaling in. Due to limitations in SQS, tests cannot reside on the SCALE_OUT_QUEUE for more than 12 hours and be kept hidden. To work around this limitation (and allow load tests longer than 12 hours), we use a separate queue for scaling out and scaling in. Because there may be multiple agents running tests and we have no way to determine which agent will be scaled in, only when the Scale In Queue is empty, will agents scale in. Each Agent autoscale group needs its own Scale in queue.
SQS COMMUNICATIONS_QUEUE - This queue is used by the agents to send update messages back to the Controller. Messages such as error, new results, completed, or failed messages can be sent. Only one Communications queue is needed by the controller that all agents and autoscale groups can use.
S3 Storage - All test files, results, and status files are written here. Different PewPew version binaries will also be uploaded here along with the file containing the calendar for the Controller.
Controller EC2 Instance/Autoscale - Due to the nature of the future scheduler, only one instance is supported at this time.
Agent EC2 Autoscale Group(s) - One or more autoscale groups with its own SCALE_OUT and SCALE_IN SQS queue. Each scale out creates a new instance running the agent code

pewpew's People

Contributors

Stargazers

Watchers

Forkers

elisnow tkmcmaster isabella232 olsonzacheryfs apachesep tw-martin

pewpew's Issues

pewpew 5.6 beta5 preview 4 exits immediately

Describe the bug

Running the new pewpew v.0.5.6 beta 5 preview 4 exits immediately on a 2 minute test without running anything. A try script works fine.

Expected behavior

Test should run

Config file

vars:
  rampTime: 1m
  loadTime: 1m
  totalTime: 2m
  serviceUrlAgent: ${SERVICE_URL_AGENT}
load_pattern:
  - linear:
      from: 1%
      to: 100%
      over: ${rampTime}
  - linear:
      from: 100%
      to: 100%
      over: ${loadTime}
config:
  client:
    headers:
      TestTime: '${epoch("ms")}'
      Accept: application/json
      FS-User-Agent-Chain: Performance Test
      User-Agent: Performance Test
  general:
    bucket_size: 1m
    log_provider_stats: 1m
endpoints:
  - method: GET
    url: http://${serviceUrlAgent}/healthcheck
    peak_load: 30hpm

Command to run

# export SERVICE_URL_AGENT="127.0.0.1:8080"
# pewpew run -w -f json createtest.yaml

or 

# pewpew run createtest.yaml

System info

Operating System: Ubuntu 18.04 and Windows 10
Pewpew version: pewpew 0.5.6-beta5-preview-4

Additional context

Output from run or run -f json

{"type":"start","msg":"Test will end around 10:42:20 26-Jun-2020 in approximately 2 minutes","binVersion":"0.5.6-beta5-preview-4"}

OR

Starting load test. Test will end around 10:42:59 26-Jun-2020 in approximately 2 minutes

Test Summary 10:40:00 to 10:41:00 26-Jun-2020
no data

Test Summary 10:40:59 to 10:42:59 26-Jun-2020
no data

Providers with auto_return allow a where clause

Describe the bug

Currently with auto_return you can have block, force, and if_not_full. But there's no way to turn it off if there is an error. We have many cases where we have to put a provides that just puts it back in only in certain conditions. It would be nice to have an option auto_return: where that then could provide a where clause like where: response.status < 300 (on success) or where: response.body=success.

Expected behavior

We can conditionally decide whether to auto_return.

Config file

providers:
  session:
    response:
      auto_return: where
      where: response.status < 300

Additional context

This also may alleviate some of the bugs around try runs with providers that are loopbacks to the same API.

ON_DEMAND does not work

Describe the bug

A clear and concise description of what the bug is.
ON_DMAND : true does not seem to be working in the latest version of pewpew

Expected behavior

A clear and concise description of what you expected to happen.
on demand should run as often as the provider is needed

Config file

Paste your simplified config file below.
https://github.com/fs-eng/SystemTestTools/blob/master/pewpewtests/Membership/UnitOnly.yaml

providers:
  a:
    response: {}

load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5s

loggers:
  test:
    to: stderr

vars:
  port: "${PORT}"


endpoints:
  - url: http://localhost:${port}
    peak_load: 1hps
    provides:
      a:
        select: 1
    on_demand: true
    
  - url: http://localhost:${port}?${a}
    logs:
      test:
        select: 1

Here is simplified version of issue https://github.com/fs-eng/SystemTestTools/blob/master/pewpewtests/Membership/UnitOnlyBUGREPRODUCEONDEMAND.yaml

Command to run

Enter below the full command used to execute Pewpew.

# command here
./pewpew run UnitOnlyBUGREPRODUCEONDEMAND.yaml -w

System info

Operating System: Linux
Pewpew version: 0.5.8 preview 3 from pewpew --version*
Seems to work on preview 2

Tracker Issue for merging new config/scripting system

Tracker Issue for things that need to be fixed for the https://github.com/OlsonZacheryFS/pewpew/tree/try-merge-new-config branch.

Config Parser "unreachable" error when logger filename has epoch

Describe the bug

Trying to parse a yaml file that has a logger filename with an epoch expression in it causes the config parser to throw an "unreachable" error

Expected behavior

The file works fine running in pewpew, it's purely a bug in the config parser. The file should parse

Config file

load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5s
loggers:
  timeLogger:
    to: 'test-${epoch("ms")}.csv'
endpoints:
  - method: GET
    url: http://localhost:8082
    peak_load: 1hps

Command to run

This is a bug in the config parser. Try to load the yaml file in the config parser.

System info

Operating System: Ubuntu, Windows 10
Pewpew version: 0.5.7 and 0.5.8-preview4

Additional context

We can currently work around this by clicking the "Bypass Config Parser" in PPaaS.

Stack trace:

Error loading config RuntimeError: unreachable
    at __rust_start_panic (wasm-function[23713]:0x5f2664)
    at rust_panic (wasm-function[21760]:0x5e4bc9)
    at std::panicking::rust_panic_with_hook::h4f753dc70b771d8e (wasm-function[4997]:0x44ce68)
    at std::panicking::begin_panic::{{closure}}::hce5f3f65f96f2f2a (wasm-function[21731]:0x5e4765)
    at std::sys_common::backtrace::__rust_end_short_backtrace::h706c23cb5ca53ba5 (wasm-function[21065]:0x5dd646)
    at std::panicking::begin_panic::h500a2937ff20cfd7 (wasm-function[21729]:0x5e4705)
    at std::sys::wasm::time::SystemTime::now::h780c8e91d0f652ad (wasm-function[23571]:0x5f20d9)
    at std::time::SystemTime::now::h92037ca9d5c1afb7 (wasm-function[23679]:0x5f2598)
    at config::expression_functions::Epoch::evaluate::h6570f5e8899733b3 (wasm-function[508]:0x224ec8)
    at config::select_parser::FunctionCall::evaluate::h45d43120f81d82c2 (wasm-function[121]:0x126ccc)

File Bodies no longer work after 0.5.6 beta 4

Describe the bug

File body's do not send the correct data and fail. I get timeouts or 504 from the server I'm testing, and the test-server just hangs and never responds.

Expected behavior

Using a file body in place of a string body should work seamlessly

Config file

vars:
  port: ${PORT}
load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5m
config:
  client:
    headers:
      Accept: '*/*'
  general:
    bucket_size: 1m
    log_provider_stats: 1m
endpoints:
  ###### PUT text ######
  - method: PUT
    url: 'http://localhost:${port}/'
    tags:
      type: put-text
      status: ${response.status}
    headers:
      Content-Type: 'text/plain'
    body: 'This is only a test'
    peak_load: 5hpm
  ###### PUT text ######
  - method: PUT
    url: 'http://localhost:${port}/'
    tags:
      type: put-text-file
      status: ${response.status}
    headers:
      Content-Type: 'text/plain'
    body:
      file: 'dist.txt'
    peak_load: 5hpm

dist.txt:

This is only a test

Command to run

$ PORT=8080 test-server &
$ pewpew try FileTest.yaml -i _id=0
$ pewpew try FileTest.yaml -i _id=1

System info

Operating System: Windows 10
Pewpew version: 0.5.7 and 0.5.8 preview 1

Additional context

It worked in 0.5.5, and 0.5.6 beta 1-4.
0.5.6 beta 5 on and 0.5.7 just hang. But 0.5.8 preview 1 returns this error from the test-server:

Request
========================================
PUT / HTTP/1.1
accept: */*
content-type: text/plain
host: localhost
content-length: 19

<<contents of file: dist.txt>>

Response (RTT: nullms)
========================================
null
null

null

Feature request: parseInt

Describe the bug

Currently after doing a match() we only have the ability to get strings out of the regex (even with (\d) capture groups). It would be very handy to be able to turn those back into numbers so we can use them for counters/incrementers/ranges

Expected behavior

We should be able to convert a string that is only numbers into a number

File provider stats continually climb

Describe the bug

Between version 0.5.6-beta2 and 0.5.6-beta4, the size and limit of the file providers now continually grows. It doesn't seem to leak any more memory than before, but we do still leak memory and crash.

Expected behavior

The file size doesn't change, so I wouldn't expect the provider to just grow and grow

Config file

RmsReadOnly.yaml

Command to run

Enter below the full command used to execute Pewpew.

# pewpew run -f json -w RmsReadOnly.yaml

System info

Operating System: Amazon Linux
Pewpew version: 0.5.6-beta4

Additional context

0.5.6-beta2 - PewPew Agent Dashboard
0.5.6-beta4 - PewPew Agent Dashboard

I was also able to repro this with my simpler Permissions file which only has about 3 API calls
PewPew Agent Dashboard
RmsGetPermissions.yaml

Final bucket output logged as Test Summary

Describe the bug

When running a test, the final partial bucket is logged to the console as Test Summary rather than Bucket Summary creating two Test Summary entries for each. In JSON we're getting two {"summaryType":"test"} at the end rather than a {"summaryType":"bucket"} and a {"summaryType":"test"}

Expected behavior

The final bucket should be output as type Bucket Summary followed by the true Test Summary

Config file

vars:
  port: ${PORT}
load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 90s
config:
  general:
    bucket_size: 1m
endpoints:
  - method: GET
    url: http://localhost:${port}/
    peak_load: 60hpm

Command to run

$ PORT=8080 test-server &

$ PORT=8080 pewpew run test.yaml -f json
# or
$ PORT=8080 pewpew run test.yaml

System info

Operating System: "Ubuntu 18.04" & "Windows 10"
Pewpew version: 0.5.6-beta5

Additional context

Console output:

Starting load test. Test will end around 09:42:50 25-Aug-2020 in approximately 1 minute and 30 seconds

Bucket Summary 09:41:00 to 09:42:00 25-Aug-2020

- GET http://localhost:8080/:
  calls made: 39
  status counts: {204: 39}
  p50: 1.228ms, p90: 1.733ms, p95: 10.767ms, p99: 329.215ms, p99.9: 329.215ms
  min: 0.333ms, max: 329.215ms, avg: 9.762ms, std. dev: 51.826ms  

Test will end around 09:42:50 25-Aug-2020 in approximately 50 seconds

Test Summary 09:42:00 to 09:43:00 25-Aug-2020

- GET http://localhost:8080/:
  calls made: 50
  status counts: {204: 50}
  p50: 0.976ms, p90: 1.469ms, p95: 1.604ms, p99: 1.765ms, p99.9: 1.765ms
  min: 0.397ms, max: 1.765ms, avg: 1.015ms, std. dev: 0.353ms     

Test Summary 09:41:20 to 09:42:50 25-Aug-2020

- GET http://localhost:8080/:
  calls made: 89
  status counts: {204: 89}
  p50: 1.072ms, p90: 1.58ms, p95: 1.733ms, p99: 329.215ms, p99.9: 
329.215ms
  min: 0.333ms, max: 329.215ms, avg: 4.848ms, std. dev: 34.581ms

{"type":"start","msg":"Test will end around 09:56:41 25-Aug-2020 in approximately 1 minute and 30 seconds","binVersion":"0.5.6-beta5"}
{"type":"summary","startTime":1598370900,"timestamp":1598370960,"summaryType":"bucket","method":"GET","url":"http://localhost:8080/","callCount":48,"statusCounts":[{"status":204,"count":48}],"requestTimeouts":0,"testErrors":[],"testErrorCount":0,"p50":1.011,"p90":1.504,"p95":1.781,"p99":315.391,"p99_9":315.391,"min":0.331,"max":315.391,"mean":7.564,"stddev":44.884,"tags":{"_id":"0"}}
{"type":"summary","startTime":1598370960,"timestamp":1598371020,"summaryType":"test","method":"GET","url":"http://localhost:8080/","callCount":41,"statusCounts":[{"status":204,"count":41}],"requestTimeouts":0,"testErrors":[],"testErrorCount":0,"p50":1.161,"p90":1.637,"p95":1.649,"p99":1.879,"p99_9":1.879,"min":0.418,"max":1.879,"mean":1.126,"stddev":0.37,"tags":{"_id":"0"}}
{"type":"summary","startTime":1598370911,"timestamp":1598371001,"summaryType":"test","method":"GET","url":"http://localhost:8080/","callCount":89,"statusCounts":[{"status":204,"count":89}],"requestTimeouts":0,"testErrors":[],"testErrorCount":0,"p50":1.099,"p90":1.559,"p95":1.781,"p99":315.391,"p99_9":315.391,"min":0.331,"max":315.391,"mean":4.598,"stddev":33.119,"tags":{"_id":"0"}}

on_demand api's exit when provider empties rather than when relying api's are finished

Describe the bug

The test exits early (with no message as to why) when a provider runs out to an on_demand api.

Expected behavior

The test should continue to run until non-on_demand api calls can't get data. Also, it should log why it exited early (provider ran out)

Config file

vars:
  port: ${PORT}
load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5m
config:
  client:
    headers:
      Accept: 'application/json'
  general:
    bucket_size: 1m
    log_provider_stats: 1m
providers:
  counterProvider:
    range:
      start: 1
      end: 10
      repeat: false
  response1:
    response: {}
  response2:
    response: {}
endpoints:
  - method: GET
    url: 'http://localhost:${port}?count=${counterProvider}'
    on_demand: true
    peak_load: 60hpm
    provides:
      response1:
        select: counterProvider
        where: response.status <= 300
        send: force
  - method: GET
    url: 'http://localhost:${port}?count=${response1}'
    on_demand: true
    peak_load: 60hpm
    provides:
      response2:
        select: response1
        where: response.status <= 300
        send: force
  - method: GET
    url: 'http://localhost:${port}?count=${response2}'
    peak_load: 60hpm

Command to run

$ PORT=8085 pewpew run ProviderEndsEarly.yaml

System info

Operating System: Windows 10
Pewpew version: 5.7 and 5.8 preview 1

Additional context

It seems inconsistent. Occasionally I get 10 calls on each api call, but most runs the final API only runs 9 times.

$ PORT=8085 pewpew run ProviderEndsEarly.yaml
Starting load test. Test will end around 10:15:32 28-Oct-2020 in approximately 5 minutes

Bucket Summary 10:10:00 to 10:11:00 28-Oct-2020

- GET http://localhost:8085?count=*:
  calls made: 10
  status counts: {204: 10}
  p50: 0.741ms, p90: 1.439ms, p95: 330.495ms, p99: 330.495ms, p99.9: 330.495ms
  min: 0.459ms, max: 330.495ms, avg: 33.744ms, std. dev: 98.875ms

- GET http://localhost:8085?count=*:
  calls made: 10
  status counts: {204: 10}
  p50: 0.878ms, p90: 1.492ms, p95: 2.133ms, p99: 2.133ms, p99.9: 2.133ms
  min: 0.662ms, max: 2.133ms, avg: 1.051ms, std. dev: 0.413ms

- GET http://localhost:8085?count=*:
  calls made: 9
  status counts: {204: 9}
  p50: 0.877ms, p90: 1.349ms, p95: 1.349ms, p99: 1.349ms, p99.9: 1.349ms
  min: 0.686ms, max: 1.349ms, avg: 0.904ms, std. dev: 0.197ms

Test will end around 10:10:42 28-Oct-2020 in approximately 0 seconds

Test Summary 10:10:32 to 10:15:32 28-Oct-2020

- GET http://localhost:8085?count=*:
  calls made: 10
  status counts: {204: 10}
  p50: 0.741ms, p90: 1.439ms, p95: 330.495ms, p99: 330.495ms, p99.9: 330.495ms
  min: 0.459ms, max: 330.495ms, avg: 33.744ms, std. dev: 98.875ms

- GET http://localhost:8085?count=*:
  calls made: 10
  status counts: {204: 10}
  p50: 0.878ms, p90: 1.492ms, p95: 2.133ms, p99: 2.133ms, p99.9: 2.133ms
  min: 0.662ms, max: 2.133ms, avg: 1.051ms, std. dev: 0.413ms

- GET http://localhost:8085?count=*:
  calls made: 9
  status counts: {204: 9}
  p50: 0.877ms, p90: 1.349ms, p95: 1.349ms, p99: 1.349ms, p99.9: 1.349ms
  min: 0.686ms, max: 1.349ms, avg: 0.904ms, std. dev: 0.197ms

pewpew 5.6 beta 4 hangs consistently on test after about 30 minutes

Describe the bug

Our Ingest test (that tests a bunch of POSTs and PUTs ) which uses a lot of providers to pass data worked fine on 0.5.6 beta 2 but has been consistently hanging after 30ish minutes and no longer writes to stdout, stderr, or sends any network traffic.

Expected behavior

The test should run for the full duration and exit gracefully.

Config file

RmsIngest.yaml
RmsAll.yaml

Command to run

# pewpew -f json -w RmsIngest.yaml

System info

Operating System: Amazon Linux
Pewpew version: 0.5.6-beta4

Additional context

Ingest Only Run
All Apis Run
All Apis Run

Those are the runs that all hung up on beta4. Ron was trying to run the tests yesterday and kept pinging me to see what was going on. I finally suggested he try the 9xl and just run it all on one box rather than try to synchronize two runs (the ingest can't be restarted on crash). Since we've moved back to beta2 we haven't had ingest hang on us.

Try script never exits on long provider chains

Describe the bug

Try script never exits on long provider chains

Expected behavior

Try script should exit once all endpoints have run once (or the one specified by -i)

Config file

provider_chain.yaml

providers:
  a:
    range: {}
  b:
    response: {}
  c:
    response: {}
  d:
    response: {}
  e:
    response: {}
  f:
    response: {}

load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5s

loggers:
  test:
    to: stderr

vars:
  port: "${PORT}"


endpoints:
  - method: POST
    url: http://localhost:${port}
    body: '{"a": ${a}}'
    provides:
      b:
        select: response.body.a
    on_demand: true

  - method: POST
    url: http://localhost:${port}
    body: '{"b": ${b}}'
    provides:
      c:
        select: response.body.b
    on_demand: true

  - method: POST
    url: http://localhost:${port}
    body: '{"c": ${c}}'
    provides:
      d:
        select: response.body.c
    on_demand: true

  - method: POST
    url: http://localhost:${port}
    body: '{"d": ${d}}'
    provides:
      e:
        select: response.body.d
    on_demand: true

  - method: POST
    url: http://localhost:${port}
    body: '{"e": ${e}}'
    provides:
      f:
        select: response.body.e
    on_demand: true

  - method: POST
    url: http://localhost:${port}
    body: '{"f": ${f}}'
    peak_load: 1hps
    logs:
      test:
        select: response.body.f

provider_loop.yaml

providers:
  a:
    range: {}
  b:
    response: {}
  c:
    response: {}
  d:
    response: {}

load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 5s

loggers:
  test:
    to: stderr

vars:
  port: "${PORT}"


endpoints:
  - method: POST
    url: http://localhost:${port}
    body: '{"a": ${a}}'
    peak_load: 1hps
    provides:
      b:
        select: response.body.a

  - method: POST
    url: http://localhost:${port}
    body: '{"b": ${b}}'
    peak_load: 5hps
    provides:
      b:
        select: b # Put it back on 'b' to reuse

  - method: POST
    url: http://localhost:${port}
    body: '{"b": ${b}}'
    peak_load: 1hps
    provides:
      c:
        select: response.body.b # take and put it on 'c'

  - method: POST
    url: http://localhost:${port}
    body: '{"c": ${c}}'
    peak_load: 5hps
    provides:
      c:
        select: c # Put it back on 'c' to reuse

  - method: POST
    url: http://localhost:${port}
    body: '{"c": ${c}}'
    peak_load: 1hps
    provides:
      d:
        select: response.body.c # take and put it on 'd'

  - method: POST
    url: http://localhost:${port}
    body: '{"d": ${d}}'
    peak_load: 1hps
    logs:
      test:
        select: response.body.d

Command to run

Enter below the full command used to execute Pewpew.

PORT=8084 test-server &

PORT=8084 pewpew try provider_chain.yaml
or
PORT=8084 pewpew try provider_chain.yaml -i _id=5

PORT=8084 pewpew try provider_loop.yaml
or
PORT=8084 pewpew try provider_loop.yaml -i _id=5

System info

Operating System: "Ubuntu 20.04"
Pewpew version: pewpew 0.5.12

Additional context

Checking old versions, it last worked in 0.5.7 and has been broken in 0.5.8 forward. It will run in pewpew run, but hangs on pewpew try

Turn on log_provider_stats by default

Describe the bug

When trying to debug issues, many users do not turn on log_provider_stats. We should turn it on by default and have an "off" option if they need to disable it for performance reasons.

Expected behavior

log_provider_stats: 1m should be the default and we should be able to modify it or set log_provider_stats: off or log_provider_stats: false if we don't want it on.

endpoints that receive from providers with no peak_load grow to insane levels of waitingToReceive

Describe the bug

Trying to have an endpoint that only processes data in rare (error) cases with no peak_load (i.e. "on_demand") increases to insanely high levels of waitingToReceive, 20000 in a matter of minutes.

Expected behavior

If it's only "on demand" to receive it should have maybe up to 5 waiting threads in case there's data, but we should have a cap on the number of threads if a provider (any provider) is not getting data.

Config file

vars:
  port: ${PORT}
  ingestName: 'test:loadtest-'
load_pattern:
  - linear:
      from: 100%
      to: 100%
      over: 50m
config:
  client:
    headers:
      TestTime: '${epoch("ms")}'
      Content-Type: application/json
      Accept: application/json
  general:
    bucket_size: 1m
    log_provider_stats: 1m
providers:
  imageCreateId:
    range:
      start: 1
  groupImageParentRelationship:
    response: {}
endpoints:
  ###### image POST ######
  - method: POST
    url: http://localhost:${port}/
    body: '{
        "id": "${ingestName}${start_pad(imageCreateId, 10, "0")}"
      }'
    peak_load: 3000hpm
    provides:
      groupImageParentRelationship:
        select:
          imageId: '`${ingestName}${start_pad(imageCreateId, 10, "0")}`'
        where: response.status == 409
        send: if_not_full
  ###### group put children ###### Only if we get 409s on the child
  - method: PUT
    url: http://localhost:${port}/
    body: '{
        "${groupImageParentRelationship.imageId}": ""
      }'
    # on_demand: true
    # max_parallel_requests: 10

Command to run

PORT=4000 pewpew run -f json RmsIngestWaitingToReceive.yaml

System info

Operating System: Amazon Linux and Windows 10
Pewpew version: pewpew 0.5.7

Additional context

Output after only a few minutes:

{"timestamp":1601052840,"provider":"groupImageParentRelationship","len":0,"limit":5,"receiverCount":1,"senderCount":1,"waitingToSend":0,"waitingToReceive":22975}
{"timestamp":1601052840,"provider":"imageCreateId","len":5,"limit":5,"receiverCount":1,"senderCount":1,"waitingToSend":1,"waitingToReceive":0}

Actual test: https://familysearch.splunkcloud.com/en-US/app/QA/pewpew_agent_dashboard?form.envSelector=rmsallstage20200924T164606294&form.timeSelector.earliest=1600966220&form.timeSelector.latest=1600979986 is up to 38 million waitingToReceive by the end of the test.