smartcat-labs / berserker Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 8.0 635 KB

Berserker is load generator with pluggable input source and configurable output.

License: Apache License 2.0

Java 94.94% Shell 2.59% Groovy 2.47%

berserker's People

Contributors

Stargazers

Watchers

Forkers

ericorange sggncucumber jayarampradhan rjagirdar hemanthkilari davidpean mm86133 guillaumerosinosky

berserker's Issues

Example configuration files do not work

All examples should work with no extra effort on Java 8 - 12.

CSV data source support through configuration file

Currently, CSV data source is supported only through Java code. However, that solution is not good enough since CSV data source cannot be used within configuration file. This is a proposal how CSV data source configuration can be used:

data-source-configuration:
  parser:
    file: # mandatory
    delimiter: # optional, defaults to ','
    record-separator: # optional, defaults to '\n'
    trim: # optional, defaults to true
    quote: # optional, defaults to null, usually is " or ', but can be any character
    comment-marker: # optional, defaults to '#' and is interpreted at the beginning of line where that line is ignored
    skip-header-record: # optional, defaults to false, can skip first row
    ignore-empty-lines: # optional, defaults to true
    null-string: # optional, set char sequence which will be mapped to String null, defaults to not set
  mapping:
    user:
      firstName: $c0
      lastName: $c1
      age: $c2
  output: $user

parser section would contain parameters relevant for parsin the CSV file
output section would contain final output value of this data source, similar to output in Ranger data source
mapping section would contain custom structure definition where values that can be referenced are columns of CSV. $c0 represents first column, $c1 represents second column and so on...

Additional: mapping section might contain Ranger constructs which will provide rich value creation with combination of Ranger's random values and values taken from CSV file.

Test HTTP worker performance

Test HTTP worker performance with current apache http client library, and also with one that gatling is using (https://github.com/AsyncHttpClient/async-http-client). Choose faster.

Create Cassandra worker

Invalid URL for downloading Berserker Runner in README.md

Link for downloading Berserker Runner JAR (near the bottom of README.md) needs to be updated, as it leads to non-existing file on bintray.

Rate generator configuration

The following is one possible way to implement rate generator configuration.

Configuration is meant to be used in YAML conf file.
Using a util YAML class, the configuration would be relocatable, that is, an arbitrary configuration root could be used. A JasonPath could be used for selecting the configuration root (e.g. $.rate.generator[1].options).
There is a list of base rate generators:
- impulse (interval, ratio)
- sow-like (interval, ratio, tip ratio)
- sine (interval)
- random (interval, distribution)
- limiter(cap, generator)
- bell-shape (interval, median, variance)
- circular-discrete (domain, interval)
- exponential
- logarithmic
All base generators have normalized output (0.0 .. 1.0).
The final/output rate is specified as an expression that combines outputs of base generators:

2000+1000*sin(interval)+50.0*random(interval2, uniform())

Syntax for specifying time intervals should support expressions like: 2d30m (two days and 30 minutes), 5s (five seconds), 24h2m20s.
Configuration should provide a simplified syntax for simple cases.
Configuration should allow for specifying generator time offset. As all generators accept relative time (starting with 0), the time offset parameter can be used to translate generator function as needed.

Rate generator examples:

rate-gen-config:
  rate: 2000 + 1000*sin(6d30m)

rate-gen-config:
    offset: 24h
    rate:
      arate: 2000 + 1000*sin(2d2h50m10s500)
      brate: 5000*bell(7d, 2d) + 50*rnd(60s)
      output: 0.6*$arate + 0.3@brate

Potential optimizations of load generator

These are just ideas with no clear action points for now

See if load generator is taking to much processor time in used with async worker.
a. Introduce sleep by default? On how long?
b. Introduce adaptive sleep if necessary? How?
Try to minimize unnecessary object creation and lower memory footprint
a. Currently in AsyncWorker large number of WorkerMeta objects are created, currently no idea how to avoid it.

Expected response status for HTTP worker

The HTTP worker's configuration should allow for specifying an HTTP status code for response. This is needed to support valid responses that don't have the default response code.

Load profile

This topic, and following discussion, is expected to help shaping of a more generic and robust architecture of Berserker.
The main purpose of the load generator is to generate load. The total, or output load of the load generator can be described and configured as a cumulative load of simple, elementary load generators.

Ltot = Lp1 + Lp2 + Lp3

Every elementary load generator can be described by a load profile. Load profile is a pair of load rate (produced by a rate generator) and data source (e.g. Ranger data generator).

This approach should provide a way to simulate total load in a multi-tenant system where behavior of particular tenants can significantly vary. A single (or several) load profiles can be used to describe a single tenant load.

load-profiles:
    - rate:
          generator: 2000 + 1000.05*sin(2h50m)
      data:
          type: ranger
          options:
              values:
                   id: uuid()
                   msg:
                       id: @id
                       payload: rlen(100...1000)
               output: msg

    -  rate:
          offset: 24h
          generator:
              a: 2000 + 1000*bell(3d, 12h)
              b: 5000*bell(7d, 2d) + 50*random(60s)
              output: @a + @b
        data:
           type: ranger
           options:
                 values:
                     msg:
                         id: uuid()
                         payload: rlen(100...1000)
                 output: msg

Support both sync and async HTTP workers

Current HTTP worker is asynchronous which prevents it from measuring response time of HTTP endpoints, worker should support both async and sync modes.

Revisit and design metrics gathering

Currently, AsyncWorker support metrics acquisition with Consumer<WorkerMeta> workerStatsGetherer interface. That might be good for someone wanting to build custom metrics reporting around it. But is it not good enough for ranger-runner and configuration based execution. Some implementations need to be built around it. Features to support:

95th percentile
99th percentile
99.9 percentile
... have it maybe configurable
if packets are dropped, percent of dropped packets
Wait time, service time, response time
Report it every second or so
Maybe have metrics interface to collect and calculate metrics and send it to some reporter

Update README.md

Please update the main README.md with the latest information about how to run berserker and with the latest configuration file changes.

Improve documentation for configuration

Berserker's configuration options are not documented properly. Each and every configuration option should have proper explanation.

For example, following configuration option is not clear:

rate-generator-configuration:
  rate: 1000

rate is quite ambiguous and can have different meanings.

Implement worker configuration

Support HdrHistogram with coordinated omission within dropwizard metrics

Technical Approach

To retain dropwizard metrics API and ability to use its open source reporters, HdrHistogram needs to be implemented within gauges. Create several gauges, each for characteristic percentile(90%, 95%, 99%, 99.9%, 99.99%, 100%) and register them within MetricRegistry.

Implement rate generator configuration

Process to use load generate to generate new load and push to Worker

Is there a yaml configuration which can be used to push newly generated load to Kafka,Cassandra without using Data Source?

Request timeout configuration for HTTP worker

The HTTP worker's configuration should support specifying an explicit timeout value.

Enable logging of request and response

It should be possible to log the actual queries (requests) that are generated and response for each of the query.

This is useful for several reasons:

to (sanity) check that the queries are generated is what we want
to (sanity) check that returned response is what we want

This can be implemented in two steps:

enable logging of requests and response (without providing the link for it, just plain logging of all the requests and responses)
each response should reference a request somehow, so that it is possible to link the request and response

Separate Exception i Configuration classes in berserker-common repo

In configuration package, all custom Exception and Configuration classes are placed in the same directory.We should create two sub-packages one for Exception and one for Configuration classes.

Create JDBC worker

Create HTTP Worker

Worker should support following:

base url
port (with default to 80)
global headers which will be propagated with every request (this should allow for a very primitive type of authentication)
per-request headers
method type
url sufix or complete url if base url is not specified
body content

Support empty composite value

There are situations where empty composite value can come in handy. For example:
data: user: firstName: random("John","Adam", "Peter") lastName: random("Doe", "Jackson", "Smith") fields: {} output: json($data)
Generated JSON could look like:
{ "user": { "firstName": "Peter", "lastName": "Smith" }, "fields": {} }
In this case, we want to have fields object, just we want it to be empty.

Technical approach:

This can be easily achieved by removing values.isEmpty() check from
CompositeValue at line if (values == null || values.isEmpty()) {

Logs Output as Bar or Line Graph

can we direct all the stats to a log file on which some script can make a graph on stats for better output? Some thing like YCSB which creates HDR files which can be loaded to hdr-histogram plotter for visualizing.

Provide public Docker images for deployment

Distributed load

One possible approach to implement load generator capable of producing high load would be to have several load generators that run in parallel and produce a combined load. Each generator produces load data for itself, that is, there's no a central place that prepares data and distributes it to the load generators. However, a central coordinator is necessary to orchestrate them - to distribute configuration and load profiles, to start and stop them, to collect metrics data (e.g. calculating coordinated omission).

Fix dependency vulnarabilities

Github reports dependency vulnerabilities which should be fixed. also, build plugin that checks that should be introduced in order to guarantee vulnerability free version.

Support limit on number of created messages

Create possibility to generate large messages

In order to test the performance of stream processing frameworks, we need messages of 1MB and 10MB with a rate of 500, 5000 and 50000 messages per second.

With the current implementation, we can use:

randomContentString() function to generate string of predefined size, with size of 1000000 characters berserker was able to create ~60 messages per sec
to use a predefined string of large size, with a string of 1MB it took 15 minutes to start producing messages, berserker was able to produce ~300 messages per second

Two possible approaches:

use the content of a predefined file as value for the field
only once generate the string of desired length and reuse it

Cassandra Worker

Provide cassandra worker implementation.

It should support following configuration options:

Consistency level (still to decide on whether it will be global, or per request)
Bootstrap commands (create keyspace, create tables) this should execute once at the worker initialization
Cassandra hosts (servers to connect to)
Async/sync execution of CQL queriees
Desing wise, it would be better to have list of prepared statements where data source would provide map of values to inject into prepared statement and prepared statement id to use

Errors not logged when berserker is run in async mode

Usually when Berserker configuration is prototyped, errors can occur rendering berserker not to work as expected. Error messages in form of exceptions in log are visible when Berserker is running in sync mode, but when running in async mode, errors are not visible.
Errors should be visible in async mode as well.

Option to reference data source, rate generator and worker configuration from other file

Configuration sections for data source, rate generator and worker can be fairly complex and big.
It might be good to have option to keep them in separate file and just reference file from this main configuration.

Improve output of the runner

Once load generator is started from the command line the output of the program stops with:

11:17:49.116 [main] INFO io.smartcat.berserker.LoadGenerator - Load generator started.

The output should be more informative. For example it can report on every 1k of rows created.

Take a look at cassandra-stress tool for inspiration.

Remove KafkaPayload and make KafkaWorker uses Map

Implement graphite metrics reporter

Support other serializers for Kafka worker

Currently, only StringSerializer is supported, Avro and/or other serializers.

Can you add a detailed page to push a CSV load to Kafka

I need sample yaml for CSV to Kafka load and a bit of description of the configuration

Revisit metrics reservoir implementation

Currently, metrics use default reservoir which is ExponentiallyDecayingReservoir.
Depending on implementation of #23, this might be already taken care of. If not, usage of SlidingTimeWindowReservoir should be considered, or even better configuration option to setup any of the following:

ExponentiallyDecayingReservoir
SlidingTimeWindowReservoir
SlidingWindowReservoir
UniformReservoir

Make metrics unit configurable

Currently, metrics returned by berserer are in nano seconds. That should be made configurable to support milliseconds and seconds as well.

Unclear berserker metrics

I don't really understand the time unit I get when monitoring kafka performances by BerserkerMetrics.
Furthermore, if I specify a load test with a rate generator of 300, i can clearly see that my computer cannot spawn more than 41 requests. What am I doing wrong?

How to use Oath or similar flow with berserker?

Some APIs have Oath flow where it is necessary to obtain a token for a user.
In that case the flow would go like this:

create or authenticate user(s) -> obtain token(s) from response -> use the token(s) in all requests

This is very common flow which hasn't been taken into consideration so far.

Create support for time limit when generating messages

Create configuration option for Berserker to support time limit when generating messages. E.g. "generate messages for 15 minutes", or "generate messages until 3:20PM"

smartcat-labs / berserker Goto Github PK

berserker's People

Contributors

Stargazers

Watchers

Forkers

berserker's Issues

Technical Approach

Technical approach:

Recommend Projects

Recommend Topics

Recommend Org

Jobs