GithubHelp home page GithubHelp logo

smartcat-labs / berserker Goto Github PK

View Code? Open in Web Editor NEW
52.0 52.0 8.0 635 KB

Berserker is load generator with pluggable input source and configurable output.

License: Apache License 2.0

Java 94.94% Shell 2.59% Groovy 2.47%

berserker's People

Contributors

dependabot[bot] avatar ericorange avatar mgobec avatar milannister avatar nivancevic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

berserker's Issues

CSV data source support through configuration file

Currently, CSV data source is supported only through Java code. However, that solution is not good enough since CSV data source cannot be used within configuration file. This is a proposal how CSV data source configuration can be used:

data-source-configuration:
  parser:
    file: # mandatory
    delimiter: # optional, defaults to ','
    record-separator: # optional, defaults to '\n'
    trim: # optional, defaults to true
    quote: # optional, defaults to null, usually is " or ', but can be any character
    comment-marker: # optional, defaults to '#' and is interpreted at the beginning of line where that line is ignored
    skip-header-record: # optional, defaults to false, can skip first row
    ignore-empty-lines: # optional, defaults to true
    null-string: # optional, set char sequence which will be mapped to String null, defaults to not set
  mapping:
    user:
      firstName: $c0
      lastName: $c1
      age: $c2
  output: $user

parser section would contain parameters relevant for parsin the CSV file
output section would contain final output value of this data source, similar to output in Ranger data source
mapping section would contain custom structure definition where values that can be referenced are columns of CSV. $c0 represents first column, $c1 represents second column and so on...

Additional: mapping section might contain Ranger constructs which will provide rich value creation with combination of Ranger's random values and values taken from CSV file.

Rate generator configuration

The following is one possible way to implement rate generator configuration.

  1. Configuration is meant to be used in YAML conf file.
  2. Using a util YAML class, the configuration would be relocatable, that is, an arbitrary configuration root could be used. A JasonPath could be used for selecting the configuration root (e.g. $.rate.generator[1].options).
  3. There is a list of base rate generators:
    - impulse (interval, ratio)
    - sow-like (interval, ratio, tip ratio)
    - sine (interval)
    - random (interval, distribution)
    - limiter(cap, generator)
    - bell-shape (interval, median, variance)
    - circular-discrete (domain, interval)
    - exponential
    - logarithmic
  4. All base generators have normalized output (0.0 .. 1.0).
  5. The final/output rate is specified as an expression that combines outputs of base generators:
2000+1000*sin(interval)+50.0*random(interval2, uniform())
  1. Syntax for specifying time intervals should support expressions like: 2d30m (two days and 30 minutes), 5s (five seconds), 24h2m20s.
  2. Configuration should provide a simplified syntax for simple cases.
  3. Configuration should allow for specifying generator time offset. As all generators accept relative time (starting with 0), the time offset parameter can be used to translate generator function as needed.

Rate generator examples:

rate-gen-config:
  rate: 2000 + 1000*sin(6d30m)
rate-gen-config:
    offset: 24h
    rate:
      arate: 2000 + 1000*sin(2d2h50m10s500)
      brate: 5000*bell(7d, 2d) + 50*rnd(60s)
      output: 0.6*$arate + 0.3@brate

Potential optimizations of load generator

These are just ideas with no clear action points for now

  1. See if load generator is taking to much processor time in used with async worker.
    a. Introduce sleep by default? On how long?
    b. Introduce adaptive sleep if necessary? How?
  2. Try to minimize unnecessary object creation and lower memory footprint
    a. Currently in AsyncWorker large number of WorkerMeta objects are created, currently no idea how to avoid it.

Expected response status for HTTP worker

The HTTP worker's configuration should allow for specifying an HTTP status code for response. This is needed to support valid responses that don't have the default response code.

Load profile

This topic, and following discussion, is expected to help shaping of a more generic and robust architecture of Berserker.
The main purpose of the load generator is to generate load. The total, or output load of the load generator can be described and configured as a cumulative load of simple, elementary load generators.

Ltot = Lp1 + Lp2 + Lp3

Every elementary load generator can be described by a load profile. Load profile is a pair of load rate (produced by a rate generator) and data source (e.g. Ranger data generator).

This approach should provide a way to simulate total load in a multi-tenant system where behavior of particular tenants can significantly vary. A single (or several) load profiles can be used to describe a single tenant load.

load-profiles:
    - rate:
          generator: 2000 + 1000.05*sin(2h50m)
      data:
          type: ranger
          options:
              values:
                   id: uuid()
                   msg:
                       id: @id
                       payload: rlen(100...1000)
               output: msg

    -  rate:
          offset: 24h
          generator:
              a: 2000 + 1000*bell(3d, 12h)
              b: 5000*bell(7d, 2d) + 50*random(60s)
              output: @a + @b
        data:
           type: ranger
           options:
                 values:
                     msg:
                         id: uuid()
                         payload: rlen(100...1000)
                 output: msg

Revisit and design metrics gathering

Currently, AsyncWorker support metrics acquisition with Consumer<WorkerMeta> workerStatsGetherer interface. That might be good for someone wanting to build custom metrics reporting around it. But is it not good enough for ranger-runner and configuration based execution. Some implementations need to be built around it. Features to support:

95th percentile
99th percentile
99.9 percentile
... have it maybe configurable
if packets are dropped, percent of dropped packets
Wait time, service time, response time
Report it every second or so
Maybe have metrics interface to collect and calculate metrics and send it to some reporter

Update README.md

Please update the main README.md with the latest information about how to run berserker and with the latest configuration file changes.

Improve documentation for configuration

Berserker's configuration options are not documented properly. Each and every configuration option should have proper explanation.

For example, following configuration option is not clear:

rate-generator-configuration:
  rate: 1000

rate is quite ambiguous and can have different meanings.

Enable logging of request and response

It should be possible to log the actual queries (requests) that are generated and response for each of the query.

This is useful for several reasons:

  1. to (sanity) check that the queries are generated is what we want
  2. to (sanity) check that returned response is what we want

This can be implemented in two steps:

  1. enable logging of requests and response (without providing the link for it, just plain logging of all the requests and responses)
  2. each response should reference a request somehow, so that it is possible to link the request and response

Create HTTP Worker

Worker should support following:

  • base url
  • port (with default to 80)
  • global headers which will be propagated with every request (this should allow for a very primitive type of authentication)
  • per-request headers
  • method type
  • url sufix or complete url if base url is not specified
  • body content

Support empty composite value

There are situations where empty composite value can come in handy. For example:
data: user: firstName: random("John","Adam", "Peter") lastName: random("Doe", "Jackson", "Smith") fields: {} output: json($data)
Generated JSON could look like:
{ "user": { "firstName": "Peter", "lastName": "Smith" }, "fields": {} }
In this case, we want to have fields object, just we want it to be empty.

Technical approach:

This can be easily achieved by removing values.isEmpty() check from
CompositeValue at line if (values == null || values.isEmpty()) {

Logs Output as Bar or Line Graph

can we direct all the stats to a log file on which some script can make a graph on stats for better output? Some thing like YCSB which creates HDR files which can be loaded to hdr-histogram plotter for visualizing.

Distributed load

One possible approach to implement load generator capable of producing high load would be to have several load generators that run in parallel and produce a combined load. Each generator produces load data for itself, that is, there's no a central place that prepares data and distributes it to the load generators. However, a central coordinator is necessary to orchestrate them - to distribute configuration and load profiles, to start and stop them, to collect metrics data (e.g. calculating coordinated omission).

Fix dependency vulnarabilities

Github reports dependency vulnerabilities which should be fixed. also, build plugin that checks that should be introduced in order to guarantee vulnerability free version.

Create possibility to generate large messages

In order to test the performance of stream processing frameworks, we need messages of 1MB and 10MB with a rate of 500, 5000 and 50000 messages per second.

With the current implementation, we can use:

  • randomContentString() function to generate string of predefined size, with size of 1000000 characters berserker was able to create ~60 messages per sec
  • to use a predefined string of large size, with a string of 1MB it took 15 minutes to start producing messages, berserker was able to produce ~300 messages per second

Two possible approaches:

  • use the content of a predefined file as value for the field
  • only once generate the string of desired length and reuse it

Cassandra Worker

Provide cassandra worker implementation.

It should support following configuration options:

  • Consistency level (still to decide on whether it will be global, or per request)
  • Bootstrap commands (create keyspace, create tables) this should execute once at the worker initialization
  • Cassandra hosts (servers to connect to)
  • Async/sync execution of CQL queriees
  • Desing wise, it would be better to have list of prepared statements where data source would provide map of values to inject into prepared statement and prepared statement id to use

Errors not logged when berserker is run in async mode

Usually when Berserker configuration is prototyped, errors can occur rendering berserker not to work as expected. Error messages in form of exceptions in log are visible when Berserker is running in sync mode, but when running in async mode, errors are not visible.
Errors should be visible in async mode as well.

Improve output of the runner

Once load generator is started from the command line the output of the program stops with:

11:17:49.116 [main] INFO io.smartcat.berserker.LoadGenerator - Load generator started.

The output should be more informative. For example it can report on every 1k of rows created.

Take a look at cassandra-stress tool for inspiration.

Revisit metrics reservoir implementation

Currently, metrics use default reservoir which is ExponentiallyDecayingReservoir.
Depending on implementation of #23, this might be already taken care of. If not, usage of SlidingTimeWindowReservoir should be considered, or even better configuration option to setup any of the following:

  • ExponentiallyDecayingReservoir
  • SlidingTimeWindowReservoir
  • SlidingWindowReservoir
  • UniformReservoir

Make metrics unit configurable

Currently, metrics returned by berserer are in nano seconds. That should be made configurable to support milliseconds and seconds as well.

Unclear berserker metrics

I don't really understand the time unit I get when monitoring kafka performances by BerserkerMetrics.
Furthermore, if I specify a load test with a rate generator of 300, i can clearly see that my computer cannot spawn more than 41 requests. What am I doing wrong?

How to use Oath or similar flow with berserker?

Some APIs have Oath flow where it is necessary to obtain a token for a user.
In that case the flow would go like this:

create or authenticate user(s) -> obtain token(s) from response -> use the token(s) in all requests

This is very common flow which hasn't been taken into consideration so far.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.