smartcat-labs / berserker Goto Github PK
View Code? Open in Web Editor NEWBerserker is load generator with pluggable input source and configurable output.
License: Apache License 2.0
Berserker is load generator with pluggable input source and configurable output.
License: Apache License 2.0
All examples should work with no extra effort on Java 8 - 12.
Currently, CSV data source is supported only through Java code. However, that solution is not good enough since CSV data source cannot be used within configuration file. This is a proposal how CSV data source configuration can be used:
data-source-configuration:
parser:
file: # mandatory
delimiter: # optional, defaults to ','
record-separator: # optional, defaults to '\n'
trim: # optional, defaults to true
quote: # optional, defaults to null, usually is " or ', but can be any character
comment-marker: # optional, defaults to '#' and is interpreted at the beginning of line where that line is ignored
skip-header-record: # optional, defaults to false, can skip first row
ignore-empty-lines: # optional, defaults to true
null-string: # optional, set char sequence which will be mapped to String null, defaults to not set
mapping:
user:
firstName: $c0
lastName: $c1
age: $c2
output: $user
parser
section would contain parameters relevant for parsin the CSV file
output
section would contain final output value of this data source, similar to output in Ranger data source
mapping
section would contain custom structure definition where values that can be referenced are columns of CSV. $c0
represents first column, $c1
represents second column and so on...
Additional: mapping
section might contain Ranger constructs which will provide rich value creation with combination of Ranger's random values and values taken from CSV file.
Test HTTP worker performance with current apache http client library, and also with one that gatling is using (https://github.com/AsyncHttpClient/async-http-client). Choose faster.
Link for downloading Berserker Runner JAR (near the bottom of README.md) needs to be updated, as it leads to non-existing file on bintray.
The following is one possible way to implement rate generator configuration.
2000+1000*sin(interval)+50.0*random(interval2, uniform())
Rate generator examples:
rate-gen-config:
rate: 2000 + 1000*sin(6d30m)
rate-gen-config:
offset: 24h
rate:
arate: 2000 + 1000*sin(2d2h50m10s500)
brate: 5000*bell(7d, 2d) + 50*rnd(60s)
output: 0.6*$arate + 0.3@brate
These are just ideas with no clear action points for now
The HTTP worker's configuration should allow for specifying an HTTP status code for response. This is needed to support valid responses that don't have the default response code.
This topic, and following discussion, is expected to help shaping of a more generic and robust architecture of Berserker.
The main purpose of the load generator is to generate load. The total, or output load of the load generator can be described and configured as a cumulative load of simple, elementary load generators.
Ltot = Lp1 + Lp2 + Lp3
Every elementary load generator can be described by a load profile. Load profile is a pair of load rate (produced by a rate generator) and data source (e.g. Ranger data generator).
This approach should provide a way to simulate total load in a multi-tenant system where behavior of particular tenants can significantly vary. A single (or several) load profiles can be used to describe a single tenant load.
load-profiles:
- rate:
generator: 2000 + 1000.05*sin(2h50m)
data:
type: ranger
options:
values:
id: uuid()
msg:
id: @id
payload: rlen(100...1000)
output: msg
- rate:
offset: 24h
generator:
a: 2000 + 1000*bell(3d, 12h)
b: 5000*bell(7d, 2d) + 50*random(60s)
output: @a + @b
data:
type: ranger
options:
values:
msg:
id: uuid()
payload: rlen(100...1000)
output: msg
Current HTTP worker is asynchronous which prevents it from measuring response time of HTTP endpoints, worker should support both async and sync modes.
Currently, AsyncWorker support metrics acquisition with Consumer<WorkerMeta> workerStatsGetherer interface. That might be good for someone wanting to build custom metrics reporting around it. But is it not good enough for ranger-runner and configuration based execution. Some implementations need to be built around it. Features to support:
95th percentile
99th percentile
99.9 percentile
... have it maybe configurable
if packets are dropped, percent of dropped packets
Wait time, service time, response time
Report it every second or so
Maybe have metrics interface to collect and calculate metrics and send it to some reporter
Please update the main README.md with the latest information about how to run berserker and with the latest configuration file changes.
Berserker's configuration options are not documented properly. Each and every configuration option should have proper explanation.
For example, following configuration option is not clear:
rate-generator-configuration:
rate: 1000
rate
is quite ambiguous and can have different meanings.
To retain dropwizard metrics API and ability to use its open source reporters, HdrHistogram needs to be implemented within gauges. Create several gauges, each for characteristic percentile(90%, 95%, 99%, 99.9%, 99.99%, 100%) and register them within MetricRegistry.
Is there a yaml configuration which can be used to push newly generated load to Kafka,Cassandra without using Data Source?
The HTTP worker's configuration should support specifying an explicit timeout value.
It should be possible to log the actual queries (requests) that are generated and response for each of the query.
This is useful for several reasons:
This can be implemented in two steps:
In configuration package, all custom Exception and Configuration classes are placed in the same directory.We should create two sub-packages one for Exception and one for Configuration classes.
Worker should support following:
There are situations where empty composite value can come in handy. For example:
data: user: firstName: random("John","Adam", "Peter") lastName: random("Doe", "Jackson", "Smith") fields: {} output: json($data)
Generated JSON could look like:
{ "user": { "firstName": "Peter", "lastName": "Smith" }, "fields": {} }
In this case, we want to have fields object, just we want it to be empty.
This can be easily achieved by removing values.isEmpty()
check from
CompositeValue at line if (values == null || values.isEmpty()) {
can we direct all the stats to a log file on which some script can make a graph on stats for better output? Some thing like YCSB which creates HDR files which can be loaded to hdr-histogram plotter for visualizing.
One possible approach to implement load generator capable of producing high load would be to have several load generators that run in parallel and produce a combined load. Each generator produces load data for itself, that is, there's no a central place that prepares data and distributes it to the load generators. However, a central coordinator is necessary to orchestrate them - to distribute configuration and load profiles, to start and stop them, to collect metrics data (e.g. calculating coordinated omission).
Github reports dependency vulnerabilities which should be fixed. also, build plugin that checks that should be introduced in order to guarantee vulnerability free version.
In order to test the performance of stream processing frameworks, we need messages of 1MB and 10MB with a rate of 500, 5000 and 50000 messages per second.
With the current implementation, we can use:
Two possible approaches:
Provide cassandra worker implementation.
It should support following configuration options:
Usually when Berserker configuration is prototyped, errors can occur rendering berserker not to work as expected. Error messages in form of exceptions in log are visible when Berserker is running in sync mode, but when running in async mode, errors are not visible.
Errors should be visible in async mode as well.
Configuration sections for data source, rate generator and worker can be fairly complex and big.
It might be good to have option to keep them in separate file and just reference file from this main configuration.
Once load generator is started from the command line the output of the program stops with:
11:17:49.116 [main] INFO io.smartcat.berserker.LoadGenerator - Load generator started.
The output should be more informative. For example it can report on every 1k of rows created.
Take a look at cassandra-stress tool for inspiration.
Currently, only StringSerializer is supported, Avro and/or other serializers.
I need sample yaml for CSV to Kafka load and a bit of description of the configuration
Currently, metrics use default reservoir which is ExponentiallyDecayingReservoir
.
Depending on implementation of #23, this might be already taken care of. If not, usage of SlidingTimeWindowReservoir
should be considered, or even better configuration option to setup any of the following:
Currently, metrics returned by berserer are in nano seconds. That should be made configurable to support milliseconds and seconds as well.
I don't really understand the time unit I get when monitoring kafka performances by BerserkerMetrics.
Furthermore, if I specify a load test with a rate generator of 300, i can clearly see that my computer cannot spawn more than 41 requests. What am I doing wrong?
Some APIs have Oath flow where it is necessary to obtain a token for a user.
In that case the flow would go like this:
create or authenticate user(s) -> obtain token(s) from response -> use the token(s) in all requests
This is very common flow which hasn't been taken into consideration so far.
Create configuration option for Berserker to support time limit when generating messages. E.g. "generate messages for 15 minutes", or "generate messages until 3:20PM"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.