elastic / rally-tracks Goto Github PK

Track specifications for the Elasticsearch benchmarking tool Rally

Python 92.94% Shell 5.59% Ruby 0.49% Makefile 0.97%

rally-tracks's Introduction

rally-tracks

This repository contains the default track specifications for the Elasticsearch benchmarking tool Rally.

Tracks are used to describe benchmarks in Rally. For each track, the README.md file documents the data used, explains its parameters and provides an example document.

You can also create your own track to ensure your benchmarks will be as realistic as possible.

Versioning Scheme

Refer to the official Rally docs for more details.

How to Contribute

If you want to contribute a track, please ensure that it works against the main version of Elasticsearch (i.e. submit PRs against the master branch). We can then check whether it's feasible to backport the track to earlier Elasticsearch versions.

See all details in the contributor guidelines.

Backporting changes

If you are a contributor with direct commit access to this repository then please backport your changes. This ensures that tracks do not work only for the latest main version of Elasticsearch but also for older versions. Apply backports with cherry-picks. Below you can find a walkthrough:

Assume we've pushed commit a7e0937 to master and want to backport it. This is a change to the noaa track. Let's check what branches are available for backporting:

daniel@io:tracks/default ‹master›$ git branch -r
  origin/1
  origin/2
  origin/5
  origin/HEAD -> origin/master
  origin/master

We'll go backwards starting from branch 5, then branch 2 and finally branch 1. After applying a change, we will test whether the track works as is for an older version of Elasticsearch.

git checkout 5
git cherry-pick a7e0937

# test the change now with an Elasticsearch 5.x distribution
esrally race --track=noaa --distribution-version=5.4.3 --test-mode

# push the change
git push origin 5

This particular track uses features that are only available in Elasticsearch 5 and later so we will stop here but the process continues until we've reached the earliest branch.

Sometimes it is necessary to remove individual operations from a track that are not supported by earlier versions. This graceful fallback is a compromise to allow to run a subset of the track on older versions of Elasticsearch too. If this is necessary then it's best to do these changes in a separate commit. Also, don't forget to cherry-pick this separate commit too to even earlier versions if necessary.

License

There is no single license for this repository. Licenses are chosen per track. They are typically licensed under the same terms as the source data. See the README files of each track for more details.

rally-tracks's People

Contributors

Stargazers

Watchers

Forkers

jpountz qiuyuanfeng markharwood seang-es casabulanca kategray martijnvg appbaseio ep3998 kirandk ruflin rawlinson-luo mesfinmulugeta ddemas ryanmaclean swamyrajamohan pcsanwald danielmitterdorfer joeweoj cdahlqvist ditac dnhatn rdiez-stratio jakelandis simitt andyb-elastic mayya-sharipova tuniu1985 dliappis kurktchiev kronuz javanna imotov bebeo92 3jiou myena sanen alexsapran ebadyano yojs glenrsmith ksvladimir weisongbai cyberhaveninc jhaarjun21 pescpro deshmukhgaurav novosibman zsearchone drawlerr realmjian thaihust henningandersen masterkun94 ascent-technologies mudboyzh peacedata0 nik9000 tlrx hubertpham95 barkbay kylechan37 zhhades hendrikmuhs weizijun jtibshirani shawyeok xandmaga helulu1 hub-cap aleksmaus einic bigdataboutique jerrygb gingerwizard domhoff fangqingsong khushbr cbuescher djrickyb cozzbp oscarneira isabella232 probakowski yuerl cmanning09 gcy0926 yxyx520 wty1993abc jaydenransom fcofdez inqueue b-deam vinaybabu16 pquentin michaelbaamonde cavokz luegg uparamonau uparamonau-es

rally-tracks's Issues

Big difference between the two measurements,Does esrally cheating?

Problems:
I run twice esrally testings(each testing only perform “index-append”), but the result are very different.

The throughput of the two measurements is very different,about 50000 docs/s.I repeated the testing many times, and this result will appear.

Why?

Add a Wikipedia track

The Wikimedia foundation provides dumps under a permissive license. As this would make for a good full-text dataset, we should consider adding it.

Consider adding benchmarks that surface the performance difference between eager_global_ordinals and murmur3

I'm not sure how often this comes up but we might want to consider running benchmarks between eager_global_ordinals and the murmur3 plugin as it relates to the performance differences of each when used for terms aggregations against high cardinality fields.

Some indicators are empty

hi
my esrally output:

_______             __   _____

/ () __ / / / /_ ______
/ /_ / / / `/ / / / / / _
/ / / / / / / // / / / / // // / / / /
// /// //_,// /__/_/_// ___/

Lap	Metric	Operation	Value	Unit
All	Indexing time		33.2486	min
All	Merge time		7.03988	min
All	Refresh time		1.28523	min
All	Flush time		0.0920167	min
All	Merge throttle time		0.526933	min
All	Median CPU usage		468.176	%
All	Total Young Gen GC		15.326	s
All	Total Old Gen GC		1.184	s
All	Heap used for doc values		0.0941429	MB
All	Heap used for terms		13.1741	MB
All	Heap used for norms		0.0684814	MB
All	Heap used for points		0.579043	MB
All	Heap used for stored fields		0.624725	MB
All	Segment count		89
All	Min Throughput	index-append
All	Median Throughput	index-append
All	Max Throughput	index-append
All	Min Throughput	force-merge
All	Median Throughput	force-merge
All	Max Throughput	force-merge
All	Min Throughput	index-stats
All	Median Throughput	index-stats
All	Max Throughput	index-stats
All	50.0th percentile latency	index-stats	1.5946	ms
All	90.0th percentile latency	index-stats	1.67269	ms
All	99.0th percentile latency	index-stats	2.5369	ms
All	100.0th percentile latency	index-stats	29.9786	ms
All	50.0th percentile service time	index-stats	1.53625	ms
All	90.0th percentile service time	index-stats	1.60196	ms
All	99.0th percentile service time	index-stats	1.92153	ms
All	100.0th percentile service time	index-stats	11.7068	ms
All	Min Throughput	node-stats
All	Median Throughput	node-stats
All	Max Throughput	node-stats
All	50.0th percentile latency	node-stats	1.62292	ms
All	90.0th percentile latency	node-stats	1.80209	ms
All	99.0th percentile latency	node-stats	4.67785	ms
All	100.0th percentile latency	node-stats	13.2262	ms
All	50.0th percentile service time	node-stats	1.55853	ms
All	90.0th percentile service time	node-stats	1.74619	ms
All	99.0th percentile service time	node-stats	2.48444	ms
All	100.0th percentile service time	node-stats	13.168	ms
All	Min Throughput	default		ops/s
All	Median Throughput	default		ops/s
All	Max Throughput	default		ops/s
All	Min Throughput	term
All	Median Throughput	term
All	Max Throughput	term
All	Min Throughput	phrase	200.104
All	Median Throughput	phrase	200.123
All	Max Throughput	phrase	200.143
All	Min Throughput	country_agg_uncached	6.16675
All	Median Throughput	country_agg_uncached	6.26778
All	Max Throughput	country_agg_uncached	6.28171
All	Min Throughput	country_agg_cached	100.07
All	Median Throughput	country_agg_cached	100.126
All	Max Throughput	country_agg_cached	100.182
All	Min Throughput	scroll	56.8402
All	Median Throughput	scroll	56.8896
All	Max Throughput	scroll	57.0091
All	Min Throughput	expression	3.82189
All	Median Throughput	expression	3.83776
All	Max Throughput	expression	3.86058
All	Min Throughput	painless_static	2.50521
All	Median Throughput	painless_static	2.51079
All	Max Throughput	painless_static	2.51662
All	Min Throughput	painless_dynamic	2.23677
All	Median Throughput	painless_dynamic	2.24671
All	Max Throughput	painless_dynamic	2.26263

Why are these empty?

Add total size of payload to track info when listing tracks

I used to use the tiny track to do quick/simple rally testing because I knew it was a small track to both download and store.

With the other tracks, you have no idea of the size required on disk or the network impact (size and time) without going into the track file and adding up all the payload file sizes or by running it once to see the resulting payload files on disk after they're retrieved.

It would be neat to show a total size indicator when listing tracks with rally.

This might even just be easy to add to the end of the description, but would be nicer if it was calculated from the track file where the sizes of payloads are specified.

Allow to use the "update" action in update challenges

Our "update" challenges simulate id conflicts but use the "index" action. We should parameterize the action so we can also use the "update" action which allows us to benchmark scenarios like elastic/elasticsearch#26802 in our nightly benchmarks.

Show one example document in each track's README

Remove async backwards-compatibility layer

With #97 we have introduced a compatibility layer that allows to use synchronous runners in older Rally versions (prior to Rally 1.5.0). After a grace period we should remove the compatibility layer and only allow async runners.

Ensure geonames corpora doesn't have negative population

We have discovered that the field population, which is used in a Math.log10() calculation, has negative values in a number of docs.

We should fix this problem, I observed the following lines in documents-2.json:

line 4826766: {"geonameid": 7576740, "name": "East Spit", "asciiname": "East Spit", "feature_class": "H", "feature_code": "RFC", "country_code": "KI", "admin1_code": "01", "admin2_code": "KU", "population": -12, "dem": "-9999", "timezone": "Pacific/Tarawa", "location": [173.47575, 0.21458]}

line 4826831: {"geonameid": 7576827, "name": "NW Shoals", "asciiname": "NW Shoals", "feature_class": "H", "feature_code": "RFC", "country_code": "KI", "admin1_code": "01", "admin2_code": "MI", "population": -3, "dem": "-9999", "timezone": "Pacific/Tarawa", "location": [172.96914, 1.01841]}

Add a Twitter track

Twitter Analytics Platforms are one of the most popular ELK uses cases. I think we should consider adding it.

Remove append-no-conflicts-index-only-1-replica challenges

These challenges are a relict from times when Rally did not support track parameters. As these challenges are now deprecated for several months and we have a good alternative solution, we should remove them.

Add highlighter queries to PMC track

Error reporting using my own defined test data

[2018-12-18T10:16:49,317][DEBUG][o.e.a.b.TransportShardBulkAction] [enadmin][4] failed to execute bulk item (index) BulkShardRequest [[enadmin][4]] containing [471] requests
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException(DocumentParser.java:171) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:72) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:261) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:700) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:677) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:658) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$2(TransportShardBulkAction.java:553) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:572) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:551) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:357) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:297) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:959) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:956) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:270) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2213) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:968) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:98) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:318) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:293) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:280) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) [x-pack-security-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) [x-pack-security-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('-' (code 45)): was expecting comma to separate Object entries
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@3a17e043; line: 1, column: 37]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:456) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:761) ~[jackson-core-2.8.10.jar:2.8.10]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:53) ~[elasticsearch-x-content-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:405) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:380) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:95) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
... 38 more

I used the same index-append challenge in geonames, only changed the data. I don't know what went wrong

Improve description of all challenges

Currently, it is not clear what each challenge does without looking at the track.json in detail. We should improve that by adding a better description for each challenge.

[Percolator] Have only one type per index

Elasticsearch 6.0 will only allow one type per index (see elastic/elasticsearch#24317) and the percolator track violates this restriction at the moment. Hence, we need to change the track so it uses only one type per index.

Any way to provide custom plugins for track mappings that leverage them

Just didn't see any documentation around this, maybe I missed it though

TermQueryParamSource and SortedTermQueryParamSource Backwards

I was looking at your nested benchmarks and noticed that term query was slower than both nested and sorted term query. This did not make sense to me so when I dug into the test I found you have mixed up TermQueryParamSource and SortedTermQueryParamSource. TermQueryParamSource has a sort in the query and SortedTermQueryParamSource has no sort. I just want to confirm that this was not intentional and is a bug in the benchmark.

https://github.com/elastic/rally-tracks/blob/master/nested/track.py#L27

Add a track that uses ingest

Many users are using ingest yet we do not track potential performance regressions of ingest.

For instance at elastic/elasticsearch#26907 a user reports potential speedups for ingest.

esrally run benchmark always download geopointshape track data to geopointshapes folder.

esrally run benchmark always download geopointshape track data to geopointshapes folder even run ./dowload geopointshape to download track data to local storage.

    {
      "name": "geopointshapes",
      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopointshape",
      "documents": [
        {
          "source-file": "documents.json.bz2",
          "document-count": 60844404,
          "compressed-bytes": 493367095,
          "uncompressed-bytes": 2780550484
        }
      ]
    }

inclusion of total size metrics on README for each dataset

When deciding what sized node is needed to hold a given dataset, it'd be helpful to see the size of each dataset in README.md. This could also help select a given track that would fit on a small node if disk size was a limiting factor.

I see the information contained in the given track.json, but in tracks like logging, you need to sum up a number of different distinct values to arrive at a total.

Add a track to test nested / parent child performance

Index + search
We should also force a high update rate (to see the cost of updating nested docs)

Use the default store type instead of assuming one

We currently allow to pass the index store type as a track parameter for some tracks and default to hybridfs if nothing is passed. However, this is making some assumptions, e.g. that memory mapping is allowed (which can be turned off explicitly with node.store.allow_mmap). There is a better default though: We can just default to the correct default value which is fs.

sparse fields track

We should consider adding a track with very sparse fields, comparing how doc_values behave over time. This is particularly interesting once we move to lucene 7.

The run from14:30-16:30 with doc_values: false,
The one from 16:30-17:30 with doc_values: true,
The very last run is with doc values disabled for all fields

The data has around 2200 fields in total, split across 30 types. The one type with the most documents has 200-300 fields.
The decrease in performance comparing the first two runs is significant, with around 30-40%.
Furthermore the indexing rate also keeps slowing down, the more data gets indexed, which does not happen as much when doc_values are disabled (see run 1)

This test was run on the following hardware:
14 cores
4 SSDs, multiple data paths
30GB heap size

The cluster was CPU bound during all runs.

The merges can't keep up in the second run and "indexing throttled" messages were showing up in the logs

Delete indices create by geonames track

I think the indices create by the test should be deleted.

percolator_with_content_ignore_me param

I test percolator_no_score_with_content_ignore_me is 200 times better than percolator_with_content_ignore_me in Median Throughput, is this param is wrong?

Index coordinates as geopoints in geonames track

Note: migrated from the Rally repo as tracks have now their own repository.

We currently index coordinates as doubles but should really index them as coordinates. This affects the mapping and also the data file (which is somewhat problematic since we obviously don't want to enforce to download the data file every time).

Add challenge with `search.default_search_timeout` cluster setting set

We are considering setting a default timeout for elasticsearch clusters and want to understand the performance overheads of this. This issue is to create a clone of an existing challenge but with the timeout setting set in order to have an apples-to-apples basis for performing comparisons

Remove size() method from parameter sources

With elastic/rally#763 we have changed the parameter source API and in #83 we have changed this track to be compatible with this change. However, we left the old size() method for backwards-compatibility. After a grace period of 6 months we should remove this method from all parameter sources.

Due on March 10, 2020

Consider adding an Uber-benchmark

Note: migrated from the Rally repo as tracks have now their own repository.

In a blog post an analysis of 1.1 billion Uber rides is discussed. The data are also available at the nyc-taxi-data Github project. We should check whether this would be a worthwhile benchmark. Also check the licensing model in more detail.

Add challenges for index sorting

Index sorting is a new feature in elasticsearch 6.x. It might be nice to benchmark some of the default tracks to measure the cost in terms of indexing throughput.
We're only interested in indexing throughput at the moment since we don't have any search/aggs feature that takes advantage of the index sorting.
The default tracks are maybe not the best fit for index sorting in terms of what we can achieve with it but they offer a variety of use cases that would allow us to measure the cost of this feature with different types of data.

percolator track fails for ES 1.x and 2.x

Steps to reproduce:

Run esrally --pipeline=from-distribution --distribution-version=2.0.0 --track=percolator

Expected vs. actual behavior

Actual: In the "search" phase, an exception is thrown:

elasticsearch.exceptions.RequestError: TransportError(400, 'search_phase_execution_exception', 'No query registered for [percolate]')

The reason is that ES 1.x and ES 2.x use the percolator API and ES 5.x uses the percolator query. Rally currently has only support for queries but not for calling arbitrary APIs. When Rally has support for these APIs (elastic/rally#105), we can also support the percolator track on older ES versions.

logging dataset download got 403

can you add new tracks config for old distribution-version for compare same race config with different elasticserach version

can you add new tracks config for old distribution-version for compare same race config with different elasticserach version.

i want to comapre same track races for different elasticsearch version ,but some new add track config not exists in old branch .

could you help to do this ?? or let me know the reason.

Logging track should not enable term vectors

I don't think any logging user enables term vectors in their mappings due to the disk/indexing penallty.

The error rate is 100%

I use the geonames track, and with a es cluster,the race result is:
| All | error rate | index-append | 100 | % |
| All | Min Throughput | force-merge | 16.61 | ops/s |

the index-append failed, I am not sure the reson, maybe the es cluster maybe the esrally

http_logs add force merge to 1 segment (#90)

Hi,

Can anyone please comment if the force merge works correctly as its takes close to 8000 seconds for this operation ?

The entire http_logs track gets executed in 12,200 seconds from 3400 earlier. Few times I ran this and can get 5700 seconds .

My Index median throughput is about 298,000 docs/second

Thanks
Kailas

200s-in-range query latency is unstable

In the nightly benchmarks we've observed that latency is unstable for the query 200s-in-range:

When we inspect latency over time in a single benchmark, we see that it is consistenly rising:

Service time in a single benchmark is around 26ms:

This suggests that the target throughput is too high for that configuration and we proposing dropping it, so one request is issued every 30ms, meaning target througput should be lowered to 1 / 30ms = 33 ops/s.

Add a benchmark for date histograms with a `time_zone`

The time_zone option is frequently set on date histograms, yet we recently discovered that it can make aggregations much slower (elastic/elasticsearch#28727). We should track this is our nigthtlly benchmarks.

Remove fast-no-conflicts challenges

These challenges are present mostly for historic reasons and don't provide much value by now. Hence, we'll remove them. As we have new index sorting benchmarks (added in #22) that use the fast-no-conflict challenges as a baseline, we'll also need to adapt that they use a different baseline:

The new baselines for the sorted benchmarks will be:

geonames: append-fast-no-conflicts
pmc: append-no-conflicts-index-only
logging: append-no-conflicts-index-only

The other tracks either don't have no index sorting benchmarks or they already use a "non-fast" baseline.

Remove explicit type definition in percolator queries

The type definition on match queries has been deprecated in Elasticsearch 6:

[2017-02-09T14:35:27,735][WARN ][o.e.d.c.ParseField       ] Deprecated field [type] used, replaced by [match_phrase and match_phrase_prefix query]

The cause of these deprecation warnings are percolator queries like these:

{
   "query": {
      "match": {
         "body": {
            "query": "yard bushes",
            "type": "boolean"
         }
      }
   }
}

We need to remove the type from all queries and will provide a new data file.

eventdata track seems to have incorrect index reference

In the eventdata track challenges, there is a check-cluster-health task defined that has "index": "logs-*" but AFAICT there are no such indices involved in the track. Should that be "index": "eventdata"?

Replace interval parameter with fixed_interval / calendar_interval

In our nightly benchmarks against master we see:

[2019-09-25T09:31:56,181][WARN ][o.e.d.s.a.b.h.DateHistogramAggregationBuilder] [rally-node-0] [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.

Grepping for interval in rally-tracks this probably affects:

http_logs
nyc_taxis
nested
pmc

we should analyze each query and change this accordingly.

Add a track for geo_shape

We should add a track for geo_shape.

@iverase @nknize @imotov Who wants to own this?

Throttle scrolls based on pages/s instead of ops/s

With elastic/rally#1100, Rally has added support to throttle requests based on the reporting unit instead of only ops/s. With elastic/rally#1103 we have introduced a BWC layer which we intend to remove after a grace period. Before we remove that layer, we need to switch throttling to pages/s for all scroll queries.

The dataset. Are there better datasets available we could exploit? The current dataset has very small documents and limited fields - geo point only. Better datasets might allow us to benchmark other geo point features in conjunction with non geo queries.
Queries - Queries are currently limited to fixed queries - bbox, polygon, distance and distanceRange. They are also fixed with limited diversity.

Finally, we should be cognisant of which queries are most commonly used in solutions e.g. pew pew maps in security.

@dliappis @danielmitterdorfer

403 Access denied, when esrally tries to download workloads.

Hi,

I am receiving 403 Access denied, when esrally tries to download workloads.

http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/http_logs/documents-181998.json.bz2

Do I need specific access to s3 bucket?