GithubHelp home page GithubHelp logo

rally-tracks's Introduction

rally-tracks

This repository contains the default track specifications for the Elasticsearch benchmarking tool Rally.

Tracks are used to describe benchmarks in Rally. For each track, the README.md file documents the data used, explains its parameters and provides an example document.

You can also create your own track to ensure your benchmarks will be as realistic as possible.

Versioning Scheme

Refer to the official Rally docs for more details.

How to Contribute

If you want to contribute a track, please ensure that it works against the main version of Elasticsearch (i.e. submit PRs against the master branch). We can then check whether it's feasible to backport the track to earlier Elasticsearch versions.

See all details in the contributor guidelines.

Backporting changes

If you are a contributor with direct commit access to this repository then please backport your changes. This ensures that tracks do not work only for the latest main version of Elasticsearch but also for older versions. Apply backports with cherry-picks. Below you can find a walkthrough:

Assume we've pushed commit a7e0937 to master and want to backport it. This is a change to the noaa track. Let's check what branches are available for backporting:

daniel@io:tracks/default ‹master›$ git branch -r
  origin/1
  origin/2
  origin/5
  origin/HEAD -> origin/master
  origin/master

We'll go backwards starting from branch 5, then branch 2 and finally branch 1. After applying a change, we will test whether the track works as is for an older version of Elasticsearch.

git checkout 5
git cherry-pick a7e0937

# test the change now with an Elasticsearch 5.x distribution
esrally race --track=noaa --distribution-version=5.4.3 --test-mode

# push the change
git push origin 5

This particular track uses features that are only available in Elasticsearch 5 and later so we will stop here but the process continues until we've reached the earliest branch.

Sometimes it is necessary to remove individual operations from a track that are not supported by earlier versions. This graceful fallback is a compromise to allow to run a subset of the track on older versions of Elasticsearch too. If this is necessary then it's best to do these changes in a separate commit. Also, don't forget to cherry-pick this separate commit too to even earlier versions if necessary.

License

There is no single license for this repository. Licenses are chosen per track. They are typically licensed under the same terms as the source data. See the README files of each track for more details.

rally-tracks's People

Contributors

1stvamp avatar afoucret avatar b-deam avatar benwtrent avatar cavokz avatar craigtaverner avatar danielmitterdorfer avatar dliappis avatar ebadyano avatar favilo avatar gbanasiak avatar gizas avatar imotov avatar inqueue avatar javanna avatar jimczi avatar jpountz avatar jtibshirani avatar kyungeunni avatar luegg avatar martijnvg avatar matriv avatar maxhniebergall avatar mayya-sharipova avatar michaelbaamonde avatar nik9000 avatar piergm avatar pmpailis avatar pquentin avatar salvatore-campagna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rally-tracks's Issues

Add challenges for index sorting

Index sorting is a new feature in elasticsearch 6.x. It might be nice to benchmark some of the default tracks to measure the cost in terms of indexing throughput.
We're only interested in indexing throughput at the moment since we don't have any search/aggs feature that takes advantage of the index sorting.
The default tracks are maybe not the best fit for index sorting in terms of what we can achieve with it but they offer a variety of use cases that would allow us to measure the cost of this feature with different types of data.

Some indicators are empty

hi
my esrally output:

_______             __   _____

/ () ____ / / / /_____ ________
/ /_ / / __ / __ `/ / __ / / __ / / _
/ __/ / / / / / /
/ / / / / // // / / / __/
/
/ /
/
/ /
/_
,// /__/_/_// ___/

Lap Metric Operation Value Unit
All Indexing time 33.2486 min
All Merge time 7.03988 min
All Refresh time 1.28523 min
All Flush time 0.0920167 min
All Merge throttle time 0.526933 min
All Median CPU usage 468.176 %
All Total Young Gen GC 15.326 s
All Total Old Gen GC 1.184 s
All Heap used for doc values 0.0941429 MB
All Heap used for terms 13.1741 MB
All Heap used for norms 0.0684814 MB
All Heap used for points 0.579043 MB
All Heap used for stored fields 0.624725 MB
All Segment count 89
All Min Throughput index-append
All Median Throughput index-append
All Max Throughput index-append
All Min Throughput force-merge
All Median Throughput force-merge
All Max Throughput force-merge
All Min Throughput index-stats
All Median Throughput index-stats
All Max Throughput index-stats
All 50.0th percentile latency index-stats 1.5946 ms
All 90.0th percentile latency index-stats 1.67269 ms
All 99.0th percentile latency index-stats 2.5369 ms
All 100.0th percentile latency index-stats 29.9786 ms
All 50.0th percentile service time index-stats 1.53625 ms
All 90.0th percentile service time index-stats 1.60196 ms
All 99.0th percentile service time index-stats 1.92153 ms
All 100.0th percentile service time index-stats 11.7068 ms
All Min Throughput node-stats
All Median Throughput node-stats
All Max Throughput node-stats
All 50.0th percentile latency node-stats 1.62292 ms
All 90.0th percentile latency node-stats 1.80209 ms
All 99.0th percentile latency node-stats 4.67785 ms
All 100.0th percentile latency node-stats 13.2262 ms
All 50.0th percentile service time node-stats 1.55853 ms
All 90.0th percentile service time node-stats 1.74619 ms
All 99.0th percentile service time node-stats 2.48444 ms
All 100.0th percentile service time node-stats 13.168 ms
All Min Throughput default ops/s
All Median Throughput default ops/s
All Max Throughput default ops/s
All Min Throughput term
All Median Throughput term
All Max Throughput term
All Min Throughput phrase 200.104
All Median Throughput phrase 200.123
All Max Throughput phrase 200.143
All Min Throughput country_agg_uncached 6.16675
All Median Throughput country_agg_uncached 6.26778
All Max Throughput country_agg_uncached 6.28171
All Min Throughput country_agg_cached 100.07
All Median Throughput country_agg_cached 100.126
All Max Throughput country_agg_cached 100.182
All Min Throughput scroll 56.8402
All Median Throughput scroll 56.8896
All Max Throughput scroll 57.0091
All Min Throughput expression 3.82189
All Median Throughput expression 3.83776
All Max Throughput expression 3.86058
All Min Throughput painless_static 2.50521
All Median Throughput painless_static 2.51079
All Max Throughput painless_static 2.51662
All Min Throughput painless_dynamic 2.23677
All Median Throughput painless_dynamic 2.24671
All Max Throughput painless_dynamic 2.26263

Why are these empty?

| All | Min Throughput | default | | ops/s |
| All | Median Throughput | default | | ops/s |
| All | Max Throughput | default | | ops/s |

inclusion of total size metrics on README for each dataset

When deciding what sized node is needed to hold a given dataset, it'd be helpful to see the size of each dataset in README.md. This could also help select a given track that would fit on a small node if disk size was a limiting factor.

I see the information contained in the given track.json, but in tracks like logging, you need to sum up a number of different distinct values to arrive at a total.

sparse fields track

We should consider adding a track with very sparse fields, comparing how doc_values behave over time. This is particularly interesting once we move to lucene 7.

screen shot 2017-04-20 at 1 08 12 pm

The run from14:30-16:30 with doc_values: false,
The one from 16:30-17:30 with doc_values: true,
The very last run is with doc values disabled for all fields

The data has around 2200 fields in total, split across 30 types. The one type with the most documents has 200-300 fields.
The decrease in performance comparing the first two runs is significant, with around 30-40%.
Furthermore the indexing rate also keeps slowing down, the more data gets indexed, which does not happen as much when doc_values are disabled (see run 1)

This test was run on the following hardware:
14 cores
4 SSDs, multiple data paths
30GB heap size

The cluster was CPU bound during all runs.

The merges can't keep up in the second run and "indexing throttled" messages were showing up in the logs

Remove size() method from parameter sources

With elastic/rally#763 we have changed the parameter source API and in #83 we have changed this track to be compatible with this change. However, we left the old size() method for backwards-compatibility. After a grace period of 6 months we should remove this method from all parameter sources.

Due on March 10, 2020

Add a Wikipedia track

The Wikimedia foundation provides dumps under a permissive license. As this would make for a good full-text dataset, we should consider adding it.

TermQueryParamSource and SortedTermQueryParamSource Backwards

I was looking at your nested benchmarks and noticed that term query was slower than both nested and sorted term query. This did not make sense to me so when I dug into the test I found you have mixed up TermQueryParamSource and SortedTermQueryParamSource. TermQueryParamSource has a sort in the query and SortedTermQueryParamSource has no sort. I just want to confirm that this was not intentional and is a bug in the benchmark.

https://github.com/elastic/rally-tracks/blob/master/nested/track.py#L27

http_logs add force merge to 1 segment (#90)

Hi,

Can anyone please comment if the force merge works correctly as its takes close to 8000 seconds for this operation ?

The entire http_logs track gets executed in 12,200 seconds from 3400 earlier. Few times I ran this and can get 5700 seconds .

My Index median throughput is about 298,000 docs/second

Thanks
Kailas

Add total size of payload to track info when listing tracks

I used to use the tiny track to do quick/simple rally testing because I knew it was a small track to both download and store.

With the other tracks, you have no idea of the size required on disk or the network impact (size and time) without going into the track file and adding up all the payload file sizes or by running it once to see the resulting payload files on disk after they're retrieved.

It would be neat to show a total size indicator when listing tracks with rally.

This might even just be easy to add to the end of the description, but would be nicer if it was calculated from the track file where the sizes of payloads are specified.

Error reporting using my own defined test data

[2018-12-18T10:16:49,317][DEBUG][o.e.a.b.TransportShardBulkAction] [enadmin][4] failed to execute bulk item (index) BulkShardRequest [[enadmin][4]] containing [471] requests
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException(DocumentParser.java:171) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:72) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:261) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:700) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:677) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:658) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$2(TransportShardBulkAction.java:553) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:572) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:551) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:357) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:297) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:959) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:956) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:270) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2213) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:968) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:98) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:318) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:293) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:280) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) [x-pack-security-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) [x-pack-security-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('-' (code 45)): was expecting comma to separate Object entries
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@3a17e043; line: 1, column: 37]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:456) ~[jackson-core-2.8.10.jar:2.8.10]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:761) ~[jackson-core-2.8.10.jar:2.8.10]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:53) ~[elasticsearch-x-content-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:405) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:380) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:95) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
... 38 more

I used the same index-append challenge in geonames, only changed the data. I don't know what went wrong

Add a Twitter track

Twitter Analytics Platforms are one of the most popular ELK uses cases. I think we should consider adding it.

Remove explicit type definition in percolator queries

The type definition on match queries has been deprecated in Elasticsearch 6:

[2017-02-09T14:35:27,735][WARN ][o.e.d.c.ParseField       ] Deprecated field [type] used, replaced by [match_phrase and match_phrase_prefix query]

The cause of these deprecation warnings are percolator queries like these:

{
   "query": {
      "match": {
         "body": {
            "query": "yard bushes",
            "type": "boolean"
         }
      }
   }
}

We need to remove the type from all queries and will provide a new data file.

Improve Geopoint Track

We discussed the need to improve the geo point track to make it more representative of the new capabilities in geo. We should revisit:

  1. The dataset. Are there better datasets available we could exploit? The current dataset has very small documents and limited fields - geo point only. Better datasets might allow us to benchmark other geo point features in conjunction with non geo queries.

  2. Queries - Queries are currently limited to fixed queries - bbox, polygon, distance and distanceRange. They are also fixed with limited diversity.

Finally, we should be cognisant of which queries are most commonly used in solutions e.g. pew pew maps in security.

@dliappis @danielmitterdorfer

Big difference between the two measurements,Does esrally cheating?

Problems:
I run twice esrally testings(each testing only perform “index-append”), but the result are very different.

Step:
1.create the index geonames manually
2.First test:
esrally race --pipeline=benchmark-only --target-hosts=28.28.0.4:9200 --track=geonames
The result:
|Min Throughput | index-append | 144447 | docs/s |
|Median Throughput | index-append | 209693 | docs/s |
|Max Throughput | index-append | 225118 | docs/s |

3.delete the index geonames manually
4.create the index geonames manually
5.Second test:
esrally race --pipeline=benchmark-only --target-hosts=28.28.0.4:9200 --track=geonames
The result:
|Min Throughput | index-append | 228746 | docs/s |
|Median Throughput | index-append | 256881 | docs/s |
|Max Throughput | index-append | 263886 | docs/s |

The throughput of the two measurements is very different,about 50000 docs/s.I repeated the testing many times, and this result will appear.

Why?

The error rate is 100%

I use the geonames track, and with a es cluster,the race result is:
| All | error rate | index-append | 100 | % |
| All | Min Throughput | force-merge | 16.61 | ops/s |

the index-append failed, I am not sure the reson, maybe the es cluster maybe the esrally

Index coordinates as geopoints in geonames track

Note: migrated from the Rally repo as tracks have now their own repository.

We currently index coordinates as doubles but should really index them as coordinates. This affects the mapping and also the data file (which is somewhat problematic since we obviously don't want to enforce to download the data file every time).

Use the default store type instead of assuming one

We currently allow to pass the index store type as a track parameter for some tracks and default to hybridfs if nothing is passed. However, this is making some assumptions, e.g. that memory mapping is allowed (which can be turned off explicitly with node.store.allow_mmap). There is a better default though: We can just default to the correct default value which is fs.

Remove fast-no-conflicts challenges

These challenges are present mostly for historic reasons and don't provide much value by now. Hence, we'll remove them. As we have new index sorting benchmarks (added in #22) that use the fast-no-conflict challenges as a baseline, we'll also need to adapt that they use a different baseline:

The new baselines for the sorted benchmarks will be:

  • geonames: append-fast-no-conflicts
  • pmc: append-no-conflicts-index-only
  • logging: append-no-conflicts-index-only

The other tracks either don't have no index sorting benchmarks or they already use a "non-fast" baseline.

200s-in-range query latency is unstable

In the nightly benchmarks we've observed that latency is unstable for the query 200s-in-range:

nightly-basic-http_logs-4g-200s-in-range-latency

When we inspect latency over time in a single benchmark, we see that it is consistenly rising:

latency-200s-in-range

Service time in a single benchmark is around 26ms:

nightly-basic-http_logs-4g-200s-in-range-service-time

This suggests that the target throughput is too high for that configuration and we proposing dropping it, so one request is issued every 30ms, meaning target througput should be lowered to 1 / 30ms = 33 ops/s.

esrally run benchmark always download geopointshape track data to geopointshapes folder.

esrally run benchmark always download geopointshape track data to geopointshapes folder even run ./dowload geopointshape to download track data to local storage.

    {
      "name": "geopointshapes",
      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopointshape",
      "documents": [
        {
          "source-file": "documents.json.bz2",
          "document-count": 60844404,
          "compressed-bytes": 493367095,
          "uncompressed-bytes": 2780550484
        }
      ]
    }

Remove async backwards-compatibility layer

With #97 we have introduced a compatibility layer that allows to use synchronous runners in older Rally versions (prior to Rally 1.5.0). After a grace period we should remove the compatibility layer and only allow async runners.

Improve description of all challenges

Currently, it is not clear what each challenge does without looking at the track.json in detail. We should improve that by adding a better description for each challenge.

Ensure geonames corpora doesn't have negative population

We have discovered that the field population, which is used in a Math.log10() calculation, has negative values in a number of docs.

We should fix this problem, I observed the following lines in documents-2.json:

line 4826766: {"geonameid": 7576740, "name": "East Spit", "asciiname": "East Spit", "feature_class": "H", "feature_code": "RFC", "country_code": "KI", "admin1_code": "01", "admin2_code": "KU", "population": -12, "dem": "-9999", "timezone": "Pacific/Tarawa", "location": [173.47575, 0.21458]}
line 4826831: {"geonameid": 7576827, "name": "NW Shoals", "asciiname": "NW Shoals", "feature_class": "H", "feature_code": "RFC", "country_code": "KI", "admin1_code": "01", "admin2_code": "MI", "population": -3, "dem": "-9999", "timezone": "Pacific/Tarawa", "location": [172.96914, 1.01841]}

percolator track fails for ES 1.x and 2.x

Steps to reproduce:

Run esrally --pipeline=from-distribution --distribution-version=2.0.0 --track=percolator

Expected vs. actual behavior

Actual: In the "search" phase, an exception is thrown:

elasticsearch.exceptions.RequestError: TransportError(400, 'search_phase_execution_exception', 'No query registered for [percolate]')

The reason is that ES 1.x and ES 2.x use the percolator API and ES 5.x uses the percolator query. Rally currently has only support for queries but not for calling arbitrary APIs. When Rally has support for these APIs (elastic/rally#105), we can also support the percolator track on older ES versions.

Replace interval parameter with fixed_interval / calendar_interval

In our nightly benchmarks against master we see:

[2019-09-25T09:31:56,181][WARN ][o.e.d.s.a.b.h.DateHistogramAggregationBuilder] [rally-node-0] [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.

Grepping for interval in rally-tracks this probably affects:

  • http_logs
  • nyc_taxis
  • nested
  • pmc

we should analyze each query and change this accordingly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.