It is a good tool to generate bulk elasticsearch test data however parameters passed a

I was looking at the but not being python/tornado expert, could not make out h

Seeing this --help output: <div class="snippet-cl

Command line parameter not being used. about elasticsearch-test-data HOT 15 CLOSED

oliver006 commented on June 13, 2024

Command line parameter not being used.

from elasticsearch-test-data.

Comments (15)

oliver006 commented on June 13, 2024

Thanks for reporting this, let me look into it and fix it.

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

I was looking at the script but not being python/tornado expert, could not make out how the command line arguments are parsed and values assigned when calling various actions. I did not find any line where index_name or index_type are assigned and passed to index create function.
Thanks

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

Seeing this --help output:

oliver:elasticsearch-test-data $ ./es_test_data.py   --help
Usage: ./es_test_data.py [OPTIONS]

Options:

  --help                           show this help information

es_test_data.py options:

  --batch_size                     Elasticsearch bulk index batch size (default
                                   1000)
  --count                          Number of docs to generate (default 10000)
  --dict_file                      Name of dictionary file to use
  --es_url                         URL of your Elasticsearch node (default
                                   http://localhost:9200/)
  --force_init_index               Force deleting and re-initializing the
                                   Elasticsearch index (default False)
  --format                         message format (default
                                   name:str,age:int,last_updated:ts)
  --id_type                        Type of 'id' to use for the docs, valid
                                   settings are int and uuid4, None is default
  --index_name                     Name of the index to store your messages
                                   (default test_data)
  --index_type                     Type (default test_type)
  --num_of_replicas                Number of replicas for ES index (default 0)
  --num_of_shards                  Number of shards for ES index (default 2)
  --out_file                       If set, write test data to out_file as well.
                                   (default False)
  --set_refresh                    Set refresh rate to -1 before starting the
                                   upload (default False)

in particular, it says you should use --index_name to provide the name of the index you want to populate.

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

Just did a few test runs and set the index value to a couple of different things and it worked as expected so closing this. Feel free to re-open or comment if you still have issues.

PS: ran it like this ./es_test_data.py --es_url=http://127.0.0.1:9200 --count=100 --index_name=test123

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

Thanks. Looks like with one typo in one option, all settings were ignored without any error. Managed to fix that but on sustained load generation, it hits following error and crashes. It is very quick when document size is large > 2kb and batch size > 1000. Any tuning suggestion/ config changes required to avoid crashes midway ?

File "/usr/lib64/python2.6/site-packages/tornado/ioloop.py", line 453, in run_sync
return future_cell[0].result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 198, in generate_test_data
yield upload_batch(upload_data_txt)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
response = yield async_http_client.fetch(request)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

What's the error? Looks like you're missing a few lines from the stack trace. And anything useful in the ES node logs?

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

Here is full error on python node. ES nodes nothing in error logs. ES Cluster in green state. It tried increasing time out value in the script to 600 from 240 but with no luck.
Thanks

[I 151130 15:47:38 es_test_data:181] Generating 100000 docs, upload batch size is 5000
Traceback (most recent call last):
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 235, in
tornado.ioloop.IOLoop.instance().run_sync(generate_test_data)
File "/usr/lib64/python2.6/site-packages/tornado/ioloop.py", line 453, in run_sync
return future_cell[0].result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 198, in generate_test_data
yield upload_batch(upload_data_txt)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 53, in upload_batch
response = yield async_http_client.fetch(request)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "", line 3, in raise_exc_info
tornado.httpclient.HTTPError: HTTP 599: Timeout

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

What's the URL of your ES server? Do you need to overwrite the default settings (http://localhost:9200)?
Can you post the command you ran and if eg. the index creation succeeded?

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

I am using proxy to connect to ES client nodes. Neither proxy or client node show any bottleneck in terms of sockets, files cpu or memory bottleneck. I am running python script from multiple nodes in parallel but inserting to different “type” of the index.
Here is the starting output.
root@cosjmeter elasticsearch-test-data-master]# [I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert
[I 151130 15:47:38 es_test_data:45] Guess the index exists already
[I 151130 15:47:38 es_test_data:181] Generating 100000 docs, upload batch size is 5000
[I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert
[I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

Does it ever successfully upload a batch? You could try lowering the batch size.
Does a curl to haproxy:80 work? Maybe try one process at a time only.
And why does it show the line "Trying to create index" three times? Are there still old, stale processes running?

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

Batch size 500 ran ok. Connection to haproxy work well with all nodes. Multiple lines are because of same scripts launched on multiple nodes using ssh command to start script from different nodes. Command issued is as below from a loop on the controller node.

ssh cosjmeterprintf %02d ${i} " nohup python /JMETER/elasticsearch-test-data-master/es_test_data.py --es_url=http://haproxy:80 --index_name=test_insert --index_type=${HOST} --num_of_shards=${SHARDS} --num_of_replicas=${REPLICAS} --count=${DOC_COUNT} --batch_size=${BATCH_SIZE} --format=name:str:1:30,time:ts,recno:int:0:1000000,field1:str:512:513,field2:str:512:513,field3:str:512:513,field4:words:25:100,num1:int:100:10000 num2:int:20000:200000 " &

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

Cool, looks like your ES node can't handle larger batches but sounds like the smaller batch size setting addressed that.

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

Cause of exit seems to be low time out in line below. Increased the time out and now batch size of 1000 running ok.
request = tornado.httpclient.HTTPRequest(tornado.options.options.es_url + "/_bulk", method="POST
", body=upload_data_txt, request_timeout=3)

from elasticsearch-test-data.

ajaybhatnagar commented on June 13, 2024

An these are observed sample upload times with 5000 batchsize.
[I 151201 09:44:03 es_test_data:60] Upload: OK - upload took: 1350ms, total docs uploaded: 140000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 6147ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 6050ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 5726ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 3147ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 4563ms, total docs uploaded: 145000
[I 151201 09:44:16 es_test_data:60] Upload: OK - upload took: 3193ms, total docs uploaded: 160000
[I 151201 09:44:17 es_test_data:60] Upload: OK - upload took: 4081ms, total docs uploaded: 165000
[I 151201 09:44:17 es_test_data:60] Upload: OK - upload took: 3929ms, total docs uploaded: 160000
[I 151201 09:44:16 es_test_data:60] Upload: OK - upload took: 4389ms, total docs uploaded: 165000
[I 151201 09:44:23 es_test_data:60] Upload: OK - upload took: 5294ms, total docs uploaded: 150000
[I 151201 09:44:24 es_test_data:60] Upload: OK - upload took: 5211ms, total docs uploaded: 150000
[I 151201 09:44:23 es_test_data:60] Upload: OK - upload took: 3973ms, total docs uploaded: 150000
[I 151201 09:44:24 es_test_data:60] Upload: OK - upload took: 5016ms, total docs uploaded: 150000

from elasticsearch-test-data.

oliver006 commented on June 13, 2024

Looks good!

from elasticsearch-test-data.

Command line parameter not being used. about elasticsearch-test-data HOT 15 CLOSED

Comments (15)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs