Comments (15)
Thanks for reporting this, let me look into it and fix it.
from elasticsearch-test-data.
I was looking at the script but not being python/tornado expert, could not make out how the command line arguments are parsed and values assigned when calling various actions. I did not find any line where index_name or index_type are assigned and passed to index create function.
Thanks
from elasticsearch-test-data.
Seeing this --help
output:
oliver:elasticsearch-test-data $ ./es_test_data.py --help
Usage: ./es_test_data.py [OPTIONS]
Options:
--help show this help information
es_test_data.py options:
--batch_size Elasticsearch bulk index batch size (default
1000)
--count Number of docs to generate (default 10000)
--dict_file Name of dictionary file to use
--es_url URL of your Elasticsearch node (default
http://localhost:9200/)
--force_init_index Force deleting and re-initializing the
Elasticsearch index (default False)
--format message format (default
name:str,age:int,last_updated:ts)
--id_type Type of 'id' to use for the docs, valid
settings are int and uuid4, None is default
--index_name Name of the index to store your messages
(default test_data)
--index_type Type (default test_type)
--num_of_replicas Number of replicas for ES index (default 0)
--num_of_shards Number of shards for ES index (default 2)
--out_file If set, write test data to out_file as well.
(default False)
--set_refresh Set refresh rate to -1 before starting the
upload (default False)
in particular, it says you should use --index_name
to provide the name of the index you want to populate.
from elasticsearch-test-data.
Just did a few test runs and set the index value to a couple of different things and it worked as expected so closing this. Feel free to re-open or comment if you still have issues.
PS: ran it like this ./es_test_data.py --es_url=http://127.0.0.1:9200 --count=100 --index_name=test123
from elasticsearch-test-data.
Thanks. Looks like with one typo in one option, all settings were ignored without any error. Managed to fix that but on sustained load generation, it hits following error and crashes. It is very quick when document size is large > 2kb and batch size > 1000. Any tuning suggestion/ config changes required to avoid crashes midway ?
File "/usr/lib64/python2.6/site-packages/tornado/ioloop.py", line 453, in run_sync
return future_cell[0].result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 198, in generate_test_data
yield upload_batch(upload_data_txt)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
response = yield async_http_client.fetch(request)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
from elasticsearch-test-data.
What's the error? Looks like you're missing a few lines from the stack trace. And anything useful in the ES node logs?
from elasticsearch-test-data.
Here is full error on python node. ES nodes nothing in error logs. ES Cluster in green state. It tried increasing time out value in the script to 600 from 240 but with no luck.
Thanks
[I 151130 15:47:38 es_test_data:181] Generating 100000 docs, upload batch size is 5000
Traceback (most recent call last):
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 235, in
tornado.ioloop.IOLoop.instance().run_sync(generate_test_data)
File "/usr/lib64/python2.6/site-packages/tornado/ioloop.py", line 453, in run_sync
return future_cell[0].result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 198, in generate_test_data
yield upload_batch(upload_data_txt)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(_exc_info)
File "/JMETER/elasticsearch-test-data-master/es_test_data.py", line 53, in upload_batch
response = yield async_http_client.fetch(request)
File "/usr/lib64/python2.6/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/usr/lib64/python2.6/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "", line 3, in raise_exc_info
tornado.httpclient.HTTPError: HTTP 599: Timeout
from elasticsearch-test-data.
What's the URL of your ES server? Do you need to overwrite the default settings (http://localhost:9200)?
Can you post the command you ran and if eg. the index creation succeeded?
from elasticsearch-test-data.
I am using proxy to connect to ES client nodes. Neither proxy or client node show any bottleneck in terms of sockets, files cpu or memory bottleneck. I am running python script from multiple nodes in parallel but inserting to different “type” of the index.
Here is the starting output.
root@cosjmeter elasticsearch-test-data-master]# [I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert
[I 151130 15:47:38 es_test_data:45] Guess the index exists already
[I 151130 15:47:38 es_test_data:181] Generating 100000 docs, upload batch size is 5000
[I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert
[I 151130 15:47:38 es_test_data:40] Trying to create index http://haproxy:80/test_insert
from elasticsearch-test-data.
Does it ever successfully upload a batch? You could try lowering the batch size.
Does a curl to haproxy:80 work? Maybe try one process at a time only.
And why does it show the line "Trying to create index" three times? Are there still old, stale processes running?
from elasticsearch-test-data.
Batch size 500 ran ok. Connection to haproxy work well with all nodes. Multiple lines are because of same scripts launched on multiple nodes using ssh command to start script from different nodes. Command issued is as below from a loop on the controller node.
ssh cosjmeterprintf %02d ${i}
" nohup python /JMETER/elasticsearch-test-data-master/es_test_data.py --es_url=http://haproxy:80 --index_name=test_insert --index_type=${HOST} --num_of_shards=${SHARDS} --num_of_replicas=${REPLICAS} --count=${DOC_COUNT} --batch_size=${BATCH_SIZE} --format=name:str:1:30,time:ts,recno:int:0:1000000,field1:str:512:513,field2:str:512:513,field3:str:512:513,field4:words:25:100,num1:int:100:10000 num2:int:20000:200000 " &
from elasticsearch-test-data.
Cool, looks like your ES node can't handle larger batches but sounds like the smaller batch size setting addressed that.
from elasticsearch-test-data.
Cause of exit seems to be low time out in line below. Increased the time out and now batch size of 1000 running ok.
request = tornado.httpclient.HTTPRequest(tornado.options.options.es_url + "/_bulk", method="POST
", body=upload_data_txt, request_timeout=3)
from elasticsearch-test-data.
An these are observed sample upload times with 5000 batchsize.
[I 151201 09:44:03 es_test_data:60] Upload: OK - upload took: 1350ms, total docs uploaded: 140000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 6147ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 6050ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 5726ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 3147ms, total docs uploaded: 145000
[I 151201 09:44:05 es_test_data:60] Upload: OK - upload took: 4563ms, total docs uploaded: 145000
[I 151201 09:44:16 es_test_data:60] Upload: OK - upload took: 3193ms, total docs uploaded: 160000
[I 151201 09:44:17 es_test_data:60] Upload: OK - upload took: 4081ms, total docs uploaded: 165000
[I 151201 09:44:17 es_test_data:60] Upload: OK - upload took: 3929ms, total docs uploaded: 160000
[I 151201 09:44:16 es_test_data:60] Upload: OK - upload took: 4389ms, total docs uploaded: 165000
[I 151201 09:44:23 es_test_data:60] Upload: OK - upload took: 5294ms, total docs uploaded: 150000
[I 151201 09:44:24 es_test_data:60] Upload: OK - upload took: 5211ms, total docs uploaded: 150000
[I 151201 09:44:23 es_test_data:60] Upload: OK - upload took: 3973ms, total docs uploaded: 150000
[I 151201 09:44:24 es_test_data:60] Upload: OK - upload took: 5016ms, total docs uploaded: 150000
from elasticsearch-test-data.
Looks good!
from elasticsearch-test-data.
Related Issues (17)
- Delete index not working HOT 1
- seems like that the u cant spec field names with a '.'
- RuntimeError at newer tornado versions HOT 1
- New feature requests
- RuntimeError: Cannot run the event loop while another loop is running HOT 1
- RuntimeError: Cannot run the event loop while another loop is running HOT 3
- Added a Dockerfile HOT 2
- Cannot find teh way how to use the validate_cert option...
- Index options not picked up HOT 5
- Request Enhancements for better performance and stress testing. HOT 4
- Upload: FAILED HOT 2
- can't connect to es HOT 2
- generate failed HOT 2
- upload failed, error: stream closed
- Error with default options against local cluster HOT 2
- Add validate_cert option to command line to allow using insecure https endpoint HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-test-data.