Comments (10)
Hi. What exactly config file do you mean?
from columnar.
thanks, I mean manticore.conf
from columnar.
For what exactly benchmark? They are slightly different.
from columnar.
Emm.Dataset about nginx.
Elasticsearch vs Manticore with 1500MB RAM limit - Elasticsearch is 5.68x slower than Manticore Columnar Library:
from columnar.
Here it is:
source logs116m
{
type = csvpipe
csvpipe_command = cat /input/data.csv
csvpipe_field = remote_addr
csvpipe_field = remote_user
csvpipe_attr_uint = runtime
csvpipe_attr_timestamp = time_local
csvpipe_field = request_type
csvpipe_field_string = request_path
csvpipe_field = request_protocol
csvpipe_attr_uint = status
csvpipe_attr_uint = size
csvpipe_field = referer
csvpipe_field = usearagent
}
index logs116m
{
path = /var/lib/manticore/logs116m.0.9.9
source = logs116m
min_infix_len = 2
columnar_attrs = id, remote_addr, remote_user, request_type, request_protocol, referer, runtime, status, size, usearagent, request_path
stored_fields = remote_addr, remote_user, request_type, referer, usearagent, request_protocol
}
searchd
{
listen = 9306:mysql
listen = 9308:http
log = /var/log/manticore/searchd.log
pid_file = /var/run/manticore/searchd.pid
binlog_path =
qcache_max_bytes = 0
access_plain_attrs = mmap
access_blob_attrs = mmap
}
from columnar.
thank you very much
from columnar.
No problem. Let me know if I can help with anything else. If you have a chance to benchmark or test the columnar library too I'll appreciate if you share your results here or at [email protected].
from columnar.
I'm sorry to reopen the issue. Can you give me the config file. about data "hacker_news_comments.csv", the conclusion is
"Elasticsearch vs Manticore with 1024MB RAM limit - Elasticsearch is 6.51x slower"
from columnar.
source hn_small
{
type = csvpipe
csvpipe_command = cat /input/data.csv
csvpipe_attr_uint = story_id
csvpipe_field = story_text
csvpipe_field_string = story_author
csvpipe_attr_uint = comment_id
csvpipe_field = comment_text
csvpipe_field_string = comment_author
csvpipe_attr_uint = comment_ranking
csvpipe_attr_uint = author_comment_count
csvpipe_attr_uint = story_comment_count
}
index hn_small
{
path = /var/lib/manticore/hn_small.0.9.9
source = hn_small
min_infix_len = 2
columnar_attrs = id, story_id, comment_id, comment_ranking, author_comment_count, story_comment_count, story_author, comment_author
stored_fields = story_text, comment_text
}
index fake
{
type = rt
path = /var/lib/manticore/fake
rt_field = f
}
searchd
{
listen = 9306:mysql41
listen = 9308:http
log = /var/log/manticore/searchd.log
pid_file = /var/run/manticore/searchd.pid
binlog_path =
qcache_max_bytes = 0
access_plain_attrs = mmap
access_blob_attrs = mmap
}
If you want to reproduce the benchmark here's more details:
Elasticsearch init:
/usr/share/logstash/bin/logstash -f $PWD/../$test/logstash.conf --pipeline.batch.size=2000 --pipeline.workers=4
curl -XPOST "localhost:9200/$test/_forcemerge?max_num_segments=1"
curl -X PUT "localhost:9200/$test/_settings?pretty" -H 'Content-Type: application/json' -d' { "index" : { "number_of_replicas" : 0 }}'
Logstash config:
input {
file {
codec => multiline {
pattern => "^\"\d+\",\"\d+\","
negate => "true"
what => "previous"
}
path => ["${PWD}/data/data.csv"]
start_position => "beginning"
sincedb_path => "/dev/null"
mode => "read"
exit_after_read => "true"
file_completed_action => "log"
file_completed_log_path => "/dev/null"
}
}
filter {
csv {
separator => ","
skip_header => "true"
columns => [
"id",
"story_id",
"story_text",
"story_author",
"comment_id",
"comment_text",
"comment_author",
"comment_ranking",
"author_comment_count",
"story_comment_count"
]
}
mutate {
remove_field => ["path", "host", "message", "@version", "@timestamp", "id"]
}
}
output {
elasticsearch {
template => "${PWD}/template.json"
template_overwrite => true
hosts => ["127.0.0.1:9200"]
index => "${test}"
}
}
logstash template:
{
"index_patterns" : "hn_small",
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": "simple"
},
"index.max_result_window" : "100000",
"index.queries.cache.enabled": false
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"story_id": {"type": "integer"},
"story_text": {"type": "text"},
"story_author": {"type": "text", "fields": {"raw": {"type":"keyword"}}},
"comment_id": {"type": "integer"},
"comment_text": {"type": "text"},
"comment_author": {"type": "text", "fields": {"raw": {"type":"keyword"}}},
"comment_ranking": {"type": "integer"},
"author_comment_count": {"type": "integer"},
"story_comment_count": {"type": "integer"}
}
}
}
csv preparation:
[ ! -f "/data/downloaded.csv" ] && wget https://zenodo.org/record/45901/files/hacker_news_comments.csv?download=1 -O /data/downloaded.csv
echo "Cleaning";
cat /data/downloaded.csv | tr -cd '\11\12\15\40-\176' > /data/cleaned.csv
echo "Multiplying"
for n in `seq 1 1`; do echo $n; cat /data/cleaned.csv >> /data/multiplied.csv; done;
rm /data/cleaned.csv
echo "Preparing"
rm /data/data.csv 2>/dev/null
csvcut -e utf-8 -l -c 0,3,4,5,6,7,8,9,10 -z 1073741824 /data/multiplied.csv|grep -v author_comment_count|csvformat -U1 -z 1073741824 > /data/data.csv
rm /data/multiplied.csv
indexation in Manticore is much simpler:
indexer -c /path/to/manticore.conf --all
from columnar.
Ok.Thanks for you reply
from columnar.
Related Issues (20)
- Crash on loading index. HOT 2
- RANDOM CRASH HOT 2
- Improve filtering performance by using SIMD predication HOT 1
- SI affects result set HOT 2
- failed tests at c6dbbc HOT 1
- Can publish to homebrew? HOT 2
- columnar: integrate streamvbyte library HOT 2
- crash after 709b9aca HOT 8
- Update deps in the daemon after CI HOT 1
- int -> bigint ALTER
- partial results with SI on HOT 1
- CI failed at da216d8a HOT 1
- mac MCL package doesn't include the knn lib HOT 2
- incorporate autotag into CI HOT 3
- Try Annoy instead of HNSW
- String comparison is not working HOT 1
- latest tests failures HOT 1
- Add possibility to set EF on every search for vector field HOT 6
- distributed index can not handle knn queries HOT 1
- Precision issue while querying the vector on the Qdrant vector benchmark test HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from columnar.