<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Here it is: <div class="snippet-clipboard-content notranslate position-relative ov

benchmark's conf about columnar HOT 10 CLOSED

manticoresoftware commented on August 12, 2024

benchmark's conf

from columnar.

Comments (10)

sanikolaev commented on August 12, 2024

Hi. What exactly config file do you mean?

from columnar.

sangensong commented on August 12, 2024

thanks, I mean manticore.conf

from columnar.

sanikolaev commented on August 12, 2024

For what exactly benchmark? They are slightly different.

from columnar.

sangensong commented on August 12, 2024

Emm.Dataset about nginx.
Elasticsearch vs Manticore with 1500MB RAM limit - Elasticsearch is 5.68x slower than Manticore Columnar Library:

from columnar.

sanikolaev commented on August 12, 2024

Here it is:

source logs116m
{
        type = csvpipe
        csvpipe_command = cat /input/data.csv

        csvpipe_field = remote_addr
        csvpipe_field = remote_user
        csvpipe_attr_uint = runtime
        csvpipe_attr_timestamp = time_local
        csvpipe_field = request_type
        csvpipe_field_string = request_path
        csvpipe_field = request_protocol
        csvpipe_attr_uint = status
        csvpipe_attr_uint = size
        csvpipe_field = referer
        csvpipe_field = usearagent
}

index logs116m
{
        path = /var/lib/manticore/logs116m.0.9.9
        source = logs116m
        min_infix_len = 2
	columnar_attrs = id, remote_addr, remote_user, request_type, request_protocol, referer,  runtime, status, size, usearagent, request_path
	stored_fields = remote_addr, remote_user, request_type, referer, usearagent, request_protocol
}


searchd
{
        listen = 9306:mysql
        listen = 9308:http
        log = /var/log/manticore/searchd.log
        pid_file = /var/run/manticore/searchd.pid
        binlog_path =
        qcache_max_bytes = 0

        access_plain_attrs = mmap
        access_blob_attrs = mmap

}

from columnar.

sangensong commented on August 12, 2024

thank you very much

from columnar.

sanikolaev commented on August 12, 2024

No problem. Let me know if I can help with anything else. If you have a chance to benchmark or test the columnar library too I'll appreciate if you share your results here or at [email protected].

from columnar.

sangensong commented on August 12, 2024

I'm sorry to reopen the issue. Can you give me the config file. about data "hacker_news_comments.csv", the conclusion is
"Elasticsearch vs Manticore with 1024MB RAM limit - Elasticsearch is 6.51x slower"

from columnar.

sanikolaev commented on August 12, 2024


source hn_small
{
        type = csvpipe
        csvpipe_command = cat /input/data.csv
        csvpipe_attr_uint = story_id
        csvpipe_field = story_text
        csvpipe_field_string = story_author
        csvpipe_attr_uint = comment_id
        csvpipe_field = comment_text
        csvpipe_field_string = comment_author
        csvpipe_attr_uint = comment_ranking
        csvpipe_attr_uint = author_comment_count
        csvpipe_attr_uint = story_comment_count
}

index hn_small
{
        path = /var/lib/manticore/hn_small.0.9.9
        source = hn_small
        min_infix_len = 2

	columnar_attrs = id, story_id, comment_id, comment_ranking, author_comment_count, story_comment_count, story_author, comment_author

        stored_fields = story_text, comment_text
}

index fake
{
        type = rt
        path = /var/lib/manticore/fake
        rt_field = f
}


searchd
{
	listen = 9306:mysql41

	listen = 9308:http

        log = /var/log/manticore/searchd.log
        pid_file = /var/run/manticore/searchd.pid
        binlog_path =
	qcache_max_bytes = 0

        access_plain_attrs = mmap
	access_blob_attrs = mmap

}

If you want to reproduce the benchmark here's more details:

Elasticsearch init:

/usr/share/logstash/bin/logstash -f $PWD/../$test/logstash.conf --pipeline.batch.size=2000 --pipeline.workers=4
curl -XPOST "localhost:9200/$test/_forcemerge?max_num_segments=1"
curl -X PUT "localhost:9200/$test/_settings?pretty" -H 'Content-Type: application/json' -d' { "index" : { "number_of_replicas" : 0 }}'

Logstash config:

input {
    file {
        codec => multiline {
                pattern => "^\"\d+\",\"\d+\","
                negate => "true"
                what => "previous"
        }
        path => ["${PWD}/data/data.csv"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        mode => "read"
        exit_after_read => "true"
        file_completed_action => "log"
        file_completed_log_path => "/dev/null"
    }
}

filter {
    csv {
        separator => ","
        skip_header => "true"
        columns => [
                "id",
                "story_id",
                "story_text",
                "story_author",
                "comment_id",
                "comment_text",
                "comment_author",
                "comment_ranking",
                "author_comment_count",
                "story_comment_count"
        ]
    }
    mutate {
        remove_field => ["path", "host", "message", "@version", "@timestamp", "id"]
    }

}

output {
    elasticsearch {
        template => "${PWD}/template.json"
        template_overwrite => true
        hosts => ["127.0.0.1:9200"]
        index => "${test}"
    }
}

logstash template:

{
    "index_patterns" : "hn_small",
    "settings": {
      "number_of_replicas": 0,
      "number_of_shards": 1,
      "analysis": {
        "analyzer": "simple"
      },
      "index.max_result_window" : "100000",
      "index.queries.cache.enabled": false
    },
    "mappings": {
        "_source": {
          "enabled": true
        },
        "properties": {
           "story_id": {"type": "integer"},
           "story_text": {"type": "text"},
           "story_author": {"type": "text", "fields": {"raw": {"type":"keyword"}}},
           "comment_id": {"type": "integer"},
           "comment_text": {"type": "text"},
           "comment_author": {"type": "text", "fields": {"raw": {"type":"keyword"}}},
           "comment_ranking": {"type": "integer"},
           "author_comment_count": {"type": "integer"},
           "story_comment_count": {"type": "integer"}
        }
   }
}

csv preparation:

[ ! -f "/data/downloaded.csv" ] && wget https://zenodo.org/record/45901/files/hacker_news_comments.csv?download=1 -O /data/downloaded.csv
echo "Cleaning";
cat /data/downloaded.csv | tr -cd '\11\12\15\40-\176' > /data/cleaned.csv
echo "Multiplying"
for n in `seq 1 1`; do echo $n; cat /data/cleaned.csv >> /data/multiplied.csv; done;
rm /data/cleaned.csv
echo "Preparing"
rm /data/data.csv 2>/dev/null
csvcut -e utf-8 -l -c 0,3,4,5,6,7,8,9,10 -z 1073741824 /data/multiplied.csv|grep -v author_comment_count|csvformat -U1 -z 1073741824 > /data/data.csv
rm /data/multiplied.csv

indexation in Manticore is much simpler:

indexer -c /path/to/manticore.conf --all

from columnar.

sangensong commented on August 12, 2024

Ok.Thanks for you reply

from columnar.

benchmark's conf about columnar HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs