GithubHelp home page GithubHelp logo

tozhu / columnar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from manticoresoftware/columnar

0.0 2.0 0.0 5.58 MB

Manticore Columnar Library

License: Apache License 2.0

CMake 14.01% C++ 83.16% XSLT 2.28% PHP 0.49% Shell 0.07%

columnar's Introduction

Manticore Columnar Library

โš ๏ธ PLEASE NOTE: This library is currently in beta and should be used in production with caution! The library is being actively developed and data formats can and will be changed.

Manticore Columnar Library is a column-oriented storage library, aiming to provide decent performance with low memory footprint at big data volume. When used in combination with Manticore Search can be beneficial for those looking for:

  1. log analytics including rich free text search capabities (which is missing in e.g. Clickhouse - great tool for metrics analytics)
  2. faster / low resource consumption log/metrics analytics. Since the library and Manticore Search are both written in C++ with low optimizations in mind in many cases the performance / RAM consumption is better than in Lucene / SOLR / Elasticsearch
  3. running log / metric analytics in docker / kubernetes. Manticore Search + the library can work with as little as 30 megabytes of RAM which Elasticsearch / Clickhouse can't. It also starts in less than a second or few seconds in the worst case. Since the overhead is so little you can afford having more nodes of Manticore Search + the library than Elasticsearch. More nodes and quicker start means higher high availability and agility.
  4. powerful SQL for logs/metrics analytics and everything else Manticore Search can give you

Getting started

Installation from dev yum/apt repositories

Ubuntu, Debian:

sudo wget https://repo.manticoresearch.com/manticore-dev-repo.noarch.deb
sudo dpkg -i manticore-dev-repo.noarch.deb
sudo apt-key adv --fetch-keys 'http://repo.manticoresearch.com/GPG-KEY-manticore'
sudo apt update
sudo apt install manticore manticore-columnar-lib

Centos:

sudo yum install https://repo.manticoresearch.com/manticore-repo.noarch.rpm
sudo yum --enablerepo manticore-dev install manticore manticore-columnar-lib

searchd -v should include columnar x.y.z, e.g.:

root@srv# searchd -v
Manticore 3.6.1 70f08813c@210601 dev (columnar 1.0.1 583c36c@210528)

Basic usage:

  1. Add a plain index to Manticore - https://manual.manticoresearch.com/Creating_an_index/Local_indexes/Plain_index#Plain-index
  2. Add columnar_attrs = attr1, attr2, attr3, ..., attr4 to the plain index (section index). You can add id too.
  3. Build the index as usually - https://play.manticoresearch.com/mysql/

Benchmark "Hacker News comments"

Goal: compare Manticore Columnar Library + Manticore Search on mostly analytical queries with:

  1. Manticore Search with its traditional storage
  2. Elasticsearch version 7.9.1

Dataset: 1,165,439 Hacker News curated comments with numeric fields

Infrastructure:

  • Specially dedicated empty server with no noise load
  • CPU: 6*2 Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
  • RAM: 64GB
  • Storage: HDD (not SSD)
  • Software and test specifics:
    • docker in privileged mode
    • RAM limit with help of Linux cgroups
    • CPU limit to only 2 virtual cores (one physical core)
    • restarting each engine before each query, then running 10 queries one by one
    • dropping OS cache before each query
    • disabled query cache
    • capturing: slowest response time (i.e. cold OS cache) and avg(top 80% fastest) ("Fast avg", shown on the pictures)
    • one shard in Elasticsearch, one plain index in Manticore Search
    • no fine-tuning in either of the engines, just default settings + same field data types everywhere
    • heap size for Elasticsearch - 50% of RAM
    • if a query fails in either of the engines it's not accounted for in the total calculation
  • The RAM constraints are based on what Manticore Search traditional storage requires:
    • 30MB - ~1/3 of the minimum requirement for Manticore Search with the traditional storage for good performance in this case (89MB)
    • 100MB - enough for all the attributes (89MB) to be put in RAM
    • 1024MB - enough for all the index files (972MB) to be put in RAM

Results:

Elasticsearch vs Manticore with 30MB RAM limit - Elasticsearch failed on start

hn_small_es_ms_30MB

Elasticsearch vs Manticore with 100MB RAM limit - Elasticsearch failed on start

hn_small_es_ms_100MB

Elasticsearch vs Manticore with 1024MB RAM limit - Elasticsearch is 6.51x slower

hn_small_es_ms_1024MB

Manticore GA vs Manticore + Columnar with 30MB RAM limit - the columnar lib is 129.45x faster

hn_small_es_ms_1024MB

Manticore GA vs Manticore + Columnar with 100MB RAM limit - the columnar lib is 1.43x slower

hn_small_es_ms_1024MB

Manticore GA vs Manticore + Columnar with 1024MB RAM limit - the columnar lib is 1.43x slower

hn_small_es_ms_1024MB

Benchmark "116M nginx log records"

Goal: compare Manticore Columnar Library + Manticore Search vs Elasticsearch 7.9.1 on typical log analysis queries with.

Dataset: 116M docs generated with help of Nginx Log Generator like this:

docker pull kscarlett/nginx-log-generator
docker run -d -e "RATE=1000000" --name nginx-log-generator kscarlett/nginx-log-generator

Same infrastructure as in the previous benchmark.

  • The RAM constraints are based on what Manticore Search traditional storage requires:
    • 1500MB - ~1/3 of the minimum requirement for Manticore Search with the traditional storage for good performance in this case
    • 4400MB - enough for all the attributes to be put in RAM
    • 36000MB - enough for all the index files to be put in RAM

Elasticsearch vs Manticore with 1500MB RAM limit - Elasticsearch is 5.68x slower than Manticore Columnar Library:

hn_small_es_ms_1500MB

Elasticsearch vs Manticore with 4400MB RAM limit - Elasticsearch is 2.71x slower than Manticore Columnar Library:

hn_small_es_ms_4400MB

Elasticsearch vs Manticore with 46000MB RAM limit - Elasticsearch is 6.7x slower than Manticore Columnar Library:

hn_small_es_ms_36000MB

Work in progress

The benchmarks reveal some problems we are working on:

  • Secondary indexes. There's no secondary indexes in the library while in Elasticsearch every field is indexed by default. Hence worse performance on some queries that heavily depend on filtering performance.
  • Grouping by strings. Elasticsearch does it faster in the logs benchmark.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.