Extremely-fast Interactive Big Data Analytics
Nebula is an extremely-fast end-to-end interactive big data analytics solution.
Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.
It can do much more than these:
- Extreme Fast Data Analytics Platform.
- Column Level Access Control Storage System.
- Distributed Cache Tier For Tabular Data.
Documents of design, internals and stories will be shared at project site.
- clone the repo:
git clone https://github.com/varchar-io/nebula.git
- run run.sh in source root:
cd nebula && ./run.sh
- explore nebula UI in browser:
http://localhost:8088
Please refer Developer Guide for building nebula from source code. Welcome to become a contributor.
Configure your data source from a permanent storage (file system) and run analytics on it. AWS S3, Azure Blob Storage are often used storage system with support of file formats like CSV, Parquet, ORC. These file formats and storage system are frequently used in modern big data ecosystems.
Connect Nebula to real-time data source such as Kafka with data formats in thrift or JSON, and do real-time data analytics.
Define a template in Nebula, and load data through Nebula API to allow data live for specific period. Run analytics on Nebula to serve queries in this ephemeral data's life time.
Highly break down input data into huge small data cubes living in Nebula nodes, usually a simple predicate (filter) will massively prune dowm data to scan for super low latency in your analytics.
Through the great projecct QuickJS, Nebula is able to support full ES6 programing through its simple UI code editor. Below is an snippet code that generates a pie charts for your SQL-like query code in JS.
On the page top, the demo video shows how nebula client SDK is used and tables and charts are generated in milliseconds!
// define an customized column
const colx = () => nebula.column("value") % 20;
nebula.apply("colx", nebula.Type.INT, colx);
// get a data set from data set stored in HTTPS or S3
nebula
.source("nebula.test")
.time("2020-08-16", "2020-08-26")
.select("colx", count("id"))
.where(and(gt("id", 5), eq("flag", true)))
.sortby(nebula.Sort.DESC)
.limit(10)
.run();