risinglightdb / risinglight Goto Github PK

An educational OLAP database system.

License: Apache License 2.0

Rust 99.76% Makefile 0.13% Shell 0.09% Dockerfile 0.01%

sql rust database olap education analytics embedded-database

risinglight's Introduction

RisingLight

RisingLight is an OLAP database system for educational purpose. It is still in rapid development, and should not be used in production.

Quick Start

Currently, RisingLight only supports Linux or macOS. If you are familiar with the Rust programming language, you can start an interactive shell with:

cargo run           # start in debug mode
cargo run --release # or start in release mode

If you meet with any build issues, see Install, Run, and Develop RisingLight for more information. We provide step-by-step guide on how to compile and run RisingLight from scratch.

After successfully building RisingLight, you may import some data and run SQL queries. See Running TPC-H Queries.

Documentation

All documentation can be found in docs folder.

At the same time, dev docs are also available in make docs (latest) or crates.io (stable).

License

RisingLight is under the Apache 2.0 license. See the LICENSE file for details.

Community

Governance

See GOVERNANCE for more information.

Communication

The main communication channel for RisingLight developers is GitHub Discussions.

Other Messaging Apps

If you want to join our active communication group in messaging apps including Discord, Telegram, and WeChat, please send an email to contact at risingwave-labs.com with your user ID. We will then manually invite you to the group.

Contributing

If you have a bug report or feature request, welcome to open an issue.

If you have any question to discuss, welcome to start a discussion on GitHub Discussions.

If you want to contribute code, see CONTRIBUTING for more information. Generally, you will need to pass necessary checks for your changes and sign DCO before submitting PRs. We have plenty of good first issues. Feel free to ask questions either on GitHub or in our chat groups if you meet any difficulty.

Acknowledgement

The RisingLight project was initiated by a group of college students who have special interests in developing database systems using modern programming technologies. The project is generously sponsored by RisingWave Labs, a startup innovating the next-generation database systems. RisingWave Labs is hiring top talents globally to build a cloud-native streaming database from scratch. If interested, please send your CV to hr at risingwave-labs.com .

Welcome to the RisingLight community!

risinglight's People

Contributors

Stargazers

Watchers

Forkers

skyzh pleiadesian ardxwe rapiz1 xuanwo sleepingpirate7 clslaid 3aceshowhand ludics yapple jyz0309 harryline-996 qiyuzhuang benjaminxiang tkoniy rouzip veeupup fatelei xiamengyu jacke guntermueller longfangsong kshvakov jinof jingshanglu junli1026 yuuch zbtzbtzbt hawkingrei isgasho jmpotato st1page st1page-graduation-design sohardforaname dongzl liuyuhui rishikumarray fedomn zzl200012 devops-forked tabversion arkbriar arbersephirotheca jackwener nanderstabel georgkreuzmayr xiaoyong-z wenym1 xuyifangreeneyes sarvex windowsxp-beta d2lark db-extreme jayicez platoneko rtenzyme xiejiann yisaer hengm3467 del-zhenwu sundy-li watch-later baymaxhwy ezreal1997 zhangwentai kwannoel ffanyq leozki strikew adlternative y7n05h stenicholas ramseyxu jess-x ptbxzrt djsczhu eliasyaoyc kikkon issac-newton ricardo-charles fkuner chowc lokax ted-jiang hushui502 yuzi-neko tencode yaoxiao1 yiyi-philosophy noneback infdahai litone01 cnutshell xinchengxx andypeng2015 apachecn ray1888 gun9nir cadl code4jy

risinglight's Issues

Proposal: pin toolchain to `nightly-2021-09-10`

storage: table reader

storage: auto split memtable

It is possible that users ingest a large amount of data into the engine, so we need to periodically flush memtable to disk.

add interface to get current memtable estimated on-disk size
add option in StorageOptions, like target_rowset_size
split memtable in txn
support flush in the background
support apply add_rowset with multiple tables

storage: flush memtable to disk

executor: implement simple aggregator

executor: show progress when copy from a file

Importing a file might take some time to complete. It's good to show the progress.

planner: convert `ColumnRef` to `InputRef` in physical planner

After supporting InputRef, we can implement following queries:

select avg(a) from t
select sum(a) + sum(b) from t

Tracking: support inner join

storage: simple point deletion support

Tracking: Merge-Tree-based Storage Engine

Tracking: Storage Engine Stage 2

storage: in-memory store

Tracking: Efficient In-Memory Representation of Data

Currently, the in-memory representation of RisingLight's data is simply a vector of data chunks. When doing updates and deletions, this could be highly inefficient. We should find a way to optimize this.

Tracking: Implement Delete executor

ci: report code coverage

executor: refine HashAgg

#69 (comment)
As mentioned earlier, constructing a visibility bitmap for every unique group key incurs high time and space complexity. Instead, if we use the row-by-row update in the current implementation, we can avoid the cost from bitmap construction.

Originally posted by @pleiadesian in #69 (comment)

executor: auto split csv input

Currently, everything is ingested in a single batch. We should ingest and yield DataChunk little by little.

planner: what does each stage do?

I feel that there might be something wrong in my implementation. For example, when I was implementing sorted scan, the procedure of selecting whether to use sorted scan in SeqScanExecutor in logical planner. #121

When I was discussing with @pleiadesian on where to generate InputRef from ColumnRef, I also felt hard to determine where to do this step.

Is there any spec or general convention about which stage should do what?

cc @MingjiHan99 @wangrunji0408

storage: char encoding

block builder #116
column builder #128
block iterator #136
column iterator #136

storage: simple table builder

storage: remove I/O operation in `RowSetBuilder`

... and we need a new RowSetWriter to write what's inside builder to the disk.

storage: catalog persistence

catalog: internal table support

It would be very helpful if we could know internal states of our storage engine using SQL queries. e.g.

SELECT rowset_id, size FROM internal.storage.rowsets WHERE table_id = 1;

Tracking: Unify Storage Interface

As discussed in https://singularity-data.larksuite.com/docs/docusneKe7PxHGG96UrUPwqZo6e, we plan to use a unified trait for all storage engines.

Add trait
Migrate current memory table to the new storage trait
Migrate executor framework to use the new storage trait

planner: pretty print plan tree

We need a pretty (and fancy) print of plan tree to help developing and debugging for optimizers.

Examples

DuckDB

D explain select v2 from t where 3 > v1;
┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             v2            │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             t             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             v2            │
│             v1            │
└───────────────────────────┘

Databend:

Projection: ((1 + 2) + 3):UInt32
  Expression: 6:UInt32 (Before Projection)
    ReadDataSource: scan partitions: [1], scan schema: [dummy:UInt8], statistics: [read_rows: 1, read_bytes: 1]

storage: in-memory row format (and sort key support)

discussion: which I/O API to use?

Now as we are using async everywhere in our system, we need to select a File I/O API to use. We need to be able to do positioned read, which is not supported by tokio. We have several choices.

Manage a I/O thread pool, which calls Rust std's read_at, and sends the result through channels. https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.html
Use https://github.com/tokio-rs/tokio-uring
Use https://github.com/hmwill/tokio-linux-aio
Use Mutex over Tokio's File, and therefore we can never concurrently read several blocks from a single file.

cli: use on-disk engine by default

The only blocker might be #96, and I'll implement this soon. After that, almost all queries can be run on disk engine, and we can find bugs in the storage engine prior to our release.

storage: support multiple storage backends

Currently, RisingLight cannot be compiled or run on Windows, because we are using ReadAt extension of UNIX. As our students might use Windows as their development environment, we need to add new storage backends (for reading).

Basically, all reads are handled in Column structure https://github.com/singularity-data/risinglight/blob/main/src/storage/secondary/column.rs

It's better to change file: Arc<std::fs::File> into an enum, e.g.

pub struct ColumnReadableFile {
  /// For `read_at`
  #[cfg(unix)]
  PositionedRead(Arc<std::fs::File>),
  /// For `file.lock().seek().read()`
  NormalRead(Arc<Mutex<tokio::fs::File>>),
  // In the future, we can even add minio / S3 file backend
}

And we should refactor the whole code path to use ColumnReadableFile instead of Arc<std::fs::File> throughout the storage system.

storage: crash recovery and persistence

refactor catalog #650
add checksum to manifest
cleanup unused files on startup #135
runtime vacuum #135

discussion: developer experience with `async_stream`

Developing with async_stream is unfriendly to developers. We never get suggestions from rust-analyzer, and the stacktrace is also hard to read.

  13: core::iter::traits::iterator::Iterator::for_each
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:727:9
  14: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_extend.rs:40:17
  15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter_nested.rs:56:9
  16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter.rs:33:9
  17: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/mod.rs:2485:9
  18: core::iter::traits::iterator::Iterator::collect
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:1739:9
  19: risinglight::array::data_chunk::DataChunk::get_row_by_idx
             at ./src/array/data_chunk.rs:46:9
  20: risinglight::executor::nested_loop_join::NestedLoopJoinExecutor::execute::{{closure}}
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/lib.rs:237:9
  21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/future/mod.rs:80:19
  22: <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/async_stream.rs:53:17
  23: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
  24: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
  25: <&mut S as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:104:9
  26: <async_stream::next::Next<S> as core::future::future::Future>::poll
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/next.rs:30:9
  27: risinglight::executor::projection::ProjectionExecutor::execute::{{closure}}
             at ./src/executor/projection.rs:13:9

The backtrace will never report the exact line where the panic happens -- the content of try_stream has been rewritten by the procedure macro!

Therefore, I propose manually expand the try_stream macro.

async_stream provides two utilities: a thread-local channel implementation to transfer the yield value from the stream to the caller, and an AsyncStream to synchronize between the stream generator the the receiver function. e.g., the InsertExecutor is expanded as follows:

mod insert {
    use super::*;
    use crate::array::DataChunk;
    use crate::catalog::TableRefId;
    use crate::storage::{Storage, Table, Transaction};
    use crate::types::ColumnId;
    use std::sync::Arc;
    /// The executor of `insert` statement.
    pub struct InsertExecutor<S: Storage> {
        pub table_ref_id: TableRefId,
        pub column_ids: Vec<ColumnId>,
        pub storage: Arc<S>,
        pub child: BoxedExecutor,
    }
    impl<S: Storage> InsertExecutor<S> {
        pub fn execute(self) -> impl Stream<Item = Result<DataChunk, ExecutorError>> {
            {
                let (mut __yield_tx, __yield_rx) = ::async_stream::yielder::pair();
                ::async_stream::AsyncStream::new(__yield_rx, async move {
                    let table = match self.storage.get_table(self.table_ref_id) {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    let mut txn = match table.write().await {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    {
                        let mut __pinned = self.child;
                        let mut __pinned =
                            unsafe { ::core::pin::Pin::new_unchecked(&mut __pinned) };
                        loop {
                            let chunk = match ::async_stream::reexport::next(&mut __pinned).await {
                                ::core::option::Option::Some(e) => e,
                                ::core::option::Option::None => break,
                            };
                            {
                                match txn
                                    .append(match chunk {
                                        ::core::result::Result::Ok(v) => v,
                                        ::core::result::Result::Err(e) => {
                                            __yield_tx
                                                .send(::core::result::Result::Err(e.into()))
                                                .await;
                                            return;
                                        }
                                    })
                                    .await
                                {
                                    ::core::result::Result::Ok(v) => v,
                                    ::core::result::Result::Err(e) => {
                                        __yield_tx
                                            .send(::core::result::Result::Err(e.into()))
                                            .await;
                                        return;
                                    }
                                };
                            }
                        }
                    }
                    match txn.commit().await {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    __yield_tx
                        .send(::core::result::Result::Ok(DataChunk::single()))
                        .await;
                })
            }
        }
    }
}

Which seems nearly identical with the original code.

There are two further problems to solve:

Internal implementation of async_stream is subject to change. Someday the AsyncStream struct might have a different functionality and different constructor, and we need to pin async_stream version to the exact version we want, instead of using semver.
Error handling is painful. We cannot simple write ?. However, we might use a custom macro to expand error handling to Ok => get value, Err => send message and return.

executor: TPC-H data generator

We can make a TPCHScanExecutor, which generates data of a TPCH table. e.g.

INSERT INTO my_table SELECT * FROM gen.tpch.xxxx

binder: add `primary key` constraint support

create table t(v1 int not null primary key, v2 int not null, v3 int not null) # supported
create table t(v1 int not null, v2 int not null, v3 int not null, primary key(v1)) # not supported

Migrate parser to `sqlparser`

sqlparser is a widely-used SQL parser crate in Rust.

Compared to the current postgres-parser:

👍 sqlparser is standalone, while postgres-parser depends on llvm and Postgres.
👍 sqlparser is more active and widely used. (799 vs 70 stars)
👍 sqlparser generates an elegant, well-documented AST, while postgres-parser generates a more verbose AST which needs additional transformation (~1.5k lines now).
🤔 postgres-parser is fully compatible with PG, but sqlparser is not. However this is not critical to an educational system.
🙈 postgres-parser has memory leak, which makes it totally unusable.

We plan to migrate our parser from postgres-parser to sqlparser.

storage: primitive null block (and column) iterator

doc: storage engine

I'll draft some detailed design doc about the storage engine, and add more docstring to our codebase.

execution: add full support of `select count(?)`

Currently, the select count(x) from table acts as counting row, which is not expected. And count(*) is simply not supported. For example,

> select * from t
+------+
| 1    |
| 2    |
| NULL |
+------+

> select count(v1) from t
+---+
| 3 |
+---+

# expected: 2

> select count(*) from t
thread 'main' panicked at 'not yet implemented: bind expression: Wildcard', src/binder/expression/mod.rs:85:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

# expected: 3

Therefore, we need to add support for full count support. This could be split into two steps:

Implement a Count aggregator in https://github.com/singularity-data/risinglight/tree/main/src/executor/aggregation
Replace current RowCount with Count in https://github.com/singularity-data/risinglight/blob/main/src/binder/expression/agg_call.rs, so that select count(x) will work correctly.
Read binder code and try adding back support for select count(*).

discussion: use `Arc<str>` instead of `String` as `UTF8Array`'s owned type

The overhead of clone will be greatly reduced.

TODO List for Stage 1

Parser:

Parsing arithmetic expressions (+, -, * and /) @wangrunji0408
Use sqlparser as new parser @wangrunji0408

Binder:

Add return names and types for select statement. @MingjiHan99
Binding arithmetic expressions (+, -, * and /) @MingjiHan99 @wangrunji0408
Add necessary implicit casting when binding the expressions. For example: 1.0 + 1 -> 1.0 + (1 cast as double) @wangrunji0408

Executor:

Implement expression evaluator and ProjectionExecutor.
@MingjiHan99 @wangrunji0408
Implement CreateTableExecutor, and InsertExecutor @MingjiHan99
@wangrunji0408

Storage:
Implement on-disk storage system:

Add base definitions for table storage (Segment and Block) @MingjiHan99
Add disk manager @MingjiHan99
Add buffer pool （Ongoing）

binder: `insert into .. select .. from ..` support

This is useful for batch writes and data import. We can later simply:

INSERT INTO my_table SELECT * FROM 'table.tbl'

array: use macro to generate match branches

As we are adding more and more functions, it seems now very tedious to have so many matching branches to statically dispatch functions. Maybe we can use the for_all_variants macro from RisingWave.

https://github.com/singularity-data/risingwave/blob/master/rust/common/src/array/mod.rs#L190

/// `for_all_variants` includes all variants of our array types. If you added a new array
/// type inside the project, be sure to add a variant here.
///
/// Every tuple has four elements, where
/// `{ enum variant name, function suffix name, array type, builder type }`
///
/// There are typically two ways of using this macro, pass token or pass no token.
/// See the following implementations for example.
#[macro_export]
macro_rules! for_all_variants {
  ($macro:tt $(, $x:tt)*) => {
    $macro! {
      [$($x),*],
      { Int16, int16, I16Array, I16ArrayBuilder },
      { Int32, int32, I32Array, I32ArrayBuilder },
      { Int64, int64, I64Array, I64ArrayBuilder },
      { Float32, float32, F32Array, F32ArrayBuilder },
      { Float64, float64, F64Array, F64ArrayBuilder },
      { UTF8, utf8, UTF8Array, UTF8ArrayBuilder },
      { Bool, bool, BoolArray, BoolArrayBuilder },
      { Decimal, decimal, DecimalArray, DecimalArrayBuilder },
      { Interval, interval, IntervalArray, IntervalArrayBuilder }
    }
  };
}

/// Define `ArrayImpl` with macro.
macro_rules! array_impl_enum {
  ([], $( { $variant_name:ident, $suffix_name:ident, $array:ty, $builder:ty } ),*) => {
    /// `ArrayImpl` embeds all possible array in `array` module.
    #[derive(Debug)]
    pub enum ArrayImpl {
      $( $variant_name($array) ),*
    }
  };
}

storage: add snapshot interface

add snapshot interface and refactor both in-memory and on-disk engine
add epoch manager

storage: benchmark and verification tool

As our SQL layer is missing some supports, and it's relatively not easy to turn our benchmarks into SQL queries, we plan to add a new tool like "secondary-bench" to benchmark the performance and verify our storage engine's correctness in a large dataset. It may be very similar to RocksDB's db_bench tool, with the following functionality:

secondary-bench filltable <table name> <schema>
secondary-bench scan <table name> <column>
secondary-bench sort-scan <table name> <column>
secondary-bench compact <table name>
etc.

TODO List for stage 2

Support aggregation function: Implement GlobalAggregationOperator and HashAggregationOperator @pleiadesian
- #55
- #54
Support where, order by, limit and offset
- #38
- #71
- #75
#49
- #41
- #44
#40
#48
Add optimizer framework and implement constant folding and predicate pushdown

storage: merge sort compaction

sorted rowset #113
merge iterator #123
planner support #121

create table t(v1 int not null, v2 int not null, v3 double not null)
insert into t values(1,4,2.5), (2,3,3.2), (3,4,4.7), (4,3,5.1)
select sum(v1+v2),sum(v1+v3) from t;
bind error: binary operator types mismatch: Int(None) != Double