GithubHelp home page GithubHelp logo

risinglightdb / risinglight Goto Github PK

View Code? Open in Web Editor NEW
1.5K 34.0 208.0 3.01 MB

An educational OLAP database system.

License: Apache License 2.0

Rust 99.76% Makefile 0.13% Shell 0.09% Dockerfile 0.01%
sql rust database olap education analytics embedded-database

risinglight's Issues

storage: benchmark and verification tool

As our SQL layer is missing some supports, and it's relatively not easy to turn our benchmarks into SQL queries, we plan to add a new tool like "secondary-bench" to benchmark the performance and verify our storage engine's correctness in a large dataset. It may be very similar to RocksDB's db_bench tool, with the following functionality:

  • secondary-bench filltable <table name> <schema>
  • secondary-bench scan <table name> <column>
  • secondary-bench sort-scan <table name> <column>
  • secondary-bench compact <table name>
  • etc.

binder: add `primary key` constraint support

create table t(v1 int not null primary key, v2 int not null, v3 int not null) # supported
create table t(v1 int not null, v2 int not null, v3 int not null, primary key(v1)) # not supported

discussion: which I/O API to use?

Now as we are using async everywhere in our system, we need to select a File I/O API to use. We need to be able to do positioned read, which is not supported by tokio. We have several choices.

storage: support multiple storage backends

Currently, RisingLight cannot be compiled or run on Windows, because we are using ReadAt extension of UNIX. As our students might use Windows as their development environment, we need to add new storage backends (for reading).

Basically, all reads are handled in Column structure https://github.com/singularity-data/risinglight/blob/main/src/storage/secondary/column.rs

It's better to change file: Arc<std::fs::File> into an enum, e.g.

pub struct ColumnReadableFile {
  /// For `read_at`
  #[cfg(unix)]
  PositionedRead(Arc<std::fs::File>),
  /// For `file.lock().seek().read()`
  NormalRead(Arc<Mutex<tokio::fs::File>>),
  // In the future, we can even add minio / S3 file backend
}

And we should refactor the whole code path to use ColumnReadableFile instead of Arc<std::fs::File> throughout the storage system.

executor: TPC-H data generator

We can make a TPCHScanExecutor, which generates data of a TPCH table. e.g.

INSERT INTO my_table SELECT * FROM gen.tpch.xxxx

Migrate parser to `sqlparser`

sqlparser is a widely-used SQL parser crate in Rust.

Compared to the current postgres-parser:

  • πŸ‘ sqlparser is standalone, while postgres-parser depends on llvm and Postgres.
  • πŸ‘ sqlparser is more active and widely used. (799 vs 70 stars)
  • πŸ‘ sqlparser generates an elegant, well-documented AST, while postgres-parser generates a more verbose AST which needs additional transformation (~1.5k lines now).
  • πŸ€” postgres-parser is fully compatible with PG, but sqlparser is not. However this is not critical to an educational system.
  • πŸ™ˆ postgres-parser has memory leak, which makes it totally unusable.

We plan to migrate our parser from postgres-parser to sqlparser.

execution: add full support of `select count(?)`

Currently, the select count(x) from table acts as counting row, which is not expected. And count(*) is simply not supported. For example,

> select * from t
+------+
| 1    |
| 2    |
| NULL |
+------+

> select count(v1) from t
+---+
| 3 |
+---+

# expected: 2

> select count(*) from t
thread 'main' panicked at 'not yet implemented: bind expression: Wildcard', src/binder/expression/mod.rs:85:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

# expected: 3

Therefore, we need to add support for full count support. This could be split into two steps:

storage: auto split memtable

It is possible that users ingest a large amount of data into the engine, so we need to periodically flush memtable to disk.

  • add interface to get current memtable estimated on-disk size
  • add option in StorageOptions, like target_rowset_size
  • split memtable in txn
  • support flush in the background
  • support apply add_rowset with multiple tables

cli: use on-disk engine by default

The only blocker might be #96, and I'll implement this soon. After that, almost all queries can be run on disk engine, and we can find bugs in the storage engine prior to our release.

planner: what does each stage do?

I feel that there might be something wrong in my implementation. For example, when I was implementing sorted scan, the procedure of selecting whether to use sorted scan in SeqScanExecutor in logical planner. #121

When I was discussing with @pleiadesian on where to generate InputRef from ColumnRef, I also felt hard to determine where to do this step.

Is there any spec or general convention about which stage should do what?

cc @MingjiHan99 @wangrunji0408

binder: insert implicit cast

create table t(v1 int not null, v2 int not null, v3 double not null)
insert into t values(1,4,2.5), (2,3,3.2), (3,4,4.7), (4,3,5.1)
select sum(v1+v2),sum(v1+v3) from t;
bind error: binary operator types mismatch: Int(None) != Double

TODO List for Stage 1

Parser:

Binder:

  • Add return names and types for select statement. @MingjiHan99
  • Binding arithmetic expressions (+, -, * and /) @MingjiHan99 @wangrunji0408
  • Add necessary implicit casting when binding the expressions. For example: 1.0 + 1 -> 1.0 + (1 cast as double) @wangrunji0408

Executor:

Storage:
Implement on-disk storage system:

  • Add base definitions for table storage (Segment and Block) @MingjiHan99
  • Add disk manager @MingjiHan99
  • Add buffer pool (OngoingοΌ‰

array: use macro to generate match branches

As we are adding more and more functions, it seems now very tedious to have so many matching branches to statically dispatch functions. Maybe we can use the for_all_variants macro from RisingWave.

https://github.com/singularity-data/risingwave/blob/master/rust/common/src/array/mod.rs#L190

/// `for_all_variants` includes all variants of our array types. If you added a new array
/// type inside the project, be sure to add a variant here.
///
/// Every tuple has four elements, where
/// `{ enum variant name, function suffix name, array type, builder type }`
///
/// There are typically two ways of using this macro, pass token or pass no token.
/// See the following implementations for example.
#[macro_export]
macro_rules! for_all_variants {
  ($macro:tt $(, $x:tt)*) => {
    $macro! {
      [$($x),*],
      { Int16, int16, I16Array, I16ArrayBuilder },
      { Int32, int32, I32Array, I32ArrayBuilder },
      { Int64, int64, I64Array, I64ArrayBuilder },
      { Float32, float32, F32Array, F32ArrayBuilder },
      { Float64, float64, F64Array, F64ArrayBuilder },
      { UTF8, utf8, UTF8Array, UTF8ArrayBuilder },
      { Bool, bool, BoolArray, BoolArrayBuilder },
      { Decimal, decimal, DecimalArray, DecimalArrayBuilder },
      { Interval, interval, IntervalArray, IntervalArrayBuilder }
    }
  };
}

/// Define `ArrayImpl` with macro.
macro_rules! array_impl_enum {
  ([], $( { $variant_name:ident, $suffix_name:ident, $array:ty, $builder:ty } ),*) => {
    /// `ArrayImpl` embeds all possible array in `array` module.
    #[derive(Debug)]
    pub enum ArrayImpl {
      $( $variant_name($array) ),*
    }
  };
}

discussion: developer experience with `async_stream`

Developing with async_stream is unfriendly to developers. We never get suggestions from rust-analyzer, and the stacktrace is also hard to read.

  13: core::iter::traits::iterator::Iterator::for_each
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:727:9
  14: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_extend.rs:40:17
  15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter_nested.rs:56:9
  16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter.rs:33:9
  17: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/mod.rs:2485:9
  18: core::iter::traits::iterator::Iterator::collect
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:1739:9
  19: risinglight::array::data_chunk::DataChunk::get_row_by_idx
             at ./src/array/data_chunk.rs:46:9
  20: risinglight::executor::nested_loop_join::NestedLoopJoinExecutor::execute::{{closure}}
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/lib.rs:237:9
  21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/future/mod.rs:80:19
  22: <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/async_stream.rs:53:17
  23: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
  24: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
  25: <&mut S as futures_core::stream::Stream>::poll_next
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:104:9
  26: <async_stream::next::Next<S> as core::future::future::Future>::poll
             at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/next.rs:30:9
  27: risinglight::executor::projection::ProjectionExecutor::execute::{{closure}}
             at ./src/executor/projection.rs:13:9

The backtrace will never report the exact line where the panic happens -- the content of try_stream has been rewritten by the procedure macro!

Therefore, I propose manually expand the try_stream macro.

async_stream provides two utilities: a thread-local channel implementation to transfer the yield value from the stream to the caller, and an AsyncStream to synchronize between the stream generator the the receiver function. e.g., the InsertExecutor is expanded as follows:

mod insert {
    use super::*;
    use crate::array::DataChunk;
    use crate::catalog::TableRefId;
    use crate::storage::{Storage, Table, Transaction};
    use crate::types::ColumnId;
    use std::sync::Arc;
    /// The executor of `insert` statement.
    pub struct InsertExecutor<S: Storage> {
        pub table_ref_id: TableRefId,
        pub column_ids: Vec<ColumnId>,
        pub storage: Arc<S>,
        pub child: BoxedExecutor,
    }
    impl<S: Storage> InsertExecutor<S> {
        pub fn execute(self) -> impl Stream<Item = Result<DataChunk, ExecutorError>> {
            {
                let (mut __yield_tx, __yield_rx) = ::async_stream::yielder::pair();
                ::async_stream::AsyncStream::new(__yield_rx, async move {
                    let table = match self.storage.get_table(self.table_ref_id) {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    let mut txn = match table.write().await {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    {
                        let mut __pinned = self.child;
                        let mut __pinned =
                            unsafe { ::core::pin::Pin::new_unchecked(&mut __pinned) };
                        loop {
                            let chunk = match ::async_stream::reexport::next(&mut __pinned).await {
                                ::core::option::Option::Some(e) => e,
                                ::core::option::Option::None => break,
                            };
                            {
                                match txn
                                    .append(match chunk {
                                        ::core::result::Result::Ok(v) => v,
                                        ::core::result::Result::Err(e) => {
                                            __yield_tx
                                                .send(::core::result::Result::Err(e.into()))
                                                .await;
                                            return;
                                        }
                                    })
                                    .await
                                {
                                    ::core::result::Result::Ok(v) => v,
                                    ::core::result::Result::Err(e) => {
                                        __yield_tx
                                            .send(::core::result::Result::Err(e.into()))
                                            .await;
                                        return;
                                    }
                                };
                            }
                        }
                    }
                    match txn.commit().await {
                        ::core::result::Result::Ok(v) => v,
                        ::core::result::Result::Err(e) => {
                            __yield_tx.send(::core::result::Result::Err(e.into())).await;
                            return;
                        }
                    };
                    __yield_tx
                        .send(::core::result::Result::Ok(DataChunk::single()))
                        .await;
                })
            }
        }
    }
}

Which seems nearly identical with the original code.

There are two further problems to solve:

  • Internal implementation of async_stream is subject to change. Someday the AsyncStream struct might have a different functionality and different constructor, and we need to pin async_stream version to the exact version we want, instead of using semver.
  • Error handling is painful. We cannot simple write ?. However, we might use a custom macro to expand error handling to Ok => get value, Err => send message and return.

doc: storage engine

I'll draft some detailed design doc about the storage engine, and add more docstring to our codebase.

executor: refine HashAgg

#69 (comment)
As mentioned earlier, constructing a visibility bitmap for every unique group key incurs high time and space complexity. Instead, if we use the row-by-row update in the current implementation, we can avoid the cost from bitmap construction.

Originally posted by @pleiadesian in #69 (comment)

catalog: internal table support

It would be very helpful if we could know internal states of our storage engine using SQL queries. e.g.

SELECT rowset_id, size FROM internal.storage.rowsets WHERE table_id = 1;

planner: pretty print plan tree

We need a pretty (and fancy) print of plan tree to help developing and debugging for optimizers.

Examples

DuckDB

D explain select v2 from t where 3 > v1;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚β”‚       Physical Plan       β”‚β”‚
β”‚β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         PROJECTION        β”‚
β”‚   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   β”‚
β”‚             v2            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          SEQ_SCAN         β”‚
β”‚   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   β”‚
β”‚             t             β”‚
β”‚   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   β”‚
β”‚             v2            β”‚
β”‚             v1            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Databend:

Projection: ((1 + 2) + 3):UInt32
  Expression: 6:UInt32 (Before Projection)
    ReadDataSource: scan partitions: [1], scan schema: [dummy:UInt8], statistics: [read_rows: 1, read_bytes: 1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.