risinglightdb / risinglight Goto Github PK
View Code? Open in Web Editor NEWAn educational OLAP database system.
License: Apache License 2.0
An educational OLAP database system.
License: Apache License 2.0
We need a pretty (and fancy) print of plan tree to help developing and debugging for optimizers.
D explain select v2 from t where 3 > v1;
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ v2 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ t │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ v2 │
│ v1 │
└───────────────────────────┘
Projection: ((1 + 2) + 3):UInt32
Expression: 6:UInt32 (Before Projection)
ReadDataSource: scan partitions: [1], scan schema: [dummy:UInt8], statistics: [read_rows: 1, read_bytes: 1]
Developing with async_stream
is unfriendly to developers. We never get suggestions from rust-analyzer, and the stacktrace is also hard to read.
13: core::iter::traits::iterator::Iterator::for_each
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:727:9
14: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_extend.rs:40:17
15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter_nested.rs:56:9
16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/spec_from_iter.rs:33:9
17: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/alloc/src/vec/mod.rs:2485:9
18: core::iter::traits::iterator::Iterator::collect
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/iter/traits/iterator.rs:1739:9
19: risinglight::array::data_chunk::DataChunk::get_row_by_idx
at ./src/array/data_chunk.rs:46:9
20: risinglight::executor::nested_loop_join::NestedLoopJoinExecutor::execute::{{closure}}
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/lib.rs:237:9
21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/497ee321af3b8496eaccd7af7b437f18bab81abf/library/core/src/future/mod.rs:80:19
22: <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/async_stream.rs:53:17
23: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
24: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:120:9
25: <&mut S as futures_core::stream::Stream>::poll_next
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/futures-core-0.3.17/src/stream.rs:104:9
26: <async_stream::next::Next<S> as core::future::future::Future>::poll
at /home/skyzh/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/async-stream-0.3.2/src/next.rs:30:9
27: risinglight::executor::projection::ProjectionExecutor::execute::{{closure}}
at ./src/executor/projection.rs:13:9
The backtrace will never report the exact line where the panic happens -- the content of try_stream
has been rewritten by the procedure macro!
Therefore, I propose manually expand the try_stream
macro.
async_stream
provides two utilities: a thread-local channel implementation to transfer the yield
value from the stream to the caller, and an AsyncStream
to synchronize between the stream generator the the receiver function. e.g., the InsertExecutor
is expanded as follows:
mod insert {
use super::*;
use crate::array::DataChunk;
use crate::catalog::TableRefId;
use crate::storage::{Storage, Table, Transaction};
use crate::types::ColumnId;
use std::sync::Arc;
/// The executor of `insert` statement.
pub struct InsertExecutor<S: Storage> {
pub table_ref_id: TableRefId,
pub column_ids: Vec<ColumnId>,
pub storage: Arc<S>,
pub child: BoxedExecutor,
}
impl<S: Storage> InsertExecutor<S> {
pub fn execute(self) -> impl Stream<Item = Result<DataChunk, ExecutorError>> {
{
let (mut __yield_tx, __yield_rx) = ::async_stream::yielder::pair();
::async_stream::AsyncStream::new(__yield_rx, async move {
let table = match self.storage.get_table(self.table_ref_id) {
::core::result::Result::Ok(v) => v,
::core::result::Result::Err(e) => {
__yield_tx.send(::core::result::Result::Err(e.into())).await;
return;
}
};
let mut txn = match table.write().await {
::core::result::Result::Ok(v) => v,
::core::result::Result::Err(e) => {
__yield_tx.send(::core::result::Result::Err(e.into())).await;
return;
}
};
{
let mut __pinned = self.child;
let mut __pinned =
unsafe { ::core::pin::Pin::new_unchecked(&mut __pinned) };
loop {
let chunk = match ::async_stream::reexport::next(&mut __pinned).await {
::core::option::Option::Some(e) => e,
::core::option::Option::None => break,
};
{
match txn
.append(match chunk {
::core::result::Result::Ok(v) => v,
::core::result::Result::Err(e) => {
__yield_tx
.send(::core::result::Result::Err(e.into()))
.await;
return;
}
})
.await
{
::core::result::Result::Ok(v) => v,
::core::result::Result::Err(e) => {
__yield_tx
.send(::core::result::Result::Err(e.into()))
.await;
return;
}
};
}
}
}
match txn.commit().await {
::core::result::Result::Ok(v) => v,
::core::result::Result::Err(e) => {
__yield_tx.send(::core::result::Result::Err(e.into())).await;
return;
}
};
__yield_tx
.send(::core::result::Result::Ok(DataChunk::single()))
.await;
})
}
}
}
}
Which seems nearly identical with the original code.
There are two further problems to solve:
async_stream
is subject to change. Someday the AsyncStream
struct might have a different functionality and different constructor, and we need to pin async_stream
version to the exact version we want, instead of using semver.?
. However, we might use a custom macro to expand error handling to Ok => get value
, Err => send message and return
.The only blocker might be #96, and I'll implement this soon. After that, almost all queries can be run on disk engine, and we can find bugs in the storage engine prior to our release.
create table t(v1 int not null primary key, v2 int not null, v3 int not null) # supported
create table t(v1 int not null, v2 int not null, v3 int not null, primary key(v1)) # not supported
Currently, everything is ingested in a single batch. We should ingest and yield DataChunk little by little.
sqlparser
is a widely-used SQL parser crate in Rust.
Compared to the current postgres-parser
:
sqlparser
is standalone, while postgres-parser
depends on llvm
and Postgres.sqlparser
is more active and widely used. (799 vs 70 stars)sqlparser
generates an elegant, well-documented AST, while postgres-parser
generates a more verbose AST which needs additional transformation (~1.5k lines now).postgres-parser
is fully compatible with PG, but sqlparser
is not. However this is not critical to an educational system.postgres-parser
has memory leak, which makes it totally unusable.We plan to migrate our parser from postgres-parser
to sqlparser
.
We can make a TPCHScanExecutor
, which generates data of a TPCH table. e.g.
INSERT INTO my_table SELECT * FROM gen.tpch.xxxx
... and we need a new RowSetWriter
to write what's inside builder to the disk.
Parser:
Parsing arithmetic expressions (+
, -
, *
and /
) @wangrunji0408
Use sqlparser
as new parser @wangrunji0408
Binder:
+
, -
, *
and /
) @MingjiHan99 @wangrunji0408 1.0 + 1
-> 1.0 + (1 cast as double)
@wangrunji0408Executor:
ProjectionExecutor
.CreateTableExecutor
, and InsertExecutor
@MingjiHan99Storage:
Implement on-disk storage system:
Segment
and Block
) @MingjiHan99As we are adding more and more functions, it seems now very tedious to have so many matching branches to statically dispatch functions. Maybe we can use the for_all_variants
macro from RisingWave.
https://github.com/singularity-data/risingwave/blob/master/rust/common/src/array/mod.rs#L190
/// `for_all_variants` includes all variants of our array types. If you added a new array
/// type inside the project, be sure to add a variant here.
///
/// Every tuple has four elements, where
/// `{ enum variant name, function suffix name, array type, builder type }`
///
/// There are typically two ways of using this macro, pass token or pass no token.
/// See the following implementations for example.
#[macro_export]
macro_rules! for_all_variants {
($macro:tt $(, $x:tt)*) => {
$macro! {
[$($x),*],
{ Int16, int16, I16Array, I16ArrayBuilder },
{ Int32, int32, I32Array, I32ArrayBuilder },
{ Int64, int64, I64Array, I64ArrayBuilder },
{ Float32, float32, F32Array, F32ArrayBuilder },
{ Float64, float64, F64Array, F64ArrayBuilder },
{ UTF8, utf8, UTF8Array, UTF8ArrayBuilder },
{ Bool, bool, BoolArray, BoolArrayBuilder },
{ Decimal, decimal, DecimalArray, DecimalArrayBuilder },
{ Interval, interval, IntervalArray, IntervalArrayBuilder }
}
};
}
/// Define `ArrayImpl` with macro.
macro_rules! array_impl_enum {
([], $( { $variant_name:ident, $suffix_name:ident, $array:ty, $builder:ty } ),*) => {
/// `ArrayImpl` embeds all possible array in `array` module.
#[derive(Debug)]
pub enum ArrayImpl {
$( $variant_name($array) ),*
}
};
}
It would be very helpful if we could know internal states of our storage engine using SQL queries. e.g.
SELECT rowset_id, size FROM internal.storage.rowsets WHERE table_id = 1;
I'll draft some detailed design doc about the storage engine, and add more docstring to our codebase.
As our SQL layer is missing some supports, and it's relatively not easy to turn our benchmarks into SQL queries, we plan to add a new tool like "secondary-bench" to benchmark the performance and verify our storage engine's correctness in a large dataset. It may be very similar to RocksDB's db_bench tool, with the following functionality:
secondary-bench filltable <table name> <schema>
secondary-bench scan <table name> <column>
secondary-bench sort-scan <table name> <column>
secondary-bench compact <table name>
Now as we are using async everywhere in our system, we need to select a File I/O API to use. We need to be able to do positioned read, which is not supported by tokio. We have several choices.
read_at
, and sends the result through channels. https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.htmlMutex
over Tokio's File, and therefore we can never concurrently read several blocks from a single file.Currently, the select count(x) from table
acts as counting row, which is not expected. And count(*)
is simply not supported. For example,
> select * from t
+------+
| 1 |
| 2 |
| NULL |
+------+
> select count(v1) from t
+---+
| 3 |
+---+
# expected: 2
> select count(*) from t
thread 'main' panicked at 'not yet implemented: bind expression: Wildcard', src/binder/expression/mod.rs:85:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
# expected: 3
Therefore, we need to add support for full count
support. This could be split into two steps:
Count
aggregator in https://github.com/singularity-data/risinglight/tree/main/src/executor/aggregationRowCount
with Count
in https://github.com/singularity-data/risinglight/blob/main/src/binder/expression/agg_call.rs, so that select count(x)
will work correctly.select count(*)
.After supporting InputRef
, we can implement following queries:
select avg(a) from t
select sum(a) + sum(b) from t
I feel that there might be something wrong in my implementation. For example, when I was implementing sorted scan, the procedure of selecting whether to use sorted scan
in SeqScanExecutor
in logical planner. #121
When I was discussing with @pleiadesian on where to generate InputRef
from ColumnRef
, I also felt hard to determine where to do this step.
Is there any spec or general convention about which stage should do what?
The overhead of clone
will be greatly reduced.
Currently, RisingLight cannot be compiled or run on Windows, because we are using ReadAt
extension of UNIX. As our students might use Windows as their development environment, we need to add new storage backends (for reading).
Basically, all reads are handled in Column
structure https://github.com/singularity-data/risinglight/blob/main/src/storage/secondary/column.rs
It's better to change file: Arc<std::fs::File>
into an enum, e.g.
pub struct ColumnReadableFile {
/// For `read_at`
#[cfg(unix)]
PositionedRead(Arc<std::fs::File>),
/// For `file.lock().seek().read()`
NormalRead(Arc<Mutex<tokio::fs::File>>),
// In the future, we can even add minio / S3 file backend
}
And we should refactor the whole code path to use ColumnReadableFile
instead of Arc<std::fs::File>
throughout the storage system.
This is useful for batch writes and data import. We can later simply:
INSERT INTO my_table SELECT * FROM 'table.tbl'
As discussed in https://singularity-data.larksuite.com/docs/docusneKe7PxHGG96UrUPwqZo6e, we plan to use a unified trait for all storage engines.
Importing a file might take some time to complete. It's good to show the progress.
create table t(v1 int not null, v2 int not null, v3 double not null)
insert into t values(1,4,2.5), (2,3,3.2), (3,4,4.7), (4,3,5.1)
select sum(v1+v2),sum(v1+v3) from t;
bind error: binary operator types mismatch: Int(None) != Double
#69 (comment)
As mentioned earlier, constructing a visibility
bitmap for every unique group key incurs high time and space complexity. Instead, if we use the row-by-row update in the current implementation, we can avoid the cost from bitmap construction.
Originally posted by @pleiadesian in #69 (comment)
It is possible that users ingest a large amount of data into the engine, so we need to periodically flush memtable to disk.
StorageOptions
, like target_rowset_size
Currently, the in-memory representation of RisingLight's data is simply a vector of data chunks. When doing updates and deletions, this could be highly inefficient. We should find a way to optimize this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.