mit-pdos / noria Goto Github PK

View Code? Open in Web Editor NEW

5.0K 5.0K 242.0 82.51 MB

Fast web applications through dynamic, partially-stateful dataflow

License: Apache License 2.0

Rust 99.18% R 0.32% Shell 0.43% HTML 0.05% TSQL 0.02%

noria's People

Contributors

Stargazers

Watchers

Forkers

omegablitz ekmartin alexxnica kryndex neethubal fangzhou-lu samyuyagati alexrenda ahmedmrefaat silky arkadiuszsz 0xflotus strikew dantodor stephensong apsaltis jbelke mbrukman harksin vishwanath1306 graydon chrischiedo xet7 flaviofernandes004 mattgathu brianwitte blandger sstephant ivshti petrovsa influx6 hamedcse kuafou kolemannix alexsnaps huaizhengzhang justusadam adamnemecek onehr duzhanyuan simonrw thenomadlad jonathangb xiongzx isgasho sanjc alanamarzoev mdr223 dmitrybykovskiy jebrial marcelomata thetrident itsourway hlb8122 baitcenter rjammala aymanosman zhuguoliang zeling maning711 doumanash alexhaker wiltonlazary wsyou lucianoforks fx-kirin gyc567 kritikathuria samedhg oetyng tsal socioprophet mdheller zmilan readablesystems tawawhite amorskoy morristech pgoreutn siathalysedi ekiziv xuxue1 paularmand sinhasantos aug2uag khalefa savoryray zhutony devesh85 davnav patrik64 praisedpk adotinthevoid kailas-ilink taiki-e hadichaudhri longjohncoder supersonic0410 desaijay315 zhouqisz

noria's Issues

Some additional data type

I would like to request some additional data types:

Timezone aware timestamps.
Binary blobs.
Compound data types: allowing nested sub-objects: list of others objects or objects themselves. This would allow Noria to work with more document-oriented data where fields have sub-documents or arrays of values or other sub-documents. One could potentially represent this with highly-normalized tables, but then it would still be useful for one to be able to create views where you could aggregate values into lists, for example. like PostgreSQL array_agg function. What I would like is that my app's state represented in materialized views contain such arrays. I have found out that this is often much more efficient for many common cases like 1:N relations. In regular SQL many rows are duplicated (N rows for each 1 row) which makes all those values duplicated: both in memory an on-wire between server and client. Being able to represent that as an array is both more efficient and more natural.

Consistency between new write and materialized view

I learned about Noria from the TwoSigma talk and I find Noria extremely interesting and it can potentially for a great fit for my use case.

If I understand correctly, the materialized view (cache) is eventually consistent with the new writes but not atomically i.e. the re-calculation of the materialized view is being done async to the write operation (with the machine's best effort)? If this is the case, may I ask if every (transaction of) write operation will trigger a re-calc of the cache? Or the re-calc inside Noria actually has its own interval (if there is, how long?) to check if a re-calc is required i.e. doing it in its own pace - to avoid the "backlog" which can happen (e.g. peak-time) when the write operation is more frequent than the time it takes to do the re-calc of the materialized view?

I understand from another GitHub issue that Noria at the moment is still a research-prototype and probably not as mature as MySQL etc for a general production use case, but may I ask which subset of the system/feature is actually mature enough to use with production data? thank you very much!

Allow trading off consistency for performance

Currently, we "swap" the Reader after every batch we process. This ensures that views are always as fresh as we can make them (subject to write propagation delays), but also means we incur the cost of waiting for readers to exit the current map for every batch. Users would likely see significant speedups if we amortized the cost of swaps by doing them less often, though this would come at the cost of an increase in the delay before writes become visible.

This should probably be an option that is exposed to users, with sufficient warning labels about "eventualness". Care must also be taken to ensure that we eventually swap the maps, even if there are no more writes, so that reads don't remain eternally stale.

Support online re-sharding

While we can currently remove all shards of an operator (and its children) to re-create it with a different sharding, this is pretty inefficient. Instead, we should support dynamically adding or removing a shard through a migration, which should cause the appropriate state transfer to take shard from/to the existing shards. We do have to be careful about concurrent writes and backfills though!

Be smarter about n-way joins

We currently implement N-way joins by doing a chain of two-way joins. While this does help somewhat for re-use, we could be smarter about it. For example:

A JOIN B JOIN C JOIN D

will currently be turned into a chain of

((A JOIN B) JOIN C) JOIN D

whereas we could turn it into

(A JOIN B) JOIN (C JOIN D)

which allow more parallelism in the data-flow. We could also take into account the size of the related tables (and even key density) to determine which join order allows us to do less work.

This work could extend to adding direct support for an N-way join, as those may be possible to execute more efficiently than lots of 2-way joins. We may want to keep a separate N-way join operator though, as its code will likely be more complicated, and slower, than the 2-way join code.

Add better SQL-compatibility tests

While we have a test module that checks for MySQL compatibility, it is still relatively limited. It'd be amazing to have a more extensive test suite that we could run to see that our operators indeed match SQL semantics. Something standard like the SQL Logic Test might be a good candidate.

Having mutliple aggregates in select errors

Noria seems not to allow for a select query with two simultaneous aggregations.

I set up this very simple table:

CREATE TABLE tab (x int, y int, PRIMARY KEY(x));

And a query with two aggregations:

VIEW test: 
    SELECT count(y), sum(y) 
    FROM tab 
    WHERE x = ? 
    GROUP BY x;

Which throws a cannot group by aggregation column in
noria-server/dataflow/src/ops/grouped/aggregate.rs:27:9.

By contrast if the select is only count(x) or sum(x) it works just fine.
I should also note that it also errors on count(*), sum(y).

My question would be whether this query is correct actually should work?

My guess is that it doesn't due to how the query graph is constructed. (One
aggregate becomes the successor of the other and does not get the initial set of
records/has its automatic group/state key include the computed column from the
ancestor.)

SQL support

Sorry if I missed an obvious doc link somewhere, but I wasn't able to find comprehensive documentation on the extend of SQL support in either noria and or noria-mysql.

I'm looking to replace some part of a 1bn+ row (postgres-based) query system with Noria, since a lot of it is written in Rust and queries are already using mat views in most cases, so here goes my feature list:

recursive and non-recursive CTE, where side-effect-free non-recursive CTE are not meant to be optimization fences (Postgres 12)
subqueries
UPSERT-type queries with a lot of rows (1-15 million)
all kinds of JOINs, but SQL-default JOINs would be fine for now

Can't compile on fresh debian install?

I've been having trouble building noria-server on my fresh debian stretch install. I've installed libclang-dev and build-essential, and I have a libstdc++ on my system, but I'm having trouble building rocksdb-sys:

# rustc -V
rustc 1.32.0-nightly (00e03ee57 2018-11-22)
# cargo build --release --bin noria-server
   Compiling librocksdb-sys v5.14.2 (https://github.com/ekmartin/rust-rocksdb.git?branch=custom#20f8595a)
error: failed to run custom build command for `librocksdb-sys v5.14.2 (https://github.com/ekmartin/rust-rocksdb.git?branch=custom#20f8595a)`
process didn't exit successfully: `/app/target/release/build/librocksdb-sys-86702df300c1afd1/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-changed=rocksdb/
cargo:rustc-link-lib=lz4
cargo:rerun-if-changed=snappy/

--- stderr
rocksdb/include/rocksdb/c.h:48:9: warning: #pragma once in main file [-Wpragma-once-outside-header]
rocksdb/include/rocksdb/c.h:68:10: fatal error: 'stdarg.h' file not found
rocksdb/include/rocksdb/c.h:48:9: warning: #pragma once in main file [-Wpragma-once-outside-header], err: false
rocksdb/include/rocksdb/c.h:68:10: fatal error: 'stdarg.h' file not found, err: true
thread 'main' panicked at 'unable to generate rocksdb bindings: ()', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Trouble getting started using mysql endpoint

Hi Noria people,

I've build Kafka / Kafka streams based replication systems to increase write performance for database systems, and Noria seems very promising.
My first desire is to start a simple Noria cluster, and I run into some trouble.
What I have done:

I've successfully built the following projects

noria
noria-mysql
noria-ui

First, I start a simple zookeeper instance:
docker run -p 2181:2181 zookeeper

Then a noria server:
noria-server --deployment myapp --no-reuse --address 127.0.0.1 --shards 0 --zookeeper 127.0.0.1:2181 -v

Then a mysql endpoint:
noria-mysql --zookeeper-address 127.0.0.1:2181 --deployment myapp -v

Then I connect a mysql client:
mysql -h 127.0.0.1
(version Ver 8.0.15 for osx10.14 on x86_64, don't know if that matters)

Connects.
Then I enter a query I've found in the unit test:
mysql> CREATE TABLE Cats (id int PRIMARY KEY, name VARCHAR(255), PRIMARY KEY(id));
Query OK, 0 rows affected (0.09 sec)

Seems ok, then:
mysql> INSERT INTO Cats (id, name) VALUES (1, 'Bob');
ERROR 2013 (HY000): Lost connection to MySQL server during query

... and after that the endpoint seems non-responsive until I restart.

I'm pretty sure I'm doing something wrong, but can't find much info.

Fix replays across same-shard domain boundaries

If two adjacent domains are sharded the same way, but have a replay path across them that uses a different key from the sharding key, this currently leads to sadness. The code has some asserts that try to figure out if this is happening and crash appropriately, but we should really support that use-case. It will likely require a cross-shard replay, but routing the responses to only the appropriate shard will be a challenge (since the replay wouldn't go through a sharder).

Support directly sharded shuffles

If a shuffle is needed, we currently inject a "merge-then-shard" sequence. This produces correct results, but also creates an artificial bottleneck. In theory, the leaves in the shards of the source domain should be able to directly send their output to the appropriate shards instead. Where this gets tricky is with ancestor queries. If an ancestor query has to ask multiple shards of the source domain, there is no longer a union in place that can buffer and merge the resulting backfills correctly.

Are reads guaranteed to see previous writes in same transaction?

I listened to an interview about Noria in which it was explained that a writer may actually not immediately witness it's own writes when reading. Surely this is not true within the same transaction?

Add support for range indices and operations

We currently only have HashMap-based indices, which only support single-key lookups. This precludes support for things like a < ? or even doing multi-backfills by key ranges. We should fix this.

Trouble Embedding noria_server

I am currently trying to embed noria_server into a testing project, and I can't seem to figure out what needs to go into Cargo.toml to pull in the noria_server code. Attempting to only include the noria crate in Cargo.toml throws an error stating that the noria_server crate cannot be found, while trying to pull it from GitHub directly throws a 404 error, presumably because noria_server lives in a subfolder of the noria repo. Advise on this would be much appreciated.

Support ancestor queries directly through unions and joins

We currently implement query_through only on simple stateless operators like filters and projections. However, in theory, it should also be possible to query through multi-ancestor stateless operators such as unions and joins. This would significantly decrease the systems' memory footprint, as we no longer need to materalize intermediate join outputs for nested joins. We do need to be a little careful when implementing it however, as there may be strange interactions with concurrent backfills, but that shouldn't be too complicated. The majority of the work would be to implement "backwards" join processing (i.e., query_through) for unions and joins (they currently only implement "forward" processing through on_input).

Separate migration planning and execution

Currently, the planning for a migration is interleaved with its actual execution. In addition to being unfortunate from a code cleanliness perspective, it also makes the code very hard to reason about with regards to fault tolerance. In fact, there are almost surely critical sections during which a failure could cause the entire system to deadlock.

JDBC

Any plan for JDBC support?

noria-mysql said;

Jul 28 03:58:09.153 ERRO query can't be parsed: "set autocommit=1, sql_mode = concat(@@sql_mode,',STRICT_TRANS_TABLES')"
Jul 28 03:58:09.158 ERRO query can't be parsed: "SHOW VARIABLES WHERE Variable_name in ('max_allowed_packet','system_time_zone','time_zone','auto_increment_increment')"

Java said;

java.lang.NullPointerException
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.requestSessionDataWithShow(AbstractConnectProtocol.java:642)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:623)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:476)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1084)
	at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:493)
	at org.mariadb.jdbc.MariaDbConnection.newConnection(MariaDbConnection.java:156)
	at org.mariadb.jdbc.Driver.connect(Driver.java:90)
	at java.sql/java.sql.DriverManager.getConnection(Unknown Source)
	at java.sql/java.sql.DriverManager.getConnection(Unknown Source)
...

Improve system profiling tooling

The system currently outputs very little information that is useful for profiling. Information such as:

Time spent in different parts of domain processing.
Rate of backfills and record processing.
Time between domain wakeups.
Number of packets received/processed.
Number of domain timeouts handled.

This would be hugely helpful for nailing down performance problems (in addition to #90).

Tests fail: "Too many open files"

System Information

cargo 1.36.0-nightly (b6581d383 2019-04-16)
rustc 1.36.0-nightly (6d599337f 2019-04-22)
Linux 4.18.0-18-generic 18.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux

description

Command:

cargo test

Test Message (Ignored the un-related part before):

Got several FAILED tests, and the test seems just stunned.

thread 'tokio-runtime-worker-5' panicked at 'not yet implemented', noria-server/src/controller/migrate/materialization/mod.rs:674:29
test controller::sql::tests::it_incorporates_join_projecting_join_columns ... FAILED (allowed)
thread 'tokio-runtime-worker-7' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for appending: /tmp/.tmpMfRmmZ/it_incorporates_explicit_multi_join-votes-1.db/MANIFEST-000005: Too many open files" }', src/libcore/result.rs:999:5
test controller::sql::tests::it_incorporates_compound_selection ... ok
thread 'tokio-runtime-worker-12' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-5' panicked at 'ErrorMessage { msg: "could not add new domain SendError(\"...\")" }', noria-server/src/worker/mod.rs:181:22
test controller::sql::tests::it_incorporates_literal_projection ... ok
test controller::sql::tests::it_incorporates_self_join ... ok
test controller::sql::tests::it_incorporates_arithmetic_projection ... ok
test controller::sql::tests::it_incorporates_aggregation_no_group_by ... ok
test controller::sql::tests::it_queries_over_aliased_view ... ignored
test controller::sql::tests::it_distinguishes_predicates ... ok
test controller::sql::tests::it_incorporates_aggregation_count_star ... ok
test controller::sql::tests::it_incorporates_aggregation ... ok
test controller::sql::tests::it_does_not_reuse_if_disabled ... ok
test controller::sql::tests::it_incorporates_join_with_nested_query ... ok
test controller::sql::tests::it_incorporates_simple_selection ... ok
test controller::sql::tests::it_orders_parameter_columns ... ok
test controller::sql::tests::it_parses ... ok
test handle::tests::limit_mutator_creation ... ok
test controller::sql::tests::it_reuses_identical_query ... ok
test integration::correct_nested_view_schema ... ok
test integration::add_columns ... ok
test controller::sql::tests::it_incorporates_implicit_multi_join ... ok
test controller::sql::tests::it_reuses_by_extending_existing_query ... ok
test controller::sql::tests::it_reuses_with_different_parameter ... ok
thread 'tokio-runtime-worker-9' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for appending: /tmp/.tmpeILAWl/crossing_migration-b-0.db/000006.dbtmp: Too many open files" }', src/libcore/result.rs:999:5
thread 'io-worker-1' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
test integration::albums ... FAILED
thread 'tokio-runtime-worker-3' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open directory: /tmp/.tmpxqLtwd/double_shuffle-Car-1.db: Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-10' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-13' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open directory: /tmp/.tmpeKlgDV/forced_shuffle_despite_same_shard-Car-0.db: Too many open files" }', src/libcore/result.rs:999:5
test integration::cascading_replays_with_sharding ... FAILED
test integration::base_mutation ... ok
thread 'tokio-runtime-worker-4' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open directory: /tmp/.tmpITov0z/do_full_vote_migration_false-article-0.db: Too many open files" }', src/libcore/result.rs:999:5
test integration::domain_amend_migration ... FAILED
test integration::empty_migration ... ok
thread 'tokio-runtime-worker-3' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for appending: /tmp/.tmpf1SJKJ/it_recovers_persisted_bases-Car-1.db/MANIFEST-000005: Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-4' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
test integration::full_vote_migration_new_and_old_unsharded ... FAILED
thread 'tokio-runtime-worker-9' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-5' panicked at 'SendError("...")', noria-server/src/startup.rs:155:30
test integration::independent_domain_migration ... ok
test integration::it_works_basic ... ok
thread 'tokio-runtime-worker-9' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
test integration::full_vote_migration_new_and_old ... FAILED
test integration::it_works_w_mat ... ok
test integration::it_works_deletion ... ok
test integration::it_works_w_partial_mat ... ok
test integration::it_works_w_partial_mat_below_empty ... ok
test integration::it_works_with_arithmetic_aliases ... ok
test integration::it_works_with_double_query_through ... ok
thread 'tokio-runtime-worker-12' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/libcore/result.rs:999:5
thread 'tokio-runtime-worker-11' panicked at 'SendError("...")', noria-server/src/startup.rs:155:30
test integration::it_works_with_function_arithmetic ... ok
test integration::it_works_with_multiple_arithmetic_expressions ... ok
test integration::it_works_with_reads_before_writes ... ok
test integration::it_works_with_simple_arithmetic ... ok
test integration::it_works_with_sql_recipe ... ok
test integration::it_works_with_vote ... ok
test integration::key_on_added ... ok
test integration::live_writes ... ok
thread 'tokio-runtime-worker-3' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open directory: /tmp/.tmpQcZ9JE/lobsters-hats-0.db: Too many open files" }', src/libcore/result.rs:999:5
test controller::sql::tests::it_incorporates_explicit_multi_join ... test controller::sql::tests::it_incorporates_explicit_multi_join has been running for over 60 seconds
test controller::sql::tests::it_incorporates_simple_join ... test controller::sql::tests::it_incorporates_simple_join has been running for over 60 seconds
test integration::crossing_migration ... test integration::crossing_migration has been running for over 60 seconds
test integration::double_shuffle ... test integration::double_shuffle has been running for over 60 seconds
test integration::finkelstein1982_queries ... test integration::finkelstein1982_queries has been running for over 60 seconds
test integration::forced_shuffle_despite_same_shard ... test integration::forced_shuffle_despite_same_shard has been running for over 60 seconds
test integration::full_aggregation_with_bogokey ... test integration::full_aggregation_with_bogokey has been running for over 60 seconds
test integration::full_vote_migration_only_new ... test integration::full_vote_migration_only_new has been running for over 60 seconds
test integration::it_recovers_persisted_bases ... test integration::it_recovers_persisted_bases has been running for over 60 seconds
test integration::it_recovers_persisted_bases_w_multiple_nodes ... test integration::it_recovers_persisted_bases_w_multiple_nodes has been running for over 60 seconds
test integration::it_works_with_join_arithmetic ... test integration::it_works_with_join_arithmetic has been running for over 60 seconds
test integration::lobsters ... test integration::lobsters has been running for over 60 seconds

Rust binding example missing

Hi there, looks like the example in the Rust binding section is 404ing. Do you have a link to the source? I’m betting Noria for an R&D project and would like to use it as a reference.

Also does Noria have a license term declared yet? If so I’d recommend putting it at the bottom of the README :-)

Thanks!

Dynamic load profiling

It'd be really cool if the system kept stats about how "busy" different parts of the graphs were. For example, how many records they're processing per second or how long they spend idle between every time they're woken up. This might help to identify performance regressions and opportunities for optimization, but could also have a wider impact by allowing the controller to re-shard or replicate nodes in response to increase/decreased load.

Document SQL syntax used

I could not find documentation for which exactly SQL syntax is being used/supported?

Add stateless reader domains

It came up in a discussion today that it'd be neat to have stateless domains below Reader nodes that reads could "process" things they extract from Reader state through. Since the domain is stateless, readers could all process this data-flow in parallel using combinations of data they've read without worrying about synchronization with other executors. This would allow us to move certain compute to the readers, which might let us materialize a less verbose Reader further "up" the graph (e.g., above an expensive join).

Sit infront of existing MySQL server

Hi, would something like this be able to act as a MySQL passthru where noria acts as the cache layer infront of MySQL but the storage and data processing is all done by MySQL itself (or MariaDB).

Or does the way that the dataflow work invalidate this as a concept?

Windows build failed

error[E0412]: cannot find type `clockid_t` in module `libc`
  --> C:\Users\xxxku\.cargo\registry\src\github.com-1ecc6299db9ec823\timekeeper-0.3.0\src\source.rs:26:31
   |
26 | fn clock_gettime(clock: libc::clockid_t) -> Result<libc::timespec, ()> {
   |                               ^^^^^^^^^ help: a type alias with a similar name exists: `clock_t`

error[E0425]: cannot find function `clock_gettime` in module `libc`
  --> C:\Users\xxxku\.cargo\registry\src\github.com-1ecc6299db9ec823\timekeeper-0.3.0\src\source.rs:33:15
   |
33 |         libc::clock_gettime(clock, &mut tp)
   |               ^^^^^^^^^^^^^ not found in `libc`
help: possible candidate is found in another module, you can import it into scope
   |
2  | use source::clock_gettime;
   |

error[E0425]: cannot find value `CLOCK_PROCESS_CPUTIME_ID` in module `libc`
  --> C:\Users\xxxku\.cargo\registry\src\github.com-1ecc6299db9ec823\timekeeper-0.3.0\src\source.rs:47:40
   |
47 |         let time = clock_gettime(libc::CLOCK_PROCESS_CPUTIME_ID).unwrap();
   |                                        ^^^^^^^^^^^^^^^^^^^^^^^^ not found in `libc`

error[E0425]: cannot find value `CLOCK_THREAD_CPUTIME_ID` in module `libc`
  --> C:\Users\xxxku\.cargo\registry\src\github.com-1ecc6299db9ec823\timekeeper-0.3.0\src\source.rs:56:40
   |
56 |         let time = clock_gettime(libc::CLOCK_THREAD_CPUTIME_ID).unwrap();
   |                                        ^^^^^^^^^^^^^^^^^^^^^^^ not found in `libc`

error: aborting due to 4 previous errors

Some errors occurred: E0412, E0425.
For more information about an error, try `rustc --explain E0412`.
error: Could not compile `timekeeper`.
warning: build failed, waiting for other jobs to finish...
error: build failed

EdgeQL Support

Edgedb https://edgedb.com/ is next-generation database with great query language. It would be great, If we can use EdgeQL with Noria. https://edgedb.com/blog/edgedb-a-new-beginning

Link to documentation is dead.

In the README, there is a link to the noria documentation in the Rust Bindings section, but the link is dead.

Migrate away from rawgit.com

rawgit.com is shutting down. We should stop relying on it for the local graph display!

Run without ZooKeeper

would it be possible to run without ZooKeeper for much simpler/basic deployments so its inline with other databases e.g. MySql/PosgreSql?

Send deltas instead of negative + positives when single column changes

Aggregations currently produce two records whenever an aggregated value changes: a negative for the old value, and a positive for the new one. This causes unfortunate extra computation downstream in the data-flow. For example, a following join, while it must still only do one lookup for the join key for those two rows, it has to allocate and produce two rows instead of one, which is costly.

We should probably introduce a new Record variant along the lines of

struct Delta {
    row: Vec<DataType>,
    diff: Vec<Option<DataType>>,
}

The values in diff, when they are Some, indicate that the given value should be added to the current value (in row; the old negative) to produce the (new) positive. This would allow the aggregation to produce a single Delta row for each output, and the join to produce a single Delta row for each Delta row, etc. Since diff is a Vec, the values could even pass through multiple operators that update diffs without turning into multiple rows (does this happen in reality though? maybe (usize, DataType) is sufficient?).

Drop rustc-serialize dependency

It has been deprecated: announcement.

Is there any functionality you would require from another library before this would be possible?

Live queries (push notifications)

I am moving this into a separate issue from a broad issue of #111.

@mjjansen mentioned push notifications. And @jonhoo replied:

Push notifications (basically, pushing parts of the data-flow to the client) is something that's definitely on our radar, and was actually one of the motivations for using data-flow in the first place. Data-flow is so amenable to distribution that in theory this should just be a matter of moving some of the data-flow nodes to a client machine. In practice it gets a little more tricky though. We don't have an implementation of it currently, and it's not at the top of our roadmap, but it is a feature we'd love to see!

I commented:

So +1 for push notifications (or I would say live query, I think this is the more common term). I do not think Noria has to provide any web API here, just expose things through Rust API, and then users can hook their own logic in Rust to push them to websockets or whatever.

And @jonhoo replied:

So, push notifications are tricky because they imply full materialization everywhere, which comes at a steep cost. There might be a good way to register interest in keys and then subscribe to updates for those keys, but that's not something we're actively working on. Might be a neat additional feature to add eventually though — it shouldn't be too hard, as most of the infrastructure is already there.

Avoiding having both size and time limits for base batching

Base tables (well, GroupCommitQueue really) currently use two values for batching: flush_timeout and queue_capacity. The former dictates that a batch may not stay in the buffer longer than the given time, and the latter dictates that we always flush once we have accumulated so and so many packets (note that it's not rows but packets!). To further complicate this story, the timer is only checked when the input stream lets up for a short while, or once every 64 packets.

This approach leads to a lot of parameter fiddling to get the results you want. Since we're primarily focused on high throughput with reasonable latency, we should really just have a single parameter that enforces a core property: "a write is never buffered for longer than X". This mirrors what we do in the various benchmarkers too. Basically, when the first row of a write is inserted into the buffer (which will become a batch), we note the time. If at any point (when adding another record or when a timeout occurs), we notice that that more than batching_interval time has passed since that time, we flush the batch.

Detect key subsumption to reduce evictions and index duplications

If an operator n sources a replay path on [0, 1], and also one on [0], it should be able to satisfy both of these with a single compound index on [0, 1]. Today, it creates two indices instead, since all indices are multi-keyed and not nested. Realizing this kind of key-subsumption also has the potential to reduce the number of join evictions we need to issue, as we know that if a given value is present in the index for [0], we haven't discarded any rows for any value of [1] for that [0], and thus don't need to evict them downstream. Or, said differently, if we miss on a given value for [0, 1], but [0] is present in the index, we know that [0, 1] must be empty.

Readme typo?

Shouldn't souplet be noria-server in the command below ? Or am I missing something?

$ cargo r --release --bin **souplet** -- --deployment myapp --no-reuse --address 172.16.0.19 --shards 0

Add support for compound sharding

We currently only support sharding by a single column value. That is silly. We should support sharding by compound keys.

Automatic failover for `View` and `Table` handles

Currently the system can successfully recover from individual machine failures. However, user code must manually re-create any view and table handles that still reference resources on the old machine. This could instead be avoided by having the handles themselves detect lost connections and transparently re-establish new ones. The controller already knows the relevant network addresses, so it should just be a matter of asking it for them whenever a connection breaks.

Running `basic-recipe` example more than once causes segfault

First off, I just want to say that this is a really cool project!

When I tried running the basic-recipe example under noria-server it worked the first time:

Oct 09 23:24:14.093 INFO found initial leader
Oct 09 23:24:14.094 INFO leader listening on external address V4(127.0.0.1:6033)
Oct 09 23:24:14.094 DEBG leader's internal listen address: V4(127.0.0.1:64841)
Oct 09 23:24:14.094 WARN Connected to new leader
Oct 09 23:24:14.095 INFO new worker registered from V4(127.0.0.1:64843), which listens on V4(127.0.0.1:64841)
Oct 09 23:24:14.112 DEBG 0 duplicate queries, version: 0
Oct 09 23:24:14.112 INFO starting migration
Oct 09 23:24:14.112 DEBG 4 queries, 2 of which are named, version: 1
Oct 09 23:24:14.112 INFO Schema version advanced from 0 to 1
Oct 09 23:24:14.113 DEBG Assigning primary key (aid) for base Article
Oct 09 23:24:14.113 INFO adding new base, node: 1
Oct 09 23:24:14.114 DEBG registering query "Article"
Oct 09 23:24:14.114 INFO adding new base, node: 2
Oct 09 23:24:14.114 DEBG registering query "Vote"
Oct 09 23:24:14.114 DEBG Making QG for "VoteCount"
Oct 09 23:24:14.115 INFO No reuse opportunity, adding fresh query
Oct 09 23:24:14.115 DEBG Added final MIR node for query named "VoteCount"
Oct 09 23:24:14.116 INFO adding new node, type: internal Distinct node, node: 3
Oct 09 23:24:14.116 INFO adding new node, type: internal |*| γ[0] node, node: 4
Oct 09 23:24:14.116 INFO adding new node, type: internal π[0, 1] node, node: 5
Oct 09 23:24:14.116 DEBG registering query "VoteCount"
Oct 09 23:24:14.117 DEBG Making QG for "ArticleWithVoteCount"
Oct 09 23:24:14.117 INFO No reuse opportunity, adding fresh query
Oct 09 23:24:14.118 DEBG Added final MIR node for query named "ArticleWithVoteCount"
Oct 09 23:24:14.118 INFO adding new node, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, node: 6
Oct 09 23:24:14.119 INFO adding new node, type: internal π[0, 1, 2, 3] node, node: 7
Oct 09 23:24:14.119 DEBG registering query "ArticleWithVoteCount"
Oct 09 23:24:14.119 INFO finalizing migration, #nodes: 7
Oct 09 23:24:14.120 DEBG node added to domain, domain: 0, type: B, node: 1
Oct 09 23:24:14.120 DEBG node added to domain, domain: 0, type: B, node: 2
Oct 09 23:24:14.120 DEBG node added to domain, domain: 0, type: internal Distinct node, node: 3
Oct 09 23:24:14.120 DEBG node added to domain, domain: 0, type: internal |*| γ[0] node, node: 4
Oct 09 23:24:14.120 DEBG node added to domain, domain: 0, type: internal π[0, 1] node, node: 5
Oct 09 23:24:14.121 DEBG node added to domain, domain: 0, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, node: 6
Oct 09 23:24:14.121 DEBG node added to domain, domain: 0, type: internal π[0, 1, 2, 3] node, node: 7
Oct 09 23:24:14.121 DEBG node added to domain, domain: 0, type: reader node, node: 8
Oct 09 23:24:14.121 DEBG assigning local index, local: 0, node: 1, type: B, domain: 0
Oct 09 23:24:14.122 DEBG assigning local index, local: 1, node: 2, type: B, domain: 0
Oct 09 23:24:14.122 DEBG assigning local index, local: 2, node: 3, type: internal Distinct node, domain: 0
Oct 09 23:24:14.122 DEBG assigning local index, local: 3, node: 4, type: internal |*| γ[0] node, domain: 0
Oct 09 23:24:14.123 DEBG assigning local index, local: 4, node: 5, type: internal π[0, 1] node, domain: 0
Oct 09 23:24:14.123 DEBG assigning local index, local: 5, node: 6, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, domain: 0
Oct 09 23:24:14.123 DEBG assigning local index, local: 6, node: 7, type: internal π[0, 1, 2, 3] node, domain: 0
Oct 09 23:24:14.123 DEBG assigning local index, local: 7, node: 8, type: reader node, domain: 0
Oct 09 23:24:14.130 DEBG booting new domains
Oct 09 23:24:14.131 INFO sending domain 0.0 to worker Ok(V4(127.0.0.1:64841))
Oct 09 23:24:14.133 INFO booted domain, nodes: 8, shard: 0, domain: 0
Oct 09 23:24:14.134 DEBG accepted new connection, base: false, id: 0.0
Oct 09 23:24:14.134 DEBG mutating existing domains
Oct 09 23:24:14.134 INFO bringing up inter-domain connections
Oct 09 23:24:14.134 INFO initializing new materializations
Oct 09 23:24:14.134 INFO adding lookup index to view, columns: [0], node: 1
Oct 09 23:24:14.135 INFO adding lookup index to view, columns: [0, 1], node: 3
Oct 09 23:24:14.135 INFO adding lookup index to view, columns: [0], node: 5
Oct 09 23:24:14.135 INFO adding lookup index to view, columns: [0], node: 4
Oct 09 23:24:14.135 INFO adding lookup index to view, columns: [0], node: 2
Oct 09 23:24:14.136 WARN using partial materialization for 8
Oct 09 23:24:14.136 WARN using partial materialization for 4
Oct 09 23:24:14.136 WARN full because required, node: 3
Oct 09 23:24:14.136 INFO adding index to view to enable partial, columns: [0], on: 3
Oct 09 23:24:14.136 WARN full because descendant is full, child: 3, node: 2
Oct 09 23:24:14.137 DEBG new fully-materalized node: B
Oct 09 23:24:14.137 INFO no need to replay empty new base, node: 1
Oct 09 23:24:14.534 DEBG new fully-materalized node: B
Oct 09 23:24:14.534 INFO no need to replay empty new base, node: 2
Oct 09 23:24:14.618 DEBG new fully-materalized node: internal Distinct node
Oct 09 23:24:14.618 INFO beginning reconstruction of internal Distinct node
Oct 09 23:24:14.619 INFO domain replay path is [(DomainIndex(0), [(NodeIndex(2), None), (NodeIndex(3), None)])], tag: 0, node: 3
Oct 09 23:24:14.670 INFO told about terminating replay path [ReplayPathSegment { node: LocalNodeIndex { id: 2 }, partial_key: None }], tag: 0, shard: 0, domain: 0
Oct 09 23:24:14.673 INFO told to prepare full state, key: [0], shard: 0, domain: 0
Oct 09 23:24:14.673 INFO told to prepare full state, key: [0, 1], shard: 0, domain: 0
Oct 09 23:24:14.673 INFO starting replay, shard: 0, domain: 0
Oct 09 23:24:14.733 DEBG current state cloned for replay, μs: 59883, shard: 0, domain: 0
Oct 09 23:24:14.734 DEBG replaying batch, #: 0, shard: 0, domain: 0
Oct 09 23:24:14.734 DEBG last batch processed, terminal: true, shard: 0, domain: 0
Oct 09 23:24:14.734 DEBG last batch received, local: 2, shard: 0, domain: 0
Oct 09 23:24:14.734 DEBG node is fully up-to-date, passes: 1, local: 2, shard: 0, domain: 0
Oct 09 23:24:14.734 INFO acknowledging replay completed, node: 2, shard: 0, domain: 0
Oct 09 23:24:15.038 INFO reconstruction completed, node: 3, ms: 420
Oct 09 23:24:15.038 DEBG new partially-materialized node: internal |*| γ[0] node
Oct 09 23:24:15.038 INFO beginning reconstruction of internal |*| γ[0] node
Oct 09 23:24:15.039 INFO domain replay path is [(DomainIndex(0), [(NodeIndex(3), Some([0])), (NodeIndex(4), Some([0]))])], tag: 1, node: 4
Oct 09 23:24:15.088 INFO told about replay path [ReplayPathSegment { node: LocalNodeIndex { id: 3 }, partial_key: Some([0]) }], tag: 1, shard: 0, domain: 0
Oct 09 23:24:15.090 INFO told to prepare partial state, tags: [Tag(1)], key: [0], shard: 0, domain: 0
Oct 09 23:24:15.147 INFO reconstruction completed, node: 4, ms: 109
Oct 09 23:24:15.148 DEBG new stateless node: internal π[0, 1] node
Oct 09 23:24:15.148 DEBG no need to replay non-materialized view, node: 5
Oct 09 23:24:15.469 INFO reconstruction completed, node: 5, ms: 320
Oct 09 23:24:15.469 DEBG new stateless node: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node
Oct 09 23:24:15.469 DEBG no need to replay non-materialized view, node: 6
Oct 09 23:24:15.513 INFO reconstruction completed, node: 6, ms: 43
Oct 09 23:24:15.513 DEBG new stateless node: internal π[0, 1, 2, 3] node
Oct 09 23:24:15.513 DEBG no need to replay non-materialized view, node: 7
Oct 09 23:24:15.566 INFO reconstruction completed, node: 7, ms: 52
Oct 09 23:24:15.566 DEBG new stateless node: reader node
Oct 09 23:24:15.566 INFO beginning reconstruction of reader node
Oct 09 23:24:15.567 INFO domain replay path is [(DomainIndex(0), [(NodeIndex(1), Some([0])), (NodeIndex(6), Some([0])), (NodeIndex(7), Some([0])), (NodeIndex(8), Some([0]))])], tag: 2, node: 8
Oct 09 23:24:15.620 INFO told about replay path [ReplayPathSegment { node: LocalNodeIndex { id: 5 }, partial_key: Some([0]) }, ReplayPathSegment { node: LocalNodeIndex { id: 6 }, partial_key: Some([0]) }, ReplayPathSegment { node: LocalNodeIndex { id: 7 }, partial_key: Some([0]) }], tag: 2, shard: 0, domain: 0
Oct 09 23:24:15.679 INFO reconstruction completed, node: 8, ms: 112
Oct 09 23:24:15.679 WARN migration completed, ms: 1566
digraph {{
    node [shape=record, fontsize=10]
    n0 [style="filled", fillcolor=white, label="(source)"]
    n1 [style="filled", fillcolor="/set312/1", label="{ { 1 / l0 / Article | B | █ } | aid, \ntitle, \nurl | unsharded }"]
    n2 [style="filled", fillcolor="/set312/1", label="{ { 2 / l1 / Vote | B | █ } | aid, \nuid | unsharded }"]
    n3 [style="filled", fillcolor="/set312/1", label="{{ 3 / l2 / q_f7dd1eab44a4f019_n0_distinct | Distinct | █ } | aid, \nuid | unsharded }"]
    n4 [style="filled", fillcolor="/set312/1", label="{{ 4 / l3 / q_f7dd1eab44a4f019_n0 | \|*\| γ[0] | ░ } | aid, \nvotes | unsharded }"]
    n5 [style="filled", fillcolor="/set312/1", label="{{ 5 / l4 / VoteCount | π[0, 1]  } | aid, \nvotes | unsharded }"]
    n6 [style="filled", fillcolor="/set312/1", label="{{ 6 / l5 / q_3697f743a02ca52_n0 | [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0  } | aid, \ntitle, \nurl, \nvotes | unsharded }"]
    n7 [style="filled", fillcolor="/set312/1", label="{{ 7 / l6 / q_3697f743a02ca52_n1 | π[0, 1, 2, 3]  } | aid, \ntitle, \nurl, \nvotes | unsharded }"]
    n8 [style="filled", fillcolor="/set312/1", label="{ { 8 / l7 / ArticleWithVoteCount | ░ } | (reader / ⚷: [0]) | unsharded }"]
    n0 -> n1 [ style=invis ]
    n0 -> n2 [ style=invis ]
    n2 -> n3 [  ]
    n3 -> n4 [  ]
    n4 -> n5 [  ]
    n1 -> n6 [  ]
    n5 -> n6 [  ]
    n6 -> n7 [  ]
    n7 -> n8 [  ]
}}
Oct 09 23:24:15.694 DEBG accepted new connection, base: true, id: 0.0
Creating article...
Creating new article...
Casting vote...
Finished writing! Let's wait for things to propagate...
Reading...
Ok(
    [
        [
            Int(1),
            TinyText("test title"),
            Text("http://pdos.csail.mit.edu"),
            BigInt(1)
        ]
    ]
)

However, when I ran it a second time, I got a segfault:

Oct 09 23:24:43.345 INFO found initial leader
Oct 09 23:24:43.348 INFO leader listening on external address V4(127.0.0.1:6033)
Oct 09 23:24:43.348 DEBG leader's internal listen address: V4(127.0.0.1:64865)
Oct 09 23:24:43.349 WARN Connected to new leader
Oct 09 23:24:43.352 INFO new worker registered from V4(127.0.0.1:64867), which listens on V4(127.0.0.1:64865)
Oct 09 23:24:43.491 DEBG 0 duplicate queries, version: 0
Oct 09 23:24:43.492 INFO starting migration
Oct 09 23:24:43.492 DEBG 4 queries, 2 of which are named, version: 1
Oct 09 23:24:43.492 INFO Schema version advanced from 0 to 1
Oct 09 23:24:43.493 DEBG Assigning primary key (aid) for base Article
Oct 09 23:24:43.494 INFO adding new base, node: 1
Oct 09 23:24:43.495 DEBG registering query "Article"
Oct 09 23:24:43.497 INFO adding new base, node: 2
Oct 09 23:24:43.497 DEBG registering query "Vote"
Oct 09 23:24:43.499 DEBG Making QG for "VoteCount"
Oct 09 23:24:43.500 INFO No reuse opportunity, adding fresh query
Oct 09 23:24:43.501 DEBG Added final MIR node for query named "VoteCount"
Oct 09 23:24:43.502 INFO adding new node, type: internal Distinct node, node: 3
Oct 09 23:24:43.503 INFO adding new node, type: internal |*| γ[0] node, node: 4
Oct 09 23:24:43.503 INFO adding new node, type: internal π[0, 1] node, node: 5
Oct 09 23:24:43.504 DEBG registering query "VoteCount"
Oct 09 23:24:43.504 DEBG Making QG for "ArticleWithVoteCount"
Oct 09 23:24:43.505 INFO No reuse opportunity, adding fresh query
Oct 09 23:24:43.505 DEBG Added final MIR node for query named "ArticleWithVoteCount"
Oct 09 23:24:43.506 INFO adding new node, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, node: 6
Oct 09 23:24:43.507 INFO adding new node, type: internal π[0, 1, 2, 3] node, node: 7
Oct 09 23:24:43.507 DEBG registering query "ArticleWithVoteCount"
Oct 09 23:24:43.507 INFO finalizing migration, #nodes: 7
Oct 09 23:24:43.508 DEBG node added to domain, domain: 0, type: B, node: 1
Oct 09 23:24:43.508 DEBG node added to domain, domain: 0, type: B, node: 2
Oct 09 23:24:43.508 DEBG node added to domain, domain: 0, type: internal Distinct node, node: 3
Oct 09 23:24:43.509 DEBG node added to domain, domain: 0, type: internal |*| γ[0] node, node: 4
Oct 09 23:24:43.509 DEBG node added to domain, domain: 0, type: internal π[0, 1] node, node: 5
Oct 09 23:24:43.509 DEBG node added to domain, domain: 0, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, node: 6
Oct 09 23:24:43.510 DEBG node added to domain, domain: 0, type: internal π[0, 1, 2, 3] node, node: 7
Oct 09 23:24:43.511 DEBG node added to domain, domain: 0, type: reader node, node: 8
Oct 09 23:24:43.513 DEBG assigning local index, local: 0, node: 1, type: B, domain: 0
Oct 09 23:24:43.514 DEBG assigning local index, local: 1, node: 2, type: B, domain: 0
Oct 09 23:24:43.515 DEBG assigning local index, local: 2, node: 3, type: internal Distinct node, domain: 0
Oct 09 23:24:43.516 DEBG assigning local index, local: 3, node: 4, type: internal |*| γ[0] node, domain: 0
Oct 09 23:24:43.517 DEBG assigning local index, local: 4, node: 5, type: internal π[0, 1] node, domain: 0
Oct 09 23:24:43.517 DEBG assigning local index, local: 5, node: 6, type: internal [1:0, 1:1, 1:2, 5:1] 1:0 ⋈ 5:0 node, domain: 0
Oct 09 23:24:43.518 DEBG assigning local index, local: 6, node: 7, type: internal π[0, 1, 2, 3] node, domain: 0
Oct 09 23:24:43.519 DEBG assigning local index, local: 7, node: 8, type: reader node, domain: 0
Oct 09 23:24:43.520 DEBG booting new domains
Oct 09 23:24:43.521 INFO sending domain 0.0 to worker Ok(V4(127.0.0.1:64865))
Oct 09 23:24:43.524 INFO booted domain, nodes: 8, shard: 0, domain: 0
Oct 09 23:24:43.525 DEBG accepted new connection, base: false, id: 0.0
Oct 09 23:24:43.525 DEBG mutating existing domains
Oct 09 23:24:43.525 INFO bringing up inter-domain connections
Oct 09 23:24:43.526 INFO initializing new materializations
Oct 09 23:24:43.527 INFO adding lookup index to view, columns: [0], node: 2
Oct 09 23:24:43.529 INFO adding lookup index to view, columns: [0], node: 4
Oct 09 23:24:43.531 INFO adding lookup index to view, columns: [0], node: 5
Oct 09 23:24:43.532 INFO adding lookup index to view, columns: [0, 1], node: 3
Oct 09 23:24:43.533 INFO adding lookup index to view, columns: [0], node: 1
Oct 09 23:24:43.533 WARN using partial materialization for 8
Oct 09 23:24:43.534 WARN using partial materialization for 4
Oct 09 23:24:43.534 WARN full because required, node: 3
Oct 09 23:24:43.534 INFO adding index to view to enable partial, columns: [0], on: 3
Oct 09 23:24:43.535 WARN full because descendant is full, child: 3, node: 2
Oct 09 23:24:43.536 DEBG new fully-materalized node: B
Oct 09 23:24:43.536 INFO no need to replay empty new base, node: 1
Oct 09 23:24:44.108 DEBG new fully-materalized node: B
Oct 09 23:24:44.108 INFO no need to replay empty new base, node: 2
Oct 09 23:24:44.283 DEBG new fully-materalized node: internal Distinct node
Oct 09 23:24:44.283 INFO beginning reconstruction of internal Distinct node
Oct 09 23:24:44.284 INFO domain replay path is [(DomainIndex(0), [(NodeIndex(2), None), (NodeIndex(3), None)])], tag: 0, node: 3
Oct 09 23:24:44.672 INFO told about terminating replay path [ReplayPathSegment { node: LocalNodeIndex { id: 2 }, partial_key: None }], tag: 0, shard: 0, domain: 0
Segmentation fault: 11

I am running on macos High Sierra.

I am not sure if this is an actual issue but wanted to let you all know!

Automatically detect aggregation subsumption

For example, a COUNT operator can be re-used in a SUM. Take the following two queries:

SELECT story_id, COUNT(*) AS vcount FROM votes GROUP BY story_id;
SELECT story_id, SUM(ratings) AS score FROM
  SELECT story_id, rating FROM ratings
  UNION
  SELECT story_id, 1 FROM votes;

The second query can re-use the first like this:

SELECT story_id, SUM(ratings) AS score FROM
  SELECT story_id, rating FROM ratings
  UNION
  (SELECT story_id, COUNT(*) FROM votes GROUP BY story_id);

to avoid re-counting all of the votes.

Be live during state clone for full replays

When we need to do a full replay, we currently do a synchronous clone of the source operator's state. This holds up processing other writes and backfills through that operator, which is sad. If we could find some kind of data-structure that allowed us to efficiently capture a snapshot of it which we could then use for the backfill, that'd be awesome!

Improve logging consistency throughout the codebase

Currently, logging is pretty ad-hoc: different modules use logging levels pretty arbitrarily, and there is important information in trace levels and unimportant info in log. We should take a pass on the code base and figure out what information we usually end up wanting, what information we want for debugging purposes, and what we never even look at.

How to install/setup?

I tried to follow the document but was stopped by Zookeeper. How to setup zookeeper? May I have some resource to tutorials to follow?

Replace local TCP connections with in-memory channels

Now that all the I/O is tokio based, it should be relatively straightforward to replace the communication channel used for domain-to-domain sends with in-memory channels when the source and destination are in the same process. This should significantly speed up certain workloads as we don't have to go through serialization and syscalls.

Development roadmap?

I assume this is a prototype, many things still have to be implemented (and / or researched), so the system will change over time.
Do you have plans to make noria production-ready?
Is there a publicly accessible development roadmap or feature plan people could follow?

Is the project looking for any new contributors?

Was looking at #111 and am assuming it's still current. I am interested in contributing, and was wondering if you had any progress on the roadmap since 2018 and/or a contributions guide.

I was specifically thinking about the availability and fault-tolerance considerations mentioned in the above issue. Was there any dev work done on either that or #105 that already covered this use case?

Please let me know if there's anything I can do!

RDMA for faster networking!

I don't really have much to say about this. Might be interesting. tokio-rdma anyone?

Delete base table or view

I've been playing Noria in a private little CMS project - and noticed that there
seems to be no way to delete a view or table once it has been installed?
Or is there and i simply overlooked it?
Is this perhaps intentional design?

Decide on terminology

As we move to distributed fault tolerant Soup, the terms we're using have become increasingly ambiguous and overloaded. I don't love all these names, but we should have terminology for all of the following concepts:

domain: the collection of all shards and replicas that do computation for the same nodes in the graph.
replica set: the part of a domain that manages a single shard.
replica: single running instantiation of the nodes in a domain.

And for these higher level components...

controller: component that coordinates control plane operations. Runs in the same address space as one worker, but typically interacts with more over the network.
controller stub: component that participates in leader election. Becomes a controller if it wins.
worker: component manages data plane operations
worker thread pool: the collection of threads on a worker that do actual processing
worker thread: single thread in the worker thread pool.

We should also probably have distinct DomainIndex, ReplicaSetIndex, and ReplicaIndex types as well as WorkerIndexs to identify individual processes. Depending on how they're implemented, worker indexes may have to be randomly generated to avoid collisions.

Add support for multi-key joins

We currently only support joins with a single join column. Adding support for multi-key joins should be relatively straightforward, though we should take care not to introduce performance regressions due to additional bookkeeping that may be necessary.