peermaps / eyros Goto Github PK
View Code? Open in Web Editor NEWinterval database
License: Other
interval database
License: Other
Right now the delete feature marks documents as deleted but it doesn't overwrite them with data later. There should also be an option to write over the data with zeros or noise when deleted.
however rust does it, plus the same info in the readme
Include upstream optimizations from the peermaps-ingest optimize phase into eyros directly and investigate strategies for incremental optimization.
Support variable-sized payloads. Eventually it would be good to have variable-sized points too but value payloads are more important for the peermaps roadmap.
Under some conditions the database hangs while writing and one of the tree files grows unbounded in size. I suspect this is some edge case in determining whether to build a new branch.
This database already ought to compile to wasm, but we should also have a good API for using eyros in the browser using browser storage.
After stress-testing by generating a db with 70 million records, queries get really sluggish. The branches are pretty balanced, but the size of some blocks (found by logging calls to read_block()
) get really large. Probably this is because data fragments aren't ever rebuilt into branches while merging.
finish the block cache for reads (lru) + writes (hash map). hopefully this will speed up both cold reads and queries once the cache is primed
Hello,
I've 3 errors while compiling eyros.
It appears that available_concurrency has been renamed available_parallelism: rust-lang/rust@6cc91cb
Compiling eyros v4.6.0
error[E0425]: cannot find function `available_concurrency` in module `std::thread`
--> C:\...\eyros-4.6.0\src\tree.rs:530:34
|
530 | let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
| ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
719 | #[cfg(feature="2d")] impl_tree![Tree2,Branch2,Node2,MState2,get_bounds2,build_data2,
| ______________________-
720 | | (P0,P1),(0,1),(usize,usize),(None,None),2
721 | | ];
| |__- in this macro invocation
|
= note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0425]: cannot find function `available_concurrency` in module `std::thread`
--> C:\...\eyros-4.6.0\src\tree.rs:530:34
|
530 | let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
| ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
722 | #[cfg(feature="3d")] impl_tree![Tree3,Branch3,Node3,MState3,get_bounds3,build_data3,
| ______________________-
723 | | (P0,P1,P2),(0,1,2),(usize,usize,usize),(None,None,None),3
724 | | ];
| |__- in this macro invocation
|
= note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0425]: cannot find function `available_concurrency` in module `std::thread`
--> C:\...\eyros-4.6.0\src\tree.rs:530:34
|
530 | let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
| ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
725 | #[cfg(feature="4d")] impl_tree![Tree4,Branch4,Node4,Mstate4,get_bounds4,build_data4,
| ______________________-
726 | | (P0,P1,P2,P3),(0,1,2,3),(usize,usize,usize,usize),(None,None,None,None),4
727 | | ];
| |__- in this macro invocation
|
= note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)
For more information about this error, try `rustc --explain E0425`.
error: could not compile `eyros` due to 3 previous errors
implement them
This one is tricky and requires some research.
When incoming data has poor locality, the resulting data blocks (groups of records) tend to span overly large intervals, significantly reducing the quality of the partitioning for each level of the tree. Pre-filtering and post-write optimization steps can improve the quality of the block intervals at the expense of some write performance.
Unfortunately I think this has crept into nightly recently and now Eyros no longer builds
error[E0407]: method `backtrace` is not a member of trait `std::error::Error`
--> /Users/alex/.cargo/registry/src/github.com-1ecc6299db9ec823/eyros-4.6.2/src/error.rs:31:3
|
31 | / fn backtrace(&'_ self) -> Option<&'_ Backtrace> {
32 | | Some(&self.backtrace)
33 | | }
| |___^ not a member of trait `std::error::Error`
For more information about this error, try `rustc --explain E0407`.
error: could not compile `eyros` due to previous error
When opening the current example the map fails with the following error in Brave browser. Tried with both:
https://ipfs.io/ipfs/QmS24zmgDz2jFdakvd6aT6sRXSGRXWJaB62aPTbvmpguBB/
ipfs://bafybeibwvqwjcptcsl5zm4gedug5mlfsltvcirxffm2zgjr5hudxct6jmy/
Uncaught (in promise) RangeError: WebAssembly.Compile is disallowed on the main thread, if the buffer size is larger than 4KB. Use WebAssembly.compile, or compile on a worker thread.
at module.exports ((index):3208)
at module.exports ((index):2701)
at onDone ((index):6630)
at notifyProgress ((index):11787)
at onReadyStateChange ((index):11521)
at XMLHttpRequest.xhr.onreadystatechange ((index):11471)
The batch write perf in the wasm build of eyros could be improved but the query performance seems to diverge even more from the rust version. Perhaps this is because batches get sent over the wasm bridge in larger chunks than query results, which stream in one by one?
It's not obvious how to delete rows. Would be nice to have example of deletion.
P.S.Thank you for the lib
This is likely already possible with a custom serialization implementation, but it would be nice to have some examples. 8 bytes of length data is overkill for variable-sized payloads that have a known maximum size.
use the new romio async implementation. this should add some extra perf for parallel i/o
I've noticed some of the other projects in peermaps have an MIT License, but this project doesn't have any licensing information. Maybe a License should be added?
I don't know if it's a bug or if I'm doing something wrong. I create an eyros database with ingest
like this (both are on eyros 4.6.1:
cargo +nightly run --release -- ingest --pbf ../../data/berlin-latest.osm.pbf --edb ../edb/
And they try to query it via:
cargo +nightly run --release --example query -- ../edb -180,-90,180,90
I get this error:
thread 'main' panicked at 'range start index 559651500 out of range for slice of length 24054', /eyros/src/bytes/from.rs:154:22
Batch writes may take a second or more or even longer for very large datasets. Instead the writes could be written immediately to durable storage and then in a background thread pushed into the LSM forest.
Provide a default implementation for storing scalar and interval types alongside each other in the same db. An example use-case is storing points and polygons without needing to create separate databases.
but this needs to be done in a way that will work with browser storage in wasm
i see that the crate for eyros is at 4.6.2 but this repository reflects 4.6.1. the ./pkg/package.json
also appears out of sync with npm. will this repo be seeing any further commits?
i've been following the project for a while and trying things out as they become available on github & npm. if there is another place to follow along, i would be interested to know where it is!
Merge multiple databases together. This should work without requiring the presence of the raw data so that the results from multiple computers can be combined together without transferring the data file (big), only the tree data (small) plus ranges for the data blocks (unknown size, probably smaller than big but bigger than small).
Hi!
Great work with the project :)
I have bee using it for a personal project and I'm facing the following error trying to open the DB after about 1gb of data inserted:
Compat { error: ErrorMessage { msg: "block too small for length field" }
About the usage, I don't have any special calls to close the DB. I just reopen the same DB as the app gets launched with the DB::open_from_path
method.
The P, V in the db inserts are in the following format:
type P = (f64, f64, u8, i64);
type V = (u8, u8);
Row::Insert(point, value);
The data is just test data, not important, but i would like to know how/if I'm doing something wrong or how can it be prevented.
Br,
J
You have a special patch with a local path in
https://github.com/peermaps/eyros/blob/master/Cargo.toml#L21
Do you have any local patches that differ from master https://github.com/datrs/random-access-disk ? Looks like your fork is 14 commits behind.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.