peermaps / eyros Goto Github PK

interval database

License: Other

Rust 89.70% Makefile 0.57% JavaScript 9.73%

eyros's Introduction

peermaps

peer to peer cartography

This tool streams raw OpenStreetMap data from p2p networks so that you can perform ad-hoc extracts for arbitrary bounding boxes. Because you are pulling the data from a p2p network (and helping to host it), you also don't need to worry about http quotas or rate limiting.

example

Stream data inside arbitrary WSEN extents from the network:

$ peermaps data -155.064270 18.9136925 -154.8093872 19.9 | head
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmconvert 0.8.4" timestamp="2016-11-28T01:59:58Z">
  <bounds minlat="18.9136925" minlon="-155.06427" maxlat="19.9" maxlon="-154.8093872"/>
  <node id="88994815" lat="19.7317131" lon="-155.0533157" version="3" timestamp="2012-01-19T21:23:51Z" changeset="10441415" uid="574654" user="Tom_Holland"/>
  <node id="88994817" lat="19.7312758" lon="-155.0533179" version="3" timestamp="2012-01-19T21:23:51Z" changeset="10441415" uid="574654" user="Tom_Holland"/>
  <node id="88994826" lat="19.7319167" lon="-155.0460457" version="3" timestamp="2012-01-19T21:23:51Z" changeset="10441415" uid="574654" user="Tom_Holland"/>
  <node id="88994829" lat="19.7329599" lon="-155.0463189" version="3" timestamp="2012-01-19T21:23:51Z" changeset="10441415" uid="574654" user="Tom_Holland"/>
  <node id="88994832" lat="19.7333033" lon="-155.0454221" version="3" timestamp="2012-01-19T21:23:51Z" changeset="10441415" uid="574654" user="Tom_Holland"/>
  <node id="88994836" lat="19.7336513" lon="-155.0450981" version="4" timestamp="2012-01-20T23:02:03Z" changeset="10451586" uid="574654" user="Tom_Holland"/>
  <node id="88994868" lat="19.7341231" lon="-155.0447835" version="3" timestamp="2012-01-20T23:02:03Z" changeset="10451586" uid="574654" user="Tom_Holland"/>

install

requirements:

ipfs
nodejs + npm
osmconvert
bash (for the peermaps data command)

Install the prerequisites, then install the peermaps command:

npm install -g peermaps

Run the ipfs daemon somewhere (in a screen for example):

ipfs daemon

Now you can use the peermaps command.

usage

peermaps data W,S,E,N {OPTIONS}

  Print all data inside the W,S,E,N extents.

  -f      Output format: osm (default), o5m, pbf, csv.
  -n      Network: ipfs (default)
  --show  Print the generated command instead of running it.

peermaps files W,S,E,N

  Print the files from the archive that overlap with the W,S,E,N extents.

  -n      Network: ipfs (default)

peermaps read FILE

  Print the content of FILE from the archive.

  -n      Network: ipfs (default)

peermaps address

  Print the address of the peermaps archive for the given network.

  -n      Network: ipfs (default)

peermaps generate INFILE {OPTIONS}

  Generate a peermaps archive at OUTDIR for INFILE.

  -o OUTDIR   Default: ./mapdata
  -t MAXSIZE  Files must be no greater than MAXSIZE. Default: 1M
  --xmin      Minimum longitude (west). Default: -180
  --xmax      Maximum longitude (east). Default: 180
  --ymin      Minimum latitude (south). Default: -90
  --ymin      Maximum latitude (north). Default: 90
  --xcount    Number of longitude divisions per branch. Default: 4
  --ycount    Number of latitude divisions per branch. Default: 4
  --nproc     Number of converter processes to spawn. Default: (`nproc`-1)

  Example:
    peermaps generate planet-latest.osm.pbf -o ~/data/planet -t 5M

  Note: this operation may take days for planet-sized inputs.

mirror

Help us mirror the archive! If you have a computer with ~38G and network to spare, you can run:

ipfs pin add QmXJ8KkgKyjRxTrEDvmZWZMNGq1dk3t97AVhF1Xeov3kB4

For now there is only one archive hash. In the future, there will be more archives and an update mechanism.

todo

generate and host vector tiles on p2p networks
dat/hyperdrive support
archive update mechanism
torrent/webtorrent support?
p2p web tile viewer
make the generate step much faster by patching osmconvert.c

eyros's People

Contributors

Stargazers

Watchers

Forkers

ralphtheninja backwardn yoshuawuyts isgasho courajs wrjanan laplacekorea graydon tsdb-io barakber tanishqkancharla iq-scm ryman

eyros's Issues

latency spikes during batch write

Batch writes may take a second or more or even longer for very large datasets. Instead the writes could be written immediately to durable storage and then in a background thread pushed into the LSM forest.

Query database created with ingest

I don't know if it's a bug or if I'm doing something wrong. I create an eyros database with ingest like this (both are on eyros 4.6.1:

cargo +nightly run --release -- ingest --pbf ../../data/berlin-latest.osm.pbf --edb ../edb/

And they try to query it via:

cargo +nightly run --release --example query -- ../edb -180,-90,180,90

I get this error:

thread 'main' panicked at 'range start index 559651500 out of range for slice of length 24054', /eyros/src/bytes/from.rs:154:22

overwrite deleted data

Right now the delete feature marks documents as deleted but it doesn't overwrite them with data later. There should also be an option to write over the data with zeros or noise when deleted.

slow query times and unbounded growth in block sizes

After stress-testing by generating a db with 70 million records, queries get really sluggish. The branches are pretty balanced, but the size of some blocks (found by logging calls to read_block()) get really large. Probably this is because data fragments aren't ever rebuilt into branches while merging.

Missing License?

I've noticed some of the other projects in peermaps have an MIT License, but this project doesn't have any licensing information. Maybe a License should be added?

mixed types

Provide a default implementation for storing scalar and interval types alongside each other in the same db. An example use-case is storing points and polygons without needing to create separate databases.

Errors while compiling: error[E0425]: cannot find function `available_concurrency` in module `std::thread`

Hello,

I've 3 errors while compiling eyros.
It appears that available_concurrency has been renamed available_parallelism: rust-lang/rust@6cc91cb

   Compiling eyros v4.6.0
error[E0425]: cannot find function `available_concurrency` in module `std::thread`
   --> C:\...\eyros-4.6.0\src\tree.rs:530:34
    |
530 |           let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
    |                                    ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
719 |   #[cfg(feature="2d")] impl_tree![Tree2,Branch2,Node2,MState2,get_bounds2,build_data2,
    |  ______________________-
720 | |   (P0,P1),(0,1),(usize,usize),(None,None),2
721 | | ];
    | |__- in this macro invocation
    |
    = note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0425]: cannot find function `available_concurrency` in module `std::thread`
   --> C:\...\eyros-4.6.0\src\tree.rs:530:34
    |
530 |           let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
    |                                    ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
722 |   #[cfg(feature="3d")] impl_tree![Tree3,Branch3,Node3,MState3,get_bounds3,build_data3,
    |  ______________________-
723 | |   (P0,P1,P2),(0,1,2),(usize,usize,usize),(None,None,None),3
724 | | ];
    | |__- in this macro invocation
    |
    = note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0425]: cannot find function `available_concurrency` in module `std::thread`
   --> C:\...\eyros-4.6.0\src\tree.rs:530:34
    |
530 |           let nproc = std::thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
    |                                    ^^^^^^^^^^^^^^^^^^^^^ not found in `std::thread`
...
725 |   #[cfg(feature="4d")] impl_tree![Tree4,Branch4,Node4,Mstate4,get_bounds4,build_data4,
    |  ______________________-
726 | |   (P0,P1,P2,P3),(0,1,2,3),(usize,usize,usize,usize),(None,None,None,None),4
727 | | ];
    | |__- in this macro invocation
    |
    = note: this error originates in the macro `impl_tree` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0425`.
error: could not compile `eyros` due to 3 previous errors

async

use the new romio async implementation. this should add some extra perf for parallel i/o

variable sized values

Support variable-sized payloads. Eventually it would be good to have variable-sized points too but value payloads are more important for the peermaps roadmap.

variable sized payloads with u16 or u32 length

This is likely already possible with a custom serialization implementation, but it would be nice to have some examples. 8 bytes of length data is overkill for variable-sized payloads that have a known maximum size.

investigate query perf from js

The batch write perf in the wasm build of eyros could be improved but the query performance seems to diverge even more from the rust version. Perhaps this is because batches get sent over the wasm bridge in larger chunks than query results, which stream in one by one?

data block compactness optimization

When incoming data has poor locality, the resulting data blocks (groups of records) tend to span overly large intervals, significantly reducing the quality of the partitioning for each level of the tree. Pre-filtering and post-write optimization steps can improve the quality of the block intervals at the expense of some write performance.

deletes

implement them

random-access-disk dependency

You have a special patch with a local path in
https://github.com/peermaps/eyros/blob/master/Cargo.toml#L21

Do you have any local patches that differ from master https://github.com/datrs/random-access-disk ? Looks like your fork is 14 commits behind.

optimization techniques for query transfer size

Include upstream optimizations from the peermaps-ingest optimize phase into eyros directly and investigate strategies for incremental optimization.

Error trying to read/open db: block too small for length field (rust integration)

Hi!

Great work with the project :)

I have bee using it for a personal project and I'm facing the following error trying to open the DB after about 1gb of data inserted:

Compat { error: ErrorMessage { msg: "block too small for length field" }

About the usage, I don't have any special calls to close the DB. I just reopen the same DB as the app gets launched with the DB::open_from_path method.

The P, V in the db inserts are in the following format:

type P = (f64, f64, u8, i64);
type V = (u8, u8);
Row::Insert(point, value);

The data is just test data, not important, but i would like to know how/if I'm doing something wrong or how can it be prevented.

Br,
J

atomic operations

This one is tricky and requires some research.

skip storing intervals on scalar branches

block cache

finish the block cache for reads (lru) + writes (hash map). hopefully this will speed up both cold reads and queries once the cache is primed

Deletion example

It's not obvious how to delete rows. Would be nice to have example of deletion.
P.S.Thank you for the lib

What happened to substack - James Halliday's github account?

wasm version

This database already ought to compile to wasm, but we should also have a good API for using eyros in the browser using browser storage.

No longer compiles on Nightly

Unfortunately I think this has crept into nightly recently and now Eyros no longer builds

error[E0407]: method `backtrace` is not a member of trait `std::error::Error`
  --> /Users/alex/.cargo/registry/src/github.com-1ecc6299db9ec823/eyros-4.6.2/src/error.rs:31:3
   |
31 | /   fn backtrace(&'_ self) -> Option<&'_ Backtrace> {
32 | |     Some(&self.backtrace)
33 | |   }
   | |___^ not a member of trait `std::error::Error`

For more information about this error, try `rustc --explain E0407`.
error: could not compile `eyros` due to previous error

merge databases

Merge multiple databases together. This should work without requiring the presence of the raw data so that the results from multiple computers can be combined together without transferring the data file (big), only the tree data (small) plus ranges for the data blocks (unknown size, probably smaller than big but bigger than small).

Error when opening current example

When opening the current example the map fails with the following error in Brave browser. Tried with both:
https://ipfs.io/ipfs/QmS24zmgDz2jFdakvd6aT6sRXSGRXWJaB62aPTbvmpguBB/
ipfs://bafybeibwvqwjcptcsl5zm4gedug5mlfsltvcirxffm2zgjr5hudxct6jmy/

Uncaught (in promise) RangeError: WebAssembly.Compile is disallowed on the main thread, if the buffer size is larger than 4KB. Use WebAssembly.compile, or compile on a worker thread.
    at module.exports ((index):3208)
    at module.exports ((index):2701)
    at onDone ((index):6630)
    at notifyProgress ((index):11787)
    at onReadyStateChange ((index):11521)
    at XMLHttpRequest.xhr.onreadystatechange ((index):11471)