GithubHelp home page GithubHelp logo

cardboard's Introduction

cardboard

Build Status Coverage Status

Cardboard is a JavaScript library for managing the storage of GeoJSON features on an AWS backend. It relies on DynamoDB for indexing and small-feature storage, and S3 for large-feature storage. Cardboard provides functions to create, read, update, and delete single features or in batch, as well as simple bounding-box spatial query capabilities.

Installation

npm install cardboard
# or globally
npm install -g cardboard

Configuration

Generate a client by passing the following configuration options to cardboard:

option required description
mainTable X the name of the DynamoDB table to use
region X the region containing the given DynamoDB table
accessKeyId AWS credentials
secretAccessKey AWS credentials
sessionToken AWS credentials
dyno a pre-configured dyno client to use for DynamoDB interactions

Providing AWS credentials is optional. Cardboard depends on the AWS SDK for JavaScript, and so credentials can be provided in any way supported by that library. See configuring the SDK in Node.js for more configuration options.

If you provide a preconfigured dyno client, you do not need to specify table and region when initializing cardboard.

Example

var Cardboard = require('cardboard');
var cardboard = Cardboard({
    mainTable: 'my-cardboard-table',
    region: 'us-east-1',
});- '6.9'

Creating a Cardboard table

Once you've initialized the client, you can use it to create a table for you:

cardboard.createTable(callback);

You don't have to create the table each time; you can provide the name of a pre-existing table to your configuration options to use that table.

API documentation

See api.md.

Concepts

Datasets

Most cardboard functions require you to specify a dataset. This is a way of grouping sets of features within a single Cardboard table. It is similar in concept to "layers" in many other GIS systems, but there are no restrictions on the types of features that can be associated with each other in a single dataset. Each feature managed by cardboard can only belong to one dataset.

Identifiers

Features within a single dataset must each have a unique id. Cardboard uses a GeoJSON feature's top-level id property to determine and persist the feature's identifier. If you provide a cardboard function with a GeoJSON feature that does not have an id property, it will assign one for you, otherwise, it will use the id that you provide. Be aware that inserting two features to a single dataset with the same id value will result in only the last feature being persisted in cardboard.

Collections

Whenever dealing with individual GeoJSON features, cardboard will expect or return a GeoJSON object of type Feature. In batch situations, or in any request that returns multiple features, cardboard will expect/return a FeatureCollection.

Precision

Cardboard retains the precision of a feature's coordinates to six decimal places.

cardboard's People

Contributors

ericcarlschwartz avatar kapadia avatar mapsam avatar mcwhittemore avatar mick avatar morganherlocker avatar rclark avatar sgillies avatar springmeyer avatar tmcw avatar waldyrious avatar willwhite avatar yhahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cardboard's Issues

generate id

Have cardboard generate an uuid each feature. Either add the the user supplied id to the dynamo doc and add a global secondary index, or create another entry like we do for featureid

query types

We now have only a bbox query. Obviously we want a polygon query. Do we want point and line queries? Do they make sense?

Store multiple data entries at each key level

As discussed with @DennisOSRM - instead of storing data like

cell!CELLID!PRIMARYKEY ⇢ geometry
cell!CELLID!PRIMARYKEY2 ⇢ geometry2

We should store it as

cell!CELLID⇢ geometry, geometry2

Implementation details:

  • How do we separate chunks of geometry in this scheme?
  • How do we indicate primary keys for each geometry so that they're quickly unique-able

Performance implications:

  • Fewer queries, which is good
  • Updating data will require downloading and uploading the chunk, which will get larger as more features overlap

flat index in s3

started in the s3 branch

The idea is to using a flat index in s3, but to track the contents of each cell in dynamo, to make them easier to update. A hybrid of the s3 branch and master.

Zero features returned when bboxQuery crosses prime meridian

cardboard.bboxQuery() returns an empty set of features if the provided bounding box crosses the prime meridian.

Test case:

var Cardboard = require('cardboard');

var c = new Cardboard({
    region: 'us-east-1',
    table: 'cardboard-staging'
});

c.bboxQuery([ -180, -85.05112877980659, 0, 85.0511287798066 ], '1409021191288.1dfc169f', function(err, data) {
    if (err) return console.error(err);
    console.log('not crossing prime: %d features', data.length);
});

c.bboxQuery([ -180, -85.05112877980659, 1, 85.0511287798066 ], '1409021191288.1dfc169f', function(err, data) {
    if (err) return console.error(err);
    console.log('crossing prime: %d features', data.length);
});

Output:

crossing prime: 0 features
not crossing prime: 47 features

Version 0.4.4. Need to see if simply upgrading will just fix this.

Use AttributesToGet to limit initial query to getting unique cells

For polygons, this would meant that initial queries would return, for instance,

cell!fdjsaklfdjsa!id:1
cell!fdjsaklfdjsa!id:2
cell!fdjsaklfdjsa!id:1

And then we could run de-duplication based on cell ids alone, and then run another query that grabs data with a batchgetitem. This would shoot more queries under the 64KB limit, I reckon.

Remove unneeded getParent calls in bboxQuery

Given a qkey like '2111111' we can get its parent cells by qkey = qkey.slice(0, -1).

@mick there's no "IN" operator or OR conditional for key conditions, is there? Something like cell: {'IN': ['2111111', '211111', '21111',...]} seems ideal. I could be ignorant about Dynamo's constraints.

Deduplicate results near the prime meridian

As I said in today's scrum, I'm adding tests near the prime meridian in an issue54 branch. We're getting duplicate features.

I'm thinking deduplication is the immediate fix with work in the near future to avoid duplicate results. @rclark @mick cardboard is where to dedupe, right? It's only dyno between us and AWS from cardboard?

Details below:

# query for line crossing 0 lon
ok 120 inserted
ok 121 {"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"}]}
not ok 122 {"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"}]}
  ---
    file:   /Users/sean/code/cardboard/node_modules/queue-async/queue.js
    line:   46
    column: 21
    stack:
      - getCaller (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:418:17)
      - assert (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:21:16)
      - Function.equal (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:162:10)
      - Test._testAssert [as equal] (/Users/sean/code/cardboard/node_modules/tap/lib/tap-test.js:87:16)
      - /Users/sean/code/cardboard/test/index.js:520:23
      - /Users/sean/code/cardboard/index.js:262:17
      - notify (/Users/sean/code/cardboard/node_modules/queue-async/queue.js:46:21)
      - Object.q.awaitAll (/Users/sean/code/cardboard/node_modules/queue-async/queue.js:68:25)
      - resolveFeatures (/Users/sean/code/cardboard/index.js:323:11)
      - /Users/sean/code/cardboard/index.js:260:13
    found:  4
    wanted: 1
  ...
ok 123 {"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"}]}
not ok 124 {"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"},{"type":"Feature","properties":{},"geometry":{"coordinates":[[-1,1],[1,1]],"type":"LineString"},"id":"ci079jsy70012rm2ha1ooft6e"}]}
  ---
    file:   /Users/sean/code/cardboard/node_modules/queue-async/queue.js
    line:   46
    column: 21
    stack:
      - getCaller (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:418:17)
      - assert (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:21:16)
      - Function.equal (/Users/sean/code/cardboard/node_modules/tap/lib/tap-assert.js:162:10)
      - Test._testAssert [as equal] (/Users/sean/code/cardboard/node_modules/tap/lib/tap-test.js:87:16)
      - /Users/sean/code/cardboard/test/index.js:520:23
      - /Users/sean/code/cardboard/index.js:262:17
      - notify (/Users/sean/code/cardboard/node_modules/queue-async/queue.js:46:21)
      - Object.q.awaitAll (/Users/sean/code/cardboard/node_modules/queue-async/queue.js:68:25)
      - resolveFeatures (/Users/sean/code/cardboard/index.js:323:11)
      - /Users/sean/code/cardboard/index.js:260:13
    found:  4
    wanted: 1
  ...
ok 125 passed queries
# teardown

return feature collections

cardboard.bboxQuery
cardboard.get
cardboard.getBySecondaryId

Should all return valid geojson feature collections.

operation: Delete Feature

  1. Get feature geometry from feature id index
  2. Recompute cover
  3. Issue delete requests for each cell id with batchWriteItem

For S3: same technique, except with s3.deleteObjects

Cost for deletion:


Footnote

If deleting things is onerous in terms of performance or cost, we could defer by using a journal - in the id-keyed record for a feature, we'd record a deleted flag and early-abort any requests / decodes of that feature.

Or: we can defer deletes to a different server. Anyway, need to implement it first.

Eliminate primary key argument in cardboard.insert

Since we are standardizing around geojson and the top-level id property, is there any problem with removing the first argument of cardboard.insert and instead just validating that the feature object contains an top-level id property?

/cc @mick

dynamodb transition

@mick i'm currently looking around in dynamodb land for how we should angle this

  • intuition would be that simple 'get a ton of exact keys' would be faster than any range queries, but dynamodb charges read units for misses, so that seems inefficient
  • otherwise, we could use lots of range queries using the Query type

as far as how to abstract this, it's either finishing #10 or writing simple-ish 'wrappers' for each backend, like i've started with dynamodb. not sure if leveldown is a decent abstraction for dynamodb

Findability of features at the dateline

Write tests that query for short dateline-crossing linestrings. I've a hunch that there's a lot of undefined behavior here. GeoJSON itself provides no guidance (yet) in this case.

Big things index

As @mick has been noticing, indexing big stuff like countries is tough with our default index levels.

  • Index big things at a different level than other things
  • Query this index simultaneously with our normal-sized-things index?

More bboxQuery tests around prime meridian and equator

There are some corner cases to test:

  • queries that barely touch feature bboxen along edges
  • queries that barely touch feature bboxen at their corners

Queries get nudged a bit at 0,0 and so that's the spot at which to focus.

"Global secondary index cell does not project [geometryid]"

Query in the cardboard script isn't working for my new table. The fio program below is the Fiona CLI (replacement for ogrinfo).

$ ./cardboard sgillies-shade --export | fio info
endpoint undefined
your table is ready sgillies-shade
{"count": 288, "crs": "+datum=WGS84 +no_defs +proj=longlat", "driver": "GeoJSON", "bounds": [-106.523437, 39.571822, -106.435546, 39.639537], "schema": {"geometry": "Polygon", "properties": {"val": "int", "id": "str"}}}
$ ./cardboard sgillies-shade --query="-107,39,-106,40"
endpoint undefined
your table is ready sgillies-shade
{ [ValidationException: One or more parameter values were invalid: Global secondary index cell does not project [geometryid]]
  message: 'One or more parameter values were invalid: Global secondary index cell does not project [geometryid]',
  code: 'ValidationException',
  time: Thu Sep 18 2014 13:46:44 GMT-0600 (MDT),
  statusCode: 400,
  retryable: false }

Removing 'geometryid' from the query options in cardboard.bboxQuery() doesn't break the tests, @mick, but then I get an empty result collection.

Flat index mode

This will be a mode that disables merging by modifying min and max level constants to be the same, and replacing range queries with direct GET queries. This will test out @DennisOSRM's idea that avoiding range queries will be faster and simpler than trying to use ranges.

Make "export", "dump", "query" sub-commands of cardboard

The cardboard script is going to be a useful tool for the Satellite team, which uses a lot of bash and Python programs, and I think it's worthwhile to change from cardboard table --export to cardboard export table. It's git-ish, which is a plus, a match for Satellite tools, and also an opportunity for me to learn another side of Node. Not high priority atm, but something I want to track.

Expose dynamoAdapter in module

Right now this is necessary to interact with Cardboard via the nodejs API.

@mick do you think it makes sense to continue down the

databaseAdapter(dbConfig, function(database) {
  new Cardboard(database);
});

Path, or should we wrap this sort of asyncness inside?

var cardboard = new Cardboard(dbConfig);

cardboard.on('ready', function() {
  // blahhh
});

Index by id

Required for #29 - this is a simple index keyed on primary id rather than s2 cover

Using dynamo wrong

@mick hey - trying to use dynalite & aws-sdk: 3a6a754

and getting

{ '0':
   { [TimeoutError: Could not load credentials from any providers]
     message: 'Could not load credentials from any providers',
     code: 'CredentialsError',
     time: Mon May 19 2014 18:35:38 GMT-0400 (EDT),
     originalError:
      { message: 'Connection timed out after 1000ms',
        code: 'TimeoutError',
        time: Mon May 19 2014 18:35:38 GMT-0400 (EDT) },
     _willRetry: false },
  '1': null }

/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/sequential_executor.js:117
          if (err._hardError) throw err;
                                    ^
TypeError: Object #<Object> has no method 'call'
    at Request.<anonymous> (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/request.js:347:20)
    at Request.callListeners (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/sequential_executor.js:114:20)
    at Request.emit (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/sequential_executor.js:81:10)
    at Request.emit (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/request.js:578:14)
    at Request.transition (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/request.js:12:12)
    at AcceptorStateMachine.runTo (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/request.js:28:9)
    at Request.<anonymous> (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/request.js:580:12)
    at Request.callListeners (/Users/tmcw/src/cardboard/node_modules/aws-sdk/lib/sequential_executor.js:90:20)

Any idea what I might be doing wrong in my testing setup? 3a6a754#diff-6015c9f6e4f7700bf6800946c7f61984R3

Benchmarking

/cc @morganherlocker have you used benchmark at all?

  • How can we test implementations against each other and find the bottlenecks in this implementation?
  • How should this interact with dynamodb so we can test real world numbers but also not spend a billion buckaroos?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.