GithubHelp home page GithubHelp logo

mapbox / osm-comments-parser Goto Github PK

View Code? Open in Web Editor NEW
5.0 97.0 6.0 278 KB

Parsers to read Notes and Changeset XML files and save them in a Postgres DB

License: ISC License

JavaScript 99.70% Shell 0.30%

osm-comments-parser's Introduction

CircleCI

Notes and comments parser

Reads XML files and saves into database.

Setup

Run npm install

Setup database:

createdb <db_name>
psql <db_name> < scripts/create_tables.sql

The following command adds indexes to the database, making it much more swift to query. You can add them right after the setup of the database, or perform this after the initial data loading (see below) for better performance during the import.

psql <db_name> < scripts/create_indexes.sql

Setup environment variables required for the project:

export OSM_COMMENTS_POSTGRES_URL='postgres://<username>@localhost/osm-comments'
export OSM_COMMENTS_TEST_POSTGRES_URL='postgres://username@localhost/osm-comments-test'

Run

In a node shell:

var notesParser = require('./notes');
notesParser({filename: '/path/to/notes-xml'});

var changesetParser = require('./changesets');
changesetParser({filename: '/path/to/changeset-xml'});

From the terminal:

node index.js <notes|changesets> --filename=/path/to/xml/file

Test

Run npm test

Initial load of changesets

When starting out with an empty database, there is an optimized way to load the initial backlog of changesets. Create an empty folder called csv in the project root and pass the option: initial=true changesetParser. After this command is run, run psql <db_name> < changesets/post_initial.sql to load the CSVs into the database. FIXME: this should be scripted.

osm-comments-parser's People

Contributors

geohacker avatar kepta avatar tyrasd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osm-comments-parser's Issues

Optimize initial load of changesets

While the current parser code works well for incremental changeset updates, it is not ideally suited to the initial load of ~40 million changeset records. This can be quite slow and we would ideally load data into postgres via ogr2ogr or a \COPY directly from a CSV rather than making multiple INSERTs.

Approach would be for the changeset parser to take an option like initial=true which would make it generate CSV files instead of writing to the database. The CSV files would then be loaded via \COPY or ogr2ogr

/cc @geohacker

Changes to DB schema

The current table schema is highly normalized and has worked well, but does not work great for some queries we want to support.

Make the following changes to schema:

  • Flatten users table - add username everywhere there is a user id
  • Add comment and source as fields in the changeset table
  • Add is_unreplied boolean field to changeset table that will be filled in by parser

cc @geohacker

Parser for .osc files

I think we can extend the scope this project to also parse .osc files. These are minutely, hourly or daily replication files that contains changes that happened to features for that period of time. The goal is:

  • to move the osm-edit-report parser to this project
  • and consolidate parsing and prepare reliable metrics for osm edits.

@batpad - let's outline steps for the parser here.

cc @Rub21 @ajithranka

initial load

Initial load of changesets
When starting out with an empty database, there is an optimized way to load the initial backlog of changesets. Create an empty folder called csv in the project root and pass the option: initial=true changesetParser. After this command is run, run psql <db_name> < changesets/post_initial.sql to load the CSVs into the database. FIXME: this should be scripted

Could you tell me how would the command look like for passing the initial=true.
I can't seem to figure out where would I pass that option.
Thanks

Cleanup README

The README is possibly slightly dated and a little confusing, especially with the back-fill step. Clean up documentation.

Unreplied boolean not set correctly

The unreplied boolean doesn't seem to be getting set correctly when comments on changesets are updated. This seems to be a recent regression.

Tests silently failing

The current npm script gives a thumbs up even if some tests fail.
So we get a green light on most of the PRs.

duplicate entries of stats table

If we do a backfill and data already exists, the codebase simply inserts the data in stats table instead of overwriting it with the new data.

  • We need to define a PK for the stats table
  • We need to handle the existence of such a row if it exists we overwrite it else we create a new row in the stats table.

cc @geohacker @batpad

Use a single environment variable for db url

Right now, the app requires two environment variables to be set:

  • OSM_COMMENTS_POSTGRES_URL for postgres url for main db
  • OSM_COMMENTS_TEST_POSTGRES_URL for postgres url for test db

Ideally, we would require only one environment variable, something like COMMENTS_PG_URL.

/cc @rclark, might poke you for help on the most elegant way to do this.

failed install

I failed npm install, because I guess failed download node-pre-gyp ??

>node-pre-gyp ERR! Tried to download(403): https://mapbox-node-binary.s3.amazonaws.com/osmium/v0.5.6/node-v59-darwin-x64.tar.gz

/usr/local/bin/node /usr/local/lib/node_modules/npm/bin/npm-cli.js install --scripts-prepend-node-path=auto

[email protected] preinstall /Users/antoine/Documents/osm-comments-parser/node_modules/osmium
npm install node-pre-gyp

  • [email protected]
    added 113 packages in 2.847s
    npm notice created a lockfile as package-lock.json. You should commit this file.

[email protected] install /Users/antoine/Documents/osm-comments-parser/node_modules/osmium
node-pre-gyp install --fallback-to-build

node-pre-gyp ERR! Tried to download(403): https://mapbox-node-binary.s3.amazonaws.com/osmium/v0.5.6/node-v59-darwin-x64.tar.gz
node-pre-gyp ERR! Pre-built binaries not found for [email protected] and [email protected] (node-v59 ABI, unknown) (falling back to source compile with node-gyp)
CXX(target) Release/obj.target/osmium/src/apply.o
In file included from ../src/apply.cpp:6:
In file included from ../../nan/nan.h:192:
../../nan/nan_maybe_43_inl.h:112:15: warning: 'ForceSet' is deprecated [-Wdeprecated-declarations]
return obj->ForceSet(isolate->GetCurrentContext(), key, value, attribs);
^
/Users/antoine/.node-gyp/9.6.1/include/node/v8.h:3164:3: note: 'ForceSet' has been explicitly marked deprecated here
V8_DEPRECATED("Use CreateDataProperty / DefineOwnProperty",
^
/Users/antoine/.node-gyp/9.6.1/include/node/v8config.h:321:29: note: expanded from macro 'V8_DEPRECATED'
declarator attribute((deprecated))
^
In file included from ../src/apply.cpp:17:
In file included from ../src/location_handler_wrap.hpp:11:
In file included from ../../libosmium/include/osmium/index/map/all.hpp:43:
../../libosmium/include/osmium/index/map/sparse_mem_table.hpp:42:10: fatal error: 'google/sparsetable' file not found
#include <google/sparsetable>
^~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
make: *** [Release/obj.target/osmium/src/apply.o] Error 1
gyp ERR! build error
gyp ERR! stack Error: make failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:258:23)
gyp ERR! stack at ChildProcess.emit (events.js:127:13)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:209:12)
gyp ERR! System Darwin 17.4.0
gyp ERR! command "/usr/local/Cellar/node/9.6.1/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "build" "--fallback-to-build" "--module=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding/osmium.node" "--module_name=osmium" "--module_path=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding"
gyp ERR! cwd /Users/antoine/Documents/osm-comments-parser/node_modules/osmium
gyp ERR! node -v v9.6.1
gyp ERR! node-gyp -v v3.6.2
gyp ERR! not ok
node-pre-gyp ERR! build error
node-pre-gyp ERR! stack Error: Failed to execute '/usr/local/Cellar/node/9.6.1/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js build --fallback-to-build --module=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding/osmium.node --module_name=osmium --module_path=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding' (1)
node-pre-gyp ERR! stack at ChildProcess. (/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/node_modules/node-pre-gyp/lib/util/compile.js:83:29)
node-pre-gyp ERR! stack at ChildProcess.emit (events.js:127:13)
node-pre-gyp ERR! stack at maybeClose (internal/child_process.js:933:16)
node-pre-gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:220:5)
node-pre-gyp ERR! System Darwin 17.4.0
node-pre-gyp ERR! command "/usr/local/Cellar/node/9.6.1/bin/node" "/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/node_modules/.bin/node-pre-gyp" "install" "--fallback-to-build"
node-pre-gyp ERR! cwd /Users/antoine/Documents/osm-comments-parser/node_modules/osmium
node-pre-gyp ERR! node -v v9.6.1
node-pre-gyp ERR! node-pre-gyp -v v0.6.39
node-pre-gyp ERR! not ok
Failed to execute '/usr/local/Cellar/node/9.6.1/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js build --fallback-to-build --module=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding/osmium.node --module_name=osmium --module_path=/Users/antoine/Documents/osm-comments-parser/node_modules/osmium/lib/binding' (1)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: node-pre-gyp install --fallback-to-build
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/antoine/.npm/_logs/2018-03-01T06_44_47_805Z-debug.log

Process finished with exit code 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.