GithubHelp home page GithubHelp logo

isabella232 / skynet-data Goto Github PK

View Code? Open in Web Editor NEW

This project forked from developmentseed/skynet-data

0.0 0.0 0.0 136 KB

[DEPRECATED] Data pipeline for machine learning with OpenStreetMap

License: ISC License

Makefile 12.91% JavaScript 64.51% HTML 20.44% Dockerfile 2.14%

skynet-data's Introduction

skynet-data

A pipeline to simplify building a set of training data for aerial-imagery- and OpenStreetMap- based machine learning. The idea is to use OSM QA Tiles to generate "ground truth" images where each color represents some category derived from OSM features. Being map tiles, it's then pretty easy to match these up with the desired input imagery.

This repository is no longer under active development. We recommend using Label Maker to prepare data instead. That repo contains utility scripts which can be used to replicate the workflow needed to prepare data for skynet-train.

Quick Start

Pre-built docker image

The easiest way to use this is via the developmentseed/skynet-data docker image:

First, create a docker.env file with the contents including your MapboxAccessToken:

MapboxAccessToken=YOUR_TOKEN

Then run:

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data download-osm-tiles

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data

The first line downloads the OSM QA tiles to /path/to/output/dir/osm/planet.mbtiles. If you've already got that file on your machine, you can skip this.

The second builds a training set using the default options (Roads features from OSM QA tiles, images from Mapbox Satellite). To change the data sources, training set size and other options, add the relevant environment variables to the docker.env file , one per line.

Local docker image

You can also create the docker images yourself using docker-compose. Similarly to the quick-start above, make sure your docker.env file has your MapboxAccessToken and any other environment variables you want to set. Then run:

docker-compose build

to build your local docker image, and

docker-compose run data download-osm-tiles
docker-compose run data 

to download the OSM QA tiles, and run the data collection as specified in docker.env. By default the collected data will be saved into the data directory, but you can overide it by using -v /path/to/output/dir:/workdir/data after docker-compose run data similar to the pre-built instructions above.

Variables

The make commands below work off the following variables (with defaults as listed):

# location of image files
IMAGE_TILES ?= "tilejson+https://a.tiles.mapbox.com/v4/mapbox.satellite.json?access_token=$(MapboxAccessToken)"
# which osm-qa tiles extract to download; e.g. united_states_of_america
QA_TILES=planet
# location of data tiles to use for rendering labels; defaults to osm-qa tiles extract specified by QA_TILES
DATA_TILES ?= mbtiles://./data/osm/$(QA_TILES).mbtiles
# filter to this bbox
BBOX ?= '-180,-85,180,85'
# number of images (tiles) to sample
TRAIN_SIZE=1000
# define label classes output
CLASSES=classes/roads-buildings.json
# Filter out tiles whose ratio of labeled to unlabeled pixels is less than or
# equal to the given ratio.  Useful for excluding images that are all background, for example.
LABEL_RATIO ?= 0
# set this to a zoom higher than the data tiles' max zoom to get overzoomed label images
ZOOM_LEVEL ?= 17

You can override any of these parameters in your docker.env and make a full training set using the instructions above.

Details

Install

  • Install NodeJS v4.6.2
  • Install tippecanoe
  • Install GNU Parallel
  • Install shuf
  • Clone this repo and run npm install. (Note that this includes a node-mapnik install, which sometimes has trouble building in bleeding-edge versions of node.)

Sample available tiles

make data/sample.txt

This just does a simple random sample of the available tiles in the given mbtiles set, using tippecanoe-enumerate. For more intelligent filtering, consider using tippecanoe-decode to examine (geojson) contents of each tile.

Labels

Build label images: make data/labels/color or make data/labels/grayscale. Uses the CLASSES json file to set up the rendering of OSM data to images that represent per-pixel category labels. See classes/water-roads-buildings.json for an example. Rendering is with mapnik; see the docs for more on filter syntax.

Images

Download aerial images from a tiled source: make data/images

Heads up: the default, Mapbox Satellite, will need you to set the MapboxAccessToken variable, and will cost you map views!

Preview

Preview the generated data by opening up preview.html?accessToken=<mapbox access token>&prefix=/path/to/data in a local web server.

skynet-data's People

Contributors

anandthakker avatar dbdean avatar drewbo avatar ellielitwack avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.