GithubHelp home page GithubHelp logo

osoc19 / best Goto Github PK

View Code? Open in Web Editor NEW
20.0 5.0 8.0 95.02 MB

Best@: Belgian Streets & Addresses

Home Page: https://osoc19.github.io/best/

License: MIT License

Jupyter Notebook 98.00% Python 1.96% Dockerfile 0.05%

best's Introduction

BeSt

Various tools to use and convert the BeST streets and addresses open data

Table of contents

Overview

Components overview

Scripts

This repository contains a collection of scripts to perform various operations with the BeST streets and addresses open data.

Downloader

The download script downloads the dataset and unzips it in the specified directory.

View the documentation

Converter

The convert script converts the xml files in the dataset to one big csv file.

View the documentation

Filter

The filter script can filter the csv file on postcode and bounding box and can output the result in various formats

View the documentation

Matching

The matching script can match the addresses of one file to the addresses of another file and will fill in the official address id and GPS coordinates

View the documentation

Count

The Count scripts can count the occurences of streetnames in the file

View the documentation

Compare

The compare script compares the streetnames of two groups of postal codes and return the common ones.

View the documentation

Docker

Openaddresses

Contains the metadata for our upload to the openaddresses.io global open address repository, and some more information on the process.

View the documentation.

Pelias

Contains information on how to set up a pelias geocoding service that uses the BeSt data.

View the documentation

Marketing

A collection of marketing and communication assets are prepared for the promotion of the dataset and other similar datasets in the future.

They can be found here

Notebooks

Using the csv data a few notebooks were created using the data in fun and interesting ways.

View the documentation

Interactive map

An interactive map visualization off all addresses in Belgium can be found here

Along with the information on how to run it.

best's People

Contributors

barthanssens avatar blaaat avatar jbelien avatar jeborsel avatar jossevandelm avatar lotte089 avatar theodedeken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

best's Issues

Compare two address files

Compare addresses between two files and report the address_id and coordinates if a match is found.

"File is not a zip file"

When I try to use the project I keep running into the following error when running this command: python /app/best/downloader/downloader.py /app/best/input --url https://opendata.bosa.be/download/best/best-full-latest.zip:

Traceback (most recent call last):
  File "/app/best/downloader/downloader.py", line 105, in <module>
    unzip_recursive(args.file_name,args.output_dir,False)
  File "/app/best/downloader/downloader.py", line 32, in unzip_recursive
    with zipfile.ZipFile(zipped_file, 'r') as zfile:
  File "/usr/local/lib/python3.7/zipfile.py", line 1225, in __init__
    self._RealGetContents()
  File "/usr/local/lib/python3.7/zipfile.py", line 1292, in _RealGetContents
    raise BadZipFile("File is not a zip file")

I've tried to run it locally on MacOS and in Docker but both give the same result. Is there something I might be overlooking here?

DtypeWarning: Specify dtype option on import or set low_memory=False

We have been trying to retrieve streets from the data but keep getting the same error when making the attempt:

Error: sys:1: DtypeWarning: Columns (5,8,9,10,12,13,15,16,17) have mixed types. Specify dtype option on import or set low_memory=False.

The command we are using is:

python /app/best/filter/filter.py /app/best/data/data.csv /app/best/data/streets.csv --output_type street --postcode 1020

We have tried several zipcodes but with no success.

Sidenote, is it possible to omit the --postcode flag so that we extract all available streets? We actually need a list of streets instead of a list of addresses.

edit: We are using FROM python:3.7.4-alpine as base, the base of our Docker setup is pretty much the same as the one included in this repo.

Remove geography object from json files

Since an iso 3166 code is enough to get the data on the openaddresses.io map, we don't need the geojson geography object anymore.
They can thus be omitted.

PHP Geocoder Provider

There is the great PHP Geocoder library. It's an abstraction layer on top of all the most used geocoding services.
I'm one of the maintainer of this project.

I'll create a Pelias provider (= abstraction layer) ASAP so people can easily query any Pelias based geocoding API including our own !

Download script does not work inside docker container.

When you download the files inside the docker container it does not work.
#77
try https://github.com/oSoc19/best/blob/docker/DOCKER.md#examples and see it fail for yourself

~/best/out $ python ../downloader/downloader.py . --verbose --log_name ../logstuff
2019-07-24 14:08:17,684 - __main__ : INFO - Start download
2019-07-24 14:15:14,674 - __main__ : INFO - Download done
2019-07-24 14:15:15,266 - __main__ : INFO - Start extraction
2019-07-24 14:15:15,266 - __main__ : CRITICAL - Output directory already contains files. To automatically remove all files in the output directory, use --force

Docker image for tools

Create a docker image for all the BeST python tools (not the pelias) + dependencies like pandas and other required libraries.

Challenge: make it as small as possible, e.g. starting from the official Python docker image 3.7.4-alpine3.10.

One probably has to add gcc, makefile etc... but they should be removed afterwards, in the same RUN command (see also https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/)

Downloading and file processing should be done outside the docker container (i.e. add VOLUME so the docker guest can mount a directory on the docker host)

Investigate use of ubuntu base image for docker

Right now the Dockerfile uses alpine linux as a base image,
Alpine is great for reducing the size of the container image, but it makes build times a lot longer.
We should try the same configuration with the ubuntu base image and try and see if it goes quicker.
The trade-off is definitely in the size of the image, but the minimal ubuntu base image available now from docker hub is only 29 MB big, which is only 25 MB bigger than the alpine image.
https://ubuntu.com/blog/minimal-ubuntu-released

Request for adress output improvement

Hey, small feature request here ๐Ÿ˜„ For the reverse geocoding, could it be possible to put the house number after the street name instead of before it? That's more the Belgian way to write addresses I think? ๐Ÿ˜Š (it's like this in both the "name" and "label" fields.)

Add exception handling for non-existing directories to file_read.py

osoc19@oSoc19:~/best$ python3 converter/file_read.py out/ out2/

Traceback (most recent call last):
  File "converter/file_read.py", line 252, in <module>
    converter(args)
  File "converter/file_read.py", line 21, in converter
    args.output_dir, '%s_addresses.csv' % args.region))
  File "/home/osoc19/best/converter/writer.py", line 8, in __init__
    self.output = open(path, 'w')
FileNotFoundError: [Errno 2] No such file or directory: 'out2/belgium_addresses.csv'

Catch FileNotFoundError

Issue with converter

(base) C:\Users\marc.bruyland\OneDrive - GCloud Belgium\Documents\PRJ_BEST\Data\OpenSummerOfCode\best-master\converter>python converter.py C:\Test brussels_addresses.csv --region brussels
converter.py:66: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
'The contents of file %s can not be read by this script', file)
2019-07-26 09:09:25,745 - main : WARNING - The contents of file BrusselsAddress.xml can not be read by this script
Traceback (most recent call last):
File "converter.py", line 293, in
converter(args)
File "converter.py", line 38, in converter
paths = find_xml_files(args.input_dir)
File "converter.py", line 67, in find_xml_files
if key in keys:
UnboundLocalError: local variable 'key' referenced before assignment

Upload marketing materials to github

Make sure that all marketing material (pitches/presentations, logos, banners...) are uploaded to github.

This includes the original files (if available) for artwork, e.g. Illustrator/SVG files used to create banners/postcards, editable files in case a PDF would be produced, etc.

I can't install pyproj.

If I install via pip install it gives

Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\duongngh\appdata\local\programs\python\python37\python.exe' 'c:\users\duongngh\appdata\local\programs\python\python37\lib\site-packages\pip_vendor\pep517_in_process.py' get_requires_for_build_wheel 'C:\Users\duongngh\AppData\Local\Temp\tmpxjumq7n8'
cwd: C:\Users\duongngh\AppData\Local\Temp\pip-install-y6cywpcs\pyproj
Complete output (1 lines):
Proj executable not found. Please set PROJ_DIR variable.


ERROR: Command errored out with exit status 1: 'c:\users\duongngh\appdata\local\programs\python\python37\python.exe' 'c:\users\duongngh\appdata\local\programs\python\python37\lib\site-packages\pip_vendor\pep517_in_process.py' get_requires_for_build_wheel 'C:\Users\duongngh\AppData\Local\Temp\tmpxjumq7n8' Check the logs for full command output.

If can install via conda but the project doesn't recognize pyproj

Update flowchart, include more components ?

The README points to a flowchart which mentions openaddresses, but it could be extended to also include pelias and/or extra components (pelias leaflets ?) or services, that use the openaddress files as a source.

Fix error in dockerfile

Collecting pyproj (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/7d/41/f0c8e63d20cd56596bfd88a455738629734faa26f26bbdf4d301330748ad/pyproj-2.2.1.tar.gz (382kB)
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Complete output from command /usr/local/bin/python /usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpq05d4lu0:
  ERROR: Proj executable not found. Please set PROJ_DIR variable.
  ----------------------------------------
ERROR: Command "/usr/local/bin/python /usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpq05d4lu0" failed with error code 1 in /tmp/pip-install-all75d2r/pyproj

Solve this issue by taking a look at:

https://github.com/pyproj4/pyproj/blob/master/docs/installation.rst

Add docker documentation

Explain:

  • What the docker is/does
  • How to build it
  • How to use it
  • Explain what the current issues are

Create non-tech story page/template

Story: person works for company X, in BI / invoicing department. Thanks to BeST, the internal address database is better, so less errors in sending invoices / letters.

Probably with a nice picture of someone in a business environment

Deliverables

  • create command line conversion tool to convert XML to other formats (csv, geojson...) => OK (shapefile, csv...)
  • a query tool to filter the data (e.g. based on frequency, phonetics...) => based on postal code, tool for occurrences of streetnames, geo box
  • a diff tool to compare data with other data sets (e.g. OSM, other address data sets..) => compare (exact) tool, fuzzy match in progress
  • document the Best-add data, data-recipes, also for non-technical users => pitch
  • data-storytelling (creative and reusable way of promoting the dataset) => in progress

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.