GithubHelp home page GithubHelp logo

openaddresses-metrics's Introduction

OpenAddresses-Metrics

This respository contains a python file that runs from the command line.

It takes as input a regional OpenAddresses zip file, you have to point it at the folder \us, and a .csv file which it will write data into.

For example: python OpenAddresses_metrics.py openaddr-collected-us_northeast/us/ data.csv

It then writes 10 fields into that file:

  1. State: The state abbreviation or file name
  2. Total Rows: The number of rows in the file
  3. Good: The number of good rows in the file. Good rows are defined as those where the lat, lon, number, and street fields are not blank, there are no quotation marks, the number field has at least one digit, the number field is not 0 or a negative number, and the row is not field descriptors.
  4. City: The number of good rows in the file with a city.
  5. Zip: The number of good rows in the file with a zip code.
  6. Both: The number of good rows in the file with both a city and zip code.
  7. Parsing: The number of rows in the file with a quotation mark as a proxy for parsing problems.
  8. 'PO': The number of rows in the file with no digits in the number field as a proxy for parsing and data problems.
  9. '-9s': The number of rows in the file with a negative number in the number field
  10. Missing Fields: The number of rows in the file with fewer than 9 fields.
  11. Bad Zip: The number of rows in the file that have a zipcode with fewer than five digits.

This file also has 1 optional output: Summary ('-s', '--summary'). If the summary flag is turned on the file takes another file as input, where the summary data will be written. It then returns:

  1. The number of good rows in a statewide file
  2. The number of good rows in other files
  3. The number of good rows with zips in either statewide or other rows, choosing the one with the most good rows with zips
  4. The number of good rows with cities in either statewide or other rows, choosing the one with the most good rows with cities
  5. The number of good rows with zips and cities in either statewide or other rows, choosing the one with the most good rows with zips and cities

openaddresses-metrics's People

Contributors

kgudel avatar

Watchers

 avatar  avatar

openaddresses-metrics's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.