GithubHelp home page GithubHelp logo

optionalg / irma-scrapers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from simonw/irma-scrapers

0.0 2.0 0.0 85 KB

Screen scrapers relating to natural disasters. See their output in https://github.com/simonw/disaster-data/

Python 100.00%

irma-scrapers's Introduction

irma-scrapers

Screen scrapers relating to hurricane Irma. See their output in https://github.com/simonw/disaster-data/

Irma Response

The Irma Response project at https://www.irmaresponse.org/ is a team of volunteers working together to make information available during and after the storm. There is a huge amount of information out there, on many different websites. The Irma API at https://irma-api.herokuapp.com/ is an attempt to gather key information in one place, verify it and publish it in a reuseable way.

To aid this effort, I've built a collection of screen scrapers that pull data from a number of different websites and APIs. That data is then stored in a Git repository, providing a clear history of changes made to the various sources that are being tracked.

Some of the scrapers also publish their findings to Slack in a format designed to make it obvious when key events happen, such as new shelters being added or removed from public listings.

Tracking changes over time

A key goal of this screen scraping mechanism is to allow changes to the underlying data sources to be tracked over time. This is achieved using git, via the GitHub API. Each scraper pulls down data from a source (an API or a website) and reformats that data into a sanitized JSON format. That JSON is then written to the git repository. If the data has changed since the last time the scraper ran, those changes will be captured by git and made available in the commit log.

Recent changes tracked by the scraper collection can be seen here: https://github.com/simonw/disaster-data/commits/master

Generating useful commit messages

The most complex code for most of the scrapers isn't in fetching the data: it's in generating useful, human-readable commit messages that summarize the underlying change. For example, here is a commit message generated by the scraper that tracks the http://www.floridadisaster.org/shelters/summary.aspx page:

florida-shelters.json: 2 shelters added

Added shelter: Atwater Elementary School (Sarasota County)
Added shelter: DEBARY ELEMENTARY SCHOOL (Volusia County)
Change detected on http://www.floridadisaster.org/shelters/summary.aspx

The full commit also shows the changes to the underlying JSON, but the human- readable message provides enough information that people who are not JSON- literate programmers can still derive value from the commit.

https://github.com/simonw/disaster-data/commit/7919aeff0913ec26d1bea8dc

Publishing to Slack

The Irma Response team use Slack to co-ordinate their efforts. You can join their Slack here: https://irma-response-slack.herokuapp.com/

Some of the scrapers publish detected changes in their data source to Slack, as links to the commits generated for each change. The human-readable message is posted directly to the channel.

irma-scrapers's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.