GithubHelp home page GithubHelp logo

mfreed / covid19-time-series-utilities Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dathere/covid19-time-series-utilities

0.0 0.0 0.0 344 KB

several utilities to help wrangle COVID-19 data into a time-series format

License: Creative Commons Attribution Share Alike 4.0 International

Shell 74.72% PLpgSQL 25.28%

covid19-time-series-utilities's Introduction

COVID-19 - time-series utilities

This repo contains several utilities for wrangling COVID-19 data from the John Hopkins University COVID-19 repository.

Requirements

Cloning

A note on cloning this repo, since the COVID19 directory is a git submodule:

  • after cloning, you must initiate the submodule. In the top level directory for the project, run git submodule init and git submodule update to clone the JHU Repo as a submodule

Content

The files in this directory and how they're used:

  • covid-19_ingest.sh: script that converts the JHU COVID-19 daily-report data to a time-series database using TimescaleDB.
  • covid-refine: OpenRefine automation script that converts JHU COVID-19 time-series data into a normalized, enriched format and uploads it to TimescaleDB.
  • schema.sql: Data definition (DDL) to create the necessary tables & hypertables.

Using the Timescale covid19-ingest script

  1. Create a TimescaleDB instance - download or signup
  2. Create a database named covid_19, and an application user covid_19_user
  psql
  create database covid_19;
  create user covid_19_user WITH PASSWORD 'your-password-here';
  alter database covid_19 OWNER TO covid_19_user;
  \quit
  1. Run schema.sql as the covid_19_user. VACUUM/ANALYZE require owner privs

    psql -U covid_19_user -h <the.server.hostname> -f schema.sql covid_19

  2. Install csvkit

    • Ubuntu: sudo apt-get install csvkit
    • MacOS: Using homebrew run brew install csvkit
  3. Using a text editor, replace the environment variables for PGHOST, PGUSER and PGPASSWORD in covid-19_ingest.sh

  4. Run the script

    bash covid-19_ingest.sh

  5. (OPTIONAL) add shell script to crontab to run daily

  6. Be able to slice-and-dice the data using the full power of PostgreSQL along with Timescale's time-series capabilities!

Using COVIDrefine

See the detailed README.

NOTES

  • the JHU COVID-19 repository is a git submodule. This was done to automate getting the latest data from their repo.
  • the script will only work in *nix environment (Linux, Unix, MacOS)
  • both scripts maintain a hidden directory called ~/.covid-19 in your home directory. -covid-19_ingest.sh checkslastcsvprocessed. Delete that file to process all daily-report files from the beginning, or change the date in the file to start processing files AFTER the entered date.

TODO

  • use postgREST to add a REST API in front of TimescaleDB database
  • create a Grafana dashboard
  • create a Carto visualization
  • create a Superset visualization

ACKNOWLEDGEMENTS

  • thanks to Avtar Sewrathan (@avthars), Prashant Sridharan (@CoolAssPuppy) and Mike Freedman (@mfreed) at Timescale for their help & support to implement this project from idea to implementation in 5 days!
  • thanks to Julian Simioni (@orangejulius) at Geocode.earth for allowing us to use the Geocode.earth API!

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

covid19-time-series-utilities's People

Contributors

jqnatividad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.