GithubHelp home page GithubHelp logo

privatekit / safepaths-datascience Goto Github PK

View Code? Open in Web Editor NEW
0.0 4.0 2.0 391 KB

Repository with various data science projects in the branches (More coming soon)

License: MIT License

R 10.46% Python 89.54%

safepaths-datascience's Introduction

safepaths-datascience

Repository with various data science projects in the branches (More coming soon)

Data files

Links to several repositories of publicly available data are stored in the data_sources.json with the following structure:

data_sources.json:
    |--Global:
        |--source:link
    |--Nations:
        |--Regions:
            |--Provinces:
                |--source:link

Ingesting algorithms

Ingesting pipelines need to check:

  • file format: csv, json, etc.
  • based on the file format, select the longitude, latitude and timestamp values (where present) (in progress)
  • logging errors (in progress)
  • build a harmonized file containing only those three columns
  • harmonize values (need to avoid doubling of data qhen present from different sources)

safepaths-datascience's People

Contributors

andreanuzzo avatar johnpalmer avatar ray-dedhia avatar tremblerz avatar

Watchers

 avatar  avatar  avatar  avatar

safepaths-datascience's Issues

Date format

At the moment the data ingestion script do not ingest timestamps or similar. The problem is that public datasets will have different formats.

  1. We need something that recognizes the date/timestamp format (without loading heavy packages as pandas, if I understand well) in each dataset we ingest
  2. What format we implement eventually? POSIX, I imagine.
  3. This will be fundamental to periodically clear the data every 15 days

Google doc

At the moment utils.py only recognizes .csv and .json. While I think those will account for most dataset, we might want to implement the Google API calls for google spreadsheets as well.

Harmonization of records

How do we make sure that public data do not overlap and numbers are summed up?

To be taken into consideration: the public data I am currently pulling do not have mobile GPS data, but refer to reference pinpoint coordinates (i.e. one central point for each state/province, or the coordinates of the hospitals).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.