GithubHelp home page GithubHelp logo

Comments (14)

dataf3l avatar dataf3l commented on May 18, 2024 1

@wilschmidtt , I suggest in addition to the a safety_measures one includes a safety_measures_start_date so that when new countries adopt the measures, the model is still useful, given so many countries have different measures.

Also, we can make a poll where we ask individuals from all countries to participate and provide information so that we can fill out these data easily. if you write a google forms poll I can send it to friends in Nepal, India, US, Colombia, Chile, Mexico, Australia, Peru, Belgium, and France and I can also translate the poll to Spanish in order to share it with people from the latin american region.

All you need is one nerd per country and you are set, this person can become an information source, also the poll should ask people what is the source of their data.

If the problem is data collection, I think we can find the people to help.

Just send the questions in English, and I'll send back the data in CSV or whatever format you want.

Remember, the less questions, the more datapoints.

from data.

owahltinez avatar owahltinez commented on May 18, 2024

Hey WIlliam, thanks for sharing -- this is pretty cool! I think that adding all the columns you propose might make the main dataset a bit bloated, but some of them I'd love to add if we can find a reliable source for them. Specifically, I'd like to get a better understanding of where you got the SafetyMeasures data from. If we can get a reliable source for that, we could add a column to the dataset for:

  • Unknown (null)
  • No measures ("none")
  • International travel restricted ("international_travel")
  • Local travel restricted ("local_travel")
  • Shelter in place enacted ("shelter_in_place")

If you want to, you can open a PR and edit the relevant metadata_*.csv files and fill the Population and SafetyMeasures columns. Unless I missed something, you can infer the other columns that you mentioned from the data itself.

from data.

wilschmidtt avatar wilschmidtt commented on May 18, 2024

The SafetyMeasures column wasn't fetched from any online source. I looked online for a site that reported this information but I couldn't find anything useful. I simply populated this column by dividing the number of confirmed cased by the population, and when the number of confirmed cases exceeded 0.002% of the population, I changed the SafetyMeasures column from 0 to 1. This method is a bit arbitrary, so I could see why it might not be the best feature to include. I simply chose 0.002% based on observing at what point different locations started to take action. From what I observed, this came right around 0.002% of location's population being infected by the virus.

I agree that international_travel, local_travel, and shelter_in_place would all be much more reliable features. The only problem is that I am not sure where such data would be available.

I will open a PR to edit the metadata populations in the meantime.

from data.

dataf3l avatar dataf3l commented on May 18, 2024

actually, I just noticed there is a date on the dataset, so nevermind, my suggestion doesn't make sense.

from data.

owahltinez avatar owahltinez commented on May 18, 2024

@dataf3l I think your idea is still valid, we can put the safety measures in its own CSV table and them merge during the data processing stage. In my opinion the biggest difficulty would be to keep it up to date, since measures are changing very fast across different countries.

from data.

wilschmidtt avatar wilschmidtt commented on May 18, 2024

@dataf3l this could still be a good idea. Like I said, the 'SafetyMeasures' column is pretty arbitrarily chosen at this point. I couldn't find a good source of data indicating when each location started issuing quarantines. I had to search all over the web, and each bit of information that I found was exclusive to one location, so trying to fill it in for every location would take far too long.

From what I observed, it seemed that right around 0.002 % confirmed is when the governments started to feel the pressure and issue warnings to the public. I tried to use this information to infer the date in which preventative measures were put into place, but if there were actual sources that could verify this date then I think that would be even better.

from data.

wilschmidtt avatar wilschmidtt commented on May 18, 2024

@dataf3l there is also the problem of keeping it up to date. The nice thing about the 0.002 % threshold is that it automates the process and doesn't require any manipulation of the data by the user.

from data.

dataf3l avatar dataf3l commented on May 18, 2024

I think that's interesting, what about renaming the column HasPassed2PercentSoWeGuesstimateMeasureHaveBeenTakenButHaveNoRealDataSoIt'sJustAGuess :p

from data.

dataf3l avatar dataf3l commented on May 18, 2024

I'm merely joking, I see having no data is clearly an issue. having up to date data will also be an issue.

from data.

wilschmidtt avatar wilschmidtt commented on May 18, 2024

@dataf3l this is a decent suggestion. But I was thinking something more along the lines of ArbitrarilyChosen2PercentBecauseImTooLazyToFindRealSourcesAndUpdateTheDataEachDaySoThisIsAllWeGot

from data.

dataf3l avatar dataf3l commented on May 18, 2024

here is what the dataset could look like:

CO: 2020-03-19:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Colombia
PE:2020-03-22:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Bolivia
BR:????:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Brazil
CL:2020-03-22:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Chile

here is where I got the data from:

Other countries:
https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_South_America#Argentina
Other continents:
https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic_by_country_and_territory

I think as people spend more time on it, it is likely that we'll be able to improve the dataset.
Let's make this happen.

If you make a Google Forms doc, I'll send it around :)

from data.

owahltinez avatar owahltinez commented on May 18, 2024

@dataf3l thank you for those links, that makes me wonder if a better approach would be to propose the creation of a new table in the Wikipedia page rather than trying to collect that data in this repo. That way, the data will be made available to a lot more people and we can still scrape it from Wikipedia ourselves.

Personally, I would prefer to keep the efforts in this repo focused towards (automated) data aggregation rather than the creation of crowd-sourced data -- even though crowd-sourced data was the original intent of this repo!

from data.

dataf3l avatar dataf3l commented on May 18, 2024

Should mankind make an app to track movements and self-report if one has symptoms so that people can avoid paths with people with symptoms?

from data.

owahltinez avatar owahltinez commented on May 18, 2024

FYI I have added mobility and government measures datasets which are relevant to this discussion.

from data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.