GithubHelp home page GithubHelp logo

data's Introduction

Proposal

Through this project, we seek to understand the relationship between internet access and inequality on a global scale. We will use data from the United Nations to compare selected socio-economic indexes to international communication measurements.

We have outlined our questions, data sources, and folder structure below for those interested in replicating our work.

Questions

What economic indicators (race, occupation, community poverty rate) are most strongly correlated with internet access rates? Can we build a model that accurately predicts said rates?

Are internet access rates a stronger predictor of poverty rates than other forms of social investment (ie roads, schools, hospitals)?

Do these effects extend across internet technologies (cell phones and broadband internet)? If not, which type of infrastructure investment is better.

Motivation

We are interested in this problem as data scientists because our field is a mixed bag. On one hand, big data can be used to influence elections, spread hateful propaganda, and be used to track every purchase and decision we make. These political consequences are well known. However, the Internet has a history of advancing economies, and those without the internet tend to be left behind. To speak about this in particular, we need to investigate the ways in which internet access influences occupational outlook while controlling for other confounding factors like geography, race, and infrastructure investment more generally.

Data

American Community Survey
Annual Survey of State Finances

Methodology

We will build several models for predicting poverty rate, using both the generalized logistic model and the generalized linear model. In this way, we'll see how things like internet access and infrastructure investment influence poverty rates. The American Community Survey includes internet access rates, poverty, race, industry, language, occupation, place of birth, and familial origin. Using this data alone, we should be able to see if race or occupation is a better indicator of aggregate povery than internet access rates.

Hypothesis

Pew Research says that 20% of teens are unable to finish their homework due to the digital divide. The end result of this is likely low-skill careers and lower incomes. In fact, the internet tends to raise the tide for all, as a breadth study (also by Pew) showed that per capita income and access rates are highly correlated. We'd like to investigate the relationship between technology and the economy and see if we can build models resilient to the particle type of device. Previous work has used infrastructure invesment to build logistic models for poverty using satellite images of infrastructure. It is also well known that poverty and broadband access rates are highly correlated. However, it is unknown if there is an underlying causal factor or if internet can, by itself, lift people out of poverty. The McKinsey Global Institute did a massive study on the economic potential of internet investment in China that will inform our approach in this matter. Finally, the Internet Society, a global organization that builds internet infrastructure (mostly in the developing world), has compiled a list of internet penetration rates and other such metrics by country across the world. However, due to data collection limitations and the quality of data sources across continents, it would be impossible to investigate these things wtih respect to more generic features like race and infrastructure. Since the United States has a non-uniform income distribution across states, this should allow us to draw from a breadth of circumstances. Due to the multiplicative of effects in education, business opportunities, and spending opportunities available on the Internet, we suspect that governmental investment in digital infrastructure will have at least as much affect as road or school spending. Additionally, we suspect that this multiplier is reduced for cellular infrastructure relative to fixed (broadband) infrastructure because of the productivity gains associated with PCs over smartphones. This research will reveal to governments (both local and national) what kinds of infrastructure investment yields the most economic gains in the digital age. To our knowledge, this particular question has not been answered.

Filesystem

  • old/: old paper
  • docs/: contains the github page of this project, viewable at thenextbilliononline.github.io/data
  • data: contains the data and Rmd files
    • Community: Contains social indicators from ACS
    • Finance: Contains state spending data from ACS
    • Time: Contains an attempt at using the census api to scrape for time-series data, uncomplete
    • Cleaned...csv cleaned data used in the report file
  • Presentation.Rpres: R slides document
  • Presentation-cache: holds pre-modelled data for slides document
  • Presentation-figure: holds images for slides document
  • Presentation.md: Markdown version of slides document
  • Report.pdf: The final report
  • Rpubs Presentation
  • Readme.md: this document

data's People

Contributors

jemceach avatar mgroysman avatar simplymathematics avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

Final Document

  • statistical analysis
  • validate data with a graphic
  • graphic for conclusion
  • one extra thing (put it on a webpage)
  • describe one challenge
  • motivation
  • insight
  • reproducibility
  • on-time
  • drop extraneous tables
  • plot for each indicator
  • presentation

Social/Economic Indicator

  • Retool it so it only has data 2000-2016 in line with the Internet data.
  • Fix api call to grab all data. The original calls stopped at Marshall Islands due to request size
  • arrange each indicator by year (check out my newly updated "internet" file for an example). I see that you tried to avoid doing this, but I really, really need these to have the same basis (year).

If you can do that, I got the last bit. I just need a vector that looks like my internet over time one. You can look at the internet.Rmd or the main file.

Internet Indicators

  • Mobile Penetration Rates
  • Fixed broadband penetration rates
  • Users per Capita
  • Number of Servers
  • Bandwidth per person
  • Mobile investment
  • Total number of users
  • Include previously scraped data

Further work

  • Include previously scraped data
  • By country analysis

Proposal

  • Motivation
  • Likely Data Sources
  • Two of These (CSV, API, Scraped, relational db)
  • One of these data sources must undergo a transformation (gather, spread, etc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.