GithubHelp home page GithubHelp logo

kashenfelter / insight_fellow_background_analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from macyeh/insight_fellow_background_analysis

0.0 1.0 0.0 2.61 MB

Insight Data Science Fellow Background Investigation

Jupyter Notebook 60.09% HTML 39.91%

insight_fellow_background_analysis's Introduction

Insight_Fellow_Background_Analysis

Insight Data Science Fellow Background Investigation, including Data-clean, wrangle and analysis

Getting Started

Python and iPython Notebook are used

Prerequisites

The following Python Libraries are used in this investigation


1. regular expression (re)
2. requests
3. pandas
4. Beautifulsoup
5. folium and plugins
6. seaborn

End to End Flow


1. Web Scraping for data collection
2. Data Wrangle and Retrieve the Geo-Location of fellows
3. API and Mapping (Using Free API)
4. Background Investigation

Web Scrapping

I navigate the website of Insight, Insight Data Science Fellows to retrieve the background information of Insight's Fellow. Insight's website is created on a structured way which leads to the convinient way to scrap the structured data

The basic workflow in this section is like the following:

  1. Request the html data page by page => Utilize Requests
  2. Scrap the html data from pages in a structued way => Recommend BeautifulSoup for html data
  3. Manipulate String with Regular Expression
  4. Store the scrapped data into Dataframe format for the further wranggling => Use Pandas

Data Wrangle

Even with the web data successfully loaded into Pandas Dataframe, the scrapped data are sometimes very row. The row data in this web-scrapping action comes from two actions:

  1. Missing Data or Null Data or Non-PhD Background
  2. Miss Placement in the html

The row data will further requires cleanning and can be reasonable manipulated sometimes. The basic cleanning workflow is like the following:

  1. Identify the null data in the dataframe
  2. To see if the mis-placed data could be fixed or manipulated
  3. Save the data into new csv

The workflow can be implemented with Pandas

API and Mapping (Free API)

In this section, we utilize Folium Map to have the interactive marker. In addition, we can also have the plugin function of MarkerCluster to have the interactive aggregated number of regional cluster shown in Map

Results

Mapping

Major Background Investigation

In this section, we would use the following functions to have the background check of fellows:

  1. Pandas Apply => This is similar as SQL casewhen to have the catogorized data
  2. Visualize the distribution of the major => Use Seaborn

Results

Background

insight_fellow_background_analysis's People

Contributors

macyeh avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.