GithubHelp home page GithubHelp logo

mta's Introduction

MTA Exploratory Data Analysis

This was the first and only group project assigned at Metis. The prompt is as follows:

"WomenTechWomenYes (WTWY) has an annual gala at the beginning of the summer each year. As we are new and inclusive organization, we try to do double duty with the gala both to fill our event space with individuals passionate about increasing the participation of women in technology, and to concurrently build awareness and reach.

To this end we place street teams at entrances to subway stations. The street teams collect email addresses and those who sign up are sent free tickets to our gala.

Where we’d like to solicit your engagement is to use MTA subway data, which as I’m sure you know is available freely from the city, to help us optimize the placement of our street teams, such that we can gather the most signatures, ideally from those who will attend the gala and contribute to our cause.

The ball is in your court now—do you think this is something that would be feasible for your group? From there we can explore what kind of an engagement would make sense for all of us."

The purpose was to meet some fellow students and become familiar with some Python tools used for data scraping, exploratory data analysis, and visualization. I was partnered up with fellow students Adi Guar and Genevieve McGuire on Monday, April 1st, with a due-date of Friday, April 5th. Following the presentation, I refactored our code and shelved it, leaving behind just enough notes to reconstruct what I did.

After having worked in industry for a year, I thought it would be appropriate to revisit this project to see how far I've come. The first time I did this project, I thought it was a herculean effort, requiring hundreds of lines of code. However, having now done many similar EDA projects, I found that I was able to simplify all the code to five easy-to-follow Jupyter notebooks and store the data in just a few CSV files.

  1. get-data-from-mta.ipynb

This automatically downloads turnstile_YYMMDD.txt files from the MTA Website and outputs them as a csv files in the data/downloads folder.

  1. get-coordinates-from-google.ipynb

This opens one of the files in the data/downloads folder and uses the station names to grab the latitude and longitude data using Google's Geocoding API, then exports the data to data/station-coordinates.csv.

  1. count-people.ipynb

This reads in csv files in the data/downloads folder and reshapes them into a new table in which the index is the timestamp and the columns are stations. It then concatenates the data for an entire year and computes the count differences between different timestamps. Finally, it exports the data to station-counts.csv.

  1. plot-counts.ipynb

This gets data from station-counts.csv and plots a histogram of number of people per week and a time-series graph for the week of 20180324 to 20280331. For historic reasons, I decided to keep it this way rather than going back and computing mean statistics across all the weeks obtained.

  1. plot-gmaps.ipynb

This accesses data on '2018-03-24 00:00:00' and plots a heatmap of the value-counts at each given station for the top 10 stations. It also imports tech-hubs.csv from the data/extra folder and plots those locations using gmaps API.

  1. zip.ipynb

This zips all the files in the data folder to save space. You may also use this to unzip all the files.

Revisiting this project is a reminder that clarity in code is clarity in thoughts, and good code is always written to be modular and adaptable. If you have any questions, feel free to reach out to me at [email protected].

mta's People

Contributors

harrisonized avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.