GithubHelp home page GithubHelp logo

tsdataclinic / subwaycrowds Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 1.0 35.42 MB

A tool to estimate how crowded your subway trip is likely to be.

Home Page: https://subwaycrowds.tsdataclinic.com/

License: Apache License 2.0

HTML 1.61% CSS 0.28% TypeScript 48.09% Python 33.34% JavaScript 0.42% SCSS 2.84% Jupyter Notebook 13.42%
nyc nyc-opendata mta subway

subwaycrowds's Introduction

Netlify Status

Subway Crowds

Plan your commute better

We built SubwayCrowds to estimate how crowded your subway trip is likely to be.

As the city continues to adjust to the new normal and people begin heading back to work and school, a central question is how will this work given NYC commuters reliance on public transportation. Is it possible to move so many people while maintaining social distancing? To help inform this question, SubwayCrowds is designed to identify for specific trips when subway cars are likely to be most crowded so that individuals might alter their travel time or route.

Methodology

1. Cleaning Schedule Data

  • Concatenate GTFS data pulled every minute for a given time range
  • Drop duplicates (on start date, time, trip_id, station) so we don’t double count the same train in the same station twice, and keep the latest time the data is available for each train reaching each station
  • Infer starting time of trip in cases where its missing by identifying the earliest available information on that trip
  • Define new unique ID for trips (as trip IDs used in raw data repeat across days)
  • For each trip, fill in the starting station if missing (this happens a lot) to be 2 minutes before the first stop on the route.
  • Calculate length of each trip and exclude trips < 25% of max trip length

2. Cleaning & merging Turnstile Data

  • Aggregate total entries and exits by Station and timestamp, consolidating counts for stations with multiple turnstiles (eg. Times Square, Penn station)
  • Exclude rows with wild jumps in counts (negative or >10000 in 4hrs)
  • Quadratic interpolation of cumulative counts to every minute
  • Correct for interpolation bias around peak and lean hours
    • Set count at 6am to be 10% of the count from 4am-8am
    • Set count at 9am to be 40% of the count from 8am-12pm
  • Merge with schedule data
  • For each train arrival, calculate total entries and exits since the last train at the station

3. Trip assignment & heuristics

  • For a given line and direction at station for an hour to approximate which direction a person goes when entering a station we use:
    • Entry weight = 1 - cumulative exits along route after this station at this hour / total exits along the route in either direction at this hour
    • Exit weight = 1 - entry weight
    • Normalize weights as a proportion of all the lines in the station
    • Find service changes in the schedule and impute weights for these as the average for that station (to handle cases like C train running along F line)

4. Crowding Estimation

  • For the first train of the day (around 5am), we set people waiting at the station to 0 (Stations are meant to be closed between 1 and 5 am, yet we see a few entries in the station between these hours)
  • We define entry_exit_ratio as the average daily ratio between overall entries and exits (typically between 1.2 and 1.4) to account for individuals exiting the station through the emergency exits (we use 1.25 currently)
  • For each stop, we calculate the following: (initialized to 0)
    • waiting[t] = waiting[t-1] - train_entries[t-1] + total_entries_since_last_train
    • train_entries[t] = waiting[t] x entry_weight
    • train_exits[t] = min(total_exits_before_next_train x entry_exit_ratio x exit_weight, crowd[t-1])
    • crowd[t] = crowd[t-1] - train_exits[t] + train_entries[t]
  • Aggregate estimates for each hour for each line and station

Developing

Thank you developing this tool with us! before you start, do checkout our Roadmap and Contributing guidelines.

To develop the web-app locally, run

yarn
yarn start 

To set up the python environment, run

conda config --append channels conda-forge
conda create -n {env_name} --file scripts/requirements.txt

## to have the environment showup as a kernel on jupyter
python -m ipykernel install --user --name {env_name} --display-name "Python ({env_name})"

To generate crowd estimates, edit global variables at the top and run

python scripts/crowding.py

Directory Structure

SubwayCrowds/
├── LICENSE
├── README.md               <- The top-level README for developers using this project
│
├── scripts
│   ├── data                <- Other data used for crowding estimation such as crosswalks, GTFS static schedule, etc.
│   ├── gcs_utils.py        <- Utility functions for accessing data from Google Cloud Storage bucket 
│   ├── gtfs.py             <- Processing real-time gtfs data
│   ├── tunrstile.py        <- Cleaning and interpolating turnstile data
│   ├── heuristics.py       <- Logic for trip assignment and crowd estimation
│   ├── crowding.py         <- Pipeline for generating subway crowd estimates
│   └── Crowding.ipynb      <- Notebook version of crowding.py 
|
├── public                  <- Static files used by the application
│
├── requirements.txt        <- Packages to build the python environment
│
├── src                     <- React front-end application structure
│   ├── Context             
│   ├── Hooks               
│   └── components

Datasets used

The links to the data used to generate crowd estimates are below:

subwaycrowds's People

Contributors

kaushik12 avatar stuartlynn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

uhspot

subwaycrowds's Issues

Rachael's comments

MTA crowding app tweaks:

  • I had to zoom out to 50% to see the top graph in chrome

top graph:

  • likely is misspelled in graph title
  • we might need a hedge like "estimated maximum # of ppl..."

bottom graph:

  • there should be a prompt for user to change the slider to pick the time they are interested in seeing and a space separation between the top graph and the time slider for the second graph (as is they appear as one - like the slider is for the top graph)
  • change red to another color - orange? red is a bit jarring

Twitter share not working

When attempting to share via Twitter, we get redirected and the tweet is pre-populated but for some reason doesn't get tweeted. Any URL restrictions on twitter side causing this?

Presenting more accurate information

From the analysis, we have crowd estimates for each train in a given direction. The current version is an average for the train (over both directions in an hour on a weekday). Given, we've moved to a trip based presentation of the results, It'd be better if the estimates are more accurately represented given the direction of travel. @stuartlynn, @caohoangha126 let's figure out the nuances of this sometime.

Add header and modals

Simple header of Data Clinic logo on white background and Links to an "About" and "Feedback" page that open up as Modals.

Mobile UI tweaks

Final check of mobile UI after below changes affected it

  • Header seems too big
  • Starting station select text box overlaps with first station in the dropdown making it hard to select
  • date-range text in the footer too big and overflows
  • additional padding needed between slider and by stops chart
  • If possible About and Feedback modal made 100% width only on mobile
  • Data Clinic and Contribute sections look bad on small screens
  • Methodology png seems to shrink a lot on smaller screens

Text edits

  • Add space after "line." at the top
  • Add period after the name of the terminus at the top
  • Change the key to sentence case
  • Axes labels on top graph
  • Change 12am to 12pm in the slider
  • Add space between number and am/pm in the slider
  • Might be more intuitive for people if we put the time interval down for the slider rather than a single # like 1pm
  • Title for bar graph
  • "Share this graph" - just applies to second graph? Or change to "Share these graphs"

Toggle for Weekday/Weekend

Similar to the slider for time, a simple toggle for Weekday/Weekend might be a nice to have. Easy to get the data in the form needed for it.

Standardize css

As we have quickly built this the styling specifications are a bit all over the place, some in Styled Components some in app.css some inline (this is my fault).

Post beta we should standardize these in to a single system, probably styled components

Sort stations list for a line

A pre-sorted list would be easier for someone not looking to type in (on mobile). Also, some stations are incorrect/missing. To be investigates in the crosswalk.

Find a GTFS-RT processing pipeline

We've had trouble processing GTFS-RT data consistently. Try to find work done by others that we could use to make the estimation run on realtime data vs. static schedule

Incorrect routes

The ordered route getting generated are incorrect for several of the lettered lines (F,G,N,A,M etc.)

Reduce modal max-width

With limited text on large screens modals seem to really stretch everything out. having a max-width for it might make it look a little nicer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.