GithubHelp home page GithubHelp logo

open-data-day's Introduction

Welcome to Open Data Day 2019!

Today, we're going to give you raw and processed GPS data for Muni buses in San Francisco.

Your challenge is to clean, analyze, and/or visualize that data and build something useful for San Franciscans. And, along the way, we hope you'll learn how the whole open data analysis process works

UPDATE after Open Data Day

Thanks to everyone who participated! We've merged in your pull requests, which are now in the master branch.

The challenges

We have three challenges for you today, each one very similar to a challenge our team of coders and analysts has had to tackle. For each, we'll give you data to work with and sample code. Your job is to build something great with the data.

At the end of the day, we encourage you to make a pull request with your code, and we'll incorporate the best code into the data visualization prototype we're building.

Challenge 1: Processing

We've gathered raw data on where every Muni bus has been every 15 seconds for over a year. This piles up to over 230 MB of data a day.

Our first challenge was to convert this raw data into "arrival times" -- so we could know when each bus reached each stop. Only then could we start making interesting analyses and visualizations.

In this challenge, we'll give you a subset of our raw GPS data for a day (trust us, analyzing all 230 MB of data takes way too long). We'll also give you an example of the algorithm we use to calculate arrival times, which we call the "eclipses" algotithm.

Your job is to make a better version of our arrival-time algorithm than the "eclipses" algorithm. We'll give you the raw data and our example code -- can you get better outputs?

To get started with this challenge, check out the 1-processing folder.

Challenge 2: Analysis

Once we had computed the arrival times, we needed to start analyzing the processed data to extract useful transit metrics.

In this challenge, we'll give you a sample of our pre-processed data, which shows when each bus reached each stop (e.g. bus #123 on route #1 reached stop #456 at 3:45pm). The metrics we've calculated so far include average waiting time at a given stop and average transit time between two stops.

Can you think up some more interesting metrics or find some interesting patterns? Some questions you could try answering:

  • At which time of the day are buses the latest?
  • At which time of the day are buses the slowest?
  • On which days of the week are buses faster or slower than average?
  • When does there seem to be the most traffic?
  • Did the bus ever take a detour or miss a stop?

To get started with this challenge, check out the 2-analysis folder.

Challenge 3: Visualization

The final challenge is to turn our data and metrics and present them in a useful, informative way.

We'll give you the raw GPS data and our processed arrival time data, as well as a sample Jupyter notebook with some demo visualizations.

What's the coolest and most interesting visualization you can make? For an added challenge, try adding in geographic, traffic, or economic data, which you might be able to get from other open data portals.

To get started with this challenge, check out the 3-visualization folder.

Which challenge should I do?

If you love data science and algorithms, or want to dig deep into iPython notebooks, try Challenge 1.

If you enjoy statistical analysis and want to draw in geographic, economic, or other datasets, try Challenge 2.

If you like graphs, want to make user-facing products, or want to try your hand at React and D3, try Challenge 3.

Getting ready to code

Fork this repository and choose the folder that corresponds to the challenge you want to tackle. Each folder will contain the full 230 MB dataset, a light version that includes a subset of the data, and a Jupyter notebook with our reference implementation of the code.

We suggest installing Anaconda to get iPython on your machine.

Then clone this repository and cd into it. Then run jupyter notebook in your terminal to browse, run, and play with our interactive Python notebooks.

Submission

If you want our project to use your code, just make a pull request of your fork! We'll review it and incorporate the best submissions into our prototype.

More about OpenTransit

Check out a (very early-stage) demo of our app! And check out the GitHub repo.

Here's a (very rough) mockup of our main app:

OpenTransit mockup

open-data-day's People

Contributors

hathix avatar jtanquil avatar manezinho avatar joannabate avatar khmccurdy avatar rileypredum avatar

Stargazers

 avatar  avatar

Watchers

 avatar James Cloos avatar Yu Chen Hou avatar Josh Freivogel avatar Eddy Ionescu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.