The 500 foot cab ride
The data was too large to upload to github, but it's all available at the NYC TLC website. I really should be using a more sophisticated manner to deal with big data, but for now it's all simple python scripts. taxi_distances.py divides all the trips into different length buckets; the results are in taxi_distances.csv. Where all those microrides occured is done via taxi_map.py and logged in taxi_coordinates.csv. And finally, the hard-to-explain cash-paid and negotiated rides are sorted for with shady.py and logged in shady.csv. All these dataset are fed into taxi.R to create the graphs.