GithubHelp home page GithubHelp logo

nyc_taxi_trip's Introduction

This project involves predicting the yellow taxi trip duration between any two locations X and Y within NYC.

Instructions on creating the virtual environment for this project

  • Create Conda environment
    • Assuming you already have VSCode and anaconda installed. If not, please find the links to install those first. If you don’t worry about storage space and do not want to install the basic dependencies yourself, you can download the Anaconda distribution, otherwise, just go with Miniconda.
    • See the `environment.yml' file for a list of all dependencies. If you need an additional package to do additional work, please add it here.
    • Create a new Conda environment using the environment.yml file. Add it to Jupyter kernel. And install Jupyter lab (you can also replace this with Jupyter notebook if you prefer that)
    > conda activate <env-name>
    > python -m ipykernel install --user --name=<custom-kernel-name>
    > jupyter lab 
    
    • Open jupyter notebook and select the above Python interpreter
    • If you need to update the environment at any stage, say you need to add a new dependency or change an existing one, edit the environment.yml file and do;
    > conda env update --file environment.yml --prune
    
  • For my dev work, I prefer to edit the docs, scripts, etc. files in VSCode, and also run the .py files in the terminal. I use Jupyter lab only for jupyter notebooks.

The project covers the following topics:

EDA

  • Raw Data import
  • External data imports
  • Data cleaning
  • Visualizations
  • Data wranging based on EDA observations
  • Inferences
  • Preparing data attributes for modeling

[ML] -- to be added in the next blog post

Introduction:

This project is inspired by the NYC taxi trip duration challenge hosted on Kaggle. But instead of using the train and test data from Kaggle taxi data was queried directly from the city of NY's website. Only Yellow taxi trips were considered for this project. More information about different taxi options available in NYC can be found here.

The motive of the project is to identify the main factors influencing the daily taxi trips of New Yorkers. The taxi trips data is taken from the NYC Taxi and Limousine Commission (TLC), and it includes pickup time, pickup and dropoff geo-coordinates, number of passengers, and several other variables. The taxi trips considered in this project are only for the year 2016 and only those trips will be considered whose exact pickup and dropoff geo-coordinates are available.

Policy researchers at the TLC and NYC can use the project observations and ML models to observe changing trends in the industry and make informed decisions regarding transportation planning within the city.

File descriptions:

-- data/raw_data/nyc_2016_raw_sql.csv. Contains around 4.2M trips within New York City (NYC) taken in 2016. NOTE: None of the data files were uploaded in this repo because of their large size but if you run the EDA notebook by cloning this repositories all the data files will be populated in the correct sub-folders.

-- The data includes the first 6 months of the year 2016 only because the exact pickup and dropoff locations were available for the first 6 months only. It was pulled from the new york city's website using a query.

Data fields:

  • vendor_id - a code indicating the provider associated with the trip record

  • pickup_datetime - date and time when the meter was engaged

  • dropoff_datetime - date and time when the meter was disengaged

  • passenger_count - the number of passengers in the vehicle (driver entered value)

  • pickup_longitude - the longitude where the meter was engaged

  • pickup_latitude - the latitude where the meter was engaged

  • dropoff_longitude - the longitude where the meter was disengaged

  • dropoff_latitude - the latitude where the meter was disengaged

  • store_and_fwd_flag - This flag indicates whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server

  • Y=store and forward; N=not a store and forward trip

  • trip_distance - distance of the trip recorded in miles (was removed because this field won't be available for unseen trips/test data)

nyc_taxi_trip's People

Watchers

Prathamesh Pawar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.