GithubHelp home page GithubHelp logo

sbavon / kaggle-nyc-taxi-fare-prediction Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 2.0 1008 KB

My solution for Kaggle NYC Taxi Fare Prediction ( ranked 21st/1463)

Python 100.00%
kaggle regression machinelearning hdbscan lightgbm datacleaning dataexploration

kaggle-nyc-taxi-fare-prediction's Introduction

Kaggle NYC Taxi Fare Prediction Kaggle Solution (Top 2% Ranked 21st/1478)

This repository is the solution that obtains the top 2% ranking of NYC Taxi Fare Prediction competition in Kaggle.

Data cleaning ( refer to data_cleaning.py)

  • remove null records
  • remove records whose locations are not within range provided in test data
  • remove data points in sea
  • eliminate outlier according to fare distribution

Data preprocessing ( refer to Data_preprocessing.py)

  • the new feature cluster is added.
    • during data exploration, I found that the fare/distance ratio is varying according to the location. So, I add the new categorical feature to specify the area of the dropoff location and pickup location
    • I used HDBScan to get the clustering model. Then, I use this model to predict the area of each record. cluster
  • the new feature distance is added
  • the new feature distance to airport is added
  • categorical data are changed to float32 to prevent memory surge due to Lightgbm python package. (The library will convert all data to float. So if the data is integer, new data will be created)

train and predict ( refer to train_predict.py)

  • lightgbm is used, and it was trained in the Amazon EC2 instance
  • With this model, the test score is 2.85311

kaggle-nyc-taxi-fare-prediction's People

Contributors

sbavon avatar

Stargazers

 avatar  avatar

Forkers

afcarl luhongkai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.