GithubHelp home page GithubHelp logo

instacart-competition's Introduction

```{r setup, include=FALSE, echo = FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## Introduction and review to the Instacart Competition

**Feedback**
---
Since I and my friend, Sam Lin, both love to explore data science field, we decided to conquer this competition as a team. I am really appreciate his suggestions and viewpoints, I learn a lot from the experience in this contest. The exchange of ideas seriously help me to achieve higher score on the leaderboard. If you are a data science lover and tend to team up with someome, please feel free to contact me. 

**Review**
---
This competition is different those competitions which I participated before. It do not offer merely two datasets, training set and testing set. Instead, it offers several csv file and ask to manipulate them into a well-structured file. At first, I faced some challenges in dealing with this issue, since I do not really understand what to do. With Sam's help, I generated a table with roughly tens millions rows. Compared to those massive projects, it is a tiny dataset, my computer, though, is unable to affort this tremendous dataset. This reminds me the importance of memory usage. In R, the package *Datatable* allows you to make good use of memory and prevent from executing unnecessary copy when you handle your dataset.

After solving the dataset issue, I started my project by using logistic regression and origianl features to predict the items. Not surprisingly, it yield roughly 0.26 F-value. To my surprise, purchase hour and purchase weekday do not have significant impacts on the consequence, it seems not match what we commonly beleive. These two features reflect extremely low importance in predicting the results. I am not sure whether we make correct use of these two features. We generate a lot of strong features through feature engineering. You could check these newly-generated features in my project code. After completing this procedure, we achieved roughly 0.36 F-value. It was not a remarkable score then; therefore we switched our model to other state-of-art models, like GBM, GLM, and XGBOOST. Fortunately, this yields 0.38 F-value, which could ranked among top 20% then. Since I did not have enough time to compete with other competitors, I did quit after achieved this score. It was an amazing experience that allowed me to learn how to conquer more challenging problem.






instacart-competition's People

Contributors

avalonliberty avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.