avalonliberty / instacart-competition Goto Github PK
View Code? Open in Web Editor NEWThis repository is created to make a record of my experience in the Instacart Competition
This repository is created to make a record of my experience in the Instacart Competition
```{r setup, include=FALSE, echo = FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Introduction and review to the Instacart Competition **Feedback** --- Since I and my friend, Sam Lin, both love to explore data science field, we decided to conquer this competition as a team. I am really appreciate his suggestions and viewpoints, I learn a lot from the experience in this contest. The exchange of ideas seriously help me to achieve higher score on the leaderboard. If you are a data science lover and tend to team up with someome, please feel free to contact me. **Review** --- This competition is different those competitions which I participated before. It do not offer merely two datasets, training set and testing set. Instead, it offers several csv file and ask to manipulate them into a well-structured file. At first, I faced some challenges in dealing with this issue, since I do not really understand what to do. With Sam's help, I generated a table with roughly tens millions rows. Compared to those massive projects, it is a tiny dataset, my computer, though, is unable to affort this tremendous dataset. This reminds me the importance of memory usage. In R, the package *Datatable* allows you to make good use of memory and prevent from executing unnecessary copy when you handle your dataset. After solving the dataset issue, I started my project by using logistic regression and origianl features to predict the items. Not surprisingly, it yield roughly 0.26 F-value. To my surprise, purchase hour and purchase weekday do not have significant impacts on the consequence, it seems not match what we commonly beleive. These two features reflect extremely low importance in predicting the results. I am not sure whether we make correct use of these two features. We generate a lot of strong features through feature engineering. You could check these newly-generated features in my project code. After completing this procedure, we achieved roughly 0.36 F-value. It was not a remarkable score then; therefore we switched our model to other state-of-art models, like GBM, GLM, and XGBOOST. Fortunately, this yields 0.38 F-value, which could ranked among top 20% then. Since I did not have enough time to compete with other competitors, I did quit after achieved this score. It was an amazing experience that allowed me to learn how to conquer more challenging problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.