GithubHelp home page GithubHelp logo

santader_kaggle_stacking's Introduction

Santader_kaggle_stacking

Santander classification Kaggle competiton: https://www.kaggle.com/c/santander-customer-satisfaction

I used this competition to create model ensembles which usually improve the loss score. Here is a nice explanation: http://mlwave.com/kaggle-ensembling-guide/

I practice with stacking, here is the procedure I followed:

2-fold stacking:

  • Split the train set in 2 parts: train_a and train_b
  • Fit a first-stage model on train_a and create predictions for train_b
  • Fit the same model on train_b and create predictions for train_a
  • Finally fit the model on the entire train set and create predictions for the test set.
  • Now train a second-stage stacker model on the probabilities from the first-stage model(s).

More detailed info:

  • First, I do some basic feature engineering and thn I do feature selection to remove noise (using best features from Gradient Boosting).
  • For first-stage models I used RF Gini, RF Entropy, 2 x Gradient Boosting, and AdaBoost.
  • For second-stage model I tried 3: 1) Logistic regression 2) RF 3) Weights on 1st stage models.

Conclusion - next steps.

  • I got a little improvement with stacking but not a big one. Throwing more models into the mix will probably help.
  • Use skitlearn to find weights for 2nd stage. I did it manually and it is very expensive.
  • Save ensambles for the last step of you analysis, after getting you best score and having tried different features. Evaluate if the gain is worth the effort.

TODO:

  • Add comments !!!! I'm sorry for not doing this before hand; I know it makes the project unreadable. Lack of time is responsible but I’ll introduce them as soon as I can.

santader_kaggle_stacking's People

Contributors

nachocarracedo avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.