###These are my scripts used for the Allstate Challenge on "How severe is an insurance claim?"
This is my first (serious) attempt at Kaggle and i manage to get 46/3055 in the private leaderboard with this submission. You can find my Kaggle profile here.
There is a seperate blog post avaliable here, which talks about what i have learnt and some pitfalls.
The order of folders are as follows:
Hyperopt Scripts and Results are also included here.
- Fork this Repo and make a directory named input.
- Download the train & test set from here and unzip them.
- Run the scripts
power2.py
,power3.py
andfourm_1106_prep.py
- You should have 3 additional files for each script.
-
The 3 scripts to generate the xgboost out of bag and predictions are
The outputs and params are available in the notebook. Note that not all outputs are used in the end for second level modelling. Refer to Second level modelling.
The motivation is from this script which is in this github repo found here.
I used 4 different sets of parameters.
- The one found in the forum.
- Same Model with a different seed.
- By changing the second layer from 200 to 250 nodes.
- Training the same model without using log transform.
The outputs can be found in the csv's
in the keras folder.
- Everything is here
- Everything is here
- Submission file
allstate1118.34514548.csv
which would be ranked 102 in the private leaderboard (top 5%).
- Combine all first level models into one data frame with combine_data.ipynb which outputs the training set and test set.
- Run a single 5-fold with early stopping on the out of bag data set with keras_stacking_single_fold.ipynb.
- You may use the pre-trained models included in that folder to skip the training.
- Bagged the above model 5 times with keras_stacking_bagged.ipynb.
- Take the second level model output and feed it back with the first level models to find the optimum weights with fmin_second_level.ipynb.
- Final Submission
allstate1117.71816974.csv