GithubHelp home page GithubHelp logo

poojasgandhi / social-impact-women-india Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 0.0 202 KB

This repository encompasses my submission for the WIDS (Women In Data Science) Challenge for 2018, held in Jan-Mar'18. It was an invite only competition based on InterMedia Survey Institute, a grant recipient of the Bill & Melinda Gates foundation in their Financial Services for the Poor program. Participants were required to predict gender based on demographic and behavioral information of survey respondents from India and their usage of traditional financial and mobile financial services.

Jupyter Notebook 100.00%
python machine-learning wids-datathon kaggle-competition

social-impact-women-india's Introduction

Financial uplift of women in India

This repository encompasses my submission for the WIDS (Women In Data Science) Challenge for 2018, held in Jan-Mar'18. It was an invite only competition based on InterMedia Survey Institute, a grant recipient of the Bill & Melinda Gates foundation in their Financial Services for the Poor program.

Participants were required to predict gender based on demographic and behavioral information of survey respondents from India and their usage of traditional financial and mobile financial services.

By predicting gender, idea was to explore the key differences in behavior patterns of men and women, and how that may impact their use of new financial services. Ideally, these findings would influence plans to reach women in developing economies and encourage them to adopt new financial tools that would help to lift them and their families out of poverty.

Approach

My objective that I inetnded to achieve through this competition was especially to learn model parmeters tuning in Python. I approach this challenge through two different training data

Using only the columns present in Data Dictionary

All the column names in the data were coded and not representative on the data. So, my first assumption was probably the organizers didnt want participants to use the variables not present in data dictionary. Accordingly, I started with a subset of the original dataset - only the columns present in dictionary

I decided to start with the Gradient Boosting Model since it is suppossed to be a winning solution for lot of Kaggle challenges Starting with the base model (with default parameters), I used Grid Search to methodically tune the parameters of the model as below

  • Fixing learning rate and optimizing # trees
  • Optimize max depth and split size sample (Tree parameter)
  • Optimize the nim # samples in a leaf node (Tree parameter)
  • Optimize maximum # features in a tree (Tree parameter)
  • Optimize the sub sample proportion (Tuning parameter)
  • Optimize the learning rate (Tuning parameter)

Using all the columns

GBM For this I just use the base model with default parameter values and model with the same parameters as finalized in the first approach

I also try the default XGB (Extreme Gradient Boosting) and Random Forest models

Instead of tuning these models further, I figured there was a higher gain in building ensembles of the models built so far.

Final submission and result

Final submission was an ensemble of the GBM with all variables (parameters tuned based on approach 1) and tuned GBM with variables from data dictioanry only.

Our team's leaderboard standing was as were as below:

  • Private leaderboard score - 0.97233
  • Public leaderboard score - 0.97314
  • Rank - 39 (Top 17 %ile)

Project structure

  • WIDS_data_prep - Importing data, missing value imputation, feature engineering and other cleaning
  • WIDS_GBM_tuning - GBM model building and tuning (Approach 1 and 2)
  • WIDS_XGB_RF - XGB and RF model building (Approach 2)

social-impact-women-india's People

Contributors

poojasgandhi avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.