GithubHelp home page GithubHelp logo

anishjohnson / credit_card_default_prediction Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 10.15 MB

In this project we predict credit card defaults using classification models.

Jupyter Notebook 100.00%
machine-learning threshold-moving classification-algorithims xgboost-classifier random-forest-classifier logistic-regression knearest-neighbor-classifier shapley-additive-explanations

credit_card_default_prediction's Introduction

Credit_Card_Default_Prediction

The best travel credit cards of February 2022 - The Points Guy

Project Objective : Predicting whether a customer will default on his/her credit card.

Problem Description:

This project is aimed at predicting the case of customers default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. We can use the K-S chart to evaluate which customers will default on their credit card payments.

Data Description:

Basic User Data.

  • ID : Unique ID of each client.
  • LIMIT_BAL : Amount of the given credit (NT dollar) : it includes both the individual consumer credit and his/her family (supplementary) credit.
  • SEX : Gender (1 = male; 2 = female).
  • EDUCATION : Qualifications (1 = graduate school; 2 = university; 3 = high school; 4 = others).
  • MARRIAGE : Marital status (1 = married; 2 = single; 3 = others).
  • AGE : Age of the client (years)

History of Past Payment.

Scale for PAY_0 to PAY_6 : (-2 = No consumption, -1 = paid in full, 0 = use of revolving credit (paid minimum only), 1 = payment delay for one month, 2 = payment delay for two months, ... 8 = payment delay for eight months, 9 = payment delay for nine months and above)

  • PAY_0 : Repayment status in September, 2005 (scale same as above)
  • PAY_2 : Repayment status in August, 2005 (scale same as above)
  • PAY_3 : Repayment status in July, 2005 (scale same as above)
  • PAY_4 : Repayment status in June, 2005 (scale same as above)
  • PAY_5 : Repayment status in May, 2005 (scale same as above)
  • PAY_6 : Repayment status in April, 2005 (scale same as above)

Amount of Bill Statement.

  • BILL_AMT1 : Amount of bill statement in September, 2005 (NT dollar)
  • BILL_AMT2 : Amount of bill statement in August, 2005 (NT dollar)
  • BILL_AMT3 : Amount of bill statement in July, 2005 (NT dollar)
  • BILL_AMT4 : Amount of bill statement in June, 2005 (NT dollar)
  • BILL_AMT5 : Amount of bill statement in May, 2005 (NT dollar)
  • BILL_AMT6 : Amount of bill statement in April, 2005 (NT dollar)

Amount of Previous Payment.

  • PAY_AMT1 : Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT2 : Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT3 : Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT4 : Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT5 : Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT6 : Amount of previous payment in September, 2005 (NT dollar)

Response Variable.

  • default payment next month : Default payment (1=yes, 0=no)

Project summary and results.

As more and more consumers rely on credit cards to pay their everyday purchases in an online and physical retail store, the amount of issued credit cards and the overwhelming amount of credit card debt by the cardholders have rapidly increased. Therefore, most financial institutions must deal with the issues of credit card default in addition to credit card fraud.

Our objective is to conduct quantitative analysis on credit card default risk by using machine learning models with accessible customer data to assist in predicting the case of customers' default payments in Taiwan.

After cleaning the data and renaming a few variables, in the next step, we began EDA (Exploratory Data Analysis) to understand the relationship between the dependent and the independent variables better and identify the necessary trends in them. We then split the data into two sets, the train (70%) and the test (30%), and standardized it using Standard-Scaler.

Then we implemented five machine-learning approaches (Logistic Regression, XGB Classifier, KNeighbors Classifier, Random Forest Classifier, ExtraTrees Classifier) to predict the default cases on the provided data; however, most models encountered challenges to resolve the imbalance problem of default cases in data sets. To avoid this, we oversampled the data using SMOTE Tomek and observed an increase in the overall performance of the models.

XG Boost Classifier with SMOTE Tomek gave the best accuracy of 82% but lacked Recall which is crucial in classifying the defaulters, whereas XG Boost with imbalanced data while applying the scale_pos_weight=3.521 provided the best Recall of 64% approx. In terms of overall balance, Random Forest Classifier provided the best results.

We also achieved a higher Recall of 80% (approx.) for XG Boost (on imbalanced data) by moving the threshold value (default threshold value=0.5) to 0.3605385.

credit_card_default_prediction's People

Contributors

anishjohnson avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.