Light

sriki18 / credit_spark Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 53 KB

Jupyter Notebook 100.00%

credit_spark's Introduction

Predicting credit card defaulters

Background

The dataset was downloaded from here: https://developer.ibm.com/blogs/snap-ml-use-cases-blog/ , credit default prediction. Quoting the text:

The task in this use case is to predict whether a person who has credit will default (not be able to repay his credit). The data scientist is provided with a data set of 10 million transactions, each of which is characterized by 18 features (including account age, account type, credit history, owns car, transaction amount, and transaction category). Also provided are the labels of these transactions, default or not. The task is to build a model to predict whether transactions will default in an unseen data set, that is, a data set that has not been used to train the model and does not have labels.

The associated .csv data file is also available in this repository.

Goal

Binary classifcation : Predict whether a person who has credit will default or not be able to repay their credit.
Interpretation : Use a regression framework to identify influential variables and quantify influence.

Main challenges

Size of dataset : 1 million rows * 19 columns.
- Favoring speed over ease of use, chose to use Apache Spark instead of Pandas and sklearn

Software/Packages/Programs

Major

Name	Version
Python	2.7
Apache Spark (`pyspark`)	2.1.3

Minor

Name	Version
matplotlib	1.5.0
seaborn	0.7.1
NumPy	1.13.1
Pandas	0.17.1

Service

IBM Watson Studio Notebooks with Spark Service

Status

Logistic regression with untransformed features gives 97.7 % accuracy.
- Pending : Test significance of coefficients.

credit_spark's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs