GithubHelp home page GithubHelp logo

loan-fraud-prediction's Introduction

To Loan or not to Loan

A data mining project to help bank managers avoid trusting non-compliant clients with loans by predicting if a loan will end successfully based on data about the clients and previous loans. This project was developed during the Machine Learning course at FEUP.

Compilation

Database

From the src folder of the repository:

Create MySQL database:

1. mysql -u root -p
    1. CREATE DATABASE bank_database;
    2. SET GLOBAL local_infile = true;
    3. quit;
2. mysql -u root -p --local-infile=1 bank_database < database/database.sql

Graphviz

Also, to plot the trees you must install graphviz in your system.

https://graphviz.org/download/

Create the virtual environment

Ubuntu

1. python3 -m venv env
2. source env/bin/activate
3. pip3 install -r ../requirements.txt

Windows

1. py -m venv env
2. .\env\Scripts\activate.bat
3. pip install -r ..\requirements.txt

Run

  1. Clean: Generate train and test csvs with clean data and save them to clean_data folder

make clean <submission_name>

  • outputs clean_data/<submission_name>.csv
  • e.g. make clean sub2 will generate the file sub2-train.csv and sub2-test.csv in the clean_data folder
  1. Train: Train the model with the clean data, using a specific classifier, compute the AUC and store the model in the models folder

make train <classifier> <submission_name>

  • outputs models/<classifier>-<submission_name>.sav
  • e.g. make train logistic_regression sub2 will use as input the file sub2-train.csv from the clean_data folder and store in the models folder the model that results of applying the Logistic Regression Classifier to the data - logistic_regression-sub2.sav
  1. Test: Test a model with the test data and store the result in the results folder

make test <classifier> <submission_name>

  • outputs results/<classifier >-<submission_name >.csv
  • e.g. make test logistic_regression sub2 will apply the model models/logistic_regression-sub2.sav to the data from clean_data/sub2-test.csv and store in results/logistic_regression-sub2.csv
  1. Explore: Explore the various datasets by printing some statistics and generating some plots

make explore <table>

  • outputs generated plots in the folder data_understanding/plots
  • Available tables: account, card, client, disp, district, loan, trans
  • e.g. make explore account will perform data exploration to the table Account, saving some plots in the folders data_understanding/plots/distribution/account and data_understanding/plots/correlation/account
  1. Clustering: Solve the descriptive problem, by generating some graphs describing the cluster approach to distinguish between different client types

make clustering

  • outputs generated graphs that are opened in the browser
  1. Clean Models: Empty the folder models containing the trained models

make clean_models

  1. Clean Cache: Empty the Python cache folders (__pycache__)

Collaborators

  1. Diana Freitas
  2. Mariana Ramos
  3. Paulo Ribeiro

loan-fraud-prediction's People

Contributors

dianaamfr avatar marianaramos37 avatar paulinho-16 avatar

Watchers

 avatar  avatar

loan-fraud-prediction's Issues

Clustering

  1. só clientes (perfis muito individuais)
  2. juntar distritos (perfis socio demograficos)
  3. juntar transações e cenas financeiras (perfis de comportamento financeiro)

2 Clusters
Bonus: usar clusters para criar o perfil do cliente

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.