GithubHelp home page GithubHelp logo

tahira-h / colsh_bank_analysis Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.17 MB

The purpose of this analysis is to use the dataset from a bank services company called Colsh Tech. The dataset from this company will allow analysts to use algorithms to apply different classification models for the prediction and find out the better accuracy.

Jupyter Notebook 100.00%

colsh_bank_analysis's Introduction

Colsh_Bank_Analysis

Instructions:

You need to apply different classification model for the prediction and find out the better accuracy.

Language:

Python

Tools:

Excel

Overview

Purpose:

The purpose of this analysis is to use the dataset from a bank services company called Colsh Tech Bank (C.T. Bank). The dataset from this company will allow analysts to use algorithms to apply different classification models for the prediction and find out the better accuracy.

Results

Dataset Analysis:

For this analysis, "Train.csv" is the dataset used to complete predictions and find accuracy.

As shown in the image below, the "data" represents all of the feature names(columns) listed in the dataset. There are 13 feature names listed in the dataset: Loan_ID, Gender, Married, Dependents, Education, Self_Employed, ApplicantIncome, CoapplicantIncome, LoanAmount, Loan_Amount_Term, Credit_History, Property_Area, and Loan_Status.

There is one "target" in the data set, Loan_Status. The Loan_Status column represents individuals listed in the dataset that does or does not have a loan with the bank. In the dataset, "Y" represents individuals that does have a loan, and "N" represents individuals that does not have a loan.

Open the file containing data_target.

data_target Close the file.

Open the file containing data_set.

data_set Close the file.

Training and Testing:

The data is split into training and testing to receive the actual and predicted values.

Open the file containing training_and_testing.

training_and_testing Close the file.

Decision Tree Classifier:

After receiving the actual and predicted values, the DecisionTreeClassifier model is imported and accuracy is calculated. According to the accuracy, there are 21 data points missed, providing the wrong calculations. The accuracy percentage is 79%(0.79).

Open the file containing decision_tree_classifier.

decision_tree_classifier Close the file.

RandomForestClassifier:

Once the second model, RandomForestClassifier, is imported and the confusion matrix is completed, take a look at the previous accuracy of the first model DecisionTreeClassifer. There were 21 mistakes made with an accuracy score of 79%(0.79). Currently, the RandomForestClassifier has 18 mistakes made with an accuracy score of 82%(0.82). There were 21 mistakes in DecisionTreeClassifer, now the mistakes decreased to 18.

Open the file containing random_forest_classifier.

random_forest_classifier Close the file.

As shown in the image above, Classification Report results for the RandomForestClassifier model shows an accuracy of 82%(0.82), the precision of No for 'Loan_Status' is 85%(0.85), and the recall of No for 'Loan_Status' is 49%(0.49). The precision of Yes for 'Loan_Status' is 82%(0.82), and the recall of Yes for 'Loan_Status' is 96%(0.96).

Tree Graph:

A decision tree graph is presented showing .png format. The decision tree includes: gini, samples, values, and class.

As shown in the image below, each box contains characteristics representing the dataset presented in this analysis. The top box(root node) is where a question is asked. The answers for the root node is True and False. If the answer to the root node is True, the action of the question divides the data into smaller subsets.

Open the file containing decision_tree_graph.

decision_tree_graph Close the file.

The root node, Loan_ID_LP001708, has a gini of 0.431, samples is 234, value is 113, 247, and class is N.

The root node's results of True gini is 0.425 and False gini 0.0. The results measures the probability of a wrongly classified variable when chosen randomly in this analysis.

The results of True samples is 233 and False samples is 1.

The results of True value is 109, 247 and False value is 4, 0. Value is how the tested information is split up. For instance, the root node value is 360, altogether the True value is 356, and altogether the False value is 4. (NOTE: 356(True) + 4(False) = 360(root node))

The results of True class is N, and False class is Y. Which is the target feature.

Summary

Results:

The performance of two models DecisionTreeClassifier and RandomForestClassifier were used for the prediction of a dataset provided by bank services company C.T. Bank. The DecisionTreeClassifier model provided wrong calculations of 21 data points missed, and a lower accuracy score of 79%(0.70). In contrast, the RandomForestClassifier model provided wrong calculations of 18 data points missed, and a higher accuracy score of 82%(0.82).

Recommendation:

With RandomForestClassifier model having the highest percentage score of accuracy. It is recommended that bank services company C.T. Bank use the RandomForestClassifier model to present the better accuracy score for clients listed in the dataset.

colsh_bank_analysis's People

Contributors

tahira-h avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.