This group project was developed with the goal of assessing financial loan risks using a dataset sourced from Kaggle. This dataset includes fictional data, consisting of approximately 32,000 records and 12 columns, where each row represents one loan record for one client.
- Source: Kaggle - Credit Risk Dataset
- Rows: 32,000
- Columns: 12
- Removed duplicate rows.
- Deleted age values greater than 100.
- Removed employment length values over 80 years.
- Dropped rows with missing values (NaNs).
- Excluded records where the interest rate equals 0%.
- Performed one-hot and label encoding on categorical data.
- Encoded
cb_person_default_on_file
with Y/N to 0/1. - Transformed
loan_grade
from A to E. - One-hot encoded
person_home_ownership
categories (Own, Rent, Mortgage). - Encoded various
loan_intent
.
- Split data into training and testing sets with a test size of 20% and a random state of 0.
- Applied MinMaxScaler for normalization.
- Loan Status: Identified profiles of clients more or less likely to default.
- cb_person_default_on_file: Profiled historically defaulting clients.
- Loan Amount: Explored the association of loan amount with other numerical indicators.
- Employed a weighted average F1 Score for classification models.
- Used weighted average R2 Score for regression models.
- Selected models include KNN, Decision Tree, Logistic Regression, Bagging & Pasting, Random Forest, Ada Boosting, and Gradient Boosting.
- Higher income, lack of previous defaults, home mortgage, and homeownership.
- Higher loan percentage of income, renters, previous defaults, and higher interest rates.
- Developed targeted loan products and marketing strategies based on customer segmentation by age, income, and loan reason.
- Formulated strategies for interest rate assessments based on loan amounts.
The project successfully implemented multiple machine learning models to assess and predict financial loan risks. The Random Forest model showed the highest accuracy in predicting loan amounts, while the Random Patches and Gradient Boosting models excelled in the loan status classification.