This project focuses on building and evaluating machine learning models for credit card fraud detection. The dataset used contains transactions labeled as fraudulent or non-fraudulent.
- ML.py: Python script containing the machine learning code.
- creditcard.csv: Dataset file containing transaction data.
numpy
pandas
matplotlib
seaborn
warnings
sklearn
(various modules for models, preprocessing, and metrics)imbalanced-learn
(for oversampling)xgboost
(if used, based on the conversation)streamlit
for model deployment
- Create a Virtual env to run
ML.py
. - Install the required libraries using
pip install -r requirements.txt
. - Ensure the dataset file (
creditcard.csv
) is in the same directory. - Run the
ML.py
script to train and evaluate machine learning models.
The following machine learning models were implemented and evaluated:
- Logistic Regression
- Decision Tree
- Random Forest
- AdaBoost
- Gradient Boosting
- Linear Support Vector Machine (SVM)
- Support Vector Machine with Polynomial kernel and rbf kernel.
Evaluation metrics such as precision, recall, F1-score, and accuracy were computed for each model. Model performance varies, and it's crucial to consider business requirements when choosing the best model.
- Hyperparameter tuning for models.
- Feature engineering to enhance model performance.
- Exploration of other algorithms and ensemble methods.
- The dataset is imbalanced, impacting model performance.
- Consider the trade-off between precision and recall based on business needs.
- Experiment with different approaches to handle class imbalance.