This project is a part of the Machine Learning Course provided by Upskill Income Sharing Agreement program with Intelligent Machines. An already ended competition dataset has been selected as this project where different machine learning models were benchmarked. The data contains real-world e-commerce transactions from Vesta. It contains a wide range of features from device type to product features. The competitors were to develop a machine learning model to predict if the transaction is fraud or not fraud. This project targets to improve the efficacy of fraudulent transaction alerts for millions of people around the world, helping hundreds of thousands of businesses reduce their fraud loss and increase their revenue
The main challenge of this project is the gigantic amount of features and it's difficult to remove the unnecessary features where we don't know which factors to consider while choosing features. Training the machine learning models on these all features will waste a lot of time and obviously won't obtain better score. The main starting point should be data exploration, data cleaning, dealing with the null values, feature engineering.
Programming language: Python
Libraries: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, XGBClassifier
Environment: Kaggle Notebook
- Go to nbviewer to run jupyter notebooks if it fails to open on Github
- Copy and Paste the URL of the .ipynb file in the blank of the nbviewer: https://github.com/Abrar2652/IEEE-CIS-Fraud-Detection-Project/blob/main/ieee-cis-fraud-detection.ipynb
If you face difficulties running the model on your local machine or Google Colab Notebook, then check if you are running the Kernel on CPU or GPU. If you're running on CPU, change the runtime to GPU. I ran this notebook with 4 GB RAM, 2.4 GHz Intel(R) Core(TM) i3 CPU. I faced a lot of difficulties including sudden shutdown due to overheating, running out of my resources, etc. Kaggle environment worked well for me.
Md. Abrar Jahin
This project is licensed under the [Apache License 2.0] License - see the LICENSE.md file for details
StackOverflow, Towards Data Science articles, Data Exploration and Feature Engineering Techniques of Kaggle Grandmasters, DataCamp