Phishing attack is used to steal confidential information of a user. Fraud websites appears like genuine websites with the logo and graphics of genuine website. This project aims to detect fraud or phishing website using machine learning techniques.
The dataset is downloaded from UCI machine learning repository.A collection of website URLs for 11000 websites. Each sample has 30 website parameters and a class label identifying it as a phishing website or not (1 or -1).
Before stating the ML model training, the data is split into 60-40 i.e., 6600 training samples & 4400 testing samples. From the dataset, it is clear that this is a supervised machine learning task.
This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The supervised machine learning models (classification) considered to train the dataset in this project are:
- Logistic Regression
- K-Nearest Neighbour
- Decision Tree Classifier
- Support Vector Machine
- Adaboost
- XGBoost
From all the models we developed,
- Highest accuracy score - 96.5% using Random forest method
- Lowest accuract score - 91.4% using Adaboost method