GithubHelp home page GithubHelp logo

phishing-website-detection-using-http-headers-and-extra-features's Introduction

Phishing-Website-Detection-using-HTTP-headers-and-extra-features

This project aims to develop a machine learning model to detect phishing websites based on HTTP headers and other features. Phishing websites are malicious websites designed to mimic legitimate ones to steal user data like passwords or credit card information. Detecting such websites is crucial for online security.

Data

The code utilizes various features extracted from website headers and additional data to identify phishing attempts. Here's a breakdown of the features:

HTTP Headers: These headers contain information about the website's server, security settings, and content type. Examples include Content-Security-Policy, Strict-Transport-Security, and X-Frame-Options.

Extra Features: The code also considers features like website traffic, URL length, presence of anchor tags, and redirection presence. These features can provide clues about the website's legitimacy. The target variable is labeled as 'label', indicating whether a website is phishing (-1) or legitimate (1).

Method

The code employs a machine learning approach called Random Forest. This algorithm works by creating multiple decision trees, each making predictions based on a random subset of features.

For evaluation, the code uses several metrics:

  1. Accuracy: Ratio of correctly classified websites (phishing and legitimate)
  2. F1 Score: Harmonic mean of precision and recall, accounting for both metrics.
  3. Precision: Proportion of correctly predicted phishing websites among all predicted phishes.
  4. Recall: Proportion of correctly identified phishing websites out of all actual phishing websites.

The code performs stratified K-fold cross-validation to assess the model's generalizability. This technique splits the data into multiple folds, trains the model on each fold with different data combinations, and evaluates its performance on the remaining unseen fold.

Results

Performance on Dataset A:

              precision    recall  f1-score   support

          -1       0.99      0.99      0.99      8992
           1       0.99      0.99      0.99      9060
    accuracy                           0.99     18052
   macro avg       0.99      0.99      0.99     18052   
weighted avg       0.99      0.99      0.99     18052

Performance on Dataset B:

              precision    recall  f1-score   support
          -1       1.00      1.00      1.00      9785

    accuracy                           1.00      9785
   macro avg       1.00      1.00      1.00      9785   
weighted avg       1.00      1.00      1.00      9785

The code reports the model's performance on both the training and testing sets from Dataset A. It then evaluates the model's effectiveness on unseen data from Dataset B.

Training and Testing Performance: The code calculates Accuracy, F1 Score, Precision, and Recall for both the training and testing datasets. This helps understand how well the model learns from the training data and generalizes to unseen data.

Unseen Data Performance: The model is applied to Dataset B, which it hasn't encountered during training. This demonstrates the model's ability to detect phishing websites in real-world scenarios with potentially different characteristics.

phishing-website-detection-using-http-headers-and-extra-features's People

Contributors

zee-rox avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.