GithubHelp home page GithubHelp logo

similarity-detection-tool's Introduction

Similarity Detection

Description

In this project we created a tool that is able to detect phishing websites. It takes a domain as input and now scans the internet for domains that look similar to the given one. For that we use a process that is called Typosquatting. If there really exist some of those websites the second component of the tool comes in, the Website Comparison Tool. Now the target website that was given is compared to the generated ones, found earlier. At the end the tool states a similarity score how suspicious the compared website is. With our tool companies can check for malicious websites, that try to impersonate them.

Installation

  • clone repository
  • run virtualenv -p python3 similarity-detection-tool to create the virtual environment
  • run cd similarity-detection-tool && source bin/activate to attach to the virtualenv
  • run chmod +x requirements/requirements.sh && sudo requirements/requirements.sh to install required packages
  • run pip3 install -r requirements/requirements.txt to install required python modules

Usage

Command

python3 main.py <domain> <typos> Note that the input must only be a domain name not the whole URL. That means that you should use google.com instead of https://google.com
If you want as many typo-domain generations as possible set the value to -2.

Output

By default the tool will log the outcome of the comparing in the logs directory in the specific domain directory. E.g. logs/google.com. With logging enabled it will also create a results.txt containing all tested URLs and their similarity scores.

Components

Typosquatting

Typoquatting is a simple attack based on the idea that a victim makes a mistake while typing the url. There can be many mistakes like missing a letter of the URL while or just simply misspelling a word.
The work of this project focuses on the miss typing of an URL and using different Top-Level Domains(TLD). E.g. when you try to enter google.com but type hoogle.com or type example.com instead of example.de.
Also this work focus on similar looking URLs. E.g. google.com and gocgle.com.

Features

  • Generating new domains based on three criteria
    • Miss typing on the keyboard, with that all letters around
    • Similar looking letters (e.g. o and c or v and u)
    • Similar sounding letters (e.g. y and i)
    • Different Top-Level Domains
  • Checking
    • Check if the domain exist via a DNS lookup over IPv4

Website Comparison Tool

Given two URLs the tool will look at different features of the websites trying to compare them. For each feature it calculates a similarity percentage which are then used to set a score for the specific feature. The sum of all feature scores is the final similarity score that states how suspicious a website looks like.

Features

  • Content

    • remove HTML markup
    • Similarity Percentage: line overlaps on both websites
  • Domain

    • remove common parts like [.de, .com, http, https, etc]
    • Similarity Percentage: word overlaps on both domains
  • Links

    • collect all hrefs in both websites (html-tag: href)
    • loop through all of them to compare everyone to everyone
    • Similarity Percentage: average word overlaps
  • Image-URLs

    • collect all image-links in both websites (html-tag: src)
    • loop through all links to compare everyone to everyone
    • Similarity Percentage: average word overlaps
  • Images

    • create screenshot of both websites and compare them
    • use different metrics for comparing (MSE, SSIM, SIM)
      • MSE: compare each pixel of the one image to the corresponding pixel of the other image
      • SSIM: same as MSE but with bigger kernel size
      • SIM: take difference of both images in embedding space

similarity-detection-tool's People

Contributors

pilladian avatar sashquash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.