GithubHelp home page GithubHelp logo

shashankag14 / similar-images-remover Goto Github PK

View Code? Open in Web Editor NEW
6.0 0.0 1.0 10 KB

A Python tool to identify and remove similar-looking images from a dataset. Utilizes image preprocessing and hashing techniques for efficient comparison.

Python 100.00%
data-optimization hashing image-deduplication image-processing opencv

similar-images-remover's Introduction

Similar Images Remover Tool

Description

This program is designed to find and remove similar-looking images in a folder containing a dataset of images collected from cameras. The goal is to optimize the dataset by removing duplicated or almost duplicated images that have minor differences and are considered non-essential for data collection and object recognition tasks.

The program uses provided functions for image comparison from imaging.py to assess the similarity between images. Specifically, it leverages the preprocess_image_change_detection function to preprocess the images and the compare_frames_change_detection function to compute a similarity score between a pair of images.

Installation

To run the program, follow these steps:

  1. Clone this GitHub repository to your local machine.
  2. Ensure you have Python installed (version 3.6 or later).
  3. Install the required libraries by running: pip install -r requirements.txt.

Usage

  1. Unzip your dataset (Eg. dataset.zip) into a folder.
  2. Execute the program by running the following command in the terminal:

python similar_images_remover.py --folder_path "/path/to/dataset_folder"

Replace "/path/to/dataset_folder" with the path to the folder containing your dataset.

Hyperparameters

Hyperparameter Description Default Value
--threshold Adjusts the similarity threshold for image removal. (Lower values result in stricter removal.) 0.85
--min_contour_area Minimum contour area for image comparison. (Lower values result in stricter removal.) 500
--gaussian_blur_radius A list of Gaussian blur radii for image preprocessing to remove high frequency features. ["None"]
--black_mask Percentage values (left, top, right, bottom) for the black mask applied to image borders. (0, 15, 0, 0)
--frame_change_thresh Threshold to convert grayscale images into binary. 25
--resize_shape Size to reshape the images. (200,200)

Output

The script will save the detected similar images along with their similarity scores to a text file for further analysis. The similar images will be removed from the dataset folder and the removed images will be moved to the removed_images directory.

Features

  • Automated Similarity Detection: The tool employs classical image comparison techniques to automatically detect and identify similar-looking images within the dataset. It uses a combination of preprocessing and contour analysis to ensure accurate and reliable similarity detection.

  • Adjustable Similarity Threshold: Users have the flexibility to fine-tune the similarity threshold (--threshold) to control the strictness of image removal. This allows for customization based on the specific requirements of the dataset and object recognition tasks.

  • Visual Preprocessing Analysis: The tool provides a built-in functionality to visualize the preprocessing steps applied to the images (using visualize_preprocess_image). Users can experiment with different Gaussian blur radii (--gaussian_blur_radius) and black mask percentages (--black_mask) to understand their impact on image similarity scores. The Gaussian blur radius list determines the radius of the blurring kernel used for smoothing the images, while the black mask percentages control the size of the black mask applied to the image borders.

  • Smart Data Optimization: By removing duplicated or nearly identical images, the Similar Images Removal Tool efficiently optimizes the dataset. This optimization leads to reduced storage requirements, faster training times, and improved model generalization.

similar-images-remover's People

Contributors

shashankag14 avatar

Stargazers

 avatar  avatar  avatar  avatar

Forkers

onequery

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.