GithubHelp home page GithubHelp logo

kwolekr / imgcmp Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 148 KB

An image file sorter and fuzzy deduplicator that can work in real time as the user saves an image. Has a great implementation of a B+ tree database!

Shell 1.08% C 97.45% C++ 1.47%

imgcmp's Introduction

imgcmp 1.0

================

An image sorter and deduplicator that does fuzzy image matching via thumbnail pixel comparison, color histogram matching, or other methods.  Maintains an on-disk database for blazing fast matching that makes real-time deduplication as the user saves the image possible.

 - What's the problem?
That damn 4chan keeps filling up my directories with unorganized, and occasionally duplicate images!  For wallpapers especially, this is pretty messy: it'd be nice to keep track of images according to average color, size, filetype too.

 - What can be done to fix it/WTF does this program actually do?
When the file is being saved to a directory, either directly or by the Save Image In Folder plugin, checks to see if the same or a similar image already exists in the said directory.  If so, notifies the user there is/are similar image(s), displays the filenames, and maybe shows the images somehow: via GTK (yuck, complicated), via the browser, or perhaps rely on some other utility such as xv.  It then has the user make a choice, and then cancel the save, or save in another directory.
There is also a batch deduplication feature to clean entire directory structures that have already been created prior to imgcmp's usage.

 - In summary:
	 - Compares a temporary cached image vs. others in a directory
	 - Maintains a cache of thumbs for each image
	 - Deduplicates all images in a directory
	 - TODO:  Communicate back with the browser for user interaction, perhaps, if possible...

	 How does this program check for similarity?
	 - Create a 64x64 thumbnail with reduced color (according to some tolerance setting), add it to a cache in the directory
	   or in a specific location if specified on the command line.
	   - Use a hashtable for exact zero-mismatched-pixel tolerance comparisons
	   - A B+ tree is maintained for very fast lookups of the nearest neighbors in average color; this makes it possible
	     to do real-time deduplication
	   - A much slower but more sensitive "deep scan" will be executed instead if option is set
	 - Use OpenCV's histogram functionality to compare images - Might be thrown off easily by color, but better with details
	   and non-continuous segments. Obviously this creates an additional dependency and might not be any better than the thumbnail
	   method - what if histogram matching were used ON the thumbnails?
	   - Might be able to use edge detection or other image matching techniques that OpenCV can provide
	 - Use ImageMagick to compare images? (icky, additional dependency)
	 - The most accurate method is probably to use pHash

 - Notes:
There is no stored configuration for this utility, all parameters are passed via command line - path, etc.
The configuration is to be stored in the Mozilla plugin, which executes this utility with the appropriate command line.

- Dependencies
	 - libgd for image loading and saving
	 - OpenCV
	 - Mozzarella Foxfire
	 - ImageMagick ?
	 - pHash ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.