GithubHelp home page GithubHelp logo

macro_tagger's People

Contributors

nkwsy avatar

Watchers

 avatar  avatar

macro_tagger's Issues

Make image_upload.py sequentially scan folders for new pictures

Have the image_upload check the mongodb to see if that image has already been uploaded.

Two suggestions I have

  1. Pull all the documents from each mongo collection and compare the path's. Example folder: 230701, filename: 0001.jpg, then list all files by iterating over all folders then running something like if '0001.jpg in listdir('230701'): return true

  2. Store a hash of the image in the DB then comparing against the image to see if that exists

Examples

File hash. Just hash's the raw bites of a file, reguardless if it's an image or not If image file is changed, including metadata, hash is altered. But this is the easiest approach ```python import hashlib

def calculate_hash(file_path):
with open(file_path, 'rb') as file:
bytes = file.read() # read entire file as bytes
readable_hash = hashlib.sha256(bytes).hexdigest()
return readable_hash

file_path = '/path/to/your/image.jpg' # replace with your file path
print(f'The SHA256 hash of the file is: {calculate_hash(file_path)}')

standardized hash: separates the actual image from the metadata and standardizes it then hash's it. Upside changing the metadata does not alter the file, downside is you must use the same process to check the hash of a file, if standardization is too general could get duplicates w/ visually similar images, longer to execute which could matter if it's checking on too many files. 

```python
from PIL import Image
import io
import hashlib

def calculate_image_hash(image_path):
    # Open the image file
    with Image.open(image_path) as img:
        # Convert image to RGB and resize
        img = img.convert('RGB').resize((8, 8), Image.ANTIALIAS)
        # Save resized image to a BytesIO object to get rid of any original metadata
        with io.BytesIO() as temp_file:
            img.save(temp_file, format='JPEG')
            temp_file.seek(0)  # Go to the start of the BytesIO object
            # Calculate the hash on the bytes of the standard image
            image_hash = hashlib.sha256(temp_file.read()).hexdigest()
    return image_hash

file_path = '/path/to/your/image.jpg'  # replace with your file path
print(f'The SHA256 hash of the image content is: {calculate_image_hash(file_path)}')

With either of these approaches I would assume one would just run the hash operation on the image and search in the DB if that hash already exists and if it does not add it to the DB.

Maybe a combination of the two might be good, running to see if new folders are added then checking the image hash to be sure it's not just a copy of another folder or something. Also could be helpful if someone taking the picture accidentally takes a duplicate photo or something.

Create Schema for sample collection and identifying macros

What I would suggest is to create standard schema for sample collections/ identifying

schema collections

  • macro_image: image info
  • sample_box: info about where sample placed, when collected when deployed ect. Steph may fill this out ahead of time before deploying in the future
  • species: info about the species ie scud etc. May be more general so we can use to with other projects ie wildlife tracking
  • individual_species: reference macro_image._id, which user id'd it, species._id

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.