GithubHelp home page GithubHelp logo

find_iphone's Introduction

Find Phone

Find the normalized coordinates of a phone in a given image

Required Python Libraries (non-built-in)

  • Numpy
  • OpenCV2
  • Matplotlib
  • roipoly (in the repo)

Algorithm Overview

Given an image, classify pixels into two classes, "Phone" and "Non-phone", using two Gaussian Classifiers. Collect the classified pixels into a binary mask. Find contours of the binary mask using OpenCV and fit minimum area rectangle on these contours. Find the rectangle with maximum area within a set range, while having a reasonable height and width ratio as an iphone (~1.77). After selecting the best rectangle, return its center point as the predicted coordinates of the phone in the image.

Training

First, the training images need to be labeled with the ROI (Region of Interest), which requires the module roipoly. In this algorithm, ROI is the phone screen. The ROI needs to be hand-labeled by a polygon using your mouse cursor. Run the following script:

python2 label_images.py ./find_phone

The command line argument is the path to a folder with labeled images and labels.txt. This script is currently looping through each image with a step size of 4. The step size can be adjusted in the script (same changes must also be made in the training script train_phone_finder.py​, otherwise the mask and image will misalign). After labeling the ROI in all the images, the script will output a file named phone_screen_masks.npy. This file will be used in the training script below. Run the following training script:

python​ train_phone_finder.py​ ./find_phone

The command line argument is the path to a folder with labeled images and labels.txt. This will collect the labeled pixels to compute the mean and covariance parameters for the two Gaussian Classifiers. For convenience, these four parameters are hardcoded in the script. Using these parameters, the script will run the algorithm, predict the coordinates for all the training images, and output the accuracy based on the 0.05 radius range. This script will take a while to run since it runs the algorithm on all training images. The script will also export these four parameters through the file trained_gaussian_params.npz, which will be loaded and used by the testing script.

Testing

The following test script will run the algorithm on the given image and print the predicted coordinates:

>> python​ find_phone.py​ ~/find_phone/51.jpg
0.2400 0.6035

The command line argument is the path to jpeg image to be tested.

Utility Module

The module find_phone_utils.py contains functions that are used by both the testing and training script. The core functions are:

  • predict_phone_coordinates

Current State of the Algorithm

The current accuracy on the entire training set is 83.721%. Since the algorithm is heavily based on the classifier on the pixel values, it is vulnerable to situations where the phone screen has glares/reflections and where the image background has many black pixels that do not belong to the phone.

The algorithm can certainly be optimized further by picking better parameters for the Gaussian Classifiers, better threshold values for the area of the rectangle and height/width ratio.

find_iphone's People

Contributors

samuelsunarjo96 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.