GithubHelp home page GithubHelp logo

28mm / fovea Goto Github PK

View Code? Open in Web Editor NEW
40.0 40.0 6.0 6.6 MB

unified cli for various saas image classification apis.

License: MIT License

Python 91.61% Shell 8.39%
clarifai google-cloud-vision imagga microsoft-cognitive-services rekognition sighthound watson-visual-recognition

fovea's People

Contributors

28mm avatar david-rajnoch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fovea's Issues

Language Support

Multiple language support needed for both labels and ocr.

  1. Which providers support which languages?
  2. Which providers support label translations?
  3. which providers support multiple language ocr?

--max-labels <int>

--confidence <float> provides a means of limiting output by relevance. --max-labels <int> would provide a means of getting a larger set of less relevant labels. While may be useful in very busy scenes, but also as a way of probing the catalog of concepts each service is able to classify.

Fix confidence threshold support

  1. Review and fix confidence threshold support for tabular output of labels.
  2. How should confidence threshold support work for things like face detection, when confidence isn't consistently reported by api providers.
  3. Since the --confidence <threshold> does nothing in combination with --output json or --output yaml, these flag combination should produce a warning.

URL support

Work out which API providers will fetch images from a provided URL, and add support for this capability. For providers that don't, fetch the image ourself and POST it as normal.

Stricter argument validation, parameter cleanup

Stricter Argument Validation:

  1. For IBM/Watson, --celebrities entails --faces but should only print celebrity-matched faces.
  2. For Microsoft --celebrities entails --categories (not --faces)
  3. ...

Parameter Cleanup:

  1. Instead of --provider {google,microsoft,amazon,opencv,watson,clarifai,facebook, or in addition to it, it should be possible to specify providers with a single argument, e.g. --google or --microsoft.
  2. Ditto output modes: --output {json,yaml,tabular} which would become --json, --yaml, and --tabular.

Celebrities support broken for microsoft provider

The (microsoft-only) celebrity detection feature appears to be broken. Tested with a still from 8 Femme, and a couple of celebrity headshots.

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api/documentation

$ fovea --provider microsoft --categories --faces --celebrities --output json travolta.jpg
{
    "categories": [
        {
            "name": "people_portrait",
            "score": 0.89453125
        }
    ],
    "requestId": "9670d412-6689-4e4c-89a9-e62fe60bcfcb",
    "metadata": {
        "width": 743,
        "height": 1000,
        "format": "Jpeg"
    },
    "faces": [
        {
            "age": 42,
            "gender": "Male",
            "faceRectangle": {
                "left": 212,
                "top": 282,
                "width": 412,
                "height": 412
            }
        }
    ]
}

Imagga Support

Imagga have a computer vision API offering:

  1. Image classification
  2. Custom image classifiers
  3. NSFW litmus
  4. Dominant color identification

Their free plan is limited to 2000 images / month, and 1 image / second
https://imagga.com/

Clarifai support

Clarifai has a free tier that supports < 5000 requests / month. Clarifai offers a set of features that overlaps with other services:

  1. General image classification
  2. Face detection
  3. Celebrity recogntion (also: Microsoft, Watson)
  4. Dominant Color determination (also: Microsoft)
  5. NSFW Image detection (also: Microsoft)

As well as some interesting models that offer greater specificity within a restricted domain:

  1. Travel
  2. Wedding
  3. Food
  4. Apparel

In addition, Clarifai has

  1. the ability to train custom models, which would be an interesting feature addition.
  2. multiple language support.

https://developer.clarifai.com/models
https://developer.clarifai.com/pricing

Custom classifier support, and facial recogntion.

Several services offer the ability to train a custom classifier:

  1. Clarifai
  2. Watson
  3. Imagga

Facial recognition is offered by:

  1. Amazon Rekognition
  2. OpenCV (Eigenfaces, Fisherfaces, Local Binary Pattern Histograms)
  3. DLib http://dlib.net/

What should support for these features look like?

Tabular output support for Text/OCR

Of the supported API providers, only Microsoft and Google have OCR functionality. Google reports the text it finds, along with associated bounding boxes. Microsoft probably does something similar. Perhaps two tabular output modes make sense:

  1. a mode that prints recovered text in top->bottom and left->right order
  2. another modes that prints the associated nesting bounding boxes.

Have yet to give this a close look.

Experimental Facebook support

Facebook do image classification and captioning, which they expose via the <img alt=""> attribute, as well as face detection and recognition which are exposed via a user-tagging interface. Some of this may be exposed via a public api, the rest might require scraping logic.

Implementing a Facebook provider would involve

  1. Using Facebook login credentials to build a private album
  2. Uploading images to it
  3. Retrieving detected faces and image labels via either a public api or a scraping tool like Selenium

Face++ Support

Face++ (https://www.faceplusplus.com) offers the following APIs:

  1. Face Detection
    1. landmarks
    2. attributes (age, gender, ethnicity, disposition)
    3. face token (for use with search and comparison APIs
  2. Face Comparison.
    1. Compare 2 image files
    2. Compare 2 base64 encoded images
    3. Compare 2 face tokens
  3. Face Search.
    1. Build a face set with face tokens
    2. From a picture of face token, find the most similar face in a face set.

Sighthound Support

Sighthound offer a free tier that allows < 5000 requests / month. https://www.sighthound.com/products/cloud

  1. Face detection

    1. Bounding box
    2. Facial landmarks
    3. Age
    4. Gender
    5. Emotion / expression
  2. Face recognition

    1. Celebrities (as a demo)
    2. Custom...
  3. Vehicle recognition

    1. Make and model
    2. License plate
    3. Color

IBM Watson Support

IBM Watson's Visual Recognition service has a free tier that supports < 250 requests / day. It does image classification and face detection, as well as supporting a category schema like Microsoft's. It appears to support celebrity detection via the same category system.

https://www.ibm.com/watson/developercloud/doc/visual-recognition/getting-started.html

Example Image Classification:

{
 "custom_classes": 0,
  "images": [
    {
        "classifiers": [
            {
                "classes": [
                    {
                        "class": "banana",
                        "score": 0.81,
                        "type_hierarchy": "/fruit/banana"
                    },
                    {
                        "class": "fruit",
                        "score": 0.922
                    },
                    {
                        "class": "mango",
                        "score": 0.554,
                        "type_hierarchy": "/fruit/mango"
                    },
                    {
                        "class": "olive color"
                        "score": 0.951
                    },
                    {
                        "class": "olive green color"
                        "score": 0.747
                    }
                ],
                "classifier_id": "default",
                "name": "default"
            }
        ],
        "image": "fruitbowl.jpg"
    }
  ],
  "images_processed": 1
}

Example face and Identity detection:

{
  "images": [
    {
      "faces": [
        {
          "age": {
            "max": 54,
            "min": 45,
            "score": 0.364876
          },
          "face_location": {
            "height": 117,
            "left": 406,
            "top": 149,
            "width": 108
          },
          "gender": {
            "gender": "MALE",
            "score": 0.993307
          },
          "identity": {
            "name": "Barack Obama",
            "score": 0.982014
            "type_hierarchy": "/people/politicians/democrats/barack obama"
          }
        }
      ],
      "image": "prez.jpg"
    }
  ],
  "images_processed": 1
}

Review and enforce image size limits.

Find image size limits for each api provider, and start checking/enforcing them. Do these limits apply to the base64-encoded image typically-used in api POST requests? Or to the file as it appears on disk?

--precision <int>

Confidence scores are reported by most services with preposterous precision. Fovea's tabular output mode should default to 2 signficant figures, with the option to print with greater precision, by supplying a --precision <int> argument.

Current:

[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.8942598	marine biology
0.7700345	biology
0.73823947	reef
0.6855024	underwater
0.6713719	fish
0.6590982	aquarium

Proposed:

[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.89	marine biology
0.77	biology
0.74	reef
0.69	underwater
0.67	fish
0.66	aquarium

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.