28mm / fovea Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 6.0 6.6 MB

unified cli for various saas image classification apis.

License: MIT License

Python 91.61% Shell 8.39%

clarifai google-cloud-vision imagga microsoft-cognitive-services rekognition sighthound watson-visual-recognition

fovea's People

Contributors

Stargazers

Watchers

Forkers

vizeai dba4x4 jaejunh singafx byronp smithart

fovea's Issues

Language Support

Multiple language support needed for both labels and ocr.

Which providers support which languages?
Which providers support label translations?
which providers support multiple language ocr?

--confidence <float> provides a means of limiting output by relevance. --max-labels <int> would provide a means of getting a larger set of less relevant labels. While may be useful in very busy scenes, but also as a way of probing the catalog of concepts each service is able to classify.

Fix confidence threshold support

Review and fix confidence threshold support for tabular output of labels.
How should confidence threshold support work for things like face detection, when confidence isn't consistently reported by api providers.
Since the --confidence <threshold> does nothing in combination with --output json or --output yaml, these flag combination should produce a warning.

URL support

Work out which API providers will fetch images from a provided URL, and add support for this capability. For providers that don't, fetch the image ourself and POST it as normal.

Stricter argument validation, parameter cleanup

Stricter Argument Validation:

For IBM/Watson, --celebrities entails --faces but should only print celebrity-matched faces.
For Microsoft --celebrities entails --categories (not --faces)
...

Parameter Cleanup:

Instead of --provider {google,microsoft,amazon,opencv,watson,clarifai,facebook, or in addition to it, it should be possible to specify providers with a single argument, e.g. --google or --microsoft.
Ditto output modes: --output {json,yaml,tabular} which would become --json, --yaml, and --tabular.

Celebrities support broken for microsoft provider

The (microsoft-only) celebrity detection feature appears to be broken. Tested with a still from 8 Femme, and a couple of celebrity headshots.

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api/documentation

$ fovea --provider microsoft --categories --faces --celebrities --output json travolta.jpg
{
    "categories": [
        {
            "name": "people_portrait",
            "score": 0.89453125
        }
    ],
    "requestId": "9670d412-6689-4e4c-89a9-e62fe60bcfcb",
    "metadata": {
        "width": 743,
        "height": 1000,
        "format": "Jpeg"
    },
    "faces": [
        {
            "age": 42,
            "gender": "Male",
            "faceRectangle": {
                "left": 212,
                "top": 282,
                "width": 412,
                "height": 412
            }
        }
    ]
}

Imagga Support

Imagga have a computer vision API offering:

Image classification
Custom image classifiers
NSFW litmus
Dominant color identification

Their free plan is limited to 2000 images / month, and 1 image / second
https://imagga.com/

Clarifai support

Clarifai has a free tier that supports < 5000 requests / month. Clarifai offers a set of features that overlaps with other services:

General image classification
Face detection
Celebrity recogntion (also: Microsoft, Watson)
Dominant Color determination (also: Microsoft)
NSFW Image detection (also: Microsoft)

As well as some interesting models that offer greater specificity within a restricted domain:

Travel
Wedding
Food
Apparel

In addition, Clarifai has

the ability to train custom models, which would be an interesting feature addition.
multiple language support.

https://developer.clarifai.com/models
https://developer.clarifai.com/pricing

Custom classifier support, and facial recogntion.

Several services offer the ability to train a custom classifier:

Clarifai
Watson
Imagga

Facial recognition is offered by:

Amazon Rekognition
OpenCV (Eigenfaces, Fisherfaces, Local Binary Pattern Histograms)
DLib http://dlib.net/

What should support for these features look like?

Tabular output support for Text/OCR

Of the supported API providers, only Microsoft and Google have OCR functionality. Google reports the text it finds, along with associated bounding boxes. Microsoft probably does something similar. Perhaps two tabular output modes make sense:

a mode that prints recovered text in top->bottom and left->right order
another modes that prints the associated nesting bounding boxes.

Have yet to give this a close look.

Experimental Facebook support

Facebook do image classification and captioning, which they expose via the <img alt=""> attribute, as well as face detection and recognition which are exposed via a user-tagging interface. Some of this may be exposed via a public api, the rest might require scraping logic.

Implementing a Facebook provider would involve

Using Facebook login credentials to build a private album
Uploading images to it
Retrieving detected faces and image labels via either a public api or a scraping tool like Selenium

Face++ Support

Face++ (https://www.faceplusplus.com) offers the following APIs:

Face Detection
1. landmarks
2. attributes (age, gender, ethnicity, disposition)
3. face token (for use with search and comparison APIs
Face Comparison.
1. Compare 2 image files
2. Compare 2 base64 encoded images
3. Compare 2 face tokens
Face Search.
1. Build a face set with face tokens
2. From a picture of face token, find the most similar face in a face set.

Ontology links

Implement a --ontology flag so that links to the Google Knowledge Graph or Wordnet synsets are preserved and printed.

Imagga will return wordnet synset ids, if its verbose parameter is set. https://docs.imagga.com/#tagging
Google mids refer to Google Knowledge Graph https://developers.google.com/knowledge-graph/

                {
                    "mid": "/m/02vkl_w",
                    "description": "agaricomycetes",
                    "score": 0.6304753
                }

Sighthound Support

Sighthound offer a free tier that allows < 5000 requests / month. https://www.sighthound.com/products/cloud

Face detection
1. Bounding box
2. Facial landmarks
3. Age
4. Gender
5. Emotion / expression
Face recognition
1. Celebrities (as a demo)
2. Custom...
Vehicle recognition
1. Make and model
2. License plate
3. Color

IBM Watson Support

IBM Watson's Visual Recognition service has a free tier that supports < 250 requests / day. It does image classification and face detection, as well as supporting a category schema like Microsoft's. It appears to support celebrity detection via the same category system.

https://www.ibm.com/watson/developercloud/doc/visual-recognition/getting-started.html

Example Image Classification:

{
 "custom_classes": 0,
  "images": [
    {
        "classifiers": [
            {
                "classes": [
                    {
                        "class": "banana",
                        "score": 0.81,
                        "type_hierarchy": "/fruit/banana"
                    },
                    {
                        "class": "fruit",
                        "score": 0.922
                    },
                    {
                        "class": "mango",
                        "score": 0.554,
                        "type_hierarchy": "/fruit/mango"
                    },
                    {
                        "class": "olive color"
                        "score": 0.951
                    },
                    {
                        "class": "olive green color"
                        "score": 0.747
                    }
                ],
                "classifier_id": "default",
                "name": "default"
            }
        ],
        "image": "fruitbowl.jpg"
    }
  ],
  "images_processed": 1
}

Example face and Identity detection:

{
  "images": [
    {
      "faces": [
        {
          "age": {
            "max": 54,
            "min": 45,
            "score": 0.364876
          },
          "face_location": {
            "height": 117,
            "left": 406,
            "top": 149,
            "width": 108
          },
          "gender": {
            "gender": "MALE",
            "score": 0.993307
          },
          "identity": {
            "name": "Barack Obama",
            "score": 0.982014
            "type_hierarchy": "/people/politicians/democrats/barack obama"
          }
        }
      ],
      "image": "prez.jpg"
    }
  ],
  "images_processed": 1
}

[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.8942598	marine biology
0.7700345	biology
0.73823947	reef
0.6855024	underwater
0.6713719	fish
0.6590982	aquarium

Proposed:

[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.89	marine biology
0.77	biology
0.74	reef
0.69	underwater
0.67	fish
0.66	aquarium

28mm / fovea Goto Github PK

fovea's People

Contributors

Stargazers

Watchers

Forkers

fovea's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs