28mm / fovea Goto Github PK
View Code? Open in Web Editor NEWunified cli for various saas image classification apis.
License: MIT License
unified cli for various saas image classification apis.
License: MIT License
Multiple language support needed for both labels and ocr.
--confidence <float>
provides a means of limiting output by relevance. --max-labels <int>
would provide a means of getting a larger set of less relevant labels. While may be useful in very busy scenes, but also as a way of probing the catalog of concepts each service is able to classify.
--confidence <threshold>
does nothing in combination with --output json
or --output yaml
, these flag combination should produce a warning.Work out which API providers will fetch images from a provided URL, and add support for this capability. For providers that don't, fetch the image ourself and POST it as normal.
Stricter Argument Validation:
--celebrities
entails --faces
but should only print celebrity-matched faces.--celebrities
entails --categories
(not --faces
)Parameter Cleanup:
--provider {google,microsoft,amazon,opencv,watson,clarifai,facebook
, or in addition to it, it should be possible to specify providers with a single argument, e.g. --google
or --microsoft
.--output {json,yaml,tabular}
which would become --json
, --yaml
, and --tabular
.The (microsoft-only) celebrity detection feature appears to be broken. Tested with a still from 8 Femme, and a couple of celebrity headshots.
https://www.microsoft.com/cognitive-services/en-us/computer-vision-api/documentation
$ fovea --provider microsoft --categories --faces --celebrities --output json travolta.jpg
{
"categories": [
{
"name": "people_portrait",
"score": 0.89453125
}
],
"requestId": "9670d412-6689-4e4c-89a9-e62fe60bcfcb",
"metadata": {
"width": 743,
"height": 1000,
"format": "Jpeg"
},
"faces": [
{
"age": 42,
"gender": "Male",
"faceRectangle": {
"left": 212,
"top": 282,
"width": 412,
"height": 412
}
}
]
}
Imagga have a computer vision API offering:
Their free plan is limited to 2000 images / month, and 1 image / second
https://imagga.com/
Clarifai has a free tier that supports < 5000 requests / month. Clarifai offers a set of features that overlaps with other services:
As well as some interesting models that offer greater specificity within a restricted domain:
In addition, Clarifai has
https://developer.clarifai.com/models
https://developer.clarifai.com/pricing
Several services offer the ability to train a custom classifier:
Facial recognition is offered by:
What should support for these features look like?
Of the supported API providers, only Microsoft and Google have OCR functionality. Google reports the text it finds, along with associated bounding boxes. Microsoft probably does something similar. Perhaps two tabular output modes make sense:
Have yet to give this a close look.
Facebook do image classification and captioning, which they expose via the <img alt="">
attribute, as well as face detection and recognition which are exposed via a user-tagging interface. Some of this may be exposed via a public api, the rest might require scraping logic.
Implementing a Facebook provider would involve
Face++ (https://www.faceplusplus.com) offers the following APIs:
Implement a --ontology
flag so that links to the Google Knowledge Graph or Wordnet synsets are preserved and printed.
verbose
parameter is set. https://docs.imagga.com/#taggingmid
s refer to Google Knowledge Graph https://developers.google.com/knowledge-graph/ {
"mid": "/m/02vkl_w",
"description": "agaricomycetes",
"score": 0.6304753
}
Sighthound offer a free tier that allows < 5000 requests / month. https://www.sighthound.com/products/cloud
Face detection
Face recognition
Vehicle recognition
IBM Watson's Visual Recognition service has a free tier that supports < 250 requests / day. It does image classification and face detection, as well as supporting a category schema like Microsoft's. It appears to support celebrity detection via the same category system.
https://www.ibm.com/watson/developercloud/doc/visual-recognition/getting-started.html
Example Image Classification:
{
"custom_classes": 0,
"images": [
{
"classifiers": [
{
"classes": [
{
"class": "banana",
"score": 0.81,
"type_hierarchy": "/fruit/banana"
},
{
"class": "fruit",
"score": 0.922
},
{
"class": "mango",
"score": 0.554,
"type_hierarchy": "/fruit/mango"
},
{
"class": "olive color"
"score": 0.951
},
{
"class": "olive green color"
"score": 0.747
}
],
"classifier_id": "default",
"name": "default"
}
],
"image": "fruitbowl.jpg"
}
],
"images_processed": 1
}
Example face and Identity detection:
{
"images": [
{
"faces": [
{
"age": {
"max": 54,
"min": 45,
"score": 0.364876
},
"face_location": {
"height": 117,
"left": 406,
"top": 149,
"width": 108
},
"gender": {
"gender": "MALE",
"score": 0.993307
},
"identity": {
"name": "Barack Obama",
"score": 0.982014
"type_hierarchy": "/people/politicians/democrats/barack obama"
}
}
],
"image": "prez.jpg"
}
],
"images_processed": 1
}
Find image size limits for each api provider, and start checking/enforcing them. Do these limits apply to the base64-encoded image typically-used in api POST requests? Or to the file as it appears on disk?
Text/OCR support is not yet implemented for the Microsoft provider. This will require making a 2nd HTTP request to a different text/ocr endpoint, and merging the resulting json documents.
https://www.microsoft.com/cognitive-services/en-us/computer-vision-api/documentation
Add support for https://github.com/yahoo/open_nsfw.
Since the Caffe model is large-ish compared with the opencv haar cascade, add a script to .../utilities/yahoo/
to check dependencies and download the model file.
Confidence scores are reported by most services with preposterous precision. Fovea's tabular output mode should default to 2 signficant figures, with the option to print with greater precision, by supplying a --precision <int>
argument.
Current:
[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.8942598 marine biology
0.7700345 biology
0.73823947 reef
0.6855024 underwater
0.6713719 fish
0.6590982 aquarium
Proposed:
[user@host]$ fovea http://farm1.static.flickr.com/45/139488995_bd06578562.jpg
0.89 marine biology
0.77 biology
0.74 reef
0.69 underwater
0.67 fish
0.66 aquarium
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.