GithubHelp home page GithubHelp logo

anooptomar1 / imageclassificationwithvisionandcoreml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from josephchang10/imageclassificationwithvisionandcoreml

0.0 2.0 0.0 2.18 MB

使用 Vision 和 Core ML 的图像分类器

Swift 100.00%

imageclassificationwithvisionandcoreml's Introduction

Classifying Images with Vision and Core ML

Demonstrates using Vision with Core ML to preprocess images and perform image classification.

Overview

Among the many features of the Core ML framework is the ability to classify input data using a trained machine-learning model. The Vision framework works with Core ML to apply classification features to images, and to preprocess those images to make machine learning tasks easier and more reliable.

This sample app uses a model based on the public MNIST database (a collection of handwriting samples) to identify handwritten digits found on rectangular objects in the image (such as sticky notes, as seen in the image below).

StickyNote

Getting Started

Vision and Core ML require macOS 10.13, iOS 11, or tvOS 11. This example project runs only in iOS 11.

Using the Sample App

Build and run the project, then use the buttons in the sample app's toolbar to take a picture or choose an image from your photo library. The sample app then:

  1. Uses Vision to detect rectangular areas in the image,
  2. Uses Core image filters to prepare those areas for processing by the ML model,
  3. Applies the model to produce an image classification result, and
  4. Presents that result as a text label in the UI.

Detecting Rectangles and Preparing for ML Processing

The example app's ViewController class provides a UI for choosing an image with the system-provided UIImagePickerController feature. After the user chooses an image (in the imagePickerController(_:didFinishPickingMediaWithInfo:) method), the sample runs a Vision request for detecting rectangles in the image:

lazy var rectanglesRequest: VNDetectRectanglesRequest = {
    return VNDetectRectanglesRequest(completionHandler: self.handleRectangles)
}()

Vision detects the corners of a rectangular object in the image scene. Because that object might appear in perspective in the image, the sample app uses those four corners and the Core Image CIPerspectiveCorrection filter to produce a rectangular image more appropriate for image classification:

func handleRectangles(request: VNRequest, error: Error?) {
    guard let observations = request.results as? [VNRectangleObservation]
        else { fatalError("unexpected result type from VNDetectRectanglesRequest") }
    guard let detectedRectangle = observations.first else {
        DispatchQueue.main.async {
            self.classificationLabel.text = "No rectangles detected."
        }
        return
    }
    let imageSize = inputImage.extent.size

    // Verify detected rectangle is valid.
    let boundingBox = detectedRectangle.boundingBox.scaled(to: imageSize)
    guard inputImage.extent.contains(boundingBox)
        else { print("invalid detected rectangle"); return }

    // Rectify the detected image and reduce it to inverted grayscale for applying model.
    let topLeft = detectedRectangle.topLeft.scaled(to: imageSize)
    let topRight = detectedRectangle.topRight.scaled(to: imageSize)
    let bottomLeft = detectedRectangle.bottomLeft.scaled(to: imageSize)
    let bottomRight = detectedRectangle.bottomRight.scaled(to: imageSize)
    let correctedImage = inputImage
        .cropping(to: boundingBox)
        .applyingFilter("CIPerspectiveCorrection", withInputParameters: [
            "inputTopLeft": CIVector(cgPoint: topLeft),
            "inputTopRight": CIVector(cgPoint: topRight),
            "inputBottomLeft": CIVector(cgPoint: bottomLeft),
            "inputBottomRight": CIVector(cgPoint: bottomRight)
        ])
        .applyingFilter("CIColorControls", withInputParameters: [
            kCIInputSaturationKey: 0,
            kCIInputContrastKey: 32
        ])
        .applyingFilter("CIColorInvert", withInputParameters: nil)

    // Show the pre-processed image
    DispatchQueue.main.async {
        self.correctedImageView.image = UIImage(ciImage: correctedImage)
    }

    // Run the Core ML MNIST classifier -- results in handleClassification method
    let handler = VNImageRequestHandler(ciImage: correctedImage)
    do {
        try handler.perform([classificationRequest])
    } catch {
        print(error)
    }
}

Classifying the Image with an ML Model

After rectifying the image, the sample app runs a Vision request that applies the bundled Core ML model to classify the image. Setting up that model requires only loading the ML model file from the app bundle:

lazy var classificationRequest: VNCoreMLRequest = {
    // Load the ML model through its generated class and create a Vision request for it.
    do {
        let model = try VNCoreMLModel(for: MNISTClassifier().model)
        return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
    } catch {
        fatalError("can't load Vision ML model: \(error)")
    }
}()

The ML model request's completion handler provides VNClassificationObservation objects, indicating what classification the model applied to the image and its confidence in that classification:

func handleClassification(request: VNRequest, error: Error?) {
    guard let observations = request.results as? [VNClassificationObservation]
        else { fatalError("unexpected result type from VNCoreMLRequest") }
    guard let best = observations.first
        else { fatalError("can't get best result") }

    DispatchQueue.main.async {
        self.classificationLabel.text = "Classification: \"\(best.identifier)\" Confidence: \(best.confidence)"
    }
}

imageclassificationwithvisionandcoreml's People

Contributors

hzhou81 avatar josephchang10 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.