Detecting Objects in Still Images

Related tags

Image VisionBasics
Overview

Detecting Objects in Still Images

Locate and demarcate rectangles, faces, barcodes, and text in images using the Vision framework.

Overview

The Vision framework can detect rectangles, faces, text, and barcodes at any orientation. This sample code shows how to create requests to detect these types of objects, and how to interpret the results of those requests. To help you visualize where an observation occurs, and how it looks, this code uses Core Animation layers to draw paths around detected features in images. For example, the following mock gift card has a QR code and rectangles that surface through the detector. The sample highlights not only text blocks (shown in red) but also individual characters within text (shown in purple):

The left side shows a sample input image that the end user feeds into the app.  The right side shows the output image with the detected text and QR code.

This sample code project runs on iOS 11. However, you can also use Vision in your own apps on macOS 10.13, iOS 11, or tvOS 11.

To see this sample in action, build and run the project, then use the toggle switches to choose which kinds of objects (any combination of rectangles, faces, barcodes, and text) to detect. Tapping anywhere else prompts the sample to request a picture, which you either capture by camera or select from your photo library. The sample then applies computer vision algorithms to find the desired features in the provided image. Finally, the sample draws colored paths around observed features on Core Animation layers.

Prepare an Input Image for Vision

Vision handles still image-based requests using a VNImageRequestHandler and assumes that images are oriented upright, so pass your image with orientation in mind. CGImage, CIImage, and CVPixelBuffer objects don't carry orientation, so provide it as part of the initializer.

You can initialize a VNImageRequestHandler from image data in the following formats:

Vision may not detect sideways or upside-down features properly if it assumes the wrong orientation. Photos selected in the sample's image picker contain orientation information. Access this data through the UIImage property imageOrientation. If you acquire your photos through other means, such as from the web or other apps, be sure to check for orientation and provide it separately if it doesn't come baked into the image.

View in Source

Create Vision Requests

Create a VNImageRequestHandler object with the image to be processed.

// Create a request handler.
let imageRequestHandler = VNImageRequestHandler(cgImage: image,
                                                orientation: orientation,
                                                options: [:])

View in Source

If you're making multiple requests from the same image (for example, detecting facial features as well as faces), create and bundle all requests to pass into the image request handler. Vision runs each request and executes its completion handler on its own thread.

You can pair each request with a completion handler to run request-specific code after Vision finishes all requests. The sample draws boxes differently based on the type of request, so this code differs from request to request. Specify your completion handler when initializing each request.

lazy var rectangleDetectionRequest: VNDetectRectanglesRequest = {
    let rectDetectRequest = VNDetectRectanglesRequest(completionHandler: self.handleDetectedRectangles)
    // Customize & configure the request to detect only certain rectangles.
    rectDetectRequest.maximumObservations = 8 // Vision currently supports up to 16.
    rectDetectRequest.minimumConfidence = 0.6 // Be confident.
    rectDetectRequest.minimumAspectRatio = 0.3 // height / width
    return rectDetectRequest
}()

View in Source

After you've created all your requests, pass them as an array to the request handler's synchronous perform(_:). Vision computations may consume resources and take time, so use a background queue to avoid blocking the main queue as it executes.

// Send the requests to the request handler.
DispatchQueue.global(qos: .userInitiated).async {
    do {
        try imageRequestHandler.perform(requests)
    } catch let error as NSError {
        print("Failed to perform image request: \(error)")
        self.presentAlert("Image Request Failed", error: error)
        return
    }
}

Interpret Detection Results

The method perform(_:) returns a Boolean representing whether the requests succeeded or resulted in an error. If it succeeded, its results property contains observation or tracking data, such as a detected object's location and bounding box.

You can access results in two ways:

  • Check the results property after calling perform(_:).

  • In the VNImageBasedRequest object's completion handler, use the callback's observation parameter to retrieve detection information. The callback results may contain multiple observations, so loop through the observations array to process each one.

For example, the sample uses facial observations and their landmarks' bounding boxes to locate the features and draw a rectangle around them.

// Perform drawing on the main thread.
DispatchQueue.main.async {
    guard let drawLayer = self.pathLayer,
        let results = request?.results as? [VNFaceObservation] else {
            return
    }
    self.draw(faces: results, onImageWithBounds: drawLayer.bounds)
    drawLayer.setNeedsDisplay()
}

Even when Vision calls its completion handlers on a background thread, always dispatch UI calls like the path-drawing code to the main thread. Access to UIKit, AppKit & resources must be serialized, so changes that affect the app's immediate appearance belong on the main thread.

CATransaction.begin()
for observation in faces {
    let faceBox = boundingBox(forRegionOfInterest: observation.boundingBox, withinImageBounds: bounds)
    let faceLayer = shapeLayer(color: .yellow, frame: faceBox)
    
    // Add to pathLayer on top of image.
    pathLayer?.addSublayer(faceLayer)
}
CATransaction.commit()

View in Source

For face landmark requests, the detector provides VNFaceObservation results with greater detail, such as facial-feature landmark regions.

For text observations, you can locate individual characters by checking the characterBoxes property.

For barcode observations, some supported symbologies contain payload information in the payloadStringValue property, allowing you to parse the content of detected barcodes. Like a supermarket scanner, barcode detection is optimized for finding one barcode per image.

It's up to your app to use or store data from the observations before exiting the completion handler. Instead of drawing paths like the sample does, write custom code to extract what your app needs from each observation.

Follow Best Practices

To reduce unnecessary computation, don't create multiple request handlers and submit them multiple times on the same image. Instead, create all your requests before querying Vision, bundle them inside a requests array, and submit that array in a single call.

To perform detection across multiple, unrelated images, create a separate image handler for each image and make requests to each handler on separate threads, so they run in parallel. Each image request handler costs additional processing time and memory, so try not to run them on the main thread. Dispatch these handlers on additional background threads, calling back to the main thread only for UI updates such as displaying images or paths.

The image-based handler that this sample introduces works for detection across any number of images, but it doesn't track objects. To perform object tracking, use a VNSequenceRequestHandler instead. For more information about object tracking, see Tracking the User's Face in Real Time.

You might also like...
A simple macOS app to read code from images, written purely in Swift using Vision Framework.
A simple macOS app to read code from images, written purely in Swift using Vision Framework.

CodeReader A simple macOS app to read code from images, written purely in Swift using Vision Framework. Usage Drag an image Click the convert button R

A UIActivityViewController to share images while displaying them as a nice preview.
A UIActivityViewController to share images while displaying them as a nice preview.

PSActivityImageViewController Overview This view controller allows you to share an image the same way as a normal UIActivityViewController would, with

Easily display images, animations, badges and alerts to your macOS application's dock icon

DSFDockTile Easily display images, animations, badges and alerts to your macOS application's dock icon. Why? I was inspired by Neil Sardesai after he

๐Ÿ“ฑiOS app to extract full-resolution video frames as images.
๐Ÿ“ฑiOS app to extract full-resolution video frames as images.

Frame Grabber is a focused, easy-to-use iOS app to extract full-resolution video frames as images. Perfect to capture and share your favorite video mo

Phimp.me - Photo Image Editor and Sharing App. Phimp.me is a Photo App for iOS that aims to replace proprietary photo applications. It offers features such as taking photos, adding filters, editing images and uploading them to social networks.
Convert HEIC images to JPEG format on the Mac

heic2jpeg Convert HEIC images to JPEG format on the Mac A basic tool to convert Apple's obnoxious HEIC format images (as the default photo format for

URLImage is a package that holds an easy way of showing images from an URL.
URLImage is a package that holds an easy way of showing images from an URL.

URLImage Overview URLImage is a package that holds an easy way of showing images from an URL. Usually this processes should take the following process

A Swift library for parsing and drawing SVG images to CoreGraphics contexts.

SwiftDraw A Swift library for parsing and drawing SVG images to CoreGraphics contexts. SwiftDraw can also convert an SVG into Swift source code. Usage

Converts images to a textual representation.

ConsoleApp7 Essentially, this suite of programs converts images to text, which is made to resemble the original image. There are three targets in this

Owner
์šฐํ˜•์ค€
์ฃผ๋ ฅ์€ ๋ชจ๋ฐ”์ผ ๊ฐœ๋ฐœ์ž์ด์ง€๋งŒ...์•„๋ฌด๊ฑฐ๋‚˜ ๋‹ค ํ•˜๋Š” ๋„“๊ณ  ์–•์€ ์ง€์‹์˜ ์ œ๋„ค๋Ÿด๋ฆฌ์ŠคํŠธ...
์šฐํ˜•์ค€
A view controller for iOS that allows users to crop portions of UIImage objects

TOCropViewController TOCropViewController is an open-source UIViewController subclass to crop out sections of UIImage objects, as well as perform basi

Tim Oliver 4.4k Jan 1, 2023
Agrume - ๐Ÿ‹ An iOS image viewer written in Swift with support for multiple images.

Agrume An iOS image viewer written in Swift with support for multiple images. Requirements Swift 5.0 iOS 9.0+ Xcode 10.2+ Installation Use Swift Packa

Jan Gorman 601 Dec 26, 2022
APNGKit is a high performance framework for loading and displaying APNG images in iOS and macOS.

APNGKit is a high performance framework for loading and displaying APNG images in iOS and macOS. It's built on top of a modified version of libpng wit

Wei Wang 2.1k Dec 30, 2022
A lightweight generic cache for iOS written in Swift with extra love for images.

Haneke is a lightweight generic cache for iOS and tvOS written in Swift 4. It's designed to be super-simple to use. Here's how you would initalize a J

Haneke 5.2k Dec 11, 2022
Kingfisher is a powerful, pure-Swift library for downloading and caching images from the web

Kingfisher is a powerful, pure-Swift library for downloading and caching images from the web. It provides you a chance to use a pure-Swift way to work

Wei Wang 20.9k Dec 30, 2022
Image viewer (or Lightbox) with support for local and remote videos and images

Table of Contents Features Focus Browse Rotation Zoom tvOS Setup Installation License Author Features Focus Select an image to enter into lightbox mod

Nes 534 Jan 3, 2023
SwiftGen is a tool to automatically generate Swift code for resources of your projects (like images, localised strings, etc), to make them type-safe to use.

SwiftGen is a tool to automatically generate Swift code for resources of your projects (like images, localised strings, etc), to make them type-safe to use.

null 8.3k Jan 5, 2023
A high-performance image library for downloading, caching, and processing images in Swift.

Features Asynchronous image downloader with priority queuing Advanced memory and database caching using YapDatabase (SQLite) Guarantee of only one ima

Yap Studios 72 Sep 19, 2022
AsyncImageExample An example project for AsyncImage. Loading images in SwiftUI article.

AsyncImageExample An example project for AsyncImage. Loading images in SwiftUI article. Note: The project works in Xcode 13.0 beta (13A5154h).

Artem Novichkov 4 Dec 31, 2021
Combine SnapshotTesting images into a single asset

An extension to SnapshotTesting which allows you to create images combining the output of multiple snapshot strategies, assuming they all output to UIImage.

James Sherlock 41 Nov 28, 2022