ARTrailer
Identify the movie title from a poster, and use SceneKit to display the movie trailer in AR.
Overview
This app runs an ARKit world-tracking session with content displayed in a SceneKit view. It first uses the Vision framework to find regions of visible texts on camera images, and pass the detected regions to the Tesseract framework for OCR. After text recognition, it overlays a movie trailer in AR world space.
Demo
Getting Started
ARKit requires iOS 11.0 and a device with an A9 (or later) processor. ARKit is not available in iOS Simulator. Building the sample code requires Xcode 10.0 or later.
Installation
- Run
pod install
and openARTrailer.xcworkspace
. - This app uses the YouTube Data API to search for movie trailers. Add the API key to Keys.plist.
<plist version="1.0">
<dict>
<key>YouTubeDataAPI</key>
<string></string>
</dict>
</plist>
Note: You need quota for the YouTube Data API to send a query request. To test video overlay without quota, you may hard code the videoId in the YouTube url.
Run the AR Session and Process Camera Images
The ViewController
class manages the AR session and displays AR overlay content in a SceneKit view. ARKit captures video frames from the camera and provides them to the view controller in the session(_:didUpdate:)
method, which then calls the processCurrentImage()
method to perform text recognition.
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Do not enqueue other buffers for processing while another Vision task is still running.
// The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
return
}
// Retain the image buffer for Vision processing.
self.currentBuffer = frame.capturedImage
processCurrentImage()
}
Serialize Image Processing for Real-Time Performance
The processCurrentImage()
method uses the view controlle's currentBuffer
property to track whether Vision is currently processing an image before starting another Vision task.
// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer!, orientation: orientation)
visionQueue.async {
do {
// Release the pixel buffer when done, allowing the next buffer to be processed.
defer { self.currentBuffer = nil }
try requestHandler.perform([self.rectangleRequest, self.textRequest])
self.recognizeTexts(cvPixelBuffer: self.currentBuffer!)
} catch {
print("Error: Vision request failed with error \"\(error)\"")
}
}
Implement the Text Detector
The code's textDetectionHandler()
method and rectangleDetectionHandler()
method detect regions of the movie poster and detect regions of visible texts on the poster.
func textDetectionHandler(request: VNRequest, error: Error?) {
guard let observations = request.results else {print("no result"); return}
let result = observations.map({$0 as? VNTextObservation})
if result.isEmpty {
return
}
textObservations = result as! [VNTextObservation]
}
Perform Text Recognition
The code's recognizeTexts()
method performs text recognition on the detected text regions.
func recognizeTexts(cvPixelBuffer: CVPixelBuffer) {
var ciImage = CIImage(cvPixelBuffer: self.currentBuffer!)
let transform = ciImage.orientationTransform(for: CGImagePropertyOrientation(rawValue: 6)!)
ciImage = ciImage.transformed(by: transform)
let size = ciImage.extent.size
var keywords = ""
for textObservation in self.textObservations {
guard let rects = textObservation.characterBoxes else {
continue
}
let (imageRect, xMin, xMax, yMin, yMax) = createImageRect(rects: rects, size: size)
var text = runOCRonImage(imageRect: imageRect, ciImage: ciImage, tesseract: tesseract)
keywords += " \(text)"
}
createVideoAnchor()
textObservations.removeAll()
}
The runOCROnImage()
method uses the Tesseract framework to perform OCR on the preprocessed image.
func runOCRonImage(imageRect: CGRect, ciImage: CIImage, tesseract: G8Tesseract) -> String {
let context = CIContext(options: nil)
guard let cgImage = context.createCGImage(ciImage, from: imageRect) else {
return ""
}
let uiImage = preprocessImage(image: UIImage(cgImage: cgImage))
tesseract.image = uiImage
tesseract.recognize()
guard let text = tesseract.recognizedText else {
return ""
}
return text.trimmingCharacters(in: CharacterSet.newlines)
}
The code's preprocessImage()
method applis image processing to optimize the camera images of the text regions for OCR.
func preprocessImage(image: UIImage) -> UIImage {
var resultImage = image.fixOrientation().g8_grayScale()?.g8_blackAndWhite()
resultImage = resultImage?.resizeVI(size: CGSize(width: image.size.width * 3, height: image.size.height * 3))!
return resultImage!
}
- Note: The accuracy of text recognition depends on the input image. Refer to the Tesseract documentation for different techniques in preprocessing images. To implement other image processing methods, add them to the UIImage extension.
Add a Video in AR
The createVideoAnchor()
methods adds an anchor to the AR session.
// Create anchor using the camera's current position
if let currentFrame = sceneView.session.currentFrame {
self.cinemaFrame = currentFrame
var translation = matrix_identity_float4x4
translation.columns.3.z = -1.5
let transform = matrix_multiply((cinemaFrame?.camera.transform)!, translation)
let anchor = ARAnchor(transform: transform)
sceneView.session.add(anchor: anchor)
}
Next, after ARKit automatically creates a SceneKit node for the newly added anchor, the renderer(_:didAdd:for:)
delegate method provides content for that node. In this case, the addVideoToSCNNode()
method creates a SpriteKit node for the newly added anchor and plays a video at the anchor.
func addVideoToSCNNode(url: String, node: SCNNode) {
let videoNode = SKVideoNode(url: URL(string: url)!)
let skScene = SKScene(size: CGSize(width: 1280, height: 720))
skScene.addChild(videoNode)
videoNode.position = CGPoint(x: skScene.size.width/2, y: skScene.size.height/2)
videoNode.size = skScene.size
let tvPlane = SCNPlane(width: 1.0, height: 0.5625)
tvPlane.firstMaterial?.diffuse.contents = skScene
tvPlane.firstMaterial?.isDoubleSided = true
let tvPlaneNode = SCNNode(geometry: tvPlane)
tvPlaneNode.eulerAngles = SCNVector3(0,GLKMathDegreesToRadians(180),GLKMathDegreesToRadians(-90))
tvPlaneNode.opacity = 0.9
videoNode.play()
node.addChildNode(tvPlaneNode)
}