Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

iOS CoreML: Detecting faces with Vision

1. Introduction

CoreML is a framework available for the latest versions of most Apple devices that allow the integration of trained machine learning models in the Apps, based on low-level primitives such as Accelerate and BNNS and on the GPU through metal.

In turn, the frameworks Vision (analysis of images), Foundation (for the processing of natural language) and GameplayKit (for the evaluation of decision-making trees) are supported by CoreML.

In addition, the latest SoC A11 Bionic that mounts the latest generation of iPhone (at the moment) contains specific hardware for the use of neural networks that makes CoreML even more efficient.

2. Environment

The tutorial is written using the following environment:

  • Hardware: MacBook Pro 15 ‘Laptop (2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3)
  • Operating System: Mac OS X Sierra 10.13.3
  • Xcode 9.2
  • iOS 11.2.6

3. About the models

CoreML works through trained machine learning models . These models have a specific format (mlmodel) so they can be used by CoreML. Apple has a page with information available to developers on how to build their own models either from scratch or by importing them from third-party tools.

It also offers some models to start working with them directly (although almost all for the analysis of images).

4. Image analysis

Well, let’s go to the mess. For this tutorial we will use the Vision framework to analyze images and detect faces, and within them try to detect the position of the eyes and mouth.

For this we will use a very simple view, with a button that after pressing it will allow us to capture a photo, load it from existing ones or use a demo one.

Once we have a UIImage, we can start the analysis. We create a request handler, in this case of type VNImageRequestHandler to which the information of the image is specified and we start the analysis invoking the method perform

func startProcessingImage(image: UIImage) {
self.selectedImage = image
self.loadingView.isHidden = false

let handler = VNImageRequestHandler(cgImage: image.cgImage!, options: [:]) {
try! handler.perform([self.request])

Through the request we indicate to Vision that we want to analyze in the image. Vision allows us to analyze several things: faces, text, bar codes, horizon detection, classification …

For the specific case of face recognition, it is not necessary to provide a model to the request since Vision internally uses the ones developed by Apple and that it has been using from several previous versions to iOS 11. If we wanted to detect cats as an example, we would have that you use a request of type VNCoreMLRequest , and you should provide a trained model that can recognize cats.

For the case that concerns us, the request would be as simple as:

let request = VNDetectFaceLandmarksRequest { [weak self] request, error in
guard let results = request.results as? [VNFaceObservation] else {
self?.processResults(results: results)

And that’s it, with only those few lines of code we can detect faces?

5. Obtaining results

In the VNFaceObservation result list , we can obtain both the area where faces have been detected as well as the outline of eyes, lips, nose, etc.

With the property boundingBox we will obtain the face area. You have to be careful, because it does not use the same coordinate system as UIKit. While in UIKit the point (0,0) is in the upper left corner, the point (0,0) of the boundingBox would be in the lower left corner.

In addition, it uses relative coordinates instead of absolutes (range [0,1], so that the center point of the image would appear referred to as (0.5,0.5)).

For this it may be useful to create an extension of CGRect that is responsible for applying the transformations.

extension CGRect {
static func *(left: CGRect, right: CGSize) -> CGRect {
let width = right.width
let height = right.height
return CGRect(x: left.origin.x * width, y: left.origin.y * height, width: left.size.width * width, height: left.size.height * height)

func toUIKitRect(totalHeight: CGFloat = 1) -> CGRect {
return CGRect(x: self.origin.x, y: totalHeight – self.origin.y – self.size.height, width: self.size.width, height: self.size.height)

In addition to the boundingBox there is also the landmarks property that contains the detections of the different parts of the face as a collection of points with the outline of the same.

To present the results, we are simply going to paint boxes to the different regions detected on the base image. The code would look like this:

func processResults(results: [VNFaceObservation]) {
self.drawNormalizedRectsOverImage(rects: {$0.boundingBox})
let allLandmarks = results.flatMap {$0.landmarks}
self.drawLandMarkRegion(regions:self.getEyes(landmarks: allLandmarks), color: .yellow)
self.drawLandMarkRegion(regions:self.getMouths(landmarks: allLandmarks), color: .green)

func drawNormalizedRectsOverImage(rects: [CGRect], color: UIColor = {
drawOverImage(rects: {$0.toUIKitRect() * selectedImage.size})

func drawOverImage(rects: [CGRect], color: UIColor = {
let imageRect = CGRect(x: 0, y: 0, width: selectedImage.size.width, height: selectedImage.size.height)
self.selectedImage.draw(in: imageRect)
rects.forEach { rect in
let ctx = UIGraphicsGetCurrentContext()
ctx?.drawPath(using: .stroke)
let imageResult = UIGraphicsGetImageFromCurrentImageContext()
self.selectedImage = imageResult!
DispatchQueue.main.async {
self.loadingView.isHidden = true
self.imageView.image = imageResult

func drawLandMarkRegion(regions: [VNFaceLandmarkRegion2D], color: UIColor) {
let rects = regions.flatMap {CGRect.boundingBox(points: $0.pointsInImage(imageSize: self.selectedImage.size))?.toUIKitRect(totalHeight: self.selectedImage.size.height)}
drawOverImage(rects: rects, color:color)

func getEyes(landmarks: [VNFaceLandmarks2D]) -> [VNFaceLandmarkRegion2D] {
var eyes = landmarks.flatMap {$0.leftEye}
eyes.append(contentsOf: landmarks.flatMap {$0.rightEye})
return eyes

func getMouths(landmarks: [VNFaceLandmarks2D]) -> [VNFaceLandmarkRegion2D] {
return landmarks.flatMap {$0.outerLips}

And we would have something like this as a result:

PS: It’s ironic, thanks to Vision, you have to throw in more code to “paint” the results than for the whole analysis of the image?

6. Conclusions

As you can see in the image with the result of the execution, obviously it is not perfect, far from it. The contour of the face more or less hits in almost all the faces of the demo image, but the position of the eyes and lips is sometimes not able to determine it correctly, however the overall result is quite acceptable.

With the rise of the use of AI and machine learning the power to have a simple API that allows us to add a light layer of intelligence to our apps without going in depth to the implementation of neural network models I believe that will make them increasingly capable to do things that until now we had never imagined.

The post iOS CoreML: Detecting faces with Vision appeared first on Target Veb.

This post first appeared on Targetveb, please read the originial post: here

Share the post

iOS CoreML: Detecting faces with Vision


Subscribe to Targetveb

Get updates delivered right to your inbox!

Thank you for your subscription