The example project of inferencing Semantic Segementation using Core ML

Overview

SemanticSegmentation-CoreML

platform-ios swift-version lisence

This project is Object Segmentation on iOS with Core ML.
If you are interested in iOS + Machine Learning, visit here you can see various DEMOs.

DeepLabV3-DEMO1 FaceParsing-DEMO DeepLabV3-DEMO-2 DeepLabV3-DEMO-3

How it works

When use Metal

image

Requirements

  • Xcode 10.2+
  • iOS 12.0+
  • Swift 5

Models

Download

Download model from apple's model page.

Matadata

Name Input Output Size iOS version+ Download
DeepLabV3 Image (Color 513 × 513) MultiArray (Int32 513 × 513) 8.6 MB iOS 12.0+ link
DeepLabV3FP16 Image (Color 513 × 513) MultiArray (Int32 513 × 513) 4.3 MB iOS 12.0+ link
DeepLabV3Int8LUT Image (Color 513 × 513) MultiArray (Int32 513 × 513) 2.3 MB iOS 12.0+ link
FaceParsing Image (Color 512 × 512) MultiArray (Int32) 512 × 512 52.7 MB iOS 14.0+ link

Inference Time − DeepLabV3

Device Inference Time Total Time (GPU) Total Time (CPU)
iPhone 12 Pro 29 ms 29 ms 240 ms
iPhone 12 Pro Max
iPhone 12 30 ms 31 ms 253 ms
iPhone 12 Mini 29 ms 30 ms 226 ms
iPhone 11 Pro 39 ms 40 ms 290 ms
iPhone 11 Pro Max 35 ms 36 ms 280 ms
iPhone 11
iPhone SE (2nd)
iPhone XS Max
iPhone XS 54 ms 55 ms 327 ms
iPhone XR 133 ms 402 ms
iPhone X 137 ms 143 ms 376 ms
iPhone 8+ 140 ms 146 ms 420 ms
iPhone 8 189 ms 529 ms
iPhone 7+ 240 ms 667 ms
iPhone 7 192 ms 208 ms 528 ms
iPhone 6S + 309 ms 1015 ms

: need to measure

Inference Time − FaceParsing

Device Inference Time Total Time (GPU) Total Time (CPU)
iPhone 12 Pro
iPhone 11 Pro 37 ms 37 ms

Labels − DeepLabV3

# total 21
["background", "aeroplane", "bicycle", "bird", "boat", 
"bottle", "bus", "car", "cat", "chair", 
"cow", "diningtable", "dog", "horse", "motorbike", 
"person", "pottedplant", "sheep", "sofa", "train", 
"tv"]

Labels − FaceParsing

# total 19
["background", "skin", "l_brow", "r_brow", "l_eye", 
"r_eye", "eye_g", "l_ear", "r_ear", "ear_r", 
"nose", "mouth", "u_lip", "l_lip", "neck", 
"neck_l", "cloth", "hair", "hat"]

See also

Comments
  • Support face-parsing model

    Support face-parsing model

    Source Model Link

    https://github.com/zllrunning/face-parsing.PyTorch

    Core ML Model Download Link

    https://github.com/tucan9389/SemanticSegmentation-CoreML/releases/download/support-face-parsing/FaceParsing.mlmodel

    Model Spec

    • Input: 512x512 image
    • Output: 512x512 (Int32)
      • Catetory index of each pixel
      • Defined 19 categories: ['background', 'skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r', 'nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat']
    • Size: 52.7 MB
    • Inference time: 30-50 ms in iPhone 11 Pro

    Conversion Script

    import torch
    
    import os.path as osp
    import json
    from PIL import Image
    import torchvision.transforms as transforms
    from model import BiSeNet
    
    import coremltools as ct
    
    dspth = 'res/test-img'
    cp = '79999_iter.pth'
    device = torch.device('cpu')
    
    output_mlmodel_path = "FaceParsing.mlmodel"
    
    labels = ['background', 'skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r',
                'nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat']
    n_classes = len(labels)
    print("n_classes:", n_classes)
    
    class MyBiSeNet(torch.nn.Module):
        def __init__(self, n_classes, pretrained_model_path):
            super(MyBiSeNet, self).__init__()
            self.model = BiSeNet(n_classes=n_classes)
            self.model.load_state_dict(torch.load(pretrained_model_path, map_location=device))
            self.model.eval()
    
        def forward(self, x):
            x = self.model(x)
            x = x[0]
            x = torch.argmax(x, dim=1)
            x = torch.squeeze(x)
            return x
    
    pretrained_model_path = osp.join('res/cp', cp)
    model = MyBiSeNet(n_classes=n_classes, pretrained_model_path=pretrained_model_path)
    model.eval()
    
    example_input = torch.rand(1, 3, 512, 512)  # after test, will get 'size mismatch' error message with size 256x256
    preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
        ),
    ])
    
    traced_model = torch.jit.trace(model, example_input)
    
    
    # Convert to Core ML using the Unified Conversion API
    print(example_input.shape)
    
    scale = 1.0 / (0.226 * 255.0)
    red_bias   = -0.485 / 0.226
    green_bias = -0.456 / 0.226
    blue_bias  = -0.406 / 0.226
    
    mlmodel = ct.convert(
        traced_model,
        inputs=[ct.ImageType(name="input",
                             shape=example_input.shape,
                             scale=scale,
                             color_layout="BGR",
                             bias=[blue_bias, green_bias, red_bias])], #name "input_1" is used in 'quickstart'
    )
    
    
    
    labels_json = {"labels": labels}
    
    mlmodel.user_defined_metadata["com.apple.coreml.model.preview.type"] = "imageSegmenter"
    mlmodel.user_defined_metadata['com.apple.coreml.model.preview.params'] = json.dumps(labels_json)
    
    mlmodel.save(output_mlmodel_path)
    
    import coremltools.proto.FeatureTypes_pb2 as ft
    
    spec = ct.utils.load_spec(output_mlmodel_path)
    
    for feature in spec.description.output:
        if feature.type.HasField("multiArrayType"):
            feature.type.multiArrayType.dataType = ft.ArrayFeatureType.INT32
    
    ct.utils.save_spec(spec, output_mlmodel_path)
    
    opened by tucan9389 3
  • Segment whole image

    Segment whole image

    Hi there - first off, great work on this repo! :D

    I wonder if there's a way to segment the whole image by padding the sides - as it stands, since imageCropAndScaleOption is .centerCrop, we only get the center.

    opened by wuharvey 2
  • [PR] Support face-parsing semantic segmentation model

    [PR] Support face-parsing semantic segmentation model

    | face-parsing model | | --- | | faceparsing-demo-001 |

    video source: https://www.youtube.com/watch?v=D571qZzLfX4

    PR Points

    • Support the face-parsing semantic segmentation model
    • Support to visualize multi-class semantic segmentation output
      • Add MultitargetSegmentationTextureGenerater.swift
      • Add new multitarget_segmentation_render_target shader function in Shaders.metal

    Model Info

    • original model repo: https://github.com/zllrunning/face-parsing.PyTorch

    Related Issues

    • #12
    • #13
    • https://github.com/zllrunning/face-parsing.PyTorch/issues/27

    How to run the FaceParsing model?

    1. Download the mlmodel from here
    2. Import the FaceParsing.mlmodel into SemanticSegmentation-CoreML>mlmodel folder of the project
    3. Change the minimum iOS version to 14.0+
    4. Change the model class from DeepLabV3() to FaceParsing() in LiveImageViewController.swift or LiveMetalCameraViewController.swift
    5. Build & run the project on the real device
    opened by tucan9389 1
  • [PR] Add app tab item and the page for tracking face and segmenting it

    [PR] Add app tab item and the page for tracking face and segmenting it

    PR Points

    • Add apps tab based on UITableViewController
    • Add app feature for tracking a face and segmenting it with Metal
      • Use Vision framework and Core ML for face detection and face-parsing semantic segmentation
    opened by tucan9389 0
  • [PR] Support to post-process on GPU with MetalCamera

    [PR] Support to post-process on GPU with MetalCamera

    Super thanks @jsharp83 for your MetalCamera

    segmentation-gpu-demo-001

    Related Issue

    • #5

    Changed

    • Make post-processing on real-time segmentation extremely fast
      • AS-IS(CPU): > 240 ms (iPhone 11 Pro)
      • TO-BE(GPU): < 1 ms (iPhone 11 Pro)
    • The main logic during generating textures:
      1. Get current pixelbuffer from camera
      2. Inference the pixelbuffer and get a segmentation map
      3. Generate a texture from the pixelbuffer (cameraTexture) https://github.com/tucan9389/SemanticSegmentation-CoreML/pull/7#discussion_r524046755
      4. Generate a texture from the segmentation map (segmentaitonTexture) https://github.com/tucan9389/SemanticSegmentation-CoreML/pull/7#discussion_r524047448
      5. Merge the textures and make a texture (overlayedTexture) https://github.com/tucan9389/SemanticSegmentation-CoreML/pull/7#discussion_r524047448
      6. Assign the overlayedTexture to metalVideoPreview: MetalVideoView
      7. Draw the overlayedTexture texture

    PR Points

    • Add LiveMetalCameraViewController post-processing with MetalKit
    • Support extremely fast post-process
      • Add MetalRenderingDevice.swift from MetalCamera
      • Add MetalVideoView.swift from MetalCamera
      • Add CameraTextureGenerater.swift from MetalCamera
      • Add SegmentationTextureGenerater.swift from MetalCamera
      • Add OverlayingTexturesGenerater.swift from MetalCamera
      • Add other utils and shader code from MetalCamera

    Reference

    • https://github.com/jsharp83/MetalCamera
    • https://github.com/jsharp83/MetalCamera/blob/master/Example/MetalCamera/SegmentationSampleViewController.swift
    opened by tucan9389 0
  • Performance Test

    Performance Test

    Model Size (MB), Minimum iOS Version

    | Model | Size | Minimum
    iOS Version | | ----: | :----: | :----: | | DeepLabV3 | 8.6 | iOS12 | | DeepLabV3FP16 | 4.3 | iOS12 | | DeepLabV3Int8LUT | 2.3 | iOS12 |

    Infernece Time (ms)

    | Model vs. Device | XS | X | | ----: | :----: | :----: | | DeepLabV3 | 135 | 177 | | DeepLabV3FP16 | 136 | 177 | | DeepLabV3Int8LUT | 135 | 177 |

    Total Time (ms)

    | Model vs. Device | XS | X | | ----: | :----: | :----: | | DeepLabV3 | 409 | 531 | | DeepLabV3FP16 | 403 | 530 | | DeepLabV3Int8LUT | 412 | 517 |

    FPS

    | Model vs. Device | XS | X | | ----: | :----: | :----: | | DeepLabV3 | 2 | 1 | | DeepLabV3FP16 | 2 | 1 | | DeepLabV3Int8LUT | 2 | 1 |

    opened by tucan9389 0
  • segment cropped image

    segment cropped image

    Hi, Thanks for the great repo!

    Let's imagine the coreml network needs a cropped image (only a subpart of the camera feed). No matter how the crop is done (it can be hard coded for testing purpose), I wonder if there is a way to change the DrawingSegmentationView for example, to achieve this. Right now, if the input image is cropped, the output view resize the image to the viewport and the result isn't well registered.

    opened by hugoliv 0
Releases(support-face-parsing)
The example of running Depth Prediction using Core ML

DepthPrediction-CoreML This project is Depth Prediction on iOS with Core ML. If you are interested in iOS + Machine Learning, visit here you can see v

tucan9389 113 Nov 17, 2022
Swift framework for document classification using a Core ML model.

DocumentClassifier Overview DocumentClassifier is a Swift framework for classifying documents into one of five categories (Business, Entertainment, Po

Todd Kramer 40 Nov 15, 2022
Photo Assessment using Core ML and Metal.

PhotoAssessment Photo Assessment (i.e. quality score) using Core ML and Metal. ?? Article 使用 Metal 和 Core ML 评价照片质量 Parallel Computation using MPS ??

杨萧玉 59 Dec 26, 2022
Classifying Images With Vision And Core ML

Classifying Images with Vision and Core ML Preprocess photos using the Vision framework and classify them with a Core ML model. Overview With the Core

Ivan Kolesov 2 Nov 15, 2022
iOS Core ML implementation of waifu2x

waifu2x on iOS Introduction This is a Core ML implementation of waifu2x. The target of this project is to run waifu2x models right on iOS devices even

Yi Xie 469 Jan 8, 2023
MXNet to Core ML - iOS sample app

Bring Machine Learning to iOS apps using Apache MXNet and Apple Core ML

Amazon Web Services - Labs 54 Nov 9, 2022
A demo for iOS machine learning framework : Core ML

CoreMLDemo A demo for iOS machine learning framework : Core ML Only Xcode9 and above are supported. Model Places205-GoogLeNet comes from [Apple Machin

null 32 Sep 16, 2022
This project is Text Recognition using Firebase built-in model on iOS

TextRecognition-MLKit This project is Text Recognition using Firebase built-in model on iOS. If you are interested in iOS + Machine Learning, visit he

tucan9389 80 Nov 17, 2022
TextDetection-CoreML - This project is Text Detection on iOS using Vision built-in model

This project is Text Detection on iOS using Vision built-in model. If you are interested in iOS + Machine Learning, visit here yo

tucan9389 61 Nov 17, 2022
Apple Developer Academy, Nano Challenge_2 Project

How Old Am I Key Features • Authors • Screenshots • Skills & Tech Stack • License ?? Project Title AI 나이측정 테스트, How Old Am I Key Features 얼굴 인식 선택된 이미

Jung Yunseong 2 Sep 14, 2022
Demo of using TensorFlow Lite on iOS

TensorFlowLiteiOS Demo of using TensorFlow Lite on iOS Use the image classification model mobilenet_quant_v1_224. This is an excerpt and arrangement o

MLBoy 4 Jan 27, 2022
Holistically-Nested Edge Detection (HED) using CoreML and Swift

HED-CoreML Holistically-Nested Edge Detection (HED) using CoreML and Swift This is the repo for tutorial, that contains an example application that ru

Andrey Volodin 101 Dec 25, 2022
The MobileNet neural network using Apple's new CoreML framework

MobileNet with CoreML This is the MobileNet neural network architecture from the paper MobileNets: Efficient Convolutional Neural Networks for Mobile

Matthijs Hollemans 698 Dec 4, 2022
The example project of inferencing Pose Estimation using Core ML

This project is Pose Estimation on iOS with Core ML. If you are interested in iOS + Machine Learning, visit here you can see various DEMOs. 한국어 README

tucan9389 636 Dec 19, 2022
Graph is a semantic database that is used to create data-driven applications.

Welcome to Graph Graph is a semantic database that is used to create data-driven applications. Download the latest sample. Features iCloud Support Mul

Cosmicmind 875 Oct 5, 2022
Represent and compare versions via semantic versioning (SemVer) in Swift

Version Version is a Swift Library, which enables to represent and compare semantic version numbers. It follows Semantic Versioning 2.0.0. The represe

Marius Rackwitz 175 Nov 29, 2022
Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

A Mobile Text-to-Image Search Powered by AI A minimal demo demonstrating semantic multimodal text-to-image search using pretrained vision-language mod

null 66 Jan 5, 2023
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Mobile Text-to-Image Search(MoTIS) MoTIS is a minimal demo demonstrating semantic multimodal text-to-image search using pretrained vision-language mod

Roy 66 Dec 2, 2022
A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

Illustration2Vec illustration2vec (i2v) is a simple library for estimating a set of tags and extracting semantic feature vectors from given illustrati

Masaki Saito 661 Dec 12, 2022
Example Xcode swift iOS project for Core Data + iCloud syncing

iCloudCoreDataStarter Hello, I'm Chad. For the last several months I have been working on Sticker Doodle, an app you should go download right now! In

Chad Etzel 521 Dec 27, 2022