Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Overview

A Mobile Text-to-Image Search Powered by AI

A minimal demo demonstrating semantic multimodal text-to-image search using pretrained vision-language models.

Features

  1. text-to-image retrieval using semantic similarity search.
  2. support different vector indexing strategies(linear scan and KMeans are now implemented).

Screenshot

  • All images in the gallery all
  • Search with query Three cats search

Install

  1. Download the two TorchScript model files(text encoder, image encoder) into models folder and add them into the Xcode project.
  2. Simply do 'pod install' and then open the generated .xcworkspace project file in XCode.
pod install

Todo

  • Basic features
  • Accessing to specified album or the whole photos
  • Asynchronous model loading and vectors computation
  • Indexing strategies
  • Linear indexing(persisted to file via built-in Data type)
  • KMeans indexing(persisted to file via NSMutableDictionary)
  • Ball-Tree indexing
  • Locality sensitive hashing indexing
  • Choices of semantic representation models
  • OpenAI's CLIP model
  • Integration of other multimodal retrieval models
  • Effiency
  • Reducing memory consumption of models
You might also like...
AsyncImage before iOS 15. Lightweight, pure SwiftUI Image view, that displays an image downloaded from URL, with auxiliary views and local cache.

URLImage URLImage is a SwiftUI view that displays an image downloaded from provided URL. URLImage manages downloading remote image and caching it loca

AYImageKit is a Swift Library for Async Image Downloading, Show Name's Initials and Can View image in Separate Screen.
AYImageKit is a Swift Library for Async Image Downloading, Show Name's Initials and Can View image in Separate Screen.

AYImageKit AYImageKit is a Swift Library for Async Image Downloading. Features Async Image Downloading. Can Show Text Initials. Can have Custom Styles

Convert the image to hexadecimal to send the image to e-paper

ConvertImageToHex Convert the image to hexadecimal to send the image to e-paper Conversion Order // 0. hex로 변환할 이미지 var image = UIImage(named: "sample

Twitter Image Pipeline is a robust and performant image loading and caching framework for iOS clients

Twitter Image Pipeline (a.k.a. TIP) Background The Twitter Image Pipeline is a streamlined framework for fetching and storing images in an application

Image-cropper - Image cropper for iOS

Image-cropper Example To run the example project, clone the repo, and run pod in

An instagram-like image editor that can apply preset filters passed to it and customized editings to a binded image.
An instagram-like image editor that can apply preset filters passed to it and customized editings to a binded image.

CZImageEditor CZImageEditor is an instagram-like image editor with clean and intuitive UI. It is pure swift and can apply preset filters and customize

RadarKit - The Radar Kit allowing you to locate places, trip neary by you Or it will help you to search out the people around you with the few lines of code FlickrSearchPhotos - Simple search photos application which uses Flickr REST API made in Swift
FlickrSearchPhotos - Simple search photos application which uses Flickr REST API made in Swift

FlickrSearchPhotos - Simple search photos application which uses Flickr REST API made in Swift

A SwiftUI app to filter & search runewords for Diablo II
A SwiftUI app to filter & search runewords for Diablo II

Runewords App This small SwiftUI app have two purposes: Making a clean, fully SwiftUI app using all the latest iOS 16 / Xcode 14 features. Browse, sea

Comments
  • Cannot match up model encodings

    Cannot match up model encodings

    Hi! Thanks for publishing this work, it's a great reference.

    I'm trying to integrate a couple of different systems, and I need the model encodings to match. So far, I haven't been able to make that work:

    Given this python;

    device = "cuda" if torch.cuda.is_available() else "cpu"
    model, preprocess = clip.load("ViT-B/32", device=device)
    
    image = preprocess(Image.open("image_1.png")).unsqueeze(0).float().to(device)
    text = clip.tokenize(["a face", "a dog", "a cat"]).to(device)
    
    with torch.no_grad():
        image_features = model.encode_image(image)
        print(image_features.tolist()[0])
    

    I'm trying to get the same array of floats out using Clip.mm's - (NSArray<NSNumber*>*)test_uiimagetomat:(UIImage*)image function. Try as I might, they always differ - and I'm not sure what the difference is. I can see that the cvt methods do the same as the image preprocess, then the normalise with the values from clip.

    Here's some of the initial values from the python code above:

    [0.3502497971057892, 0.0028706961311399937, -0.46749746799468994, -0.14868411421775818, -0.03139263391494751, -0.4536064863204956
    

    And from the Swift:

    [0.3193549513816833496, 0.0140316337347030640, -0.4410626888275146484, -0.0908056870102882385, -0.0415024310350418091, -0.4141347408294677734
    

    I used the preview of the quicklook on debugging the iOS code to save the image from the UIImage to ensure the same image is being used. In both cases, I'm using the original vit-b-32 CLIP image encoding. Strangely, the numbers above are kind of similar - but not sure if that's coincidental.

    Any advice?

    opened by wabzqem 1
Owner
null
Converts images to a textual representation.

ConsoleApp7 Essentially, this suite of programs converts images to text, which is made to resemble the original image. There are three targets in this

Cedric 1 Nov 11, 2021
add text(multiple line support) to imageView, edit, rotate or resize them as you want, then render the text on image

StickerTextView is an subclass of UIImageView. You can add multiple text to it, edit, rotate, resize the text as you want with one finger, then render the text on Image.

Textcat 478 Dec 17, 2022
A complete Mac App: drag an image file to the top section and the bottom section will show you the text of any QRCodes in the image.

QRDecode A complete Mac App: drag an image file to the top section and the bottom section will show you the text of any QRCodes in the image. QRDecode

David Phillip Oster 2 Oct 28, 2022
SwiftUI Image loading and Animation framework powered by SDWebImage

SDWebImageSwiftUI What's for SDWebImageSwiftUI is a SwiftUI image loading framework, which based on SDWebImage. It brings all your favorite features f

null 1.6k Jan 6, 2023
A smart and easy-to-use image masking and cutout SDK for mobile apps.

TinyCrayon SDK for iOS A smart and easy-to-use image masking and cutout SDK for mobile apps. TinyCrayon SDK provides tools for adding image cutout and

null 1.8k Dec 30, 2022
Style Art library process images using COREML with a set of pre trained machine learning models and convert them to Art style.

StyleArt Style Art is a library that process images using COREML with a set of pre trained machine learning models and convert them to Art style. Prev

iLeaf Solutions Pvt. Ltd. 222 Dec 17, 2022
Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

Open nsfw model This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blo

Yahoo 5.6k Jan 5, 2023
📷 A composable image editor using Core Image and Metal.

Brightroom - Composable image editor - building your own UI Classic Image Editor PhotosCrop Face detection Masking component ?? v2.0.0-alpha now open!

Muukii 2.8k Jan 3, 2023
An image download extension of the image view written in Swift for iOS, tvOS and macOS.

Moa, an image downloader written in Swift for iOS, tvOS and macOS Moa is an image download library written in Swift. It allows to download and show an

Evgenii Neumerzhitckii 330 Sep 9, 2022
📷 A composable image editor using Core Image and Metal.

Brightroom - Composable image editor - building your own UI Classic Image Editor PhotosCrop Face detection Masking component ?? v2.0.0-alpha now open!

Muukii 2.8k Jan 2, 2023