Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Last update: Jan 5, 2023

Related tags

Overview

A Mobile Text-to-Image Search Powered by AI

A minimal demo demonstrating semantic multimodal text-to-image search using pretrained vision-language models.

Features

text-to-image retrieval using semantic similarity search.
support different vector indexing strategies(linear scan and KMeans are now implemented).

Screenshot

All images in the gallery
Search with query Three cats

Install

Download the two TorchScript model files(text encoder, image encoder) into models folder and add them into the Xcode project.
Simply do 'pod install' and then open the generated .xcworkspace project file in XCode.

pod install

Todo

Basic features

Accessing to specified album or the whole photos
Asynchronous model loading and vectors computation

Indexing strategies

Linear indexing(persisted to file via built-in Data type)
KMeans indexing(persisted to file via NSMutableDictionary)
Ball-Tree indexing
Locality sensitive hashing indexing

Choices of semantic representation models

OpenAI's CLIP model
Integration of other multimodal retrieval models

Effiency

Reducing memory consumption of models

AsyncImage before iOS 15. Lightweight, pure SwiftUI Image view, that displays an image downloaded from URL, with auxiliary views and local cache.

URLImage URLImage is a SwiftUI view that displays an image downloaded from provided URL. URLImage manages downloading remote image and caching it loca

1k Jan 4, 2023

AYImageKit is a Swift Library for Async Image Downloading, Show Name's Initials and Can View image in Separate Screen.

AYImageKit AYImageKit is a Swift Library for Async Image Downloading. Features Async Image Downloading. Can Show Text Initials. Can have Custom Styles

11 Jan 10, 2022

Convert the image to hexadecimal to send the image to e-paper

ConvertImageToHex Convert the image to hexadecimal to send the image to e-paper Conversion Order // 0. hex로 변환할 이미지 var image = UIImage(named: "sample

0 Feb 26, 2022

Twitter Image Pipeline is a robust and performant image loading and caching framework for iOS clients

Twitter Image Pipeline (a.k.a. TIP) Background The Twitter Image Pipeline is a streamlined framework for fetching and storing images in an application

1.8k Dec 17, 2022

Image-cropper - Image cropper for iOS

Image-cropper Example To run the example project, clone the repo, and run pod in

0 Jan 6, 2022

An instagram-like image editor that can apply preset filters passed to it and customized editings to a binded image.

CZImageEditor CZImageEditor is an instagram-like image editor with clean and intuitive UI. It is pure swift and can apply preset filters and customize

8 Dec 16, 2022

RadarKit - The Radar Kit allowing you to locate places, trip neary by you Or it will help you to search out the people around you with the few lines of code

RadarKit Preview Discover the world 🌎 around you..!!! The Radar Kit allowing yo

6 Sep 20, 2022

FlickrSearchPhotos - Simple search photos application which uses Flickr REST API made in Swift

1 Jun 6, 2022

A SwiftUI app to filter & search runewords for Diablo II

Runewords App This small SwiftUI app have two purposes: Making a clean, fully SwiftUI app using all the latest iOS 16 / Xcode 14 features. Browse, sea

44 Dec 18, 2022

Comments

Cannot match up model encodings
Hi! Thanks for publishing this work, it's a great reference.

I'm trying to integrate a couple of different systems, and I need the model encodings to match. So far, I haven't been able to make that work:

Given this python;

device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device) image = preprocess(Image.open("image_1.png")).unsqueeze(0).float().to(device) text = clip.tokenize(["a face", "a dog", "a cat"]).to(device) with torch.no_grad(): image_features = model.encode_image(image) print(image_features.tolist()[0])

I'm trying to get the same array of floats out using Clip.mm's - (NSArray<NSNumber*>*)test_uiimagetomat:(UIImage*)image function. Try as I might, they always differ - and I'm not sure what the difference is. I can see that the cvt methods do the same as the image preprocess, then the normalise with the values from clip.

Here's some of the initial values from the python code above:

[0.3502497971057892, 0.0028706961311399937, -0.46749746799468994, -0.14868411421775818, -0.03139263391494751, -0.4536064863204956

And from the Swift:

[0.3193549513816833496, 0.0140316337347030640, -0.4410626888275146484, -0.0908056870102882385, -0.0415024310350418091, -0.4141347408294677734

I used the preview of the quicklook on debugging the iOS code to save the image from the UIImage to ensure the same image is being used. In both cases, I'm using the original vit-b-32 CLIP image encoding. Strangely, the numbers above are kind of similar - but not sure if that's coincidental.

Any advice?
opened by wabzqem 1

Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Related tags

Overview

A Mobile Text-to-Image Search Powered by AI

Features

Screenshot

Install

Todo

You might also like...

AsyncImage before iOS 15. Lightweight, pure SwiftUI Image view, that displays an image downloaded from URL, with auxiliary views and local cache.

AYImageKit is a Swift Library for Async Image Downloading, Show Name's Initials and Can View image in Separate Screen.

Convert the image to hexadecimal to send the image to e-paper

Twitter Image Pipeline is a robust and performant image loading and caching framework for iOS clients

Image-cropper - Image cropper for iOS

An instagram-like image editor that can apply preset filters passed to it and customized editings to a binded image.

RadarKit - The Radar Kit allowing you to locate places, trip neary by you Or it will help you to search out the people around you with the few lines of code

FlickrSearchPhotos - Simple search photos application which uses Flickr REST API made in Swift

A SwiftUI app to filter & search runewords for Diablo II

Comments

Cannot match up model encodings

Owner

Converts images to a textual representation.

add text(multiple line support) to imageView, edit, rotate or resize them as you want, then render the text on image

A complete Mac App: drag an image file to the top section and the bottom section will show you the text of any QRCodes in the image.

SwiftUI Image loading and Animation framework powered by SDWebImage

A smart and easy-to-use image masking and cutout SDK for mobile apps.

Style Art library process images using COREML with a set of pre trained machine learning models and convert them to Art style.

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

📷 A composable image editor using Core Image and Metal.

An image download extension of the image view written in Swift for iOS, tvOS and macOS.

📷 A composable image editor using Core Image and Metal.