mazzzystar / queryable Goto Github PK

View Code? Open in Web Editor NEW

2.4K 15.0 390.0 1.96 MB

Run OpenAI's CLIP model on iOS to search photos.

Home Page: https://queryable.app

License: MIT License

Swift 85.42% Jupyter Notebook 14.58%

ios photos search swiftui clip-model semantic-search mobile natural-language-image-search

queryable's Introduction

Queryable

The open-source code of Queryable, an iOS app, leverages the OpenAI's CLIP model to conduct offline searches in the 'Photos' album. Unlike the category-based search model built into the iOS Photos app, Queryable allows you to use natural language statements, such as a brown dog sitting on a bench, to search your album. Since it's offline, your album privacy won't be compromised by any company, including Apple or Google.

Blog | Website | App Store | 故事

How does it work?

Encode all album photos using the CLIP Image Encoder, compute image vectors, and save them.
For each new text query, compute the corresponding text vector using the Text Encoder.
Compare the similarity between this text vector and each image vector.
Rank and return the top K most similar results.

The process is as follows:

For more details, please refer to my blog: Run CLIP on iPhone to Search Photos.

PicQuery(Android)

The Android version(Code) developed by @greyovo, which supports both English and Chinese. See details in #12.

Run on Xcode

Download the ImageEncoder_float32.mlmodelc and TextEncoder_float32.mlmodelc from Google Drive. Clone this repo, put the downloaded models below CoreMLModels/ path and run Xcode, it should work.

Core ML Export

If you only want to run Queryable, you can skip this step and directly use the exported model from Google Drive. If you wish to implement Queryable that supports your own native language, or do some model quantization/acceleration work, here are some guidelines.

The trick is to separate the TextEncoder and ImageEncoder at the architecture level, and then load the model weights individually. Queryable uses the OpenAI ViT-B/32 model, and I wrote a Jupyter notebook to demonstrate how to separate, load, and export the Core ML model. The export results of the ImageEncoder's Core ML have a certain level of precision error, and more appropriate normalization parameters may be needed.

Update (2023/09/22): Thanks to jxiong22 for providing the scripts to convert the HuggingFace version of clip-vit-base-patch32. This has significantly reduced the precision error in the image encoder. For more details, see #18.

Contributions

Disclaimer: I am not a professional iOS engineer, please forgive my poor Swift code. You may focus only on the loading, computation, storage, and sorting of the model.

You can apply Queryable to your own product, but I don't recommend simply modifying the appearance and listing it on the App Store. If you are interested in optimizing certain aspects(such as #4, #5, #6, #10, #11, ~~#12~~), feel free to submit a PR (Pull Request).

Thanks to Chris Buguet, the issue (#5) where devices below iPhone 11 couldn't run has been fixed.
greyovo has completed the Android app(#12) development: Google Play. The author stated that the code will be released in the future.
yujinqiu has developed the macOS version named Searchable(not open-sourced), which supports full-disk search. See #4

Thank you for your contribution : )

If you have any questions/suggestions, here are some contact methods: Discord | Twitter | Reddit: r/Queryable.

License

MIT License

queryable's People

Contributors

Stargazers

Watchers

Forkers

gerrampard javaliker mny459 mileson tokage allwillcome baconwaffle telami robertbyran fujohnwang ahhaa liujun121533 xiaoker dogwars xygkevin dipsysu hichencha prnda y-projects ivoidcat su-yan daassh westbrookwang smallallen2628 sjclijie feng520ckx formerly yihleego zjc17 idlechen suzhenyu costcost yjmqn afwu frankan0 peyet viczhang6 lightman2 yunhei tangyiyong fja0kl tmgt jacob-lj supersadmin kaliscuit longbo666 chinaboy0618 zilan920 xukuanzhuo wangdashuaihenshuai losywee rome753 dogfood1 lcecho asionbo crawping jason6law nassergo nododo xchn liberize midas-studio gxfg feave wying111 nooooolan aohun gscsnm liuyidi2 flybread0 zexpp5 guidongyuan raochuan sander90 flya1ong yiiim tjerwinchen owen800q chenyuan-new tech-chao droppix zergmk2 litaooo v-mi jetwaves ixuzhi zk4 ksksks2222 rick-xia cowfox xsmuyu iamkomen huihuisang jiang20160402 iwzoo southsala xxxxjl leerojan dcell gjcfer

queryable's Issues

Synology(群晖) NAS really needs this feature to search photos.

Many photos are stored in the Synology NAS, but when there are too many, it becomes hard to find specific ones. I hope there can be a solution for this. Thank you

CoreMLModels:No such file or directory

After cloning the project to the local environment, the compilation shows a lack of CoreMLModels and Assets files, preventing local execution.

iOS 17 beta issue: abnormal search results.

In iOS 17 beta 3, regardless of the keyword used, the search results are always the same.

pytorch version and coremltools version？export failed with coremltools.

Hello,
I encountered a failure when exporting with coremltools:

Converting PyTorch Frontend ==> MIL Ops: 6%|▎ | 63/1131 [00:00<00:00, 10698.02 ops/s]
...
RuntimeError: PyTorch convert function for op 'unflatten' not implemented.

which versions of PyTorch and coremltools you are using?

My development environment:
m1 mac pro
python: 3.8.17
torch: 2.0.0
coremltools: 6.3.0

Thanks~

Abnormal operation on A12 and lower chips

Currently, Queryable does not support devices below the iPhone 11 (with the A13 chip). On these devices, indexing can be built normally, but the search results for any query are the same, and I haven't debugged the problem. If you find a solution, I would greatly appreciate it if you could submit a PR.

Is it possible to give support to devices running Android 9?

MacOS/Windows: supports searching the entire disk

This is a feature that I couldn't implement during the closed-source period. Currently, the Mac OS version only supports querying photos from the album. I hope to expand this to search for images across the entire disk.

As for Windows user, it should be entirely another app that could runs on Win system.

PR submissions are welcome.

google drive can not access

https://drive.google.com/drive/folders/12ze3UcqrXt9qeySGh_j_zWE-PWRDTzJv?usp=drive_link&pli=1

i want to download the model of queryable, but the google drive can not access

PhotoSearchModel.swift: "cosine_similarity" function

Is this function actually correct?

func cosine_similarity(A: MLShapedArray<Float32>, B: MLShapedArray<Float32>) -> Float {
        let magnitude = vDSP.rootMeanSquare(A.scalars) * vDSP.rootMeanSquare(B.scalars)
        let dotarray = vDSP.dot(A.scalars, B.scalars)
        return  dotarray / magnitude
 }

This line does not seem to be calculating the magnitude of a vector which is supposed to the square root of the sum of squares (RSS) but is actually calculating the RMS (the square root of the mean sum of squares):

let magnitude = vDSP.rootMeanSquare(A.scalars) * vDSP.rootMeanSquare(B.scalars)

My suggested correction would be:

let magnitude = vDSP.sumOfSquares(A.scalars).squareRoot() * vDSP.sumOfSquares(B.scalars).squareRoot()

I am not good at this stuff but trying t figure it out.

Could you please share the Chinese CLIP model used in 「寻隐」？

Using LSH to Accelerate Embedding Similarity Search

Why not first filter the images with LSH, and then perform cosine calculations with vectors?

Method of calculating similarity

I checked out class PhotoSearcherModel and class PhotoSearcher, and found that when calculating similarity, you end up using spherical_dist_loss, which contradicts the introduction of "Calculating cosine similarity" in the README diagram. I am a little confused and wonder if you could give some insight about that. :)

Thanks in advance.

To quickly locate photos in the album

There's a feature request I hope can be considered:
Currently, we can quickly search for the desired photo,
but this photo is sequential in the album.
I hope we can quickly locate this photo within the album,
or alternatively, directly display the consecutive photos in this app's interface.

Photo similarity look a little bit low

Hi there,
I'm try to print the top N similarity with the following output, sim is less than 0.5, but the result for human is right.
I'm not sure weather it's a bug or not.

photoID: E223B067-975A-49AF-AC42-6387EFC2C73D/L0/001, sim: 0.260636
photoID: E41F3701-3434-44E8-B40C-F7708C43EBA0/L0/001, sim: 0.257243
photoID: C0BD9F08-99D0-42E6-A1FA-275B6FCA4141/L0/001, sim: 0.256340
photoID: 287F7C32-D2B3-43D9-AB2C-A24E26CA4CD4/L0/001, sim: 0.251475
photoID: EDBFBAF6-4503-48FA-97DF-848A1F74C30A/L0/001, sim: 0.251110
photoID: AC47FDB3-4C68-4142-B0B5-40631EE0D41A/L0/001, sim: 0.249579
photoID: EB8D302D-C742-43E6-9ED0-156D669F1814/L0/001, sim: 0.249101
photoID: AF3ABD8F-4A75-4178-95C4-A7D218123BFF/L0/001, sim: 0.248550
photoID: 3412635C-AD4C-49FB-82BF-05552F964DAF/L0/001, sim: 0.247968
photoID: 78DBA56B-B2E1-4125-B746-66EF3319CC89/L0/001, sim: 0.247247

think about image-to-text

This project is amazing.I have an idea, what if we reverse the process and perform image-to-text analysis on a single image?For example:

a dog sitting next to a car

like this model: https://huggingface.co/nlpconnect/vit-gpt2-image-captioning

would that be feasible? I don't have much experience with large models and would like to ask for some design advice. Thank you :>

Support for some common features

We've received a lot of user feedback, among which the most frequently raised issues are:

Swipe left or right to view the previous/next photo
Select multiple photos for deletion
Support for indexing selected albums only / importing external albums for indexin

Perhaps someone is interested in the implementation of these features : )

Similarities seem wrong

I've adapted parts of this app to test CLIP on Mac and iOS devices, and when I evaluate the similarities of various texts to a given image, the values don't seem right. For example, given n texts, the softmax of the spherical distances or cosine similarities results in values that are all close to 1/n, even when one text should be a clear match for the image and another should not be.

Perhaps something in my adaptation is causing this error, but I wanted to ask if you've encountered this issue before.

目前已有一个win10版，哪位大佬能开发一个支持中文的win10版

目前已有一个win10版，https://github.com/EdVince/CLIP-ImageSearch-NCNN 但非常难用，作者没做持久化处理，也就是说关掉了就要重新扫描一遍图库，而且搜索结果只能出来一张图片，更接近demo
哪位大佬能开发一个支持中文的win10版

Feature: Image Library Auto Indexing

The current version only creates an index during the first import. It does not update the index database when album pictures are updated, and only does so after uninstalling and reinstalling. It is necessary to add a feature for manual or automatic indexing

Provide the `.mlmodel` / `.mlpackage` files instead of the `.mlmodelc` ones

Currently only the compiled CoreML models (.mlmodelc files) are provided in the Google Drive. Providing the .mlmodel or even better the .mlpackage files would be great since this would allow tweaking the network and playing with quantification options.

Adds favorites and saves to album features.

It seems impossible to select the original image when sharing photos on iOS, so I usually find the picture in the photo app, mark it as a favorite, and then send the picture from WeChat through the favorite album. I hope to add this feature

[Feature Request] Export calculated embeddings from the app

Thanks for making the app!

I'd like to get the CLIP embeddings for all my photos in iCloud, and it would be really convenient if the app supports exporting calculated embeddings and would save me a lot of time and compute, and it will allow users to explore their photos' semantics in alternative ways.

Would you consider adding this feature?

Android support

Are there any plans to support Android devices in the future?

need version for Windows ！

In great need.q(≧▽≦q)

Multiple languages supports

Because the CLIP model is language-dependent, if we use something like Multilingual-CLIP (supporting 40+ languages), the size of the Queryable app would exceed 1GB. And because of offline limitations, it's impossible to use a translation API, which is why Queryable currently only supports English.

I have added a script to export PyTorch models to Core ML. Therefore, if you're interested, you can train a CLIP model in your own language and integrate it into Queryable. New model additions are welcome : )