GithubHelp home page GithubHelp logo

mazzzystar / queryable Goto Github PK

View Code? Open in Web Editor NEW
2.4K 15.0 390.0 1.96 MB

Run OpenAI's CLIP model on iOS to search photos.

Home Page: https://queryable.app

License: MIT License

Swift 85.42% Jupyter Notebook 14.58%
ios photos search swiftui clip-model semantic-search mobile natural-language-image-search

queryable's Introduction

Queryable

download-on-the-app-store

Queryable

The open-source code of Queryable, an iOS app, leverages the OpenAI's CLIP model to conduct offline searches in the 'Photos' album. Unlike the category-based search model built into the iOS Photos app, Queryable allows you to use natural language statements, such as a brown dog sitting on a bench, to search your album. Since it's offline, your album privacy won't be compromised by any company, including Apple or Google.

Blog | Website | App Store | 故事

How does it work?

  • Encode all album photos using the CLIP Image Encoder, compute image vectors, and save them.
  • For each new text query, compute the corresponding text vector using the Text Encoder.
  • Compare the similarity between this text vector and each image vector.
  • Rank and return the top K most similar results.

The process is as follows:

For more details, please refer to my blog: Run CLIP on iPhone to Search Photos.

PicQuery(Android)

download-on-the-app-store

The Android version(Code) developed by @greyovo, which supports both English and Chinese. See details in #12.

Run on Xcode

Download the ImageEncoder_float32.mlmodelc and TextEncoder_float32.mlmodelc from Google Drive. Clone this repo, put the downloaded models below CoreMLModels/ path and run Xcode, it should work.

Core ML Export

If you only want to run Queryable, you can skip this step and directly use the exported model from Google Drive. If you wish to implement Queryable that supports your own native language, or do some model quantization/acceleration work, here are some guidelines.

The trick is to separate the TextEncoder and ImageEncoder at the architecture level, and then load the model weights individually. Queryable uses the OpenAI ViT-B/32 model, and I wrote a Jupyter notebook to demonstrate how to separate, load, and export the Core ML model. The export results of the ImageEncoder's Core ML have a certain level of precision error, and more appropriate normalization parameters may be needed.

  • Update (2023/09/22): Thanks to jxiong22 for providing the scripts to convert the HuggingFace version of clip-vit-base-patch32. This has significantly reduced the precision error in the image encoder. For more details, see #18.

Contributions

Disclaimer: I am not a professional iOS engineer, please forgive my poor Swift code. You may focus only on the loading, computation, storage, and sorting of the model.

You can apply Queryable to your own product, but I don't recommend simply modifying the appearance and listing it on the App Store. If you are interested in optimizing certain aspects(such as #4, #5, #6, #10, #11, #12), feel free to submit a PR (Pull Request).

  • Thanks to Chris Buguet, the issue (#5) where devices below iPhone 11 couldn't run has been fixed.
  • greyovo has completed the Android app(#12) development: Google Play. The author stated that the code will be released in the future.
  • yujinqiu has developed the macOS version named Searchable(not open-sourced), which supports full-disk search. See #4

Thank you for your contribution : )

If you have any questions/suggestions, here are some contact methods: Discord | Twitter | Reddit: r/Queryable.

License

MIT License

Copyright (c) 2023 Ke Fang

queryable's People

Contributors

andforce avatar codingstyle avatar hkdalex avatar jxiong22 avatar mazzzystar avatar shrootbuck avatar yujinqiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

queryable's Issues

pytorch version and coremltools version?export failed with coremltools.

Hello,
I encountered a failure when exporting with coremltools:

Converting PyTorch Frontend ==> MIL Ops: 6%|▎ | 63/1131 [00:00<00:00, 10698.02 ops/s]
...
RuntimeError: PyTorch convert function for op 'unflatten' not implemented.

which versions of PyTorch and coremltools you are using?

My development environment:
m1 mac pro
python: 3.8.17
torch: 2.0.0
coremltools: 6.3.0

Thanks~

Abnormal operation on A12 and lower chips

Currently, Queryable does not support devices below the iPhone 11 (with the A13 chip). On these devices, indexing can be built normally, but the search results for any query are the same, and I haven't debugged the problem. If you find a solution, I would greatly appreciate it if you could submit a PR.

MacOS/Windows: supports searching the entire disk

This is a feature that I couldn't implement during the closed-source period. Currently, the Mac OS version only supports querying photos from the album. I hope to expand this to search for images across the entire disk.

As for Windows user, it should be entirely another app that could runs on Win system.

PR submissions are welcome.

PhotoSearchModel.swift: "cosine_similarity" function

Is this function actually correct?

func cosine_similarity(A: MLShapedArray<Float32>, B: MLShapedArray<Float32>) -> Float {
        let magnitude = vDSP.rootMeanSquare(A.scalars) * vDSP.rootMeanSquare(B.scalars)
        let dotarray = vDSP.dot(A.scalars, B.scalars)
        return  dotarray / magnitude
 }

This line does not seem to be calculating the magnitude of a vector which is supposed to the square root of the sum of squares (RSS) but is actually calculating the RMS (the square root of the mean sum of squares):

let magnitude = vDSP.rootMeanSquare(A.scalars) * vDSP.rootMeanSquare(B.scalars)

My suggested correction would be:

let magnitude = vDSP.sumOfSquares(A.scalars).squareRoot() * vDSP.sumOfSquares(B.scalars).squareRoot()

I am not good at this stuff but trying t figure it out.

Method of calculating similarity

I checked out class PhotoSearcherModel and class PhotoSearcher, and found that when calculating similarity, you end up using spherical_dist_loss, which contradicts the introduction of "Calculating cosine similarity" in the README diagram. I am a little confused and wonder if you could give some insight about that. :)

Thanks in advance.

To quickly locate photos in the album

There's a feature request I hope can be considered:
Currently, we can quickly search for the desired photo,
but this photo is sequential in the album.
I hope we can quickly locate this photo within the album,
or alternatively, directly display the consecutive photos in this app's interface.

Photo similarity look a little bit low

Hi there,
I'm try to print the top N similarity with the following output, sim is less than 0.5, but the result for human is right.
I'm not sure weather it's a bug or not.

photoID: E223B067-975A-49AF-AC42-6387EFC2C73D/L0/001, sim: 0.260636
photoID: E41F3701-3434-44E8-B40C-F7708C43EBA0/L0/001, sim: 0.257243
photoID: C0BD9F08-99D0-42E6-A1FA-275B6FCA4141/L0/001, sim: 0.256340
photoID: 287F7C32-D2B3-43D9-AB2C-A24E26CA4CD4/L0/001, sim: 0.251475
photoID: EDBFBAF6-4503-48FA-97DF-848A1F74C30A/L0/001, sim: 0.251110
photoID: AC47FDB3-4C68-4142-B0B5-40631EE0D41A/L0/001, sim: 0.249579
photoID: EB8D302D-C742-43E6-9ED0-156D669F1814/L0/001, sim: 0.249101
photoID: AF3ABD8F-4A75-4178-95C4-A7D218123BFF/L0/001, sim: 0.248550
photoID: 3412635C-AD4C-49FB-82BF-05552F964DAF/L0/001, sim: 0.247968
photoID: 78DBA56B-B2E1-4125-B746-66EF3319CC89/L0/001, sim: 0.247247

Support for some common features

We've received a lot of user feedback, among which the most frequently raised issues are:

  • Swipe left or right to view the previous/next photo
  • Select multiple photos for deletion
  • Support for indexing selected albums only / importing external albums for indexin

Perhaps someone is interested in the implementation of these features : )

Similarities seem wrong

I've adapted parts of this app to test CLIP on Mac and iOS devices, and when I evaluate the similarities of various texts to a given image, the values don't seem right. For example, given n texts, the softmax of the spherical distances or cosine similarities results in values that are all close to 1/n, even when one text should be a clear match for the image and another should not be.

Perhaps something in my adaptation is causing this error, but I wanted to ask if you've encountered this issue before.

Feature: Image Library Auto Indexing

The current version only creates an index during the first import. It does not update the index database when album pictures are updated, and only does so after uninstalling and reinstalling. It is necessary to add a feature for manual or automatic indexing

Adds favorites and saves to album features.

It seems impossible to select the original image when sharing photos on iOS, so I usually find the picture in the photo app, mark it as a favorite, and then send the picture from WeChat through the favorite album. I hope to add this feature

[Feature Request] Export calculated embeddings from the app

Thanks for making the app!

I'd like to get the CLIP embeddings for all my photos in iCloud, and it would be really convenient if the app supports exporting calculated embeddings and would save me a lot of time and compute, and it will allow users to explore their photos' semantics in alternative ways.

Would you consider adding this feature?

Android support

Are there any plans to support Android devices in the future?

Multiple languages supports

Because the CLIP model is language-dependent, if we use something like Multilingual-CLIP (supporting 40+ languages), the size of the Queryable app would exceed 1GB. And because of offline limitations, it's impossible to use a translation API, which is why Queryable currently only supports English.

I have added a script to export PyTorch models to Core ML. Therefore, if you're interested, you can train a CLIP model in your own language and integrate it into Queryable. New model additions are welcome : )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.