GithubHelp home page GithubHelp logo

trendingtechnology / mtis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from drsy/motis

0.0 2.0 0.0 4.03 MB

Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Swift 81.61% Objective-C 2.43% Objective-C++ 15.48% Ruby 0.47%

mtis's Introduction

A Mobile Text-to-Image Search Powered by AI

A minimal demo demonstrating semantic multimodal text-to-image search using pretrained vision-language models.

Features

  1. text-to-image retrieval using semantic similarity search.
  2. support different vector indexing strategies(linear scan and KMeans are now implemented).

Screenshot

  • All images in the gallery all
  • Search with query Three cats search

Install

  1. Download the two TorchScript model files(text encoder, image encoder) into models folder and add them into the Xcode project.
  2. Required dependencies are defined in the Podfile. We use Cocapods to manage these dependencies. Simply do 'pod install' and then open the generated .xcworkspace project file in XCode.
pod install
  1. This demo by default load all images in the local photo gallery on your realphone or simulator. One can change it to a specified album by setting the albumName variable in getPhotos method and replacing assetResults in line 117 of GalleryInteractor.swift with photoAssets.

Todo

  • Basic features
  • Accessing to specified album or the whole photos
  • Asynchronous model loading and vectors computation
  • Indexing strategies
  • Linear indexing(persisted to file via built-in Data type)
  • KMeans indexing(persisted to file via NSMutableDictionary)
  • Ball-Tree indexing
  • Locality sensitive hashing indexing
  • Choices of semantic representation models
  • OpenAI's CLIP model
  • Integration of other multimodal retrieval models
  • Effiency
  • Reducing memory consumption of models(ViT/B-32 version of CLIP takes about 605MB for storage and 1GB for runtime on iPhone)

About us

This project is maintained by ADAPT lab from Shang Hai Jiao Tong University. We expect it to continually integrate more advanced features and better cross-modla search experience.

mtis's People

Contributors

drsy avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.