GithubHelp home page GithubHelp logo

vra / dinov2-retrieval Goto Github PK

View Code? Open in Web Editor NEW
49.0 2.0 6.0 574 KB

A cli program of image retrieval using dinov2

License: MIT License

Python 100.00%
ai computer-vision deep-learning dinov2 image-retrieval pytorch

dinov2-retrieval's Introduction

dinov2-retrieval

A program of image retrieval using dinov2

Some random results(left: query image, others: retrieved image from Caltech 256):

How to install

pip install dinov2_retrieval

Tested on:

  • MacOS 13.4
  • Windows 11
  • Ubuntu 20.04

How to use

All available options:

$ dinov2_retrieval -h
usage: dinov2_retrieval [-h] [-s {small,base,large,largest}] [-p MODEL_PATH] [-o OUTPUT_ROOT] -q QUERY -d DATABASE [-n NUM] [--size SIZE]
                        [-m {0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100}] [--disable-cache] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -s {small,base,large,largest}, --model-size {small,base,large,largest}
                        DinoV2 model type
  -p MODEL_PATH, --model-path MODEL_PATH
                        path to dinov2 model, useful when github is unavailable
  -o OUTPUT_ROOT, --output-root OUTPUT_ROOT
                        root folder to save output results
  -q QUERY, --query QUERY
                        path to a query image file or image folder
  -d DATABASE, --database DATABASE
                        path to the database image file or image folder
  -n NUM, --num NUM     How many images to show in retrieval results
  --size SIZE           image output size
  -m {0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100}, --margin {0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100}
                        margin size (in pixel) between concatenated images
  --disable-cache       don't cache database features, will extract features each time, quite time-consuming for large database
  -v, --verbose         show detailed logs

Generally, You only need to set -q (or --query) and -d (or --database) to run this program:

dinov2_retrieval -q /path/to/query/image -d /path/to/database

NOTE: both of them can be path to an image or a folder.

The results will be saved to output folder.

Debug info

add -v or --verbose to show debug info

Change model size

DINOv2 contains 4 types of pretrained models, you can use --model_size + one of [small, base, large, largest] to choose which model to use.

Use cached model

If the network to GitHub (where the DINOv2 models are stored) is unstable, you can set --model-path to cached model folder after downloading model during the first run.

TODO

  • [ ]. optimize visualization result, e.g., better background color
  • [ ]. More tech details

License

MIT

dinov2-retrieval's People

Contributors

vra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dinov2-retrieval's Issues

size mismatch for pos_embed

Hello, I have a question about DINOv2. Could you please help me?I instantiated a vit_small ViT model and tried to load the pretrained weights using the load_pretrained_weights function from utils. Here's the code I wrote:
self.vit_model = vits.dict'vit_small'
load_pretrained_weights(self.vit_model, 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth', None)
However, I encountered the following error:
Traceback (most recent call last):
File "/data/PycharmProjects/train.py", line 124, in
model = model(aff_classes=args.num_classes)
File "/data/PycharmProjects/models/locate.py", line 89, in init
load_pretrained_weights(self.vit_model, pretrained_url, None)
File "/data/PycharmProjects/models/dinov2/dinov2/utils/utils.py", line 32, in load_pretrained_weights
msg = model.load_state_dict(state_dict, strict=False)
File "/home/ustc/anaconda3/envs/locate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DinoVisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 384]) from checkpoint, the shape in current model is torch.Size([1, 257, 384]).

Could you please help me understand what might be causing this issue? Thank you for your assistance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.