GithubHelp home page GithubHelp logo

amaralibey / mixvpr Goto Github PK

View Code? Open in Web Editor NEW
188.0 6.0 21.0 1.08 MB

MixVPR: Feature Mixing for Visual Place Recognition (WACV 2023)

Python 100.00%
image-based-localisation place-recognition visual-place-recognition localization loop-closure-detection pytorch pytorch-implementation relocalization slam visual-slam

mixvpr's Introduction

MixVPR: Feature Mixing for Visual Place Recognition

PWC PWC PWC PWC PWC PWC

This is the official repo for WACV 2023 paper "MixVPR: Feature Mixing for Visual Place Recognition"

Summary

This paper introduces MixVPR, a novel all-MLP feature aggregation method that addresses the challenges of large-scale Visual Place Recognition, while remaining practical for real-world scenarios with strict latency requirements. The technique leverages feature maps from pre-trained backbones as a set of global features, and integrates a global relationship between them through a cascade of feature mixing, eliminating the need for local or pyramidal aggregation. MixVPR achieves new state-of-the-art performance on multiple large-scale benchmarks, while being significantly more efficient in terms of latency and parameter count compared to existing methods.

[WACV open access] [ArXiv]

architecture

Trained models

All models have been trained on GSV-Cities dataset (https://github.com/amaralibey/gsv-cities).

performance

Weights

Backbone Output
dimension
Pitts250k-test Pitts30k-test MSLS-val DOWNLOAD
R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10
ResNet50 4096 94.3 98.2 98.9 91.6 95.5 96.4 88.2 93.1 94.3 LINK
ResNet50 512 93.2 97.9 98.6 90.7 95.5 96.3 84.1 91.8 93.7 LINK
ResNet50 128 88.7 95.8 97.4 87.8 94.3 95.7 78.5 88.2 90.4 LINK

Code to load the pretrained weights is as follows:

from main import VPRModel

# Note that images must be resized to 320x320
model = VPRModel(backbone_arch='resnet50', 
                 layers_to_crop=[4],
                 agg_arch='MixVPR',
                 agg_config={'in_channels' : 1024,
                             'in_h' : 20,
                             'in_w' : 20,
                             'out_channels' : 1024,
                             'mix_depth' : 4,
                             'mlp_ratio' : 1,
                             'out_rows' : 4},
                )

state_dict = torch.load('./LOGS/resnet50_MixVPR_4096_channels(1024)_rows(4).ckpt')
model.load_state_dict(state_dict)
model.eval()

Bibtex

@inproceedings{ali2023mixvpr,
  title={MixVPR: Feature Mixing for Visual Place Recognition},
  author={Ali-bey, Amar and Chaib-draa, Brahim and Gigu{\`e}re, Philippe},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2998--3007},
  year={2023}
}

mixvpr's People

Contributors

amaralibey avatar zafirshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mixvpr's Issues

question about backbone ‘Swin’

Hi, there. Thanks for your amazing work!And I have some questions about the hyper-parameters in training with 'swin' backbone.
Would you please to show that the hyper-parameter (learning rate, etc. ) when using Swin transformer?

I replaced the backbone structure in the network with a VIT, which uses the same learning rate as training resnet. But the recall is very bad. I haven't frozen some weights... Do you have any suggestions on this?

Just for testing

Hello, excuse me if I want to test, I should be your pretraining weight loaded into the ./utils/validation.py file and run the validation.py? I didn't see the other test code

The error samples are due to issues with the ground truth annotations rather than errors in the model predictions.

Hello@amaralibey, I have selected the failure samples from the MSLS validation set, pitts30k test set and pitts250 test set where the recall1 of the mixvpr model failed. I found that for a large part of these error samples, the recall1 images given by the model are actually correct, but they are not in the ground truths. In other words, the cause of these error samples is the problem of the ground truths, not the errors of model prediction, which comparing image similarity cannot solve.

optimizer step

Dear @amaralibey,

I have some questions. I would be grateful if you could answer them.

  1. In optimizer_step() function in main.py file you multiply lr by lr_scale till 650 steps but you didn't mention this part in paper. Do you do the same warm up for Adam and AdamW?
  2. How do you select the learning rate for SGD, Adam, AdamW when you increase the batch size? For instance, some authors in self-supervised models select the learning rate by this formula: lr = base_lr * batch_size / 256 and base_lr can be 0.2, 0.3 and other values.
  3. Do you use the same scheduler with the same settings for Adam and AdamW optimizers?
  4. Did you use any framework to find the best hyperparameters?

Thanks for your attention!

Some questions

Hello, I would like to know about whether MAP is available and how it should be calculated?

License File

Can you please provide a license file for the repo, just to make sure it is OK to use
Can I assume it is MIT license?
Thanks great work!

How to visually learn weights from a subset of 24 neurons from the first feature mixer block?

In Figure 5 of the paper, you visualized the learned weights from a subset of 24 neurons in the first Feature-Mixer block.
This approach is very good and very convincing.
Could you share the code for visualization?

Are you visualizing the weight of this layer?

nn.Linear(in_dim, int(in_dim * mlp_ratio)),

in_dim=400
mlp_ratio=1
So there are 400 neurons in this layer, and if I'm using ResNet18, then each neuron will have 400*256 parameters. Is it correct that your approach is to extract 24 neurons from these 400 neurons, and display the parameters of each neuron as an image of size (400, 256), where positive weights are displayed as blue pixels and negative weights as red pixels?

Pretrained weights

Hi @amaralibey , do you have an estimate on when the pretrained weights will be publicly available? Even only the weights of the best model would be very useful :)

how to change query image shape?

Hi , thank you for your excellent work!
I noticed that the input image shape must be 320320, is there any pretrained weights trained with higer resulotion?
And, is that means all methods mentioned in paper were fed with 320
320 image ?

Problems in training?

Hi, Amar
Thanks for the great work.
When I train MixVPR,during 1.epoch,loss minimizes fastly,but during epoch2、3,loss doesn‘t fall,last epoch 4、5 loss maxmize fastly. Because the machine memory is too small, I set the batchsize to 20, and you set it to 120
image
image

A singleGPU will run the results, but multiple GPUs will make an error!!

Hi,Amar Ali-bey!

Your code is so awesome. It's so concise that it appeals to me ! !

When I was training with multiple Gpus, I encountered an error, Recall@n resulting in nan.

Snipaste_2023-04-24_14-26-00
Snipaste_2023-04-24_14-25-39
Snipaste_2023-04-23_20-04-37

But I can use one GPU to run the project very well !!
Snipaste_2023-04-24_14-39-39

So I am thinking that your code doesn't run well when somenone use multiple GPUs. Because the data is distrubuted in the multiple GPUs.

I hope you to consider my question, then please give me an answer.

why is the skip connection executed inside the FeatureMixerLayer here?

return x + self.mix(x)

Why is the difference operation skip connection inside the FeatureMixerLayer when stacking FeatureMixerLayers, instead of after stacking the FeatureMixerLayers like in ViT?

self.mix = nn.Sequential(*[
            FeatureMixerLayer(in_dim=hw, mlp_ratio=mlp_ratio)
            for _ in range(self.mix_depth)
        ])

The general practice is to execute the skip connection in the forward function after executing the above statement. However, why is the skip connection executed inside the FeatureMixerLayer here?

Multi-similarity mining on Pittsburgh30k training set

Hello Amar.

MixVPR is an amazing job. We are trying to train it on the pittsburgh30k dataset to compare it with our approaches. How does Multi-similarity mining work on the Pittsburgh 30k training set, where data loading gives a list of database+queries? Could you please clarify how to load the Pittsburgh training samples such that the triplets are mined error-free? I ran the Pittsburgh data code that you provided and made a few modifications. The main.py is running bug free but it's giving loss=0 and acc=1 for all epochs.

Many thanks
Anu

Dataset

Could you give me a link to download the dataset?

Public code release?

Hi @amaralibey, many thanks for your great work, your paper looks very interesting! Are you planning to make the code publicly available?

Many thanks,
Tobi

When I use resnet18, I should manually modify the in of MixVPR in the source code_ Channels and out_ Channels?

When I use resnet18, the source code has automatically checked out in the ResNet class_ Modify the value of channels

self.out_channels = out_channels // 2 if self.model.layer4 is None else out_channels

However, there is no code in MixVPR_ Channels and out_ Channels are automatically modified. Should I modify them manually at this time?

self.channel_proj = nn.Linear(in_channels, out_channels)

self.channel_proj = nn.Linear(in_channels, out_channels)
self.row_proj = nn.Linear(hw, out_rows)

I should change the value of in_channels to 256, out_ channels=hw=400,out_ Rows remains unchanged at 4,right?

Training loss and generalization during test

I tried tuning some training params and sometimes the training loss is lower than using the default params in the paper but generalizes worse on our own data. it seems like overfitting but I am not sure. Besides, the git history shows that you have tried using dropout. Does it help in your experiments?

Reproducing results on MSLS

Hello, I used your trained model on the datasets that you use, and I was able to reproduce your results on Pitts250k and Pitts30k.
However, when testing on MSLS the results I achieve are quite lower than the ones on the table: with your model I achieve a R@1 of 83.4, while you report a R@1 of 88.2.
Do you know why this could be the case? Did you perform some special pre-processing on MSLS, or perhaps did you remove queries that do not have any positives within 25 meters?

Releasing the model on torch.hub?

Hi @amaralibey! I'm finding your model quite useful for a number of projects, but it's always a bit cumbersome to insert the model's code into other codebases. It would be very useful if the model was on torch.hub, have you considered releasing it there? It is quite simple to do and allows people to use your model with two lines of code, allowing more people to use your model and helping to spread your work!

For example I did it for CosPlace, and the trained models can be automatically downloaded from anywhere without cloning the repo or importing the model just like this

import torch
model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)

Pytorch version

Hi,Can you share the version of pytorch and the version of pytortch_ Lightning version?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.