amaralibey / mixvpr Goto Github PK

MixVPR: Feature Mixing for Visual Place Recognition (WACV 2023)

Python 100.00%

image-based-localisation place-recognition visual-place-recognition localization loop-closure-detection pytorch pytorch-implementation relocalization slam visual-slam

mixvpr's Introduction

MixVPR: Feature Mixing for Visual Place Recognition

This is the official repo for WACV 2023 paper "MixVPR: Feature Mixing for Visual Place Recognition"

Summary

This paper introduces MixVPR, a novel all-MLP feature aggregation method that addresses the challenges of large-scale Visual Place Recognition, while remaining practical for real-world scenarios with strict latency requirements. The technique leverages feature maps from pre-trained backbones as a set of global features, and integrates a global relationship between them through a cascade of feature mixing, eliminating the need for local or pyramidal aggregation. MixVPR achieves new state-of-the-art performance on multiple large-scale benchmarks, while being significantly more efficient in terms of latency and parameter count compared to existing methods.

[WACV open access] [ArXiv]

Trained models

All models have been trained on GSV-Cities dataset (https://github.com/amaralibey/gsv-cities).

Weights

Backbone	Output dimension	Pitts250k-test			Pitts30k-test			MSLS-val			DOWNLOAD
Backbone	Output dimension	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	DOWNLOAD
ResNet50	4096	94.3	98.2	98.9	91.6	95.5	96.4	88.2	93.1	94.3	LINK
ResNet50	512	93.2	97.9	98.6	90.7	95.5	96.3	84.1	91.8	93.7	LINK
ResNet50	128	88.7	95.8	97.4	87.8	94.3	95.7	78.5	88.2	90.4	LINK

Code to load the pretrained weights is as follows:

from main import VPRModel

# Note that images must be resized to 320x320
model = VPRModel(backbone_arch='resnet50', 
                 layers_to_crop=[4],
                 agg_arch='MixVPR',
                 agg_config={'in_channels' : 1024,
                             'in_h' : 20,
                             'in_w' : 20,
                             'out_channels' : 1024,
                             'mix_depth' : 4,
                             'mlp_ratio' : 1,
                             'out_rows' : 4},
                )

state_dict = torch.load('./LOGS/resnet50_MixVPR_4096_channels(1024)_rows(4).ckpt')
model.load_state_dict(state_dict)
model.eval()

Bibtex

@inproceedings{ali2023mixvpr,
  title={MixVPR: Feature Mixing for Visual Place Recognition},
  author={Ali-bey, Amar and Chaib-draa, Brahim and Gigu{\`e}re, Philippe},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2998--3007},
  year={2023}
}

mixvpr's People

Contributors

Stargazers

Watchers

mixvpr's Issues

question about backbone ‘Swin’

Hi, there. Thanks for your amazing work！And I have some questions about the hyper-parameters in training with 'swin' backbone.
Would you please to show that the hyper-parameter (learning rate, etc. ) when using Swin transformer?

I replaced the backbone structure in the network with a VIT, which uses the same learning rate as training resnet. But the recall is very bad. I haven't frozen some weights... Do you have any suggestions on this?

Just for testing

Hello, excuse me if I want to test, I should be your pretraining weight loaded into the ./utils/validation.py file and run the validation.py? I didn't see the other test code

The error samples are due to issues with the ground truth annotations rather than errors in the model predictions.

Hello@amaralibey, I have selected the failure samples from the MSLS validation set, pitts30k test set and pitts250 test set where the recall1 of the mixvpr model failed. I found that for a large part of these error samples, the recall1 images given by the model are actually correct, but they are not in the ground truths. In other words, the cause of these error samples is the problem of the ground truths, not the errors of model prediction, which comparing image similarity cannot solve.

optimizer step

Dear @amaralibey,

I have some questions. I would be grateful if you could answer them.

In optimizer_step() function in main.py file you multiply lr by lr_scale till 650 steps but you didn't mention this part in paper. Do you do the same warm up for Adam and AdamW?
How do you select the learning rate for SGD, Adam, AdamW when you increase the batch size? For instance, some authors in self-supervised models select the learning rate by this formula: lr = base_lr * batch_size / 256 and base_lr can be 0.2, 0.3 and other values.
Do you use the same scheduler with the same settings for Adam and AdamW optimizers?
Did you use any framework to find the best hyperparameters?

Thanks for your attention!

How to evaluate this model?

Please add evaluation script

Some questions

Hello, I would like to know about whether MAP is available and how it should be calculated?

On releasing the trained weights of ResNet50 with 2048 dimensionality.

Hi, can you release or share the trained weights with 2048 dimensionality?
The main experiments in Table 1 are conducted with 2048 and 4096, but I can access only to one with 4096.

Thanks!

License File

Can you please provide a license file for the repo, just to make sure it is OK to use
Can I assume it is MIT license?
Thanks great work!

would you release the resnet18 pretrained model?

Hello, @amaralibey ,-I really appreciate for your work！Congratulations :)
would you release the resnet18 pretrained model? thanks.

GPU GB needed to train MixVPR

Hello, how much GPU memory is needed to train MixVPR with the full batch size of 120 quadruplets?

Where can I download the Pittsburgh-30k-test dataset ？

can you give me the url, thank you!

About the specific number of images of the Mapillary Challenge dataset

Hello, @amaralibey ,-I really appreciate for your work！Congratulations :)

I noticed that you also did Mapillary Challenge experiments in your paper. Could you please give the specific number and names of the Mapillary Challenge images(Database+Queries) like this link( https://github.com/amaralibey/MixVPR/tree/main/datasets/msls_val)?

I am always looking forward to your kind response.
Best regards.

How to visually learn weights from a subset of 24 neurons from the first feature mixer block？

In Figure 5 of the paper, you visualized the learned weights from a subset of 24 neurons in the first Feature-Mixer block.
This approach is very good and very convincing.
Could you share the code for visualization?

Are you visualizing the weight of this layer?

MixVPR/models/aggregators/mixvpr.py

Line 13 in 31de0c3

nn.Linear(in_dim, int(in_dim * mlp_ratio)),

in_dim=400
mlp_ratio=1
So there are 400 neurons in this layer, and if I'm using ResNet18, then each neuron will have 400*256 parameters. Is it correct that your approach is to extract 24 neurons from these 400 neurons, and display the parameters of each neuron as an image of size (400, 256), where positive weights are displayed as blue pixels and negative weights as red pixels?

Pretrained weights

Hi @amaralibey , do you have an estimate on when the pretrained weights will be publicly available? Even only the weights of the best model would be very useful :)

how to change query image shape?

Hi , thank you for your excellent work!
I noticed that the input image shape must be 320320, is there any pretrained weights trained with higer resulotion?
And, is that means all methods mentioned in paper were fed with 320320 image ?

Why is reload() called?

There are multiple calls to the reload() method in here.
What is the purpose of that method?

Problems in training？

Hi, Amar
Thanks for the great work.
When I train MixVPR，during 1.epoch，loss minimizes fastly，but during epoch2、3，loss doesn‘t fall，last epoch 4、5 loss maxmize fastly. Because the machine memory is too small, I set the batchsize to 20, and you set it to 120

hello，can you tell me how to predict image thanks,have a sample

A singleGPU will run the results, but multiple GPUs will make an error！！

Hi，Amar Ali-bey！

Your code is so awesome. It's so concise that it appeals to me ! !

When I was training with multiple Gpus, I encountered an error, Recall@n resulting in nan.

But I can use one GPU to run the project very well !!

So I am thinking that your code doesn't run well when somenone use multiple GPUs. Because the data is distrubuted in the multiple GPUs.

I hope you to consider my question, then please give me an answer.

why is the skip connection executed inside the FeatureMixerLayer here?

MixVPR/models/aggregators/mixvpr.py

Line 25 in 31de0c3

return x + self.mix(x)

Why is the difference operation skip connection inside the FeatureMixerLayer when stacking FeatureMixerLayers, instead of after stacking the FeatureMixerLayers like in ViT?

self.mix = nn.Sequential(*[
            FeatureMixerLayer(in_dim=hw, mlp_ratio=mlp_ratio)
            for _ in range(self.mix_depth)
        ])

The general practice is to execute the skip connection in the forward function after executing the above statement. However, why is the skip connection executed inside the FeatureMixerLayer here?

Multi-similarity mining on Pittsburgh30k training set

Hello Amar.

MixVPR is an amazing job. We are trying to train it on the pittsburgh30k dataset to compare it with our approaches. How does Multi-similarity mining work on the Pittsburgh 30k training set, where data loading gives a list of database+queries? Could you please clarify how to load the Pittsburgh training samples such that the triplets are mined error-free? I ran the Pittsburgh data code that you provided and made a few modifications. The main.py is running bug free but it's giving loss=0 and acc=1 for all epochs.

Many thanks
Anu

Dataset

Could you give me a link to download the dataset?

About pitts30k_val.mat. I can not find it in PittsburgDataset

Hi, Amar
Thanks for the great work!
I have a question about dataset. In your project which name is PittsburgDataset. But I can not find pitts30k_val.mat in PittsburgDataset that I downloaded.

can you provide more detailed comparative data?

can you provide more detailed comparative data?
For example, testing data on Tokyo24/7 and Pitts30k-test for methods such as net VLAD and MixVPR?

Really a good work with simple but effective approach!

Public code release?

Hi @amaralibey, many thanks for your great work, your paper looks very interesting! Are you planning to make the code publicly available?

Many thanks,
Tobi

When I use resnet18, I should manually modify the in of MixVPR in the source code_ Channels and out_ Channels?

When I use resnet18, the source code has automatically checked out in the ResNet class_ Modify the value of channels

MixVPR/models/backbones/resnet.py

Line 86 in 31de0c3

 self.out_channels = out_channels // 2 if self.model.layer4 is None else out_channels 

However, there is no code in MixVPR_ Channels and out_ Channels are automatically modified. Should I modify them manually at this time?

MixVPR/models/aggregators/mixvpr.py

Line 55 in 31de0c3

self.channel_proj = nn.Linear(in_channels, out_channels)

self.channel_proj = nn.Linear(in_channels, out_channels)
self.row_proj = nn.Linear(hw, out_rows)

I should change the value of in_channels to 256, out_ channels=hw=400，out_ Rows remains unchanged at 4，right?

Training loss and generalization during test

I tried tuning some training params and sometimes the training loss is lower than using the default params in the paper but generalizes worse on our own data. it seems like overfitting but I am not sure. Besides, the git history shows that you have tried using dropout. Does it help in your experiments?

Reproducing results on MSLS

Hello, I used your trained model on the datasets that you use, and I was able to reproduce your results on Pitts250k and Pitts30k.
However, when testing on MSLS the results I achieve are quite lower than the ones on the table: with your model I achieve a R@1 of 83.4, while you report a R@1 of 88.2.
Do you know why this could be the case? Did you perform some special pre-processing on MSLS, or perhaps did you remove queries that do not have any positives within 25 meters?

Why is 【batch_acc】 calculated in this way?

MixVPR/main.py

Line 127 in 31de0c3

batch_acc = 1.0 - (nb_mined/nb_samples)

Why is 【batch_acc】 calculated in this way?
When I put the batch_ Size set to 50, min_ img_ per_ place=4, nb_ Samples equals 200, nb_ Mined is also equal to 200, so batch_ Is acc equal to 0?

Releasing the model on torch.hub?

Hi @amaralibey! I'm finding your model quite useful for a number of projects, but it's always a bit cumbersome to insert the model's code into other codebases. It would be very useful if the model was on torch.hub, have you considered releasing it there? It is quite simple to do and allows people to use your model with two lines of code, allowing more people to use your model and helping to spread your work!

For example I did it for CosPlace, and the trained models can be automatically downloaded from anywhere without cloning the repo or importing the model just like this

import torch
model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)

Pytorch version

Hi,Can you share the version of pytorch and the version of pytortch_ Lightning version?

amaralibey / mixvpr Goto Github PK

mixvpr's Introduction

MixVPR: Feature Mixing for Visual Place Recognition

Summary

Trained models

Weights

Bibtex

mixvpr's People

Contributors

Stargazers

Watchers

Forkers

mixvpr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs