Hello Amar. MixVPR is an amazing job. We are trying to train it on t

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multi-similarity mining on Pittsburgh30k training set about mixvpr HOT 6 OPEN

amaralibey commented on June 22, 2024

Multi-similarity mining on Pittsburgh30k training set

from mixvpr.

Comments (6)

Anuradha-Uggi commented on June 22, 2024 1

Thats true. Solely using UTMs might force the model to learn the similarity between visually dissimilar images. Will think about the other ways.

Thanks. I will check it out.

Sure. Thank you, Amar. That was a nice discussion with you. Good luck with your research!

from mixvpr.

amaralibey commented on June 22, 2024

Hello @Anuradha-Uggi,

We have discussed the loss function and online mining strategies in another paper, which can be found at (https://github.com/amaralibey/gsv-cities). The motivation behind collecting the GSV-Cities dataset was the lack of precise ground truth in existing datasets.

The MS-mining strategy involves dynamically mining hard positive pairs and hard negative pairs during the training process, specifically at the loss level. This strategy requires precise labels, where each image in the batch must have an ID. However, this label requirement is not applicable to the Pittsburgh dataset.

In the Pittsburgh dataset, there are queries and their POTENTIAL positives, meaning that most of these POTENTIAL images do not actually correspond to the same location as the query (which is why the authors use weak supervision, which uses the easiest positives, to guarantee that the positive image represent the same location as the query). Consequently, the presence of these potential images does not allow for effective mining of hard positives.

The provided code is not designed to train on the Pittsburgh dataset due to specific requirements for the batch size and labels. The expected batch size format is (P, K, C, H, W), where P represents the number of places and K represents the number of images per place. Additionally, each image needs to have a corresponding label or ID.

The results you are obtaining are a direct consequence of the modifications you have made to the code. Based on my understanding, it seems that you might be assigning different IDs to each image within the batch. As a result, the online miner is unable to find any positive pairs, leading to an empty list. This indicates that there are no informative pairs present in the batch, resulting in a 1.0 accuracy (indicating the absence of hard pairs). Consequently, the loss function receives zero pairs and generates a loss value of zero.

To summarize, the Multi Similarity miner cannot be directly utilized without significant modifications to its core functionality when working with the Pittsburgh dataset.

from mixvpr.

Anuradha-Uggi commented on June 22, 2024

Hi. Thanks for the explanation. If I understood correctly in GSV-cities, we know that the samples belonging to the same place are positives and they form negatives with the samples from other places. The GSV-cities already know which are positives and which are negatives through assigning ids for the images. I think you use MS-mining to further refine the triplets/pairs to based on the extent of positivesness and negativeness through thresholding.

If I want to modify it for pitts30k training, how should we modify the code?

One thing I think of is to replace the NetVLAD layer in https://github.com/Nanne/pytorch-NetVlad with MixVPR model and the corresponding loss functions as well. Do you think it works?

from mixvpr.

amaralibey commented on June 22, 2024

It is challenging to establish a definitive way for determining the positive images for each query in Pitts30k. While the potential positive images are those within a 10-meter radius, there is no information available regarding their azimuth (heading or orientation). As a result, a significant portion of these potential positives may correspond to entirely different locations than the query itself. In light of this limitation, it becomes difficult to mine hard positives in the presence of all the false positives.

An alternative approach you can consider is modifying the MSLoss code to mine for the easiest positives while retaining the hard negatives (https://github.com/msight-tech/research-ms-loss/blob/master/ret_benchmark/losses/multi_similarity_loss.py). The positive pair mining is done at line 39 pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)] Notice that mining the positive depends on the value of the hardest negative (max(neg_pair)).

You'll need to mine for the easiest positives (instead of the hardest) all while taking into account the similarity of the hardest negatives. I suggest you thouroughly read MSLoss paper at this [LINK](https://github. com/MalongTech/research-ms-loss) before doing so.

For the NetVLAD layer, you can use the implementation you just mentionned, that's the one most researchers use when training on Pitts30k using PyTorch.

Good luck,

from mixvpr.

Anuradha-Uggi commented on June 22, 2024

The main problem is as you said there is no obvious way to find the labels for the pitts30k dataset. We may have to rely on the UTMs provided. You can correct me here.

We ran the baseline MixVPR 4096 model with backbone resnet50 on Nordland. It gives R@1 76%. Whereas, you mentioned it as 58% in your paper. The test dataset I used has 27592 samples in the databases and the same number of queries having the seasonal (winter, summer) changes between the database and the query. Could you please confirm?

We trained the MixVPR on pitts30k with the hard ming strategy given in Nannes NetVLAD torch code. R@1 on the pitts30k test came a little down making our proposed approach a little better. But the MixVPR with pitts30k is still leading on Nordland.

from mixvpr.

amaralibey commented on June 22, 2024

@Anuradha-Uggi,

Relying solely on UTMs is not sufficient as they do not account for bearing (or orientation with respect to the North Pole). For instance, if two images are 5 meters apart based on UTMs, it does not guarantee that they depict the same location. One image could be facing north while the other might be facing south.

Regarding Nordland, we used the dataset provided by VRP-Bench (https://github.com/MubarizZaffar/VPR-Bench). I have developed a script and a notebook that explains and demonstrates how to test on it.

As for Pitts30k dataset, I have not personally attempted training on it. It is possible that your technique may yield better results than MixVPR in cases where the data is weakly labeled (e.g. Pitts30k-train or pitts250k-train).

from mixvpr.

Multi-similarity mining on Pittsburgh30k training set about mixvpr HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs