GithubHelp home page GithubHelp logo

Comments (6)

Anuradha-Uggi avatar Anuradha-Uggi commented on June 22, 2024 1

Thats true. Solely using UTMs might force the model to learn the similarity between visually dissimilar images. Will think about the other ways.

Thanks. I will check it out.

Sure. Thank you, Amar. That was a nice discussion with you. Good luck with your research!

from mixvpr.

amaralibey avatar amaralibey commented on June 22, 2024

Hello @Anuradha-Uggi,

We have discussed the loss function and online mining strategies in another paper, which can be found at (https://github.com/amaralibey/gsv-cities). The motivation behind collecting the GSV-Cities dataset was the lack of precise ground truth in existing datasets.

The MS-mining strategy involves dynamically mining hard positive pairs and hard negative pairs during the training process, specifically at the loss level. This strategy requires precise labels, where each image in the batch must have an ID. However, this label requirement is not applicable to the Pittsburgh dataset.

In the Pittsburgh dataset, there are queries and their POTENTIAL positives, meaning that most of these POTENTIAL images do not actually correspond to the same location as the query (which is why the authors use weak supervision, which uses the easiest positives, to guarantee that the positive image represent the same location as the query). Consequently, the presence of these potential images does not allow for effective mining of hard positives.

The provided code is not designed to train on the Pittsburgh dataset due to specific requirements for the batch size and labels. The expected batch size format is (P, K, C, H, W), where P represents the number of places and K represents the number of images per place. Additionally, each image needs to have a corresponding label or ID.

The results you are obtaining are a direct consequence of the modifications you have made to the code. Based on my understanding, it seems that you might be assigning different IDs to each image within the batch. As a result, the online miner is unable to find any positive pairs, leading to an empty list. This indicates that there are no informative pairs present in the batch, resulting in a 1.0 accuracy (indicating the absence of hard pairs). Consequently, the loss function receives zero pairs and generates a loss value of zero.

To summarize, the Multi Similarity miner cannot be directly utilized without significant modifications to its core functionality when working with the Pittsburgh dataset.

from mixvpr.

Anuradha-Uggi avatar Anuradha-Uggi commented on June 22, 2024

Hi. Thanks for the explanation. If I understood correctly in GSV-cities, we know that the samples belonging to the same place are positives and they form negatives with the samples from other places. The GSV-cities already know which are positives and which are negatives through assigning ids for the images. I think you use MS-mining to further refine the triplets/pairs to based on the extent of positivesness and negativeness through thresholding.

If I want to modify it for pitts30k training, how should we modify the code?

One thing I think of is to replace the NetVLAD layer in https://github.com/Nanne/pytorch-NetVlad with MixVPR model and the corresponding loss functions as well. Do you think it works?

from mixvpr.

amaralibey avatar amaralibey commented on June 22, 2024

It is challenging to establish a definitive way for determining the positive images for each query in Pitts30k. While the potential positive images are those within a 10-meter radius, there is no information available regarding their azimuth (heading or orientation). As a result, a significant portion of these potential positives may correspond to entirely different locations than the query itself. In light of this limitation, it becomes difficult to mine hard positives in the presence of all the false positives.

An alternative approach you can consider is modifying the MSLoss code to mine for the easiest positives while retaining the hard negatives (https://github.com/msight-tech/research-ms-loss/blob/master/ret_benchmark/losses/multi_similarity_loss.py). The positive pair mining is done at line 39 pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)] Notice that mining the positive depends on the value of the hardest negative (max(neg_pair)).

You'll need to mine for the easiest positives (instead of the hardest) all while taking into account the similarity of the hardest negatives. I suggest you thouroughly read MSLoss paper at this [LINK](https://github. com/MalongTech/research-ms-loss) before doing so.

For the NetVLAD layer, you can use the implementation you just mentionned, that's the one most researchers use when training on Pitts30k using PyTorch.

Good luck,

from mixvpr.

Anuradha-Uggi avatar Anuradha-Uggi commented on June 22, 2024

The main problem is as you said there is no obvious way to find the labels for the pitts30k dataset. We may have to rely on the UTMs provided. You can correct me here.

We ran the baseline MixVPR 4096 model with backbone resnet50 on Nordland. It gives R@1 76%. Whereas, you mentioned it as 58% in your paper. The test dataset I used has 27592 samples in the databases and the same number of queries having the seasonal (winter, summer) changes between the database and the query. Could you please confirm?

We trained the MixVPR on pitts30k with the hard ming strategy given in Nannes NetVLAD torch code. R@1 on the pitts30k test came a little down making our proposed approach a little better. But the MixVPR with pitts30k is still leading on Nordland.

from mixvpr.

amaralibey avatar amaralibey commented on June 22, 2024

@Anuradha-Uggi,

Relying solely on UTMs is not sufficient as they do not account for bearing (or orientation with respect to the North Pole). For instance, if two images are 5 meters apart based on UTMs, it does not guarantee that they depict the same location. One image could be facing north while the other might be facing south.

Regarding Nordland, we used the dataset provided by VRP-Bench (https://github.com/MubarizZaffar/VPR-Bench). I have developed a script and a notebook that explains and demonstrates how to test on it.

As for Pitts30k dataset, I have not personally attempted training on it. It is possible that your technique may yield better results than MixVPR in cases where the data is weakly labeled (e.g. Pitts30k-train or pitts250k-train).

from mixvpr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.