GithubHelp home page GithubHelp logo

Comments (6)

amaralibey avatar amaralibey commented on September 24, 2024

Hi @wpumain,

In the context of visual place recognition, the batch accuracy metric denotes the proportion of positive pairs that the mining process has identified as uninformative.
It's easier to refer to classification to better understand. In classification, batch accuracy refers to the ratio of correctly classified examples within a batch. Similarly, in visual place recognition, we apply this analogy by assessing how well the neural network distinguishes between images of a place X and images of the other places within the mini batch.

For example, when we use the triplet margin loss function, we form triplets of images (A, P, N) where A is an anchor image, P is the positive image that represents the same place as A, and N is a negative image that represents a different place. We want the network to generate similar representations for A and P and a dissimilar representation for N. We do so by imposing a margin-based triplet constraint to encourage the network to learn representations that minimize the distance between the positive pair (A,P) while maximizing the distance between the negative pair (A,N), BUT ONLY when the representation of the negative image N is within a certain distance m compared to the anchor image A. This is expressed mathematically as minimize{ d(A,P) - d(A,N) + m }.

So, when the representation of the negative image N is beyond the margin m, it's deemed uninformative, which means the network already distinguishes it very well from the positive one. Therefore we don't take it into account when calculating the loss value (we don't mine it, so it's not counted in nb_mined).

Now that being said, nb_mined will contain the number of mined samples (number of informative samples), and so nb_mined/nb_samples is the fraction of informative samples within the mini-batch (the difficult samples that will be used to calculate the final value of the loss). To find the number of uninformative samples (the samples that the network easily distinguishes, same analogy as classification) we just take the complementary to 1, i.e., 1.0 - (nb_mined/nb_samples)

from mixvpr.

wpumain avatar wpumain commented on September 24, 2024

Thank you for your help, I understand a little bit. "Uninformative samples" refer to the samples that the network can easily distinguish, which is equivalent to correctly classified samples in classification. "Informative samples" refer to the difficult samples, which are equivalent to samples that are not correctly classified in classification. Is that right?

However, I have one question. When a sample is classified as "informative", it only means that the model finds it difficult to distinguish, but it does not necessarily mean that the model cannot distinguish it correctly. Therefore, using "nb_mined" to represent the number of incorrectly predicted samples in the mini-batch may not be precise?

from mixvpr.

wpumain avatar wpumain commented on September 24, 2024

miner_outputs = self.miner(descriptors, labels)

MixVPR/main.py

Line 120 in 31de0c3

miner_outputs = self.miner(descriptors, labels)

How to understand the return value "miner_outputs" of "self.miner(descriptors, labels)" in Python?

During the execution, I found that "miner_outputs" is a tuple containing four elements: the first and second are "tensor(598,)" and the third and fourth are "tensor(37225,)". I understand the mathematical principle of the Multi-Similarity Loss, but I just can't understand the result of "miner_outputs" and how to calculate the loss based on it. Could you help me with this?

from mixvpr.

amaralibey avatar amaralibey commented on September 24, 2024

Hello @wpumain,
Sorry for the late answer,

Q: "However, I have one question. When a sample is classified as "informative", it only means that the model finds it difficult to distinguish, but it does not necessarily mean that the model cannot distinguish it correctly. Therefore, using "nb_mined" to represent the number of incorrectly predicted samples in the mini-batch may not be precise?"

A: nb_mined is not utilized in the learning process, which means it does not contribute to the loss function. Instead, it provides valuable insights into the model's behavior. This is significant because if the model struggles at the batch level, it is likely to have difficulty making accurate distinctions globally. Essentially, nb_mined serves as an indicator that the model is learning something, but it does not have a direct impact on the learning process. If deisred, you can come up with a different formula or simply ignore nb_mined.

from mixvpr.

amaralibey avatar amaralibey commented on September 24, 2024

miner_outputs = self.miner(descriptors, labels)

MixVPR/main.py

Line 120 in 31de0c3

miner_outputs = self.miner(descriptors, labels)

How to understand the return value "miner_outputs" of "self.miner(descriptors, labels)" in Python?
During the execution, I found that "miner_outputs" is a tuple containing four elements: the first and second are "tensor(598,)" and the third and fourth are "tensor(37225,)". I understand the mathematical principle of the Multi-Similarity Loss, but I just can't understand the result of "miner_outputs" and how to calculate the loss based on it. Could you help me with this?

These numbers represent the number of positive and negative pairs respectively, that were identified as informative by the miner.

from mixvpr.

wpumain avatar wpumain commented on September 24, 2024

Think you

from mixvpr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.