Comments (6)
Hi @wpumain,
In the context of visual place recognition, the batch accuracy metric denotes the proportion of positive pairs that the mining process has identified as uninformative.
It's easier to refer to classification to better understand. In classification, batch accuracy refers to the ratio of correctly classified examples within a batch. Similarly, in visual place recognition, we apply this analogy by assessing how well the neural network distinguishes between images of a place X and images of the other places within the mini batch.
For example, when we use the triplet margin loss function, we form triplets of images (A, P, N) where A is an anchor image, P is the positive image that represents the same place as A, and N is a negative image that represents a different place. We want the network to generate similar representations for A and P and a dissimilar representation for N. We do so by imposing a margin-based triplet constraint to encourage the network to learn representations that minimize the distance between the positive pair (A,P) while maximizing the distance between the negative pair (A,N), BUT ONLY when the representation of the negative image N is within a certain distance m compared to the anchor image A. This is expressed mathematically as minimize{ d(A,P) - d(A,N) + m }.
So, when the representation of the negative image N is beyond the margin m, it's deemed uninformative, which means the network already distinguishes it very well from the positive one. Therefore we don't take it into account when calculating the loss value (we don't mine it, so it's not counted in nb_mined
).
Now that being said, nb_mined
will contain the number of mined samples (number of informative samples), and so nb_mined/nb_samples
is the fraction of informative samples within the mini-batch (the difficult samples that will be used to calculate the final value of the loss). To find the number of uninformative samples (the samples that the network easily distinguishes, same analogy as classification) we just take the complementary to 1, i.e., 1.0 - (nb_mined/nb_samples)
from mixvpr.
Thank you for your help, I understand a little bit. "Uninformative samples" refer to the samples that the network can easily distinguish, which is equivalent to correctly classified samples in classification. "Informative samples" refer to the difficult samples, which are equivalent to samples that are not correctly classified in classification. Is that right?
However, I have one question. When a sample is classified as "informative", it only means that the model finds it difficult to distinguish, but it does not necessarily mean that the model cannot distinguish it correctly. Therefore, using "nb_mined" to represent the number of incorrectly predicted samples in the mini-batch may not be precise?
from mixvpr.
miner_outputs = self.miner(descriptors, labels)
Line 120 in 31de0c3
How to understand the return value "miner_outputs" of "self.miner(descriptors, labels)" in Python?
During the execution, I found that "miner_outputs" is a tuple containing four elements: the first and second are "tensor(598,)" and the third and fourth are "tensor(37225,)". I understand the mathematical principle of the Multi-Similarity Loss, but I just can't understand the result of "miner_outputs" and how to calculate the loss based on it. Could you help me with this?
from mixvpr.
Hello @wpumain,
Sorry for the late answer,
Q: "However, I have one question. When a sample is classified as "informative", it only means that the model finds it difficult to distinguish, but it does not necessarily mean that the model cannot distinguish it correctly. Therefore, using "nb_mined" to represent the number of incorrectly predicted samples in the mini-batch may not be precise?"
A: nb_mined
is not utilized in the learning process, which means it does not contribute to the loss function. Instead, it provides valuable insights into the model's behavior. This is significant because if the model struggles at the batch level, it is likely to have difficulty making accurate distinctions globally. Essentially, nb_mined
serves as an indicator that the model is learning something, but it does not have a direct impact on the learning process. If deisred, you can come up with a different formula or simply ignore nb_mined
.
from mixvpr.
miner_outputs = self.miner(descriptors, labels)
Line 120 in 31de0c3
How to understand the return value "miner_outputs" of "self.miner(descriptors, labels)" in Python?
During the execution, I found that "miner_outputs" is a tuple containing four elements: the first and second are "tensor(598,)" and the third and fourth are "tensor(37225,)". I understand the mathematical principle of the Multi-Similarity Loss, but I just can't understand the result of "miner_outputs" and how to calculate the loss based on it. Could you help me with this?
These numbers represent the number of positive and negative pairs respectively, that were identified as informative by the miner.
from mixvpr.
Think you
from mixvpr.
Related Issues (20)
- Really a good work with simple but effective approach!
- Training loss and generalization during test HOT 4
- Just for testing HOT 2
- A singleGPU will run the results, but multiple GPUs will make an error!! HOT 3
- can you provide more detailed comparative data? HOT 2
- optimizer step HOT 2
- question about backbone ‘Swin’ HOT 5
- Multi-similarity mining on Pittsburgh30k training set HOT 6
- how to change query image shape? HOT 1
- The error samples are due to issues with the ground truth annotations rather than errors in the model predictions. HOT 6
- Dataset HOT 4
- About the specific number of images of the Mapillary Challenge dataset
- would you release the resnet18 pretrained model?
- How to evaluate this model?
- Some questions
- About pitts30k_val.mat. I can not find it in PittsburgDataset HOT 1
- License File
- Releasing the model on torch.hub?
- On releasing the trained weights of ResNet50 with 2048 dimensionality.
- pose estimation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mixvpr.