GithubHelp home page GithubHelp logo

feature maps about netvlad HOT 1 CLOSED

relja avatar relja commented on August 18, 2024
feature maps

from netvlad.

Comments (1)

Relja avatar Relja commented on August 18, 2024

Hi,

Please refer to the paper as it seems you don't quite understand the gist of what is NetVLAD trying to do and how it is computed, everything is explained in the paper:
https://arxiv.org/pdf/1511.07247.pdf

No the final output is not a score for each class, there are no classes (at least in this case - if you wanted, you could always add an FC and train it for classification). NetVLAD produces a representation for the input image (in slightly older terms, a "global descriptor"). You could produce a representation for an image by extracting for example fc7 features, or by average pooling or max pooling over all spatial locations after a conv layer, or use NetVLAD to do learnt pooling of the conv features.

Equations 3 and 4 (and figure 2) of the paper exactly define what NetVLAD is doing. The input feature map is HxWxD and you treat that as N=W*H vectors("local descriptors"), where each vector is size D. To get the soft assignments (equation 3), you do a per-vector linear transformation (i.e. a 1x1xDxK conv where K is the number of "clusters", the parameter of the method, producing N vectors of dimensionality K) and softmax (again on a per-vector basis). So it's a 1x1 convolution which goes from D input channels to K output channels.

Then, follow equation 4 to produce the final output, i.e. for each (vector i, cluster k) multiply the soft assignment weight (x_i,k) with the residual x_i - c_k, and for each k aggregate over all vectors i. I understand the last sentence is not that clear as it is hard to explain in 1 line, but if you just look at equation 4, it tells you exactly what you need to know. The final output (don't forget the normalizations, also explained in the paper) is a vector of dimensionality D*K.

Best,
Relja

from netvlad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.