GithubHelp home page GithubHelp logo

Comments (10)

drewm1980 avatar drewm1980 commented on July 20, 2024 1

Thanks Gabriele, it does!

RE the question of whether we're numerically testing the equivariance of our model, I have an anecdote for you... we once had a customer rotate the object we were analyzing and report the variation in the output of our algorithm as a bug. They didn't have a way of testing if our numbers were actually correct but it was really easy for them to test if our algorithm was rotation invariant!

from e2cnn.

Gabri95 avatar Gabri95 commented on July 20, 2024

Hi @drewm1980

What you say is right. I think you are looking for these:

Upsampling supports different options though, in practice, we found bilinear interpolation to work better.
We implemented different downsampling algorithms, as operations like max-pooling are not always compatible with any type of representation.
In case you use scalar fields, regular fields or quotient fields, you can use (pointwise) max-pooling, which acts as usual in Pytorch.
Average pooling is always compatible with any representation.
To get better stability, I would suggest using the antialiased versions of the downsampling methods.

Is this what you were looking for?

Best,
Gabriele

from e2cnn.

drewm1980 avatar drewm1980 commented on July 20, 2024

Hi Gabriele,

Thanks for the response. I read more of the code... R2Conv seems like it is closely related, though it seems to use trainable weights, instead of a fixed anti-alias filter.

My first forays with this will indeed be with scalar fields, although I already have applications in mind for vector fields (as outputs).

I would like to only use equivariant operators unless there is a good reason not to. Equivariance is why I'm here after all :) My understanding is that max pooling and average pooling (presumably over rectangular regions) breaks equivariance, but I will look further into the antialiased versions of the operators you mention.

Do you know of published networks doing down and up sampling based on this toolbox? I would not at all be surprised if someone got to this before me.

from e2cnn.

drewm1980 avatar drewm1980 commented on July 20, 2024

PointwiseAvgPoolAntialiased looks like the go-to operator for correct (antialiased) downsampling. "Antialiased channel-wise average-pooling: each channel is treated independently. It performs strided convolution with a Gaussian blur filter." I am curious why you compute a gaussian blur kernel rather than using the "Tri-3" or "Bin-5" filters from the "Making convolutions shift invariant again" paper. Is it closer to being a radial function at small filter sizes since it's not constrained to integers by construction? It's certainly more generic. To do 2X downsampling (stride=2), is sigma=1 a good starting point? That results in a 7x7 kernel.

from e2cnn.

Gabri95 avatar Gabri95 commented on July 20, 2024

Hi @drewm1980

Indeed, I think antialiased average pooling is what you are looking for.
In many cases, though, it seems like using a stride > 1 in your previous convolutional layer is already good enough since the learnable convolutional filters are already quite smooth (we use a band-limited basis). This, of course, depends on the specific case and on how perfectly equivariant you want the model to be.

In the anti-aliased pooling, we use Gaussian filters for simplicity since they are perfectly rotation invariant (analytically, i.e. in the continuous domain). I guess that after discretization, there is little difference between using their filters and a Gaussian blur, though we did not experiment with them.

Regarding the downsampling, it does depend on you task. Using larger filters (larger sigma) is, of course, giving more stable results but will also result in less expressive networks (you will smooth your features too much and drop too much high-frequency information). I think you should experiment with different sizes. I often found 5x5 filters (so sigma = 2/3) to be good enough to give acceptable results. If you do not care about numerically testing the equivariance of your model but only aim for higher performance, I think you can often just use strided convolution (as suggested above), as it requires fewer computations.

Hope this answers your question!

Best,
Gabriele

from e2cnn.

page200 avatar page200 commented on July 20, 2024

Some info from this thread might be missing in the documentation of the pooling layers. In their documentation, it seems not obvious what kind of equivariance they have. Or is that implied by something?

from e2cnn.

Gabri95 avatar Gabri95 commented on July 20, 2024

Hi @page200

Are you referring my first message? In particular:

We implemented different downsampling algorithms, as operations like max-pooling are not always compatible with any type of representation.
In case you use scalar fields, regular fields or quotient fields, you can use (pointwise) max-pooling, which acts as usual in Pytorch.
Average pooling is always compatible with any representation.

The documentation of PointwiseMaxPooling mentions this:

Notice that not all representations support this kind of pooling. In general, only representations which support pointwise non-linearities do.

If you refer to the comments on anti-aliasing, I agree these are not really discussed enough in the docs.
I will update it with some additional notes about it, thanks for pointing this out!

Best,
Gabriele

from e2cnn.

page200 avatar page200 commented on July 20, 2024

On one hand yes, about anti-aliasing. On the other hand, the docstrings don't make it obvious which layers have what kind of equivariance. Maybe instead of "max-pooling" in the first sentence of each layer's description you could write something like "G-equivariant max-pooling, where G is ...". Thanks!

from e2cnn.

Gabri95 avatar Gabri95 commented on July 20, 2024

In that sense, max pooling is supposed to be equivariant to any group G.
Of course, this is in practice not true since max pooling breaks equivariance to continuous rotations.
Indeed, max pooling can be perfectly equivariant only to 90 degrees rotations and reflections (like all operations in the library) since these are the only perfect symmetries of the grid.
Is this what you meant?

Best,
Gabriele

from e2cnn.

page200 avatar page200 commented on July 20, 2024

I meant that, and I meant another thing: The docstring of the layer doesn't state yet whether the layer is equivariant. And if it is equivariant under some group G, which input variable contains the info (in what format) regarding what the current G is? The docstring should start with something like "G-equivariant max-pooling, where G is given by ...".

from e2cnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.