Going in the same order of experiments, with embeddings from 40 epochs, the following is what we get after the final fine-tuning (without any augmentation) -
We need to trace what is non-trainable and what is. In the main SwAV code, all the variables with no_grad tag are basically non-trainable.
Did not get this part - "## crossentropy loss between code and p, assuming that code is to be precited from the assigned cluster. if wrong then logits will be label and vice versa". Are you meaning that the logits and labels in criterion will be swapped? If so, how?
I think it might be even better to just replicate the following as the authors have done in here. What do you think?
Two normalizations. First, they normalize the embeddings they get from the RN50 backbone, then they pass it through a linear layer (prototype). While training they again normalize this prototype vector.
Hi, I was looking into the notebook initial_notebooks/MultiCropDataset_Architecture.ipynb
and I am curious, when visualizing the images from im1, im2, im3 tensors
why do they contain different images,
for instance, I would expect
im1[0], im2[0], im3[0]. to be different augmentations from the same image, but this is not the case here. I am probably getting something wrong.
Hi, first off, thanks for the wonderful effort in converting this code from pyT to TF ..i was running your baseline model and looks like post epoch 2 where the loss is around 2.4 , the loss either doesn't go down or goes down almost negligibly till about epoch 10 .. and even at 10 its about 2.35/2.36 .. in your experiments have you noticed them go down in a 40 epoch run ? logically it doesn't make sense that once it plateaus more epochs can solve the problem ..
i am planning to use this for a totally different domain and dataset ( which of course, i will post on a public link AND ref ur work ) and would appreciate any thoughts
This notebook presents a minimal implementation for optimal transport using the Sinkhorn Knopp algorithm. In the context of SwAV this is basically needed in order to compute the cluster assignments from prototypes.