ayulockin / swav-tf Goto Github PK

TensorFlow implementation of "Unsupervised Learning of Visual Features by Contrasting Cluster Assignments".

Home Page: https://app.wandb.ai/authors/swav-tf/reports/Unsupervised-Visual-Representation-Learning-with-SwAV--VmlldzoyMjg3Mzg

License: Apache License 2.0

Jupyter Notebook 99.90% Python 0.10%

tensorflow2 keras tensorflow representation-learning self-supervised-learning augmentation contrastive-learning

swav-tf's Introduction

Hello, I'm Ayush Thakur 👋

🔭 I’m currently working at Weights and Biases as a Machine Learning Engineer.
📣 I am a Google Developer Expert in Machine Learning (TensorFlow Core).
🌻 I am a Kaggle Notebooks Master. Here's my Kaggle profile.
🌱 I create contents on deep learning. Find some of my work here.
🪂 I build MLOps pipeline for open-source repositories like Keras, OpenMMLab repos, Meta repo, etc.
🌞 Currently interested in Unsupervised Visual Representation Learning.
👯 I would love to collaborate on any computer vision project. (It should not be face detection)
✨ I love good discussions. Best way to reach me: Twitter - @ayushthakur0
😄 Pronouns: He/His
⚡ Fun fact: I love watching anime. (Naruto is all time fav. One Piece is love. Jujutsu Kaisen is 🔥. I can keep talking...)

Show some ❤️ by starring some of the repositories!

swav-tf's People

Contributors

Stargazers

Watchers

Forkers

yuuxii aturkelson mldl widemeadows techthiyanes huangjin1995 yuliianikolaenko mnansary sc-shrestha paolocambria gchetty bjornkellner

swav-tf's Issues

Fine-tuning with 10% labeled data with SwAV-learned embeddings

@ayulockin

From the fine-tuning notebooks (10 epochs and 40 epochs) I have the following observations:

With SwAV embeddings from 10 epochs, the model tends to have a pretty large overfit margin (note that this after the final fine-tuning is done):

Final progress (with EarlyStopping) -

loss: 0.5366 - acc: 0.8120 - val_loss: 1.8524 - val_acc: 0.5000

With the same embeddings along with augmentation, the model seems to recover the large gap -

Final progress (with EarlyStopping) -

loss: 0.9433 - acc: 0.6104 - val_loss: 1.2685 - val_acc: 0.5455

Going in the same order of experiments, with embeddings from 40 epochs, the following is what we get after the final fine-tuning (without any augmentation) -

Final progress -

loss: 1.3076 - acc: 0.5613 - val_loss: 1.7242 - val_acc: 0.4800

Note that the model does not suffer from large overfitting gap in this case.

With augmentation, we get -

Final progress -

loss: 0.9239 - acc: 0.6621 - val_loss: 1.3685 - val_acc: 0.5291

This performance is almost similar to what we got in this setting with the embeddings from 10 epochs.

Copyright information

Hi there,
I've noticed that the code files do not seem to declare a copyright header. What's the copyright information on this code?

Linear evaluation on the entire dataset with pre-trained ResNet50 features

Classify a single image

Hello :)
How could I predict the label of a single new image after the SwAV training and linear evaluation ?
Thank you!

Pointers on Train_Step_And_Loss.ipynb

We need to trace what is non-trainable and what is. In the main SwAV code, all the variables with no_grad tag are basically non-trainable.
Did not get this part - "## crossentropy loss between code and p, assuming that code is to be precited from the assigned cluster. if wrong then logits will be label and vice versa". Are you meaning that the logits and labels in criterion will be swapped? If so, how?
I think it might be even better to just replicate the following as the authors have done in here. What do you think?
Two normalizations. First, they normalize the embeddings they get from the RN50 backbone, then they pass it through a linear layer (prototype). While training they again normalize this prototype vector.

[QUESTION] Difference 10-epochs, 40-epochs

Hi Ayush and Sayak,
I read your blog on wandb and find it interesting to see the performance on a relatively small dataset. Great work!

I am trying to reproduce your results with the official swAV-implementation. Though I only reach the 10-epochs plateau (~2.7).

I was wondering what are the difference between both runs? I found the following, did I miss any?

projection to prototype architecture
PolynomialDecay
... further?

Just curious, if you achieved the same results with the original pytorch implementation?

Thanks in advance :)

image loaders don't produce augmentations from the same image

Hi, I was looking into the notebook initial_notebooks/MultiCropDataset_Architecture.ipynb

and I am curious, when visualizing the images from im1, im2, im3 tensors
why do they contain different images,
for instance, I would expect
im1[0], im2[0], im3[0]. to be different augmentations from the same image, but this is not the case here. I am probably getting something wrong.

model loss plateau's after 2 epochs

Hi, first off, thanks for the wonderful effort in converting this code from pyT to TF ..i was running your baseline model and looks like post epoch 2 where the loss is around 2.4 , the loss either doesn't go down or goes down almost negligibly till about epoch 10 .. and even at 10 its about 2.35/2.36 .. in your experiments have you noticed them go down in a 40 epoch run ? logically it doesn't make sense that once it plateaus more epochs can solve the problem ..

i am planning to use this for a totally different domain and dataset ( which of course, i will post on a public link AND ref ur work ) and would appreciate any thoughts

Pointers on Minimal_Sinkhorn_Knopp.ipynb

This notebook presents a minimal implementation for optimal transport using the Sinkhorn Knopp algorithm. In the context of SwAV this is basically needed in order to compute the cluster assignments from prototypes.

Referred from the following:

(A.1 from the SwAV paper)