gupta-abhay / pytorch-vit Goto Github PK

View Code? Open in Web Editor NEW

280.0 9.0 33.0 4.93 MB

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Home Page: https://arxiv.org/abs/2010.11929

License: MIT License

Python 100.00%

image-recognition transformers image-classification vit hybrid-vit vision-transformer

pytorch-vit's Issues

Patch the image

How do you patch the image? any clues for the preprocessing and training step?

adjust_learning_rate can't import

In the train.py there the code include "from vit.utils import ( adjust_learning_rate)" but no adjust_learning_rate in the util.py

The reshape operation while using conv_patch_representation has some mistake.

https://github.com/gupta-abhay/ViT/blob/fcc17638d0f4d661af19128871345b01a800631c/vit/models/ViT.py#L99
I guess the self.flatten_dim in this line should be replaced with embedding_dim.

The size of embedding dim

This work is very interested and fascinating. I have a question : the size of embedding size and how you decide it?
Look forward to the release of pretrained model.

Using ViT

Hello Gupta!

Being new to vision tasks, can you share just a small snippet that can show how we can use pytorch-vit in downwards vision tasks like image retrieval etc. Thanx

NameError: name 'lib' is not defined

Working of FixedPositionalEncoding and LearnedPositionalEncoding?

I get the part where Image is split into P say 16x16 smaller image patches and then you have to Flatten the 3D patch to pass it into a Linear layer to get what they call Liner Projection. Can you please explain how the two types of Embeddings are working. Looked at your code too and looked like a maze to me. If you could just explain in Laymen's terms, I'll look at the code again and understand.
Thanks

Pretrained models

Hi, will you release any pre-trained models?

Thank you

ModuleNotFoundError: No module named 'Transformer'

layout for vit: NCHW or NHWC? bugpropagation

hello,
Thanks a lot for this very interesting work!

when you unroll the tensor you use unfold and flatten like this:

x = (x.unfold(2, self.patch_dim, self.patch_dim).
unfold(3, self.patch_dim, self.patch_dim).contiguous())
x = x.view(x.size(0), -1, self.flatten_dim)

but if x is in shape N,C,H,W, unrolling ends up with N,C,H//P,W//P,P,P and therefore flattening ends up mixing data from different channels. It means your "words" come from different blocks in space. It does not really matter for training your model with one specific size, but i think it will have hard time to transfer to a different size...

instead you could do like this:

self.flatten_dim_in = (patch_dim**2) * in_channels
...
x = (x.unfold(2, self.patch_dim, self.patch_dim)
         .unfold(3, self.patch_dim, self.patch_dim) .contiguous())
x = x.view(b,c,-1,self.patch_dim**2)
x = x.permute(0,2,3,1).contiguous()
x = x.view(x.size(0), -1, self.flatten_dim_in)

Just to make sure the data at the end is really what you expect: all the rgb pixels of one patch together, and not a mix of patches together.

Now i haven't tried your code yet so perhaps you have a different layout than N,C,H,W for images?

Why only use the first patch? Thanks

I don't understand the line 74 of ViT.py:
x = self.to_cls_token(x[:, 0])
If the first dimension of x is batch, then the 2nd dimension 0 should be patch, as the dimension of x should be [batch, patch, feature]. Does it mean only the first patch is used? Could anybody help me on this? Thanks a lot.

An error occurs in build_model.py

''model = models.dict [args.model] (num_classes=args.num_classes)'', 'module' object is not callable.

gupta-abhay / pytorch-vit Goto Github PK

pytorch-vit's Issues

Patch the image

adjust_learning_rate can't import

The reshape operation while using conv_patch_representation has some mistake.

The size of embedding dim

Using ViT

NameError: name 'lib' is not defined

Working of FixedPositionalEncoding and LearnedPositionalEncoding?

Pretrained models

ModuleNotFoundError: No module named 'Transformer'

layout for vit: NCHW or NHWC? bugpropagation

Why only use the first patch? Thanks

An error occurs in build_model.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs