apsdehal / flava-tutorials Goto Github PK
View Code? Open in Web Editor NEWTutorials for FLAVA model https://arxiv.org/abs/2112.04482
Tutorials for FLAVA model https://arxiv.org/abs/2112.04482
Hi thanks for the tutorial !
Is there any source of notebook introduced by CLIP mentioned in Readme?
Hi,
Thank you for uploading these notebooks for FLAVA, they have been very helpful. I'm opening this issue because there is one thing I'm a bit confused about. Running winoground-flava-example.ipynb example, I find that the image-text match outputs change depending on the input length (padding) and thus also on the batch size.
Under the header 'Look at an example from Winoground and get the image-caption scores from FLAVA', the text and image of Winograd sample 155 are processed with max_length=77, padding=True
:
inputs_c0_i0 = flava_processor(text=[winoground[155]["caption_0"]], images=[winoground[155]["image_0"].convert("RGB")], return_tensors="pt", max_length=77, padding=True, return_codebook_pixels=True, return_image_mask=True).to("cuda")
According to the HuggingFace documentation, padding=True
pads to the max sequence in the batch. That means that for this sample with a batch size of 1, no padding is applied. Inspecting the output of the processor, we find that indeed, no padding is applied.
Now in the next section of the notebook ('Get FLAVA image-caption scores from the whole dataset'), padding is set as padding="max_length", max_length=77
, causing all tokenized inputs to have length 77. This yields the following scores:
contrastive text score: 0.2525
contrastive image score: 0.135
contrastive group score: 0.09
itm text score: 0.3225
itm image score: 0.1975
itm group score: 0.14
Changing the settings here to padding=True
(removing padding and leaving everything else as it was), the scores now become:
contrastive text score: 0.2525
contrastive image score: 0.135
contrastive group score: 0.09
itm text score: 0.2125
itm image score: 0.1025
itm group score: 0.0475
We observe that the contrastive scores remain the same, but the itm scores have changed. Going back to sample 155, we find that going from padding=True
(no padding) to padding="max_length"
drastically changes the itm-score for each (image, text) pair.
FLAVA itm image-text match scores (no padding):
image_0, caption_0: 0.5821857452392578
image_0, caption_1: 0.6948502063751221
image_1, caption_0: 0.5699254274368286
image_1, caption_1: 0.7856388092041016
FLAVA itm image-text match scores (padding to max_length=77):
image_0, caption_0: 0.9999473094940186
image_0, caption_1: 0.9999871253967285
image_1, caption_0: 0.9997100234031677
image_1, caption_1: 0.9999109506607056
What could be the cause of this, and would this mean that we cannot use FLAVA with HuggingFace for batch processing (since that will add padding)?
Hi, thanks for your very good work and examples. Just wondering if there is any plan to make the 6th point available.
Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.