GithubHelp home page GithubHelp logo

flava-tutorials's People

Contributors

apsdehal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flava-tutorials's Issues

Reference from CLIP

Hi thanks for the tutorial !
Is there any source of notebook introduced by CLIP mentioned in Readme?

ITM output changes due to padding

Hi,

Thank you for uploading these notebooks for FLAVA, they have been very helpful. I'm opening this issue because there is one thing I'm a bit confused about. Running winoground-flava-example.ipynb example, I find that the image-text match outputs change depending on the input length (padding) and thus also on the batch size.

Under the header 'Look at an example from Winoground and get the image-caption scores from FLAVA', the text and image of Winograd sample 155 are processed with max_length=77, padding=True:

inputs_c0_i0 = flava_processor(text=[winoground[155]["caption_0"]], images=[winoground[155]["image_0"].convert("RGB")], return_tensors="pt", max_length=77, padding=True, return_codebook_pixels=True, return_image_mask=True).to("cuda")

According to the HuggingFace documentation, padding=True pads to the max sequence in the batch. That means that for this sample with a batch size of 1, no padding is applied. Inspecting the output of the processor, we find that indeed, no padding is applied.

Now in the next section of the notebook ('Get FLAVA image-caption scores from the whole dataset'), padding is set as padding="max_length", max_length=77, causing all tokenized inputs to have length 77. This yields the following scores:

contrastive text score: 0.2525
contrastive image score: 0.135
contrastive group score: 0.09
itm text score: 0.3225
itm image score: 0.1975
itm group score: 0.14

Changing the settings here to padding=True (removing padding and leaving everything else as it was), the scores now become:

contrastive text score: 0.2525
contrastive image score: 0.135
contrastive group score: 0.09
itm text score: 0.2125
itm image score: 0.1025
itm group score: 0.0475

We observe that the contrastive scores remain the same, but the itm scores have changed. Going back to sample 155, we find that going from padding=True (no padding) to padding="max_length" drastically changes the itm-score for each (image, text) pair.

FLAVA itm image-text match scores (no padding):
image_0, caption_0: 0.5821857452392578
image_0, caption_1: 0.6948502063751221
image_1, caption_0: 0.5699254274368286
image_1, caption_1: 0.7856388092041016

FLAVA itm image-text match scores (padding to max_length=77):
image_0, caption_0: 0.9999473094940186
image_0, caption_1: 0.9999871253967285
image_1, caption_0: 0.9997100234031677
image_1, caption_1: 0.9999109506607056

What could be the cause of this, and would this mean that we cannot use FLAVA with HuggingFace for batch processing (since that will add padding)?

Finetuning example

Hi, thanks for your very good work and examples. Just wondering if there is any plan to make the 6th point available.

  1. Fine-tune on custom task [to be added soon]

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.