GithubHelp home page GithubHelp logo

VQA input construction about ofa HOT 1 CLOSED

ofa-sys avatar ofa-sys commented on August 26, 2024
VQA input construction

from ofa.

Comments (1)

yangapku avatar yangapku commented on August 26, 2024 1

We use decoder_prompts and prefix_tokens for better VQA finetuning performance. Specifically, for VQA we have an hyper-parameter option called --prompt-type, which determines whether to add the question before the answer in the input sequence of the decoder during finetuning & evaluation. The question has already been input in the encoder, here we consider whether to feed it into the decoder again. If the --prompt-type is not none, then the decoder_prompts and prefix_tokens will record the prepended question to construct the decoder input sequence during evaluation. The decoder_prompts is used for all-candidate evaluation and the prefix_tokens is used for beam-search generative evaluation. In our experiments, we found concatenating the question with the answer in the decoder input sequence improves the accuracy somewhat, compared with not performing concatenation.

For the other question, note that in the original VQAv2 dataset, most questions are annotated with more than one ground-truth answers. However, OFA is a seq2seq model which requires one source sequence (image & question) paired with only one target sequence (ground-truth answer) during training. In this case, we split the original sample with one question paired with more than one answers into multiple seq2seq samples, each consists of the question paired with one of the ground-truth answer.

from ofa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.