GithubHelp home page GithubHelp logo

attngrounder's Introduction

AttnGrounder: Talking to Cars with Attention

AttnGrounder: Talking to Cars with Attention by Vivek Mittal.

Accepted at ECCV'20 C4AV Workshop. Talk2Car dataset used for this paper is available at https://talk2car.github.io/.

Model Overview

complete_model

Abstract:

We propose Attention Grounder (AttnGrounder), a singlestage end-to-end trainable model for the task of visual grounding. Visual grounding aims to localize a specific object in an image based on a given natural language text query. Unlike previous methods that use the same text representation for every image region, we use a visual-text attention module that relates each word in the given query with every region in the corresponding image for constructing a region dependent text representation. Furthermore, for improving the localization ability of our model, we use our visual-text attention module to generate an attention mask around the referred object. The attention mask is trained as an auxiliary task using a rectangular mask generated with the provided ground-truth coordinates. We evaluate AttnGrounder on the Talk2Car dataset and show an improvement of 3.26% over the existing methods.

Attention Map in Action

attention_map

Usage

Preprocessed Talk2Car data is available at this link extract it under ln_data folder. Download the images following instruction given at this link. Extract all the images in ln_data\images folder. All the hyperparameters are set, just run the following command in working directory (if you face memory issue try decreasing the batch size).

python train_yolo.py --batch_size 14

Credits

Part of the code or models are from DMS, MAttNet, Yolov3, Pytorch-yolov3 and One Stage Grounding.

attngrounder's People

Contributors

vk-mittal14 avatar

Stargazers

 avatar kiitaamuuraa avatar

Watchers

 avatar

attngrounder's Issues

corpus file

corpus file has a problem
it gives the following error :
Traceback (most recent call last):
File "", line 1, in
File "/home/nour/anaconda3/envs/dataset2/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/nour/anaconda3/envs/dataset2/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/home/nour/anaconda3/envs/dataset2/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'spacy.lemmatizer'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.