GithubHelp home page GithubHelp logo

google / localized-narratives Goto Github PK

View Code? Open in Web Editor NEW
76.0 10.0 14.0 9.63 MB

Localized Narratives

Home Page: https://google.github.io/localized-narratives/

License: Apache License 2.0

HTML 81.29% Python 18.71%
computer-vision image-captioning speech-analysis

localized-narratives's Introduction

Localized Narratives

Visit the project page for all the information about Localized Narratives, data downloads, visualizations, and much more.

localized-narratives's People

Contributors

jponttuset avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localized-narratives's Issues

Controlled Captioning Baseline

Hi, thank you for your impressive work!
I'm currently building my captioning model on this dataset. And I'm wondering whether can you provide your Controlled Image Captioning baseline code? I believe that will lead to more fair comparable works on this topic and will boost the influence of this dataset.

Dataset Download

Hi @jponttuset

I am unable to download the raw voice recordings in the dataset. There is no such path, as mentioned in the documentation.

https://storage.googleapis.com/localized-narratives
While accessing the above URL, I am getting the following error:

<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Details>No such object: localized-narratives/voice-recordings/</Details>
</Error>

trace data

Hi, could you explain why the 'traces' is organized as List[List[TimePoint]] instead of List[TimePoint]?

Error in coco data

Hi,
While I run the demo.py file, I found a data error in coco val dataset.
In coco_val_localized_narratives.jsonl , line 4414, the case lacks traces data.

`{
"dataset_id":"mscoco_val2017",
"image_id":"381639",
"annotator_id":89,
"caption":"In the image we can see girl standing and holding a doll in her hand. These are the road cone. There are even other people who are getting into airplane, there is a building. This is a tree and sky.",
"timed_caption":[
{
"utterance":"In the",
"start_time":0,
"end_time":0
},
{
"utterance":"image",
"start_time":0,
"end_time":4.4
},
{
"utterance":"we",
"start_time":4.4,
"end_time":5
},
{
"utterance":"can",
"start_time":5,
"end_time":5.3
},
{
"utterance":"see",
"start_time":5.3,
"end_time":5.5
},
{
"utterance":"girl",
"start_time":5.5,
"end_time":6.3
},
{
"utterance":"standing",
"start_time":6.3,
"end_time":6.9
},
{
"utterance":"and",
"start_time":6.9,
"end_time":7.6
},
{
"utterance":"holding",
"start_time":7.6,
"end_time":8.1
},
{
"utterance":"a",
"start_time":8.1,
"end_time":8.2
},
{
"utterance":"doll",
"start_time":8.2,
"end_time":8.5
},
{
"utterance":"in",
"start_time":8.5,
"end_time":9.1
},
{
"utterance":"her",
"start_time":9.1,
"end_time":9.3
},
{
"utterance":"hand.",
"start_time":9.3,
"end_time":9.5
},
{
"utterance":"These",
"start_time":9.5,
"end_time":10.5
},
{
"utterance":"are",
"start_time":10.5,
"end_time":10.7
},
{
"utterance":"the",
"start_time":10.7,
"end_time":10.9
},
{
"utterance":"road",
"start_time":10.9,
"end_time":11.2
},
{
"utterance":"cone.",
"start_time":11.2,
"end_time":11.6
},
{
"utterance":"There",
"start_time":11.6,
"end_time":12.2
},
{
"utterance":"are",
"start_time":12.2,
"end_time":12.3
},
{
"utterance":"even",
"start_time":12.3,
"end_time":12.8
},
{
"utterance":"other",
"start_time":12.8,
"end_time":13
},
{
"utterance":"people",
"start_time":13,
"end_time":13.5
},
{
"utterance":"who",
"start_time":13.5,
"end_time":13.5
},
{
"utterance":"are",
"start_time":13.5,
"end_time":14
},
{
"utterance":"getting",
"start_time":14,
"end_time":14.5
},
{
"utterance":"into",
"start_time":14.5,
"end_time":15.2
},
{
"utterance":"airplane,",
"start_time":15.2,
"end_time":15.8
},
{
"utterance":"there",
"start_time":15.8,
"end_time":16.7
},
{
"utterance":"is",
"start_time":16.7,
"end_time":16.9
},
{
"utterance":"a",
"start_time":16.9,
"end_time":17.1
},
{
"utterance":"building.",
"start_time":17.1,
"end_time":17.6
},
{
"utterance":"This",
"start_time":17.6,
"end_time":18.5
},
{
"utterance":"is",
"start_time":18.5,
"end_time":18.8
},
{
"utterance":"a",
"start_time":18.8,
"end_time":18.9
},
{
"utterance":"tree",
"start_time":18.9,
"end_time":19.4
},
{
"utterance":"and",
"start_time":19.4,
"end_time":19.9
},
{
"utterance":"sky.",
"start_time":19.9,
"end_time":20.2
}
],
"traces":[

],
"voice_recording":"coco_val/coco_val_381639_89.ogg"

}`

Data download link broken

https://google.github.io/localized-narratives/

The download for all datasets are broken.

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access denied.</Message>
<Details>Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.</Details>
</Error>

Question about controlled caption evaluation

Hi, I have a question about the evaluation process.
Since in the validation set, an image may have multiple traces according to your annotation. Which trace is taken as input? If we take each trace as an individual item to evaluate, we will get multiple hypotheses for a single image id, which is inconsistent with the default setting in the MS coco caption evaluation tool.
A more detailed explanation on how to reproduce the reported results is a great help.
Many thanks for considering my request.

Annotation Code available

Hi,

would you have the annotation codebase available? I would like to collect similar data for my work.

Thanks

Every word should have an individual time span.

I quote from the paper that "Note that µ assigns each ai to exactly one mj , but mj can match to zero or multiple words in a". and refer you to the formal definition for how the start_time and end_time is derived for a word mj. This stands to reason that every individual word should have its own individual time span. However, in the sample data

{ dataset_id: 'mscoco_val2017', image_id: '137576', annotator_id: 93, caption: 'In this image there are group of cows standing and eating th...', timed_caption: [{'utterance': 'In this', 'start_time': 0.0, 'end_time': 0.4}, ...], traces: [[{'x': 0.2086, 'y': -0.0533, 't': 0.022}, ...], ...], voice_recording: 'coco_val/coco_val_137576_93.ogg' }

it is shown that two words 'In this' shares the same time window which is contrary to how the paper describes the start_time and end_time would be assigned... that is to individual words.

Yet, it is correct in other parts of the dataset such as the following

{ dataset_id: ADE20k, image_id: ADE_val_00000175, annotator_id: 125, caption: In this image on the left side I can see a bed and a window...., timed_caption: [{'utterance': 'In', 'start_time': 0.0, 'end_time': 0.0}, {'utterance': 'this', 'start_time': 0.0, 'end_time': 0.8}, ...], traces: [[{'x': 0.6408, 'y': 0.1371, 't': 0.013}, ...], ...], voice_recording: ade20k_validation/ade20k_validation_ADE_val_00000175_125.ogg }

where 'In' and 'this' each have their own start_time and end_time.

I would appreciate if you could help me shed light on this disparity. Sorry if this is a mistake on my part and I thank you for reading my message.

Yours Sincerely,
Gordon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.