GithubHelp home page GithubHelp logo

husthuaan / aat Goto Github PK

View Code? Open in Web Editor NEW
49.0 5.0 13.0 106 KB

Code for paper "Adaptively Aligned Image Captioning via Adaptive Attention Time". NeurIPS 2019

Home Page: https://arxiv.org/abs/1909.09060

License: MIT License

Python 98.01% Shell 1.13% HTML 0.86%
image-captioning seq2seq attention-mechanism neurips-2019 neurips

aat's Introduction

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

aat's People

Contributors

a95279527 avatar husthuaan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aat's Issues

Reg Training time

Hi,

Thanks for sharing your code here.

Can you please tell on what type of GPUs you training your model, how much time it took to complete one epoch and the number of epochs till you run your model?

Regards
Deepak Mittal

计算SPICE score报错Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before startin

你好!
我在AAT里面运行AOA模型的代码,运用下面的代码:
python3.6 train.py --id aoa
--batch_size 10
--beam_size 1
--max_epochs 25
--caption_model aoa
--refine 1
--refine_aoa 1
--use_ff 0
--decoder_type AoA
--use_multi_head 2
--num_heads 8
--multi_head_scale 1
--mean_feats 1
--ctx_drop 1
--dropout_aoa 0.3
--label_smoothing 0.2
--input_json data/cocotalk.json
--input_label_h5 data/cocotalk_label.h5
--input_fc_dir data/cocobu_fc
--input_att_dir data/cocobu_att
--input_box_dir data/cocobu_box
--seq_per_img 5
--learning_rate 2e-4
--num_layers 2
--input_encoding_size 1024
--rnn_size 1024
--learning_rate_decay_start 0
--scheduled_sampling_start 0
--checkpoint_path log_aoa/log_aoa
--save_checkpoint_every 6000
--language_eval 1
--val_images_use -1
--scheduled_sampling_increase_every 5
--scheduled_sampling_max_prob 0.5
--learning_rate_decay_every 3

在epoch=0的过程中,computing SPICE 时,代码报错如下:

`computing SPICE score...
Parsing reference captions
Initiating Stanford parsing pipeline
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec].
Threads( StanfordCoreNLP ) #

Threads( StanfordCoreNLP ) #
A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f0aa67f4e10, pid=12537, tid=0x00007f0a7d4b4700

JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01)
Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode linux-amd64 compressed oops)
Problematic frame:
V [libjvm.so+0x408e10]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/hs_err_pid12537.log

[error occurred during error reporting , id 0xb]

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp

Traceback (most recent call last):
File "train.py", line 300, in
train(opt)
File "train.py", line 244, in train
val_loss, predictions, lang_stats = eval_utils.eval_split(dp_model, lw_model.crit, loader, eval_kwargs)
File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 173, in eval_split
lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split)
File "/home/muli/myExpe--caption/AAT/eval_utils.py", line 55, in language_eval
cocoEval.evaluate()
File "coco-caption/pycocoevalcap/eval.py", line 61, in evaluate
score, scores = scorer.compute_score(gts, res)
File "coco-caption/pycocoevalcap/spice/spice.py", line 79, in compute_score
cwd=os.path.dirname(os.path.abspath(file)))
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmpyfgzkyc4', '-cache', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/cache/1601606281.2002816', '-out', '/home/muli/myExpe--caption/AAT/coco-caption/pycocoevalcap/spice/tmp/tmp_de8maf_', '-subset', '-silent']' died with <Signals.SIGABRT: 6>.
Terminating BlobFetcher`

请问一下,这个怎么解决呢?
我在运行AAT 模型时,没有报错

code confusing

what does (1+(1-p)) and selector mean in self.att_cost += (1+(1-p)).squeeze(1) and selector = (p < 1 - self.epsilon).data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.