ruotianluo / imagecaptioning.pytorch Goto Github PK

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

License: MIT License

Python 94.77% HTML 0.57% Shell 4.66%

imagecaptioning.pytorch's Issues

Teacher Forcing

Has teacher forcing been done to train this model ? And if yes , to what degree. I am traning my own show attend tell model, and at every step I do teacher forcing, it's overfitting terribly. What do you think I should do ?
Thanks so much !

TensorBoard Problem

Dear @ruotianluo,
When I run the train.py code for MS-COCO data set, I have faced the following error:
Traceback (most recent call last): File "train.py", line 204, in <module> train(opt) File "train.py", line 152, in train for k,v in lang_stats.items(): AttributeError: 'NoneType' object has no attribute 'items' Terminating BlobFetcher
this error occurred, when the evaluation process is running (i.e., save_checkpoint process).
When I have commented the # Write validation result into summary part in your train.py code, then everything were correct, unless I can't seen chart in the TensorBoard.

some code missing?

python scripts/prepro_labels.py --input_json .../dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk failed. Here are the errors:

Traceback (most recent call last):
  File "scripts/prepro_labels.py", line 192, in <module>
    main(params)
  File "scripts/prepro_labels.py", line 138, in main
    imgs = imgs['images']
TypeError: list indices must be integers, not str

It seems that some code is missing.

the paper pointed that fc is generated by faster rcnn, I find you use resnet instead.

Will it influence the result? and I cannot understand the att contains 14x14 in myResnet, what does it mean? Thanks

Performance of CIDEr decreases when performing self critic training at first 6000 iterations

Hi, I follow the instruction to train the model. Take the fc model for example. I train it with cross entropy loss for 25 epochs(iterations: 336000), and the CIDEr on validation set is 0.92. Then I further train the model with scst, its CIDEr on validation set is 0.89 at iterations 342000. I want to ask why the cider decrease
at the first 6000 iterations of scst(self critic sequence training).

Error in eval

I am encountering following error. Can somebody help me to resolve this issue?

python eval.py --model no_finetune/att2in/model-best.pth --infos_path no_finetune/att2in/infos_a2i-best.pkl --image_folder ../images/ --num_images 5
DataLoaderRaw loading images from folder: ../images/
0
listing all images in directory ../images/
DataLoaderRaw found 4 images
Traceback (most recent call last):
File "eval.py", line 122, in
vars(opt))
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/eval_utils.py", line 102, in eval_split
seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/Att2inModel.py", line 197, in sample
return self.sample_beam(fc_feats, att_feats, opt)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/Att2inModel.py", line 186, in sample_beam
self.done_beams[k] = self.beam_search(state, logprobs, tmp_fc_feats, tmp_att_feats, tmp_p_att_feats, opt=opt)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/CaptionModel.py", line 105, in beam_search
state)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/CaptionModel.py", line 50, in beam_step
candidate_logprob = beam_logprobs_sum[q] + local_logprob
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

I have a GPU and I have confirmed that torch is using that GPU

Doubt in criterion

Hey as far as I can see you are creating 1 indexed labels and that same labels go into criterion. Don't you need 0 indexed labels for pytorch tensor gather function

The performance on MS COCO Val5000

Hi, very good job. Can you give the final performance that this code can achieve on MS COCO validation-5000 comparing to karpathy's neuraltalk2?

Is it possible to detect anomaly with neuraltalk?

Hi, @rym9005023 @gujiuxiang @ruotianluo

Is it possible to detect anomaly with neuraltalk?

I have converted my 1d signals to images.

And I want to enter this images to neuraltalk network for anomaly signal detection.

I will just train the text "Normal" and "Abnormal" to neuraltalk for anomaly captioning.

Is that possible?

Thanks in advance.

We can help each other! Finding friend who is studying image captioning , my WeChat ID lijingx-正在研究image caption 朋友们来一起讨论代码

RT,my English and Chinese are OK,I think good discussion make easier and better study of image captioning.My wechat ID lijingx-.

Evaluation: AttributeError: 'Namespace' object has no attribute 'caption_model'

When running eval.py on python2.7, I get this error:

File "eval.py", line 99, in <module> model = models.setup(opt) File "/path/to/neuraltalk2.pytorch/models.py", line 16, in setup if opt.caption_model == 'show_tell': AttributeError: 'Namespace' object has no attribute 'caption_model'

It looks like the "caption_model" argument is missing from the Argument Parser in eval.py, causing an error to be thrown when model.py attempts to access it.

I see that there are model settings are in "opts.py". Are we somehow meant to import these?

performances of each model and self-critical

Dear @ruotianluo,
Thank you for your fantastic code. I have three problems:
1, I trained on my own dataset with top-dowm model.Would you please tell me the performances of each model on COCO dataset in models folder ? I read your document and just think the dop-dowm model performs best.
2, i don't find the process of image size when using resnet,and i resize the image 512*512 as input.
3, what is self-critical training? i don't find relevant parameters. I just find "Att2in model in self-critical" in Att2inModel.
Thank you again. I hope you don't mind so many questions and my poor english. Look forward to communicating with you !

Performance

Hi ruotian,
Have you tested the results on standard benchmark ? I am curious about it.
Thanks !

A question about evaluation funciton

Sorry again for bother you. But I can't understand the code
in the class LanguageModelCriterion
what does the mask do ?
I don't understand the loss calculation process , Is that is to calculate the poster probabilty ?
Can you tell me or give some reference. Many thanks!!!

The attention mechanism of '''top-down'' is different from that of ''show attend and tell'', but they seem the same in your code.

I met a error in the training...

xw@xw:~/ImageCaptioning.pytorch-master$ python train.py --id st --caption_model show_tell --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_st --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 25
...
evaluating validation preformance... 4989/5000 (2.672655)
image 324313: a man is sitting on a bed with a laptop
image 46616: a man is riding a skateboard on a ramp
image 285832: a living room with a couch and a table
image 496718: a man is holding a cell phone while standing in a park
image 398209: a living room with a couch and a table
image 568041: a living room with a couch and a table
image 206596: a man is playing tennis on a tennis court
image 451949: a man is holding a skateboard in a park
image 203138: a man in a suit and tie is holding a cell phone
image 296759: a close up of a person holding a hot dog
evaluating validation preformance... -1/5000 (2.669259)
Traceback (most recent call last):
File "train.py", line 204, in
train(opt)
File "train.py", line 152, in train
for k,v in lang_stats.items():
AttributeError: 'NoneType' object has no attribute 'items'
Terminating BlobFetcher

how to control the num of thread?

Hi, ruotian:

Thanks for your awesome code of reproducing the 'self-critical sequence training'.

I have a question that how to control the num of thread. When I run the code, all the threads are opened, and it occupy much cpu resource.

Thanks you!

Train on other dataset

@ruotianluo Thank you for your fantastic code. as you mentioned, If one wants to train your code on another dataset must create a file like dataset_coco.json. Would you please explain the format of dataset_coco.json file?

Features saved in many npy files are slow to read

On certain file systems (e.g. NFS) storing/reading thousands of files is very slow.

I can see that there is h5py imported, although never used. Were there any problem with hdf5? Maybe it makes sense to save features to a single npz?

about topdown model

Dear @ruotianluo ,
I was wondering if you could tell me your pretrained topdown model use Faster r-cnn(bottom-up attention) or ResNet

Benchmarks

Cross entropy loss (Cider score on validation set without beam search; 25epochs):
fc 0.92
att2in 0.95
att2in2 0.99
topdown 1.01

(self critical training is in https://github.com/ruotianluo/self-critical.pytorch)
Self-critical training. (Self critical after 25epochs; Suggestion: don't start self critical too late):
att2in 1.12
topdown 1.12

Test split (beam size 5):
cross entropy:
topdown: 1.07

self-critical:
topdown:
Bleu_1: 0.779 Bleu_2: 0.615 Bleu_3: 0.467 Bleu_4: 0.347 METEOR: 0.269 ROUGE_L: 0.561 CIDEr: 1.143
att2in2:
Bleu_1: 0.777 Bleu_2: 0.613 Bleu_3: 0.465 Bleu_4: 0.347 METEOR: 0.267 ROUGE_L: 0.560 CIDEr: 1.156

Pre trained vectors

Hi I'm making my own version if a image captioning model, I haven't gone through your code in detail yet. I was wondering if you have used pre-trained word vectors for this task or just used an one hot encoding representation ?
And do you think the use of pre-trained word vectors make a substantial impact on training time and accuracy ?

UnboundLocalError: local variable 'resnet' referenced before assignment

python eval.py --model topdown/model-best.pth --infos_path topdown/infos_td-best.pkl --image_folder images --num_images 5

Traceback (most recent call last):
  File "eval.py", line 114, in <module>
    'cnn_model': opt.cnn_model})
  File "/home/demobin/work/github/neuraltalk2.pytorch/dataloaderraw.py", line 37, in __init__
    resnet = getattr(resnet, self.cnn_model)()
UnboundLocalError: local variable 'resnet' referenced before assignment

Cpu issue

I changed the eval.py to run it on cpu but I was encountered to some error. Can you provide the cpu version of it? Thanks in advance

About att_feature

Hi, I am new to image captioning. I want to know what's the att_feats here. Could anyone explain it? Because I fail to find it in original paper...

I cannot reach the score where "readme.md“ mentioned.

Hi, I use parameters as follows:(ShowAndTell)
CNN: resnet152
LSTM: 2 Layers
other paramters are same as you mentioned, but the CIDEr score is only 0.681
And when change resnet152 to InceptionV4, the CIDEr score is only 0.651.
Both of them are far away from 0.84 which you mentioned in train parts.
Can you give me some advice on this score? I have tried lots of different parameters, but the score is still low.

It slowed down as training processed

When train, i notice it slow down. And it also influence on other tensorflow code(from 0.2s/per_batch --> 1.2s/per_batch --> 1.8s/per_batch). But once i stopped training, other codes' speed is back to 0.2s/per_batch. What may cause this situation?

It generate all the same sentences to different images, and i found the input of LSTM in the second time step is all zeros.

It generate all the same sentences when eval. I found when sample_max == 1, the second input to LSTM is all zeros. And if sample_max != 1, the result is not match to the picture although it seems right.

About the batch_size

If I set --battch_size=16 and seq_per_img=5, the actual batch_size is 80 , right?

Multi-GPU Training Support

Does your nice code support Multi-GPU training?

if i want to train cnn and LSTM，how should i do????

Potential Issue of using multi-gpu

Hey, thanks for this amazing repo.

I was reading through your code, and I think there might be a potential issue for using multiple GPU with torch.nn.DataParallel

Particularly, you break the procedure when sentences reach the end (by checking their sum)

# https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/AttModel.py#L90-L91
if i >= 1 and seq[:, i].data.sum() == 0:
    break

# I am using AttModel.py as an example, but it should be the same to other models

When forward passing the model, the data in mini-batch will be divided and send into individual GPUs, there could be a case where the output on one GPU is shorter than the others. This will result in an error when collecting output from all GPUs, since their dimension mismatch.

Is there any particular reason why you break the process, instead of letting it run to the end (let the for loop finish)?

AttributeError: 'Namespace' object has no attribute 'use_att'

python eval.py --dump_images 0 --num_images 5000 --model topdown/model-best.pth --infos_path topdown/infos_td-best.pkl --language_eval 1

Traceback (most recent call last):
  File "eval.py", line 109, in <module>
    loader = DataLoader(opt)
  File "/home/demobin/work/github/neuraltalk2.pytorch/dataloader.py", line 42, in __init__
    self.use_att = opt.use_att
AttributeError: 'Namespace' object has no attribute 'use_att'

One problem during the training process

preprocessing steps are both ok, but when I am trainning the model after epoch 0,

evaluating validation preformance... -1/5000 (2.649871)
Traceback (most recent call last):
  File "train.py", line 204, in <module>
    train(opt)
  File "train.py", line 152, in train
    for k,v in lang_stats.items():
AttributeError: 'NoneType' object has no attribute 'items'
Terminating BlobFetcher

seems like codes in line 138-140 in file eval_utils.py meet the problem
lang_stats = None if lang_eval == 1: lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split)
I am new to pytorch

When beam_size > 1, it shows size mismatch error.

When beam_size > 1, it shows errors as follows:
Traceback (most recent call last):
File "/home/mh/workspace/MyImageCaptioning/MyTrain.py", line 346, in
train()
File "/home/mh/workspace/MyImageCaptioning/MyTrain.py", line 263, in train
val_loss, predictions, lang_stats = eval_utils.eval_split(model, in_model, vb_model, jj_nn_model, crit, val_loader, cl_loss, epoch, eval_kwargs)
File "/home/mh/workspace/MyImageCaptioning/eval_utils.py", line 148, in eval_split
seq, _ = model.sample(_features, eval_kwargs)
File "/home/mh/workspace/MyImageCaptioning/models/ShowTellModel.py", line 134, in sample
return self.sample_beam(fc_feats, opt)
File "/home/mh/workspace/MyImageCaptioning/models/ShowTellModel.py", line 113, in sample_beam
xt = self.img_embed(fc).expand(beam_size, self.input_encoding_size)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 835, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

I check torch.expand() tutorial, found that, tensor expand first Dim must be same as the tensor. I cannot sure isn't the main reason result in this error.

A bug in 'fc' model when using GRU?

There is a bug in 'fc' model with GRU as rnn_type, the error hint is as follows. I know the bug is caused by line 35 of FCModel.py, but I don't know how to fix it. Any help will be appreciated.

neuraltalk2.pytorch/models/FCModel.py", line 35, in forward
next_c = forget_gate * state[1][-1] + in_gate * in_transform
File "/home//anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 76, in getitem
return Index.apply(self, key)
File "/home//anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
result = i.index(ctx.index)
IndexError: index 1 is out of range for dimension 0 (of size 1)

Python-3 support

Hi @ruotianluo ,
I looked at the coco-caption codebase.
It seems that we need some modifications like xrange -> range etc., to port the code to Python-3.

So, I'd like to know if you have any plans to port the pycoco tools to Python-3 so that we can use this current codebase for training models in Python-3?

questions about initializing the lstm hidden states

here :https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/OldModel.py#L49
you seems to directly init the hidden states with the fc_feats with a linear layer. So I want to ask that if I want to implement an attention model where the lstm takes fc_feats as input at step 0, and takes start token as input at step 1, like the figure below, then how to init the hidden states of lstm?

failed to generate and save fc and att features to .h5 files, for my own datasets

processing 0/279 (0.00% done)
Traceback (most recent call last):
File "/home/jzheng/PycharmProjects/ImageCaptioning_Skyler/scripts/prepro_feats_sky.py", line 119, in
main(params)
File "/home/jzheng/PycharmProjects/ImageCaptioning_Skyler/scripts/prepro_feats_sky.py", line 89, in main
(2048,), dtype="float")
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 119, in create_dataset
self[name] = dset
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 287, in setitem
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

Some confusion about adaptive attention model

First, thanks so much for contributing such great codes.
However,I get some questions when I review code of the adaptive attention model.According to the paper "Knowing When to Look", LSTM only receive the word vector Xt and the previous hidden state Ht-1,instead of the image vector,but your code includes the image vector when building the LSTM.
Would you please explain it ?

UnboundLocalError: local variable 'cnn_optimizer' referenced before assignment

Hi, ruotian. Thank you for fantastic code. But when I try to finetune CNN using your with_finetune branch, I get the error as below:

Traceback (most recent call last):
File "train.py", line 254, in
train(opt)
File "train.py", line 151, in train
cnn_optimizer.zero_grad()
UnboundLocalError: local variable 'cnn_optimizer' referenced before assignment

What are the abbreviation of "fc" and "att" meaning?

I can see that the abbreviation of "fc" and "att" are full of the whole project. And it seems that come from misc/resnet_utils.py, but no any comment mentions about their meaning.

Could someone tell me what are the abbreviation of "fc" and "att" meaning and what are the two variables doing?

Thanks.

some bug found in using

eval.py
line 79: opt.input_fc_h5 = infos['opt'].input_fc_h5 need change to opt.input_fc_dir = infos['opt'].input_fc_dir
line 80: opt.input_att_h5 = infos['opt'].input_att_h5 need change to opt.input_att_dir = infos['opt'].input_att_dir

dataloaderraw.py
line 104: img = img.concatenate((img, img, img), axis=2) need change to img = np.concatenate((img, img, img), axis=2)

Generate soft attention pictures of each word

Like the paper mentions, "As the model generates each word, its attention changes to reflect the relevant parts of the image." I'd like generate the soft attention pictures of each word, but met some problems.
Is the script of eval.py can do the function? Or how to implement the function?

Best regards.

Train on Flicker8k Dataset

I want to train on flicker8k dataset. Some error occur in 'prepro_feats.py' when I using 'dataset_flickr8k.json'. The error is about no 'cocoid'. How to solve it? Do i need to generate my own '**.json'? 3Q~

where is infos_pkl

About using cpu on evaluating

Hi~
Thanks for sharing the codes. I have trained a gpu model using my own dataset. It really helps a lot.
However, now I need evaluate the model on another machine only with cpu. So could you help provide some codes about how to convert gpu model to a cpu checkpoint, and how to eval using cpu model? Thanks a lot!

dataloaderraw.py, line 25 - IOError: [Errno 2] No such file or directory: '/home-nfs/rluo/rluo/model/pytorch-resnet/resnet101.pth'

This access of an absolute directory throws an error unless on rluo's account.
Not difficult to manually fix for a user, but just thought I'd flag it.

Cpu issue

I changed the eval.py to run it on cpu but I was encountered to some error. Can you provide the cpu version of it? Thanks in advance

ValueError: sampler should be an instance of torch.utils.data.Sampler.

The solution is in another code.

class SubsetSampler(torch.utils.data.sampler.Sampler):
    def __init__(self, indices):
        self.indices = indices

    def __iter__(self):
        return (self.indices[i] for i in range(len(self.indices)))

    def __len__(self):
        return len(self.indices)

and

sampler=SubsetSampler(self.dataloader.split_ix[self.split][self.dataloader.iterators[self.split]:])

ruotianluo / imagecaptioning.pytorch Goto Github PK

imagecaptioning.pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs