saahiluppal / catr Goto Github PK

View Code? Open in Web Editor NEW

252.0 4.0 54.0 3.07 MB

Image Captioning Using Transformer

License: Apache License 2.0

Python 63.72% Jupyter Notebook 36.28%

image-captioning transformer

catr's Introduction

Wanna grab these configuration files ?

📝 Latest Blog Posts

📈 GitHub Stats

catr's People

Contributors

Stargazers

Watchers

catr's Issues

What's the difference between v1, v2 and v3

Hi,

There are three pre-trained models are provided. Are they essentially the same models with same configurations?

Thanks!

Does this code requires features/box extraction?

Can I perform an end-to-end training with this code on my own dataset (that contains pairs of images and captions), or do I need to first extract features/bounding boxes for the images?

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Hi, I get this error while trying to run main.py

runfile('D:/COCO/imge_captioning_transform_github/1/catr-master/main.py', wdir='D:/COCO/imge_captioning_transform_github/1/catr-master')
Reloaded modules: datasets, datasets.utils, datasets.coco, configuration, engine
Initializing Device: cuda
Traceback (most recent call last):

File D:\COCO\imge_captioning_transform_github\1\catr-master\main.py:90 in
main(config)

File D:\COCO\imge_captioning_transform_github\1\catr-master\main.py:23 in main
model, criterion = caption.build_model(config)

File ~\Desktop\models\caption.py:51 in build_model

File ~\Desktop\models\backbone.py:112 in build_backbone

File ~\Desktop\models\backbone.py:85 in init

File ~\anaconda3\envs\my_envir_gpu\lib\site-packages\torchvision\models\resnet.py:342 in resnet101
return _resnet("resnet101", Bottleneck, [3, 4, 23, 3], pretrained, progress, **kwargs)

File ~\anaconda3\envs\my_envir_gpu\lib\site-packages\torchvision\models\resnet.py:296 in _resnet
state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)

File ~\anaconda3\envs\my_envir_gpu\lib\site-packages\torch\hub.py:595 in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)

File ~\anaconda3\envs\my_envir_gpu\lib\site-packages\torch\serialization.py:705 in load
with _open_zipfile_reader(opened_file) as opened_zipfile:

File ~\anaconda3\envs\my_envir_gpu\lib\site-packages\torch\serialization.py:243 in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Can you tell me where I should change in the code to fix this error?

Cannot implement data parallel training for multiple GPUs

loading model

sorry for bothering again,but when i try load the model as u suggested it gives this error
AttributeError: module 'torch' has no attribute 'load_state_dict'
and if i use model=model.load_state_dict insteade of torch.load_state dict it gives the following error
model=model.load_state_dict(torch.load('/content/drive/MyDrive/BanglalekhaDataset/model/checkpoint1.pth'))
NameError: name 'model' is not defined

Questions on using sample images

You are pulling images from .github/ such as .github/cake.png. Where is this located--in your repo? In any case, I was able to insert a reference in colab's file system i.e. /content/catr/test.jpg so all is OK.

I tried two pictures and obtained peculiar results. For the attached, I got "Baby eating a donut with a spoon". Is it trained on a particular corpus so I will know what sorts of images will work better at receiving a more accurate caption?

https://www.google.com/search?q=picture+of+kid+eating+ice+cream&client=firefox-b-1-d&sxsrf=ALeKk01x0N-Cp698EaTBLihZO1tmGKxsEQ:1624046293165&tbm=isch&source=iu&ictx=1&fir=8linmDlDQlkv2M%252CaFcoZXjfh6m-RM%252C_&vet=1&usg=AI4_-kTh7sbXkQdBOOZh72YivTDSQmnZzA&sa=X&ved=2ahUKEwjd-Zvz-6HxAhWLG80KHaxGBL0Q9QF6BAgQEAE&biw=1595&bih=1126#imgrc=c2CgWDwHJ4rJtM

No simple way to test custom trained model

It'll be good if we can provide model checkpoint file for prediction in predict.py

Can I know loss of your pretrained model?

I wonder training loss and validation loss of your pretrained model

Paper Link

Hello
Thanks for sharing the code, however i cant find the paper pdf anywhere
Can you attach it?
Thanks

Some questions about image augmentation

Hello, I'm a newbie of image captioning. I have some questions about the image augmentation in coco.py.

Flip and rotate will make the wrong position, such as "left & right" and "above & under", I think those augmentations are not good with image captioning. Could you please tell me why using flip and rotate?
Why the numerical ranges of brightness, contrast, and saturation in color jitter are [0.5, 1.3], [0.8, 1.5], and [0.2, 1.5]? Do you reference other works or papers? If yes, could you please tell me the reference?

Thank you.

Consideration of padding?

Hi! I studied your code, and got some questions.

It seems like pad token is making a loss, too.

catr/models/caption.py

Line 55 in 8a1f770

criterion = torch.nn.CrossEntropyLoss()

In my opinion, the code above need to be like:

criterion = torch.nn.CrossEntropyLoss(ignore_index=config.pad_token_id)

Otherwise, the model must predict [PAD] token, too.

Also, I wonder the reason why you used FrozenBatchNorm. Was batch size 32 not sufficient for stable learning?

Thank you!!

Beam search

I'm trying to implement beam search but I'm getting strange results. Was wondering if anybody has managed to implement beam search with this repo?

Specific words are preferred to be generated when I was training catr-model on my own dataset

prediction: [CLS] i was was was the the was was was i was i [SEP]
dataset: [CLS] you do he coming for you mother he alive not well i [SEP]

prediction: [CLS] i was was i was i of and the i the i [SEP]
dataset: [CLS] you do if you receive a letter from yourself with information only [SEP]

prediction: [CLS] i was was i i was i was was the the i [SEP]
dataset: [CLS] you do mean that she again you do even know what you [SEP]
...........

It was strange because when I used the pre-trained catr-model, he works fine. I modified my dataset format to fit in coco-dataset style, and made sure each data pairs fed successfully into training(I printed the input image and captions during training). I made a mini dataset(n<40) to make sure its convergence(at last loss=0.214xxxx, actually I thought loss should converge to 0.001 due to it's so tiny), and this phenomenon didn't disappear. What possibly happens to my procedure?

The requirements version

Thank you so much for your great work.I was trying to reproduce your work but it seems that I had a version problem.Could you please tell me the detailed version about your requirement, especially the transformers.I will be grateful .

Model Performance

Hi！Could I ask for you about model's performance ? Such as, BLEU, CIDEr and so on

Multiple images

Hi,

The prediction code is only one image each. How can i change this to predict the image in the folder at once?

thanks.

RuntimeError: CUDA error: device-side assert triggered

I got an error while trying to change the Bert Base pre-trained library. I have tried to run this model in another language.

the error is like -

` Initializing Device: cuda

Number of params: 83972666
Train: 18308
Valid: 1830
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Start Training..
Epoch: 0
0% 0/1144 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
FutureWarning,
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
0% 0/1144 [00:08<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 98, in
main(config)
File "main.py", line 74, in main
model, criterion, data_loader_train, optimizer, device, epoch, config.clip_max_norm)
File "/content/gdrive/My Drive/image captioning research work/image captioning/engine.py", line 25, in train_one_epoch
outputs = model(samples, caps[:, :-1], cap_masks[:, :-1])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/gdrive/My Drive/image captioning research work/image captioning/models/caption.py", line 29, in forward
pos[-1], target, target_mask)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/gdrive/My Drive/image captioning research work/image captioning/models/transformer.py", line 48, in forward
tgt = self.embeddings(tgt).permute(1, 0, 2)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/gdrive/My Drive/image captioning research work/image captioning/models/transformer.py", line 293, in forward
input_embeds = self.word_embeddings(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 160, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2043, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered`
Can you tell me where I should change in the code to fix this error?

pretrained models issue

HI
I tried only the pre-trained models, V1 and V3 --> runnning predict.py , but prediction return only a list of "[unk] [unk] [unk] [unk] [unk] [unk] [unk] [unk] [unk] [unk]" on my images dataset;..
what can be the problem??

thanks for your clear code;

License violation

Hi,

It appears that this repository is heavily based off the DETR code, including files copied verbatim, but stripped from the copyright header.

The original code was released under the Apache license. In particular, according to section 4.c:

You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works

In order to be in compliance with the license, could you reinstate appropriate copyrights and pointers to the original work?
Thanks in advance.

Hugging Face Hub Integration

Hi there!

This project is super cool! Would you be interested in sharing the pretrained models in the Hugging Face Hub? The Hugging Face Hub offers free hosting of models (over 10,000 models have been uploaded by many research organizations) and it would make your work more accessible and visible to others. People would be able to try the model directly in the browser (we're implementing an image captioning widget at the moment). The only thing required would be to upload the models to the Hub. I'm happy to answer any questions about this.

Happy to hear your thoughts,
Omar

How to slice <eos> token with different sentence length

As I want the model to predict the end token by excluding it from the input into the model, I simply slice the token off the end of the sequence. Thus:

trg = [sos, x_1, x_2, x_3, eos]
trg[:-1] = [sos, x_1, x_2, x_3]

This is also same as your implementation.

But actually many datasets collect sentences with different length, ans thus the last elements of sentences are tokens, such as:

trg = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
trg[:-1] = [sos, x_1, x_2, x_3, eos, pad, pad]

In such a case, I can’t slice the token, may I ask how can I solve this issue?

loading model

hi i was able to train a model running your code,but when i try to predict the caption using your predict.py file,it downloads your pretrained model,how to use the model i trained in the predict.py file,i am very new to this,i would really appreciate if you could help me out.Thanks in advance

From #2 , I know there is no paper, but I found similar paper with your work.

Does the figure below explain your work?

Thank you!

What the difference between v1, v2 and v3?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.