ttengwang / pdvc Goto Github PK

View Code? Open in Web Editor NEW

199.0 7.0 23.0 37.82 MB

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

License: MIT License

Python 78.08% Shell 5.83% C++ 1.46% Cuda 14.64%

dense-video-captioning activitynet-captions youcook2 video-paragraph-captioning

pdvc's Introduction

pdvc's People

Contributors

Stargazers

Watchers

pdvc's Issues

"Running PDVC on Your Own Videos": Did i miss something?

Thank you for your great work

I loaded your pretrained model and ran your code using my video dataset (SumMe: video summarization benchmark),

but the results are really weird.

most captions doesn't represent visual features

Cooking.mp4

I just loaded your models and ran on the video datasets

most video captions are very weird

Did i miss something???

thank you

Error when run make.sh

running build
running build_py
running build_ext
building 'MultiScaleDeformableAttention' extension
Emitting ninja build file /home/binzheng/code/PDVC-main/pdvc/ops/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/bin/nvcc -DWITH_CUDA -I/home/binzheng/code/PDVC-main/pdvc/ops/src -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/TH -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/THC -I/home/binzheng/anaconda3/envs/PDVCC/include/python3.7m -c -c /home/binzheng/code/PDVC-main/pdvc/ops/src/cuda/ms_deform_attn_cuda.cu -o /home/binzheng/code/PDVC-main/pdvc/ops/build/temp.linux-x86_64-3.7/home/binzheng/code/PDVC-main/pdvc/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++14
FAILED: /home/binzheng/code/PDVC-main/pdvc/ops/build/temp.linux-x86_64-3.7/home/binzheng/code/PDVC-main/pdvc/ops/src/cuda/ms_deform_attn_cuda.o 
/usr/bin/nvcc -DWITH_CUDA -I/home/binzheng/code/PDVC-main/pdvc/ops/src -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/TH -I/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/include/THC -I/home/binzheng/anaconda3/envs/PDVCC/include/python3.7m -c -c /home/binzheng/code/PDVC-main/pdvc/ops/src/cuda/ms_deform_attn_cuda.cu -o /home/binzheng/code/PDVC-main/pdvc/ops/build/temp.linux-x86_64-3.7/home/binzheng/code/PDVC-main/pdvc/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++14
nvcc fatal   : Unsupported gpu architecture 'compute_75'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1539, in _run_ninja_build
    env=env)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 70, in <module>
    cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 670, in build_extensions
    build_ext.build_extensions(self)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 500, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1255, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/home/binzheng/anaconda3/envs/PDVCC/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

This error message appeared when I run make.sh. How can I solve it? Thank you!

What is the difference between the learnt proposals and ground-true proposals

Hi @ttengwang ~
Thanks for the sharing of your wonderful work! What is the difference between the learnt proposals and ground-true proposals？

Thanks,
Yuanyuan

About evaluation indicators

I would like to ask if the evaluation index of the reproduction model of the paper is only described by video paragraphs. If I want to get the evaluation index of dense video caption

关于实验结果

您好，我想问一下当我用PDVC with learnt proposals 训练出来的结果是与readme中Dense video caption（with learnt proposals）中的结果对比还是跟论文中Predicted proposals对比呢 Predicted proposals与learnt proposals又有什么不同呢？麻烦您能回答我这个困惑么

Results on YouCook2 varies with different Runs - Seed is same!

Hi @ttengwang !

thanks for nice work on DVC!

I am able to run the code on YouCook2 with small configuration changes, however, I am getter different results if run the same code multiple time with same seed. Metric is SODA_c.

Method	Validation
Run 1	4.171
Run 2	3.933
Run 3	4.322
Run 4	3.958
Average	4.096 ± 0.159

Please let me know your thoughts!

Regards
Anil

Ablation study of auxiliary losses？

Hello，
I was wondering about the role of auxiliary losses on each intermediate decoder layer. Can it help to accelerate the model convergence or for other purposes？
Thanks!

用自己训练的C3D模型进行测试，错误

教授你好，我用c3d网络训练好的模型，直接放入save运行报错在运行eval.py时候报错TypeError: expected str, bytes or os.PathLike object, not list

No such file or directory: 'visualization/output/generated_captions/dvc_results.json'

hi
when I run code as below:

video_folder=visualization/videos
output_folder=visualization/output
pdvc_model_path=save/anet_tsp_pdvc/model-best.pth
output_language=en
bash test_and_visualize.sh $video_folder $output_folder $pdvc_model_path $output_language

and the error is generated:

**from densevid_eval3.SODA.soda import SODA

ModuleNotFoundError: No module named 'densevid_eval3.SODA.soda'
START VISUALIZATION
Traceback (most recent call last):
File "visualization/visualization.py", line 154, in
d = json.load(open(opt.dvc_file))['results']
FileNotFoundError: [Errno 2] No such file or directory: 'visualization/output/generated_captions/dvc_results.json'**

so where is dvc_results.json? and how can i got it?

thanks

START FEATURE EXTRACTION出现错误

作者您好，我运行您的代码之后，在START FEATURE EXTRACTION阶段出现RuntimeError: CUDA error: unknown error，其它地方均无错误，请问我该如何解决它？

Paper understanding

Hello,
What is the exact dimensions of input to deformable transformer encoder?
From what I understood:

Input video is sampled for T frames
Extract frame features using TSN/C3D {x^t}_t=1^T
Then L convolutional layers for each frame feature giving {f^l}_l=1^L

So the input is of TxL temporal dimension right?

the video is shown with a white screen.

In the Infer stage, follow the readme instruction（Running PDVC on Your Own Videos）, but for different videos, always generate captions with the same sentence: "The video is shown with a white screen."

A question about object detection

Thank you so much for this wonderful project. When I tried to run your code on my validation set, I ran into some problems. For example, in a video, a cat runs out of a Christmas gift box, but the prediction is: a woman runs out of the Christmas gift box. Another video of mine shows some sheep walking and the prediction is that some horses are walking. From this, it can be seen that the model can recognize the action, but not the type of the object. I think it may be the problem of ActivityNet, because the animal category in the dataset only contains dogs and horses. Could you please provide a pre training weight obtained after pre-training on ImageNet-22K. I think this may be really effective for the model when it comes to object detection. Finally thank you for your contribution.

about "pred_event_count"

Thank you for the great work!

I am trying to run the model on different videos. But the "pred_event_count" seems always be 3, is this just a coincidence or is there something I have done wrongly?

I am using the pretrained TSP features provided in the repo and the model work well on the demo video ("pred_event_count" is 3 as well).

Does the code support multi-gpu training?

Hi, thanks for your great work! I use the command python train.py --cfg_path ${config_path} --gpu_id 0,1,2,3 to train the model, however, it seems only the first gpu is working. Does the code support multi-gpu training? Can you share the multi-GPU training command? Thanks a lot.

Issue on inference

Hi,
I tried to perform inference on my own videos by simply putting those videos in the /visualization/videos folder, then running the provided scripts in this repo.

However, when loading the model Loading model from save/anet_tsp_pdvc/model-best.pth, my terminal shows this error:

visualization/output/r2plus1d_34-tsp_on_activitynet_stride_16/sample_vid.npy not exists, use zero padding.
all feature files of video sample-vid do not exist

Then the generated captions in dvc_results.json will just be talking about a black screen or white screen or credit scene. I assume this is due to the zero padding.
It seems that there is a problem when extracting features from my video(?) I am not too sure. Is there any step I might have missed or any step that was not included in the scripts?

Any help is appreciated. Thank you~

评估过程失败，npy文件不存在。

google drive 无法访问内容

教授您好，为什么在google drive下载提取的tsp或c3D特征时显示访问被拒绝呢？

visualization

Hello, Professor Wang
I want to use my own model to visualize some videos in the ActivityNet Captions.
I want to see the sentences generated by the original model and my own model.
What should I do, looking forward to your reply. Thanks.

Always predicting <end token> for captions for validation loop

The model trains fine, but during the validation loop, the model only predicts <end_token> instead of other words? Did you face this issue too?

How to further train on our own dataset?

Hello,
Thank you so much for this amazing github repo. I wanted to use the pre-trained model and further train on my own dataset of videos and corresponding captions that I have. Do you have any suggestions on how I can do this?

I want to extract the ActivityNet Caption 2D features

dear wang:
I want to extract the ActivityNet Caption 2D features, do you have the raw video? Or can I use TSN or TSP instead of 2D features？Looking forward to your reply，thank you！

Should I change other parameters when I change batch_size?

I cloned your repository and trained the model on ActivityNet. When I changed batch_size to a higher number, there was an error in the program. It happened at line 387 of pdvc.py, and after debugging, I found that the program ran there with an out-of-bounds in tensor. Should I change other parameters when I change batch_size?

Some questions for your paper

Hello, Teng

I have read your PDVC paper and run this code, it is a very good work! However, there are some points I can't understand in the paper, could you explain it?

I can't get how to attain N queries in the flow chart, in this paper, seems there are no anchors, is it also by some anchors pre-set and the order of confidence score?
In Table 3 of this paper, in line 9 of this table (MT [31] with TSN feature), why it is re-evaluated, is it due to different evaluation tools or different features used? Also, I found the Meteor score in Table1, [31] is 9.25, not the same with the re-evaluated value 4.98, could you help me with this?
Could you explain what is the difference of PDVC light and PDVC methods?
You can use Chinese if you prefer.
Thank you very much!

关于模型测试结果

您好，我用您的训练的模型测试，出现以下情况，请问您知道是什么原因吗？

/home/yy/anaconda3/envs/DVC1/bin/python /home/yy/桌面/PDVC/eval.py --eval_folder=anet_c3d_pdvc --eval_model_path=model-best.pth
{'eval_save_dir': 'save', 'eval_mode': 'eval', 'test_video_feature_folder': None, 'test_video_meta_data_csv_path': None, 'eval_folder': 'anet_c3d_pdvc', 'eval_model_path': 'model-best.pth', 'eval_tool_version': '2018', 'eval_caption_file': 'data/anet/captiondata/val_1.json', 'eval_proposal_type': 'gt', 'eval_transformer_input_type': 'queries', 'gpu_id': ['0'], 'eval_device': 'cuda'}
load info from save/anet_c3d_pdvc/info.json
load translator, total_vocab: %d 5747
load captioning file, %d captioning loaded 4917
/home/yy/anaconda3/envs/DVC1/lib/python3.7/site-packages/torch/nn/modules/rnn.py:61: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
all decoder layers share the same caption head
Loading model from save/anet_c3d_pdvc/model-best.pth
alpha: 1.0, temp: 2.0
loss: OrderedDict([('loss_ce', 0.19), ('loss_counter', 0.116), ('loss_bbox', 0.151), ('loss_giou', 0.302), ('loss_self_iou', 0.144), ('cardinality_error', 3.56), ('loss_ce_0', 0.18), ('loss_counter_0', 0.117), ('loss_bbox_0', 0.331), ('loss_giou_0', 0.503), ('loss_self_iou_0', 0.242), ('cardinality_error_0', 3.56), ('total_loss', 4.075)])
available video number 4917
PTBTokenizer tokenized 610661 tokens at 1464671.40 tokens per second.
PTBTokenizer tokenized 583002 tokens at 1426850.56 tokens per second.
Traceback (most recent call last):
File "/home/yy/桌面/PDVC/eval.py", line 144, in
main(opt)
File "/home/yy/桌面/PDVC/eval.py", line 109, in main
logger, alpha=opt.ec_alpha, dvc_eval_version=opt.eval_tool_version, device=opt.eval_device, debug=False, skip_lang_eval=False)
File "/home/yy/桌面/PDVC/eval_utils.py", line 224, in evaluate
dvc_eval_version=dvc_eval_version
File "/home/yy/桌面/PDVC/eval_utils.py", line 124, in eval_metrics
dvc_score = eval_dvc(json_path=dvc_filename, reference=gt_filenames, version=dvc_eval_version)
File "/home/yy/桌面/PDVC/densevid_eval3/eval_dvc.py", line 13, in eval_dvc
score = eval_func(args)
File "/home/yy/桌面/PDVC/densevid_eval3/evaluate2018.py", line 261, in main
evaluator.evaluate()
File "/home/yy/桌面/PDVC/densevid_eval3/evaluate2018.py", line 113, in evaluate
scores = self.evaluate_tiou(tiou)
File "/home/yy/桌面/PDVC/densevid_eval3/evaluate2018.py", line 237, in evaluate_tiou
score, scores = scorer.compute_score(gts[vid_id], res[vid_id])
File "/home/yy/桌面/PDVC/densevid_eval3/pycocoevalcap/meteor/meteor.py", line 37, in compute_score
stat = self._stat(res[i][0], gts[i])
File "/home/yy/桌面/PDVC/densevid_eval3/pycocoevalcap/meteor/meteor.py", line 57, in _stat
self.meteor_p.stdin.flush()
BrokenPipeError: [Errno 32] Broken pipe

How do I train my own data set？

Thank you very much for your outstanding work, but I have my own batch of video data, I want to label my data and replicate your method, how should I label my data set? Looking forward to your reply！

BLEU4/CIDEr

I'm sorry to bother you again, professor. As for the two indicators BLEU4 and CIDEr of Dense Video Caption task in the paper, how can I get these two results？

C3D特征

您好！
王老师，我查看.npy的C3D特征，发现每个特征的列数都是500，行数是不一样的。我想问您，列数和行数具体代表什么呢？每一行的特征是一个时间段的视频特征吗？列数和行数有关系吗？期待您的回复，谢谢您。

没能复现readme中示例的效果

作者您好，我运行您的代码之后，对于坤坤跳舞的视屏，我并没有得到和您一样的效果大部分的时间caption的结果是"the credits of the video are shown"

对于其他视屏，也是在大部分时间中都会显示这句话。请问是我哪一步错误了吗？我使用的就是您网盘中给的预训练模型

关于实验结果的问题

您好，我想复现出论文中的实验结果。但是我得出的结果是Bleu4和METEOR比论文中的要高，但是CIDEr比论文中的结果低很多，想请教一下这是什么原因造成的？

i3d+vggish results

Hello professor,

When I reproduced your ‘i3d+vggish’ model, I found that i can not achieve the same results compared with the original paper. I don't know if there is something wrong with my settings.

Thinks

Few questions about training

Hello @ttengwang ,
I am trying to train your model from scratch (just for learning purpose). However I am facing few issues:

the train_caption_file or val_caption_file does not have labels, which is being used in video_dataset.py (also in class loss). Am I using some wrong file?
I tried with labels from action_proposal dataset (with captioning related part removed), but the loss_ce doesn't decrease at all, both in train and val (did you face any issues like this?). Also the loss_ce is coming in ranges of 300-400.
How many epochs you trained before getting decent captions?

Some questions regarding the dataset

I would like to ask a question about the dataset. I looked at some papers, and I found that some papers are testing the modal performance on the ActivityNet Captions validation set, while some papers are testing the modal performance on the ActivityNet Captions test split. Is there the difference between ActivityNet Captions validation set and ActivityNet Captions test split?

Is there any limit for Batch_size?

I try to train the model with tsn feature.But it only use 2GB GPU memory.So I try to train the model with bitch_size = 8.But there are some error like:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
  0%|                                                                                                                                                                                                   | 0/2502 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 317, in <module>
    train(opt)
  File "train.py", line 181, in train
    output, loss = model(dt, criterion, opt.transformer_input_type)
  File "/home/anaconda3/envs/PDVC-main/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 166, in forward
    disable_iterative_refine)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 299, in parallel_prediction_matched
    others, self.opt.caption_decoder_type, indices)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 387, in caption_prediction
    cap_prob = cap_head(hs[:, feat_bigids], reference[:, feat_bigids], others, seq)
RuntimeError: CUDA error: device-side assert triggered

I have met the same problem when Batch_size not be 1

caption my custom video

Hi @ttengwang ~
Thanks for the sharing of your wonderful work! I want to caption my custom video, but unfortunately I find that most codes for captioning are starting from extracted features, and little instructions are provided for the extraction process. It's very inconvenient for me, cause I'm not so familiar with the captioning task, and I just want to utilize the tool for some applications. So could you please kindly give me some detailed instructions on how I can get the captions from a raw video? I will appreciate it a lot!

Thanks,
Zhihong

关于在自己的视频上运行PDVC

您好，我是一名大二学生，对您这个项目十分感兴趣，感谢您对该项目的付出和无私奉献。想请教您一些问题，希望您不吝赐教，感激不尽
我尝试在自己的视频上运行PDVC，按照Readme中的操作步骤成功了，但仔细阅读test_and_visualize.sh文件后发现这个方法仅限于TSP这个模型。
而我一直训练的是C3D模型
所以引出问题一：
同时我也注意到关键操作是 START Dense-Captioning 用python 运行 eval.py ,利用之前步骤生成的特征文件再生成 dvc_caption.josn，那我如何用自己训练过的C3D模型来生成.npy，再用eval.py生成Caption呢？

我也尝试过训练TSP模型，下载download_tsp_features.sh对应的文件，注意里面提到download the following files and reformat them into data/features/tsp/VIDEO_ID.npy where VIDEO_ID starts with 'v_'，不知如何将这些TSP的h5文件format为.npy文件，只有convert_c3d_h5_to_npy.py却不能为TSP所用。

问题二：
如何将TSP的特征文件直接转化为npy文件？我注意到训练时获取的就是.npy文件
关于问题二的猜想：
所以我是需要按照TSP的Readme所描述的将[Activity Net]数据集全部下载下来并用fiftyone分类，再用extract_features_from_a_released_checkpoint.sh 得到TSP的特征文件再进行训练嘛，那这样的话.h5文件是否得到利用呢。

以上问题可能比较基础和啰嗦，如果您乐意为我解答真的能帮到我很多！！再次感谢~

请问能将提取好特征的数据集放入百度网盘或者谷歌网盘吗

通过浏览器下载数据集网络会经常突然断开，导致数据集下载失败，还会有下载速度过慢的情况。

Difference between para_METEOR and METEOR

hello, professor~
I want to ask you what's the difference between para_METEOR and METEOR

关于 MultiScaleDeformableAttention 的问题

您好作者，我在安装reader me的步骤安装好MultiScaleDeformableAttention之后，在运行时，出现了导入的错误。你又遇到类似的问题吗，应该怎么解决！

$5{{H~7T)_3XFEI0M@ SY HS$

为什么测试集的loss几乎不变？

我直接按照readme进行训练，发现evaluate的loss几乎一直不变，几乎一直保持下面的大小
loss: OrderedDict([('loss_ce', 0.187), ('loss_counter', 0.114), ('loss_bbox', 0.151), ('loss_giou', 0.299), ('loss_self_iou', 0.138), ('cardinality_error', 3.56), ('loss_ce_0', 0.173), ('loss_counter_0', 0.115), ('loss_bbox_0', 0.343), ('loss_giou_0', 0.511), ('loss_self_iou_0', 0.242), ('cardinality_error_0', 3.56), ('total_loss', 4.075)])
请问这是怎么回事？

关于Activitynet caption的TSN特征

王老师您好：
data文件里的TSN特征是youcook2的特征，我想问一下，Activitynet caption的TSN特征在哪里找到呢

Comparison with Base Transformer on YouCook2

Hi @ttengwang

Appreciate you for sharing the code.

I am wondering if you train the base Transformer +LSTM on Youcook2 dataset, i.e. similar to Row 1 and 2 in Table 7 (a).

I am wondering if the current code supports to train the base transformer or not.

Thanks

使用GT proposal时测得的paragraph captioning结果偏低

作者您好~我使用的是TSP的特征以及预训练好的模型，测试predicted proposal时得到与readme中相近结果（bleu4:10.46, METEOR: 16.43, CIDEr: 20.92），这个结果比论文Table 4中的结果要好，可能是因为用了更好的特征。

但当我使用同样的特征和模型测试GT proposal时，得到（bleu4:11.17, METEOR15.58, CIDEr: 22.70），这个结果又明显不如论文Table 4中的结果，这是为什么呢？是测试GT proposal用的模型和测试predicted proposal的模型不一样吗？

如果方便的话，能不能给我发一份模型在predicted proposal和GT proposal两种条件下的预测结果呀，我们打算搜集一些模型的结果进行一个人工评测，我的邮箱是[email protected]，感谢！

NameError: name 'MSDA' is not defined

I installed the MultiScaleDeformableAttention as directed.

Running PDVC on Your Own Videos

您好！很高兴可以看到您优秀的工作！
但是，我在测试模型的过程中遇到了问题。在“Running PDVC on Your Own Videos”部分，我使用了您提供的预训练模型和准别好的测试视频“xukun”，可是测试结果并没有达到您README中展示的效果。不知道是不是我的操作过程出现了问题呢？

期待您的回复！谢谢！

Question about the result difference of video paragraph captioning

Thanks for the great work!
I notice that in the Table 4 of your paper, PDVC can achieve "B@4 11.80| M 15.93 | C 27.27" in ActivityNet Captions ae-val set, but it is "B@4 10.18 | M 15.96 | C 20.66" for PDVC with TSN features shown in the Readme. I wonder if the two datasets (ActivityNet Captions v.s. ActivityNet Entity) are different that leads to such different results? Looking forward to your reply.

questions about counter_class_rate

Hi!

I found that a predefined list called 'counter_class_rate' in ./pdvc/crtirion.py is used as a weight in counter loss.
I'm curious about the way to get the list. Is it the frequency of event number of the dataset?

I'd be appreciated if you can answer my question!

error in ms_deformable_im2col_cuda: invalid device function

48%|████▊ | 2358/4917 [02:40<03:01, 14.12it/s]error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
48%|████▊ | 2360/4917 [02:40<02:55, 14.58it/s]error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function

Hello, what does the above situation mean when training the model? But the program can run

anet_tsn_pdvc best model fail to load..

Thanks for the sharing of your wonderful work! I want to employ your best tsn model to run my video. But an error occurred while loading the model as follows:(maybe released model parameters and structure do not match)
Loading model from save/anet_tsn_pdvc/model-best.pth
Traceback (most recent call last):
File "eval.py", line 111, in
main(opt)
File "eval.py", line 70, in main
model.load_state_dict(loaded_pth['model'], strict=True)
File "/data11/zq/vc_envs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PDVC:
size mismatch for transformer.pos_trans.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]).

A question about demo video

Thanks for sharing your wonderful work.
I haven't read your paper yet, so based on demo video, I have some questions:
1- Is your PDVC model can be considered as live video captioning?
2- Is the caption is generated for each event directly without reading all video frames?
3- How long does it take to generate caption for one event?

ttengwang / pdvc Goto Github PK

pdvc's Introduction

pdvc's People

Contributors

Stargazers

Watchers

Forkers

pdvc's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs