microsoft / 2d-tan Goto Github PK

View Code? Open in Web Editor NEW

918.0 22.0 153.0 30.35 MB

VideoX: a collection of video cross-modal models

License: Other

Shell 0.45% Python 99.55%

2d-tan's Introduction

VideoX - Multi-modal Video Content Understanding

This is a collection of our video understanding work

SeqTrack (@CVPR'23): SeqTrack: Sequence to Sequence Learning for Visual Object Tracking

X-CLIP (@ECCV'22 Oral): Expanding Language-Image Pretrained Models for General Video Recognition

MS-2D-TAN (@TPAMI'21): Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

2D-TAN (@AAAI'20): Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

News

☀️ Hiring research interns with strong coding skills at MSRA: [email protected]
💥 Apr, 2023: Code for SeqTrack is now released.
💥 Feb, 2023: SeqTrack was accepted to CVPR'23
💥 Sep, 2022: X-CLIP is now integrated into
💥 Aug, 2022: Code for X-CLIP is now released.
💥 Jul, 2022: X-CLIP was accepted to ECCV'22 as Oral
💥 Oct, 2021: Code for MS-2D-TAN is now released.
💥 Sep, 2021: MS-2D-TAN was accepted to TPAMI'21
💥 Dec, 2019: Code for 2D-TAN is now released.
💥 Nov, 2019: 2D-TAN was accepted to AAAI'20

Works

SeqTrack

In this paper, we propose a new sequence-to-sequence learning framework for visual tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem, which predicts object bounding boxes in an autoregressive fashion. SeqTrack only adopts a simple encoder-decoder transformer architecture. The encoder extracts visual features with a bidirectional transformer, while the decoder generates a sequence of bounding box values autoregressively with a causal decoder. The loss function is a plain cross-entropy. Such a sequence learning paradigm not only simplifies tracking framework, but also achieves competitive performance on many benchmarks.

X-CLIP

In this paper, we propose a new video recognition framework which adapts the pretrained language-image models to video recognition. Specifically, to capture the temporal information, we propose a cross-frame attention mechanism that explicitly exchanges information across frames. To utilize the text information in video categories, we design a video-specific prompting technique which can yield instance-level discriminative textual representation. Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios, including fully-supervised, few-shot and zero-shot.

MS-2D-TAN

In this paper, we study the problem of moment localization with natural language, and propose a extend our previous proposed 2D-TAN method to a multi-scale version. The core idea is to retrieve a moment from two-dimensional temporal maps at different temporal scales, which considers adjacent moment candidates as the temporal context. The extended version is capable of encoding adjacent temporal relation at different scales, while learning discriminative features for matching video moments with referring expressions. Our model is simple in design and achieves competitive performance in comparison with the state-of-the-art methods on three benchmark datasets.

2D-TAN

In this paper, we study the problem of moment localization with natural language, and propose a novel 2D Temporal Adjacent Networks(2D-TAN) method. The core idea is to retrieve a moment on a two-dimensional temporal map, which considers adjacent moment candidates as the temporal context. 2D-TAN is capable of encoding adjacent temporal relation, while learning discriminative feature for matching video moments with referring expressions. Our model is simple in design and achieves competitive performance in comparison with the state-of-the-art methods on three benchmark datasets.

Bibtex

@InProceedings{SeqTrack,
  title={SeqTrack: Sequence to Sequence Learning for Visual Object Tracking},
  author={Chen, Xin and Peng, Houwen and Wang, Dong and Lu, Huchuan and Hu, Han},
  booktitle={CVPR},
  year={2023}
}

@InProceedings{XCLIP,
  title={Expanding Language-Image Pretrained Models for General Video Recognition},
  author={Ni, Bolin and Peng, Houwen and Chen, Minghao and Zhang, Songyang and Meng, Gaofeng and Fu, Jianlong and Xiang, Shiming and Ling, Haibin},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}


@InProceedings{Zhang2021MS2DTAN,
    author = {Zhang, Songyang and Peng, Houwen and Fu, Jianlong and Lu, Yijuan and Luo, Jiebo},
    title = {Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language},
    booktitle = {TPAMI},
    year = {2021}
}


@InProceedings{2DTAN_2020_AAAI,
    author = {Zhang, Songyang and Peng, Houwen and Fu, Jianlong and Luo, Jiebo},
    title = {Learning 2D Temporal Adjacent Networks forMoment Localization with Natural Language},
    booktitle = {AAAI},
    year = {2020}
}

License

License under an MIT license.

2d-tan's People

Contributors

Stargazers

Watchers

Forkers

penghouwen researchmm ammieqi shaoboh zhzhuangxue xiaogangli zymale xuzengmin giserh piperliu tanghaoyu258 zhyj3038 ameenali 1061136002 zeta1999 jlai21321 bhaskers-blu-org2 frostinassiky oscarjia xujinglin biao-gong zenhjunpro taffywrinkle claudiusgonzo daybright-david lumiaomiao geogubd babysun-ljh bruinxiong sy-zhang ruizewang gokoshu fagan2888 onlyou5 xrosliang makingbignewswallace zzz512 jayleicn soldelli qixi-art standardgalactic cqkmxpr lianglili xalp starmemda tongbaochen jaireyu pfshawn wolfworld6 menghuaa zbhhere stallmanterminator iridescentsoap zzxihuanheixiu test-mass-forker-org-1 hisstar wenjuan7275 zpyi matsutook larryjianfeng nielsrogge tufo830 markussagen fangwudi verigle ylch tofreekobe isabella232 jjprincess guttappa1238 jaedukseo joskid sdi1982 ai-machine-vision-lab gaohuan2015 zhihao-chen kanapazombie hwijune daiguangzhao pkurainbow cv-ip grey-orange autogyro anminhhung kan-bayashi sichengmo fmu2 perfyperfect yeyingdege neophack aniki-ly 5l1v3r1 jareturing masixian expert68 shibin1027 m-pourzare enochxu jrbronkar xiuqizhiping1034

2d-tan's Issues

[X-CLIP] The input of "Video-specific Prompting"

Hi, thanks for your great paper.

In the paper Fig.2, it looks like the "Video-specific Prompting" use the output of "Multi-frame Integration Transformer" as visual feature input.
But in the implement code, you send the output "img_features" of "Cross-frame Communication Transformer" into "Video-specific Prompting".

Is the picture on the paper wrong?

Runtime Error

I meet this problem as followes
" RuntimeError: CUDA out of memory. Tried to allocate 1.41 GiB (GPU 0; 11.91 GiB total capacity; 11.04 GiB already allocated; 203.06 MiB free; 121.00 MiB cached)"

When i run this program as introduced, Gpu 0 is available.
How can solve this issues? thank you

The onedrive link for downloading features is expired

Hi, Thanks for sharing such good work.

I just found the onedirve link for feature download is expired. Could you repair it asap ?

Regards

[X-CLIP] Video demo inference code

Dear author:
Thanks for publishing your work! it's really insightful! I want to have a try about the open set video recognition performance. Hence a simple inferece code like this will be much helpful:

thanks.

How long will it take to finish the whole program？

I run the code for ActivityNet Captions dataset and it takes 3 hours for one epoch on one RTX 2080Ti. I want to confirm the time and the number of epoches.

About activitynet dataset

The provided link only contain processed feature for Tacos and Charades-STA dataset, so for the activitynet dataset, did you use the c3d feature officially provided by activitynet challenge?

Extracted features for TACos seem to be corrupted

When using h5py.File("tall_c3d_features.hdf5", "r") to read the extracted features for TACos dataset, got error as follows:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 72876032, sblock->base_addr = 0, stored_eof = 485560344)

while I can open other hdf5 files normally, e.g. PCA_activitynet_v1-3.hdf5 can be read correctly.
Is the TACos feature file corrupted? If so, can you please reupload tall_c3d_features.hdf5?
Thanks a lot!

Can't find "on_start_epoch, on_sample…" in engine.py

Hello, thank you for your great work, but I have some questions.
1、In engine.py, I see some state "on_start_epoch, on_sample, on_end_epoch, on_test_sample", but they only appear once, I can't figure out what's their function, can you explain it?
2、I see you set MAX_EPOCH to 100, but I find the performance on test set has not improved obviously since around 20 epochs, more epochs only improve the performance on training set. Do you have the same situation during your taining time?

Clarification

Hi
I want to to ask if for activitynet you have reported the results for val ot test set? (table 2)?
Thanks!!!

Confused about Dataset Usage

According to the code, ActivityNet-v1.3 (released in 2016) is used, and ActivityNet Captions (released in 2021) is mentioned to be used in paper. However, these two datasets are not the same datasets. I am really confused about it.

Could you please help me with it? Thank you very much.

Training errors with new features

Hello!

I am trying to train the 2D-TAN network with my own extracted features of TACoS and I got the following error:

`Traceback (most recent call last):
File "moment_localization/train.py", line 297, in
scheduler=scheduler)
File "/home/share/wangzilong2/2D-TAN/moment_localization/../lib/core/engine.py", line 41, in train
state['optimizer'].step(closure)
File "/home/share/wangzilong2/home/share/wangzilong2/anaconda3/envs/2D-TAN/lib/python3.7/site-packages/torch/optim/adam.py", line 58, in step
loss = closure()
File "/home/share/wangzilong2/2D-TAN/moment_localization/../lib/core/engine.py", line 30, in closure
loss, output = state'network'
File "moment_localization/train.py", line 154, in network
prediction, map_mask = model(textual_input, textual_mask, visual_input)
File "/home/share/wangzilong2/home/share/wangzilong2/anaconda3/envs/2D-TAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/share/wangzilong2/2D-TAN/moment_localization/../lib/models/tan.py", line 20, in forward
vis_h = self.frame_layer(visual_input.transpose(1, 2))
File "/home/share/wangzilong2/home/share/wangzilong2/anaconda3/envs/2D-TAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/share/wangzilong2/2D-TAN/moment_localization/../lib/models/frame_modules/frame_pool.py", line 18, in forward
vis_h = torch.relu(self.vis_conv(visual_input))
File "/home/share/wangzilong2/home/share/wangzilong2/anaconda3/envs/2D-TAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/share/wangzilong2/home/share/wangzilong2/anaconda3/envs/2D-TAN/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 196, in forward
self.padding, self.dilation, self.groups)

RuntimeError: Expected 3-dimensional input for 3-dimensional weight 256 2048 1 94911601578208 94911602548768 94911602553032, but got 6-dimensional input of size [256, 2048, 8, 2, 7, 7] instead`

My features (for each video, feature size is 64 * 2048) partially resemble the TACoS c3d features provided originally (for each video, feature size is x * 4096, where x stands for the varying number of clips used to extract features I think? but the point is they are different for each video) and they are both in hdf5 format. I have no idea what is wrong even after searching through the web, please help, thank you!

Question about the computational resource requirements?

Thank you for kindly sharing! And I am curious about how much computational resources are needed for training the model and the corresponding training time. Because it has a relatively large scale of parameters as introduced in the paper.

Relation between CLIP-X and IFC (Nips 21), TEViT (cvpr22) and ReferFormer (cvpr 22)

Hi,

I would like to ask what is the relation between your proposed cross-frame attention and the one in IFC [1] and TEViT [2], I consider none of the above papers is cited. In addition, the text token to ReferFormer (cvpr 22).

As the cross-frame communication transformer is considered as a major contribution of the paper, I need to raise a AIV concern.

[1] Video Instance Segmentation using Inter-Frame Communication Transformers
[2] Temporally Efficient Vision Transformer for Video Instance Segmentation

about TACoS data split

Hi @Sy-Zhang!
Appreciate for providing a nicely organized codebase.

I am confused about the data split regarding the TACoS dataset.
While your paper indicates that it follows the data split of TALL (Gao et al. 2017), I found they are not the same.
The data split in TALL is 50:25:25 (proportion) and your code is 75/27/25 (actual number), which is obviously different.
It would be more clear if you clarify the one practitioners to follow.

Many thanks.

about rgb feature

hi,

Thank you for your excellent work! I would like to ask if the original frame image you used is the 24fps on the charades' official website? (only Charades dataset) hope to get your reply ! thanks !

Best,
jun

GPU memory increasing in training

Thank you for your solid work.
My GPU memory slowly increased when the model was training. I wonder if there is any memory leakage in the code?

Visual features

Hello, I read your paper recently and your works were so amazing that I really want to reproduce it. But I have trouble in downloading the visual features from google drive becuase I don't have authority. I want to apply for authority for the visual features.

Access denied when downloading features

Is this link expired? Thanks! http://ai2-website.s3.amazonaws.com/data/Charades_v1_features_rgb.tar.gz

Loss equation question

Hi

can you explain the intuition behind the need of scaling the IoU values between 0 and 1 in the loss function?
Thanks

Feature extractor problems

Hi, may I ask about the method of feature extraction.
Paper shows that the method is VGG16, but in the code, the features have been already given from the dataset, take Charades as example, it used Charades_v1_features_rgb.tar.gz.

For my own dataset, do I need to train a two-stream net (like Two-Stream features (RGB Stream) at first to extract the feature of the data? And then use the 2d-tan to localize the tasks.

Adding X-CLIP to HuggingFace Transformers

Hi,

I've implemented X-CLIP as a fork of 🤗 HuggingFace Transformers, and we are planning to add it to the library soon (see huggingface/transformers#18852). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/1upFMg-FPNP_D8dxeYWTju6lpYldZk8AJ?usp=sharing

I really like the simplicity of X-CLIP, which is the main reason I decided to add it :)

As you may or may not know, each model on the HuggingFace hub has its own git repository. For example, the xclip-base-patch32 checkpoint can be found here. If you check the "files and versions" tab, you can find the converted weights of the model. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

If you haven't done so, would you be interested in joining the Microsoft organisation on the hub, such that we can store all model checkpoints there (rather than under my username)? This also enables you (and your co-authors) to have write access to the X-CLIP models on the hub, so you can edit the model cards, add new models etc.

Let me know!

Kind regards,

Niels
ML Engineer @ HuggingFace

About TACoS data annotation

Hi @Sy-Zhang
Thank you for your excellent work!
I am confused about the annotation in TACoS, cause some of the annotations you provide are different from officially provided.

For example, for the video s13-d21.avi in train.json, timestamp [252, 686] appears in your annotations but not in the original annotations. I‘d appreciate it if you could explain that.

No module named pathlib

Hello ,您代码更新完了嘛?我测试的时候报错

Traceback (most recent call last):
File "moment_localization/test.py", line 20, in
from core.utils import AverageMeter
File "/data/gao/2D-TAN/moment_localization/../lib/core/utils.py", line 8, in
from pathlib import Path
ImportError: No module named pathlib
这是什么,您可以告诉我一下.thankyou

About S-2D-TAN

Thanks for such a great work. I want to know how to generate proposals for temporal action localization via sparse 2d temporal adjacent network. Is it to set different strides in the original conv2? thank you.

Are all methods in Table I using VGG features?

Thanks for nice job. @penghouwen，@Sy-Zhang

In Table I, some method do not using the vgg features in their papers, hence you replace their original feature with vgg or just copy the results from their paper?

Another question: fc6 layer of VGG16?

Thank you again for your wonderful work!

DataSet Preparation and Use of Decord

Hi, Thank you for your wonderful work. i was trying to download all the requirements to reproduce the results. I have following trivial queries please;

1: Where and how can we use decord in this code base.
2: If possible can you elaborate the arrangment of dataset. what i understood for the option 2 is that we need to download zipped data.
3: Where in zipped files are we using train, test label files.

random seed

Hello, did you try to fix the random seed in the code to obtain the same results when running the same code? I made some attempts as follows:
def set_seed(seed):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
I add the set_seed() function in train.py, but I cannot obtain the same results when I run the same code without any changes, do you know what's the problem?

About fps and time_unit of videos

As described in paper: "Specifically, videos are decoded at 25 fps and the output of the last average pooling layer are extracted for every 16 consecutive frames. Therefore, each video clip corresponds to 0.64 second". Take TACoS for example, fps is 29.4 in train.json, I am confused about how to decode a video in 25fps? Did you discard some frames? If we decode a video by its original fps, we will get a 16/29.4 time unit. Looking forward to your reply, thanks!

Argument Clarification

Hi,
Can you clarify more about the TARGET_STRIDE argument?

Thanks

Question about feature extraction.

Sorry to disturb you. Really excellent work!
May I ask what Conv models to use for extracting frame features from raw videos?

About your MS-2DTAN

Hi,i am interesting in your multi-scale 2D-TAN model because your result on TACoS improved significantly and the paper link point this code.But it seems two version share same code and i reproduced it get a worse performence.I wander if you update this code for your MS-2D-TAN model

ValueError: DATA_DIR not exist in config.py

Hi, Sy. I am trying to run MS-2D-TAN with python moment_localization/train.py --cfg experiments/tacos/MS-2D-TAN-G-VGG.yaml --dataDir data/ --verbose but get the following error:

Traceback (most recent call last):
  File "moment_localization/train.py", line 77, in <module>
    args = parse_args()
  File "moment_localization/train.py", line 43, in parse_args
    update_config(args.cfg)
  File "/media/jpl/T7/MS-2D-TAN/lib/core/config.py", line 105, in update_config
    _update_dict(config[k], v)
  File "/media/jpl/T7/MS-2D-TAN/lib/core/config.py", line 96, in _update_dict
    raise ValueError("{} not exist in config.py".format(k))
ValueError: DATA_DIR not exist in config.py

For clarification, since my environment reported error on import. I changed the code in train.py (but I dont think the problem is here):

import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from lib import models
from lib import datasets
from lib.core.config import config, update_config
...

[2DTAN、MS-2DTAN]File is not a zip file

sovled

about Upper Bound in Table 4

hi, in the paper, you provide the upper bound results on activitynet captioning dataset, I want to know how to calculate the upper bound results, thanks!!!

ActivityNet feature corrupted?

Hi,

  Thanks for your great work and code. 
  The ActivityNet feature seems to be corrupted. I tried several times and it always has this problem. Could you please help?

Extract Features

Hi,In Google Drive,there are just the extract features of tacos and Charades-STA datasets,Do you have the extract features of activitynet dataset??Thanks

Unable to reproduce the same results reported in the paper

Hello

I am trying to reproduce the results mentioned in the paper for the activitynet c3d features.
and I am not getting the same results, the results I am getting are as follows :
https://imgur.com/a/AYrJ4hi

I didnt edit the expermints config file at all (by the way is the 2D-TAN-64x64-K9L4-pool.yaml file supposed to give the results mentioned in table 3) ?

Anyidea what might be wrong?
also I didn't find any refrences for the loss function defined in : https://github.com/microsoft/2D-TAN/blob/master/lib/models/loss.py
so how are you exactly calculating the loss?

Train and test using other video features

Hello, thanks for the wonderful work!

So I have some features that I extracted by myself and would like to train and test the network using those features, I wonder if you could let me know how to do this? I'm still using ActivityNet, Charades-STA and TACoS dataset but with different features. It would be even better if you could explain how to train the network on some completely different datasets!

Thanks in advance!

AttributeError: 'EasyDict' object has no attribute 'TAG'

Hi, songyang, when I want to run Multi-scale 2d-tan, the command is as follows: python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-VGG.yaml --verbose, but an error occurs: AttributeError: 'EasyDict' object has no attribute 'TAG', so, what's wrong??

About the feature extraction

should the txt be ordered in the convert_vgg_features_to_hdf5.py

Hi，@penghouwen，@Sy-Zhang Thanks for sharing the nice job!

I had a question: should the txt be ordered in the convert_vgg_features_to_hdf5.py? If these texts not ordered, we cannot get the right video temporal information.

Please correct me if I am wrong. Thank you again for your nice work!

About LSTM

Hi!

In your paper released in Arxiv, it said that

we sequentially feed the word embeddings into a three-layer bidirectional LSTM network.

However, in the experiment setting files under the experiments folder, TAN/FUSION_MODULE/PARAMS/LSTM/BIDIRECTIONAL is all set to False.

May I ask the reason of it?

video feature

What is your model for extracting visual features?

RuntimeError: DataLoader worker (pid 5423) is killed by signal: Segmentation fault.

ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "main.py", line 369, in
main(config)
File "main.py", line 121, in main
train_one_epoch(epoch, model, criterion, optimizer, lr_scheduler, train_loader, text_labels, config, mixup_fn)
File "main.py", line 203, in train_one_epoch
scaled_loss.backward()
File "/root/miniconda3/envs/env/lib/python3.7/contextlib.py", line 119, in exit
next(self.gen)
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/handle.py", line 123, in scale_loss
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/scaler.py", line 184, in unscale_with_stashed
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/scaler.py", line 148, in unscale_with_stashed_python
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/scaler.py", line 22, in axpby_check_overflow_python
File "/root/miniconda3/envs/env/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 5423) is killed by signal: Segmentation fault.
Killing subprocess 4388

Hi there, i got a dataloader error above during some iterations of epoch. Do yo have any idea about that ?
These are part of parameters:
python -m torch.distributed.launch --nproc_per_node=1 main.py -cfg configs/k600/16_8.yaml --output . --accumulation-steps 2 --resume /data/xxxx/xclip/VideoX-master/X-CLIP/pretrained_models/k600_16_8.pth

batch size is 8

About 2D-TAN for HACS Temporal Action Localization Challenge

May I ask is there any implementation of S-2D-TAN for HACS on github? Really thank you.

Question about table 2

Hello
Thanks for sharing this amazing work.
I have a question regarding table 2, The original MCN paper didn't publish results on Activitynet Captions dataset, thus I assume you have re-evaluated their model again on this dataset, my question is did you follow their same setting in the sense of dividing each video into 5 seconds segments and thus each moment candidate now composed of any continuous number of segments?
Best

[X-CLIP] Question about Table 3

Hi, thanks for your great paper.
In Table 3 of this paper, the zero-shot performance of ActionCLIP is 40.8% and 58.3%, but according to Figure 3 of the ActionCLIP, the zero-shot performance of these two datasets is about 50% and 70%.
Is there any difference in implementation?

about RuntimeError

Hello ,thanks for your sharing!
I run your code, and always with the problem:(for three datasets)

File "moment_localization/train.py", line 295, in
scheduler=scheduler)
File "/home/zq/reproduce/2D-TAN/moment_localization/../lib/core/engine.py", line 43, in train
self.hook('on_update', state)
File "/home/zq/reproduce/2D-TAN/moment_localization/../lib/core/engine.py", line 8, in hook
self.hooksname
File "moment_localization/train.py", line 203, in on_update
val_state = engine.test(network, iterator('val'), 'val')
File "/home/zq/reproduce/2D-TAN/moment_localization/../lib/core/engine.py", line 60, in test
for sample in state['iterator']:
File "/home/zq/anaconda3/envs/2dtan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 576, in next idx, batch = self._get_batch()
File "/home/zq/anaconda3/envs/2dtan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 553, in _get_batch success, data = self._try_get_batch()
File "/home/zq/anaconda3/envs/2dtan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 519, in _try_get_batch raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 33782) exited unexpectedly

my env:
Ububtu 16.04.06
cuda 9.0
python 3.7.5
torch 1.1.0

looking forward to your reply

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = './data/Charades-STA/vgg_rgb_features.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Greetings!
I'm currently trying to run and train this model, but I've encountered some problems relating to the dataset.
I've followed the instruction to download the Charades_v1_features_rgb.tar.gz from the official website and converted it to charades_vgg_rgb.hdf5. However, the error occurred saying that I don't have the file vgg_rgb_features.hdf5.
I wonder do I have to download this from the box drive link in the README.md? Since the download always seems to fail, I want to know whether it will work the same if I change the output file name to vgg_rgb_features.hdf5 instead of charades_vgg_rgb.hdf5 in the convert_vgg_features_to_hdf5.py?

About Activitynet dataset

Hi, as #9 said, I could download extracted features of activitynet. But I want know how to use the file pac_activitynet_v1-3.hdf5 beacuse I want use this dataset with Charades-STA's model. Thank you very much if you can help me.