tencentyouturesearch / actiondetection-afsd Goto Github PK
View Code? Open in Web Editor NEWCode for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"
License: Other
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"
License: Other
Hi , i observed that the video for ANet dataset is trimmed off to have 768 frames , most likely to fit GPU. But my question , when feeding the data to I3D backbone , is it sent as ( batch, channel = 3 , temporal = 768 , height, width ) dimension ? or you break it up into windows of 16 and repeatedly fit in the data ?
Hi @linchuming ,
Thanks for your sharing code. I wonder whether you can also upload the npy files of ActivityNet dataset?
Thank you.
Hi,
I checked your codes on very long videos and I got a lot of false positives. What will be the best values for conf_thresh and top_k?
你好,请问,这个框架应用于自定义的数据,该如何构建数据格式,比如一个视频是一个动作从start到end的视频
Hello,
Thank you for your great work,
I found out that the number of classes in the config file for thumos14 dataset is the actual number of classes + 1. Here, thumos14 dataset has 20 classes while the config file is set to 21. I also tried it in my costume dataset, and I found out that the number of classes in the config file should be set to the number of actual classes + 1. Otherwise, it gives an error. So, what is that extra class? How can I find the original class indices after the action detection is complete?
Thank you so much for your great work. I receive this error when I train on my costume dataset based on Thumos. I followed the all of your templates for data annotations. Would you please help me?
0% 0/18218 [00:00<?, ?it/s]/home/nomad/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
0% 17/18218 [00:11<2:03:32, 2.46it/s, loss=58.30155]/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
0% 17/18218 [00:12<3:36:36, 1.40it/s, loss=58.30155]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 281, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 174, in run_one_epoch
loss_ct, loss_start, loss_end = forward_one_epoch(
File "AFSD/thumos14/train.py", line 137, in forward_one_epoch
loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss(
File "/home/matthew/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/matthew/ActionDetection-AFSD/AFSD/thumos14/multisegment_loss.py", line 254, in forward
N = max(pos.sum(), 1)
RuntimeError: CUDA error: device-side assert triggered
Thanks for adding the code for multi-GPU training. However, changing the value of ngpu in config.py does not seem to work. For example, if ngpu=4, the program still trains only on GPU=0 instead of 4 GPU=0,1,2,3
@linchuming 您好我在运行python3 AFSD/anet_data/video2npy.py THREAD_NUM生成 RGB npy 输入数据时,遇到一个问题,当采样视频的总时长超过1分钟时,ret, frame = cap.read(),ret为false,count = cap.get(cv2.CAP_PROP_FRAME_COUNT)为770。但是同样的count为770,但是采样视频总时长不超过1分钟时,ret是为true。我不知道这是什么问题,您能帮帮我吗?还有一个神奇的现象是,我把不能正确读帧的视频下载到我本地笔记本电脑上时,这些都可以读取。
Hi,
Thank you for your open source work.
As for ActivityNet, how many GPUs are used in your paper?
Thanks!
Hello,
Thanks for your great work, Is there any code/tool to calculate the accuracy/mAP for each individual class?
Hello, do you remove the background data provided by the thumos 14 dataset during training and testing?
你好,我在我自己的电脑上(cuda11.2)可以进行setup.py并运行后续程序,但是在3090的服务器中(cuda11.1 cuda11.4)进行训练时,在boundary_max_pooling_cuda处总是会报错 cuda runtime error(209):no kernel image is available foe execution on the device.
我调整了好多torch和cuda版本,但好像并不是版本不匹配的问题
能帮帮我吗 谢谢你
UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
@linchuming
I have trained the flow model and rgb model by myself, and the results are better than the original results separately. But when I use the fusion method to test the model, the final results are even worse. How should this be explained?
in gen_denseflow_npy.py :
Following I3D data preprocessing, for the flow stream, we convert the videos to grayscale,and pixel values are truncated to the range [-20, 20], then rescaled between -1 and 1. We only use the first two output dimensions, and apply the same cropping as for RGB.
but i dont see the operation '' recal flow from [-20, 20] to [-1, 1]''
Thanks for you work
Hi,
Congrats for your awesome work.
I just want to know why is the Untrimmednet result used during post process ? After reading your paper, it is evident that this work is a localization network ( classification + proposals ) , so why is the UntrimmedNet coming here ? Isnt this network supposed to give you action classification as well ?
Thanks in advance
I have trained the AFSD rgb model on THUMOS14 dataset as described in Implementation Details, and the experiment results are as follows:
0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg.
57.7 | 52.5 | 44.6 | 35.1 | 23.4 | 42.6
However, the results are still about 1.0 lower than the value in the paper.
Could you offer help and figure out this problem?
Thanks a lot.
非常感谢开源的工作!我在使用代码时会报错CUDA_runtime error (98)。
报错位置为:AFSD/prop_pooling/boundary_max_pooling_kernel.cu:110
我猜想应该是CUDA拓展出现了问题。
我的环境信息:
pytorch 1.4.0
torchvision 0.5.0
cuda: 10.0
另外:不知道有没有CPU版本的boundary_max_pooling_kernel呢?非常感谢!
Hi @linchuming,
Thanks for sharing the code. I wonder whether this code only supports for 1 gpu training?
Thank you very much for your work!
When I run AFSD/anet/test.py file,I met this problem.
Hope to get your answer!
How to understand the use of bounds?
Thanks!
Hi,
Congratulations on such a nice work! Also, thank you for open-sourcing the code!
We are trying to use this code on our raw untrimmed videos and want to use this framework for temporal action localization.
We have our own non-standard data with 15 minutes of videos on avg at 30fps and a higher resolution (~500X900). We also have multiple actions in the videos.
For the activity net, I see that the max frames are specified to be 768
Could you please suggest if we need to split video into clips and what would be the length of each clip? Do we need to sample 256/768 frames uniformly? Or should we split clips based on the actions? Could you please point to any starter code that we could refer?
Thanks.
Thanks for your sharing! When I attempt to transfer ant mp4 file to .npy file, some mp4 file could not be read. I guess it's cv2 version problem. So could you tell us what version of opencv-python are you using?
@linchuming Hello, you use cuhk_ val_ simp_ share. json file when AFSD predicts the class of proposals in activitynet1.3 datasets. Does the model that gets the json file use the temporary boundary annotations of the training set in activitynet1.3 datasets when training the video classifier? Is the video classification score file predicted by untrimmednet network?
ActivityNet是一段视频只有一个action,如果我想用一段长视频包含多组action该如何训练&测试呢?
Thank you very much for your great work. I am getting this error while training on Thumos dataset. can you help me?
100% 200/200 [01:02<00:00, 3.20it/s]
0% 0/7842 [01:36<?, ?it/s]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 279, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 170, in run_one_epoch
for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar):
File "D:\anaconda\envs\yyf\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter
return self._get_iterator()
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init
w.start()
File "D:\anaconda\envs\yyf\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "D:\anaconda\envs\yyf\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
File "", line 1, in
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
I modified your code for baseline results.I deleted all the network structure and loss function after the "Basic Prediction Module" and got a very pool results.But you provided the basekine results in your paper 43.1 31.0 19.0 in table(a). I only modified the following three .py files:BDNet.py,multisegment_loss.py and train.py.
I only leave the two loss functions called "loss_loc_val" and "loss_conf_val" and delete others and I got 0.05981531185934582,0.029753291292292032 and 0.008829597885094938
I don't know how you achieve the baseline results in your paper.
Hello,
Firstly, I'd like to thank you much for publishing the code and congratulations about the CVPR'21 paper.
I would like to have an overview of the architecture of pyramid feature module in your pipeline, it is noted to be shared in supplementary, but unfortunately I cannot get access to it.
Could you please share the pdf file of supplementary?
I am trying to extract RGB frames by following ActivityNet Readme.
However, when I run video2npy.py, it cannot read frames for some videos .
In detail, VideoCapture.read() returns False while get(cv2.CAP_PROP_FRAME_COUNT) returns 770 frames.
The videos are not scaled to 112x112. (The videos are also generated by transform_videos.py.)
One of width and height is 112, but another was different.
It seems like that they keep the original aspect ratio during resizing.
Is that a problem? Then, how could I fix this?
How can i set "max_frame" for very long video ?
Thanks!
Hi, You didn't refer to any words about "flow" in your paper. I want to know whether you use the optical flow model or not?
Hi Chuming, nice work, but when I unzip TH14_Test_set_mp4.zip file, the password required?
based on the paper,should i use the code “python3 AFSD/anet/train_init.py configs/anet.yaml --lw=1 --cw=1 --piou=0.5” to train the net. The lw=1 is right? Why does my loss increase when I train?
我在复现代码的过程当中发现这个repo不支持多卡,在这里把我个人的解决方法写到这里把,希望作者可以更新一下多卡版本
采用4块V100进行训练,修改的地方:
train.py->def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):
if training:
if ssl:
tar = targets[0]
pro = torch.stack([tar,tar,tar,tar],dim=0)
output_dict = net(clips, proposals=pro, ssl=ssl)
else:
output_dict = net(clips, ssl=False)
output_dict['priors'] = output_dict['priors'][0:126:]
I checked the code carefully according to the formula(1 and 3) in the paper. I could not understand why we use ScaleExp()?In the code,"l_segment = new_priors - segments[:, :, :1]".Do we divide the both parts of the formula 3 by 2^l?Thank you!!
hello ,i am new here! Thank you for your great work.In my case, setting l _trip loss does not improve my models's performance!
I am wonder why do you chose the minimal action length "wmin" in one video as the length of inserted clips, in real ,is it a hyperparameter ? Could you please give me some advices?
I can not understand your operation in that function augment_(). What is the meaning of new_input and new_annos?I am completely confused.Thank you!
刚准备研究这一块的东西,请问一下目前有可以完成实时视频流场景下的行为检测的方法吗?
I downloaded all the sampled video data(32.4G), the total number of these videos is 14950. But the total number of all npy files I get after running step 3 is only 11171. When I run the RGB model inference I also get some FileNotFoundError like "No such file or directory: 'datasets/activitynet/train_val_npy_112/v_JDg--pjY5gg.npy'". I wish i can use some help.
what's the dataset when you train the I3D backbone model? Is the imagenet?
Prepare the pre-processed RGB data.
How to pre-process RGB data?
Hi,
Could you please provide the download links for the THUMOS14 RGB data
numpy files instead of the Weiyun link provided here? I am not able to access the link https://share.weiyun.com/bP62lmHj
.
Something either on GDrive or a link with wget
access could work.
Thank you for your help!
是自己训练的吗?预训练时模型的图片输入维度是多少
thanks for your nice work, and can you provide the details about training GPU and training time?
I noticed that the number of videos in the thumos14_gt. json file is 410, it seems that there are 3 videos missing in the test part, now I check that 'video_test_0000270' is not in the thumos14_gt. json, does this affect the evaluation result?
Hi, @linchuming,
I wonder how to evaluate inference speed? i.e. results reported in Table 3 from paper.
Hi, have you tried to use I3D pre-extracted features? Since this methods involves finetuning of I3D models,
which may result in unfair comparison with other methods.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.