yrcong / sttran Goto Github PK
View Code? Open in Web Editor NEWSpatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
License: MIT License
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
License: MIT License
Hi!
I tried your code but I have found that the evaluation for sgcls is extremely low (near to 0) and the predcls and sgdet were the same with your paper.
Could you please give me some hint about this problem or could you please check your code for sgcls again?
Thank you so much!
Hi, thanks for your code and paper in advance.
However, I have a small question.
When I run the training code in predcls or sgcls mode, everything is fine but when I run the training code in sgdet mode, the error below shows:
File "/home/quhaoxuan/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 50, in reshape
x = x.view(
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
I understand that this function seems to be only triggered in sgdet setting.
But can I ask is there any suggestions on any possible solutions to this error?
Many thanks in advance
您好,关于usage那里我没看明白,clone完之后应该怎么操作?clone完之后运行下面五行代码,然后把pth模型放在对应目录,然后就能开始跑下面的实验代码了吗?
Thanks for your codes, its helpful!
but some error happend when i run the Train command,error like this:
The CKPT saved here: data/
spatial encoder layer num: 1 / temporal decoder layer num: 3
mode : predcls
save_path : data/
model_path : None
data_path : /home/abc/NewDisk/origin_datasets/ActionGenome/dataset/ag/
datasize : large
ckpt : None
optimizer : adamw
lr : 1e-05
nepoch : 10
enc_layer : 1
dec_layer : 3
bce_loss : False
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 7584 videos and 177330 valid frames
144 videos are invalid (no person), remove them
49 videos are invalid (only one frame), remove them
21643 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
loading word vectors from data/glove.6B.200d.pt
loading word vectors from /home/abc/NewDisk/model_zoo/STTran/glove.6B.200d.pt
__background__ -> __background__
fail on __background__
The program is stuck at this point, so i wonder why?
Thanks!
I extract the format triplet prediction from the results, but how can I assign all the triplets to each frame?
您好,打扰了!我注意到在预训练faster_rcnn时您用到了ag_database.pkl这个文件,请问能提供下这个文件或者这个文件的具体格式吗?
I actually just want to use pretrained models files and detect scene graphs in a custom video/images, instead of downloading AG dataset or training right now. But I am not able to make the necessary modifications in the code. If someone could please help me out.
In the introduction video on youtube, we see the author visualize the recognition of human actions in video frames. What I want to know is, did the author include the code for video action recognition visualization in the code? I hope the author can provide some help, thank you very much
您好,打扰了!请问目前编译 jwyang/faster-rcnn.pytorch,是需要将您仓库中的fasterRCNN目录下文件替换为https://github.com/jwyang/faster-rcnn.pytorch 中文件、修改https://github.com/yrcong/STTran/blob/main/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py 中代码,然后按照 jwyang/faster-rcnn.pytorch 的readme要求进行编译吗?此外, jwyang中的faster rcnn对于环境的要求似乎与您readme中的环境不同,请问会有影响吗?
Hi, I am trying to use the pre-trained object detector's model with detectron2 code, since the https://github.com/jwyang/faster-rcnn.pytorch is deprecated (and I could not fix the incompatibility issues with current libraries anyway). For this purpose, I used the configuration yaml file (faster_rcnn_R_101_C4_3x) from the detectron2 model zoo, while changing the necessary values (mainly anchors' sizes and the number of classes). To match the pre-trained model with the detectron2 rennet's architecture I also had to change some parts of their code (such as the shapes of some layers). However, testing the STTran model with the adjusted detectron2 code yielded poor results. I tried to check and test the object detector separately (using the detectron2 evaluation code), and the AP results were below 0 except for the "person" class. I understand that this is not a relevant issue since the pre-trained model you provided might be working well with the original code, but I'd appreciate it if you could provide the configurations of faster RCNN used in training your model. Thank you.
Is there any suggestion on how to run the model on customized input videos?
Thank you!
Thanx for ur wonderful work.
It seems that the clean_class replaces the categories of those data that originally classified as class_ idx with their second highest confidence categories when it is sgdet mode. Could you explain why this operation performs here?
Thanx again.
Waiting for reply : )
I downloaded the trained model files(predcls.tar, ...) from your github.
And i tried to untar that files using "tar -xvf predcls.tar"
But it gives me these error messages..
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Do you know any solutions? or problems?
请问action_genome.py中为什么会进行俩次筛选打印呢
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 7584 videos and 177330 valid frames
144 videos are invalid (no person), remove them
49 videos are invalid (only one frame), remove them
21643 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
您好,您工程中预训练好的模型faster_rcnn_ag.pth是更快的rcnn的PyTorch-1.0分支预训练出来的吗,还是之前版本的呢
你好,请问ObjectClassifier中的clean_class的class_idx为什么输入5,8,17呢
Hello,
Thanks a lot your nice work.
In the paper, you mentioned to evaluate on 25 classes, however, in your dataloader you have "Other Relationships" as a contact class which makes in total 26 classes. Is there any particular reason for using it?
我发现AG数据集原论文中说relationship一共有26个 并且数据集中也是如此标注的。
但是在你的AGdateset中(把no person and only one frame等不满足情况的去掉后)我发现只有16个relationship在你的dateset中 也即是说 剩下的10种是没有数据的?
Hello,
I am trying to run your code but I'm coming across some issues regarding torch.utils.ffi.
I compiled the draw_rectangles, fpn. Then compiled faster rcnn as instructed and copied the files over the fasterRCNN folder inside STTran, but when I try to run the code as it tries to resolve all the dependencies from line 10 on test.py from lib.object_detector import detector
it crashes with the following error:
File "/media/hdd5/raphael/HOI/STTran/test.py", line 10, in
from lib.object_detector import detector
File "/media/hdd5/raphael/HOI/STTran/lib/object_detector.py", line 10, in
from fasterRCNN.lib.model.faster_rcnn.resnet import resnet
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in
from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in
from model.rpn.rpn import _RPN
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 8, in
from .proposal_layer import _ProposalLayer
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/proposal_layer.py", line 20, in
from model.nms.nms_wrapper import nms
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_wrapper.py", line 10, in
from model.nms.nms_gpu import nms_gpu
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_gpu.py", line 4, in
from ._ext import nms
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/_ext/nms/init.py", line 2, in
from torch.utils.ffi import _wrap_function
File "/media/hdd5/raphael/HOI/HOI_env/lib/python3.8/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
This seems to be deprecated for pytorch versions > 0.4 but on the repo, it is stated to use 1.1. Strangely I tried downgrading to torch==0.4.1 but the same error persists.
Any ideas on how to work around that would be appreciated!
作者你好,请问整个网络当时您训练了多久达到的现在的效果?
In your readme, it is said to follow JingweiJ/ActionGenome to get the dataset.
Could you please supply the version of your ffmpeg? For I followed JingweiJ/ActionGenome and get the empty folders of all the videos.
By the way, I'd like to know whether you use 'python tools/dump_frames.py --all_frames' or just 'python tools/dump_frames.py' to get the dataset?
Thanks for your work!
thanks for your great job.
when I followed README, and try to run the train command:
python train.py -mode predcls -datasize large -data_path $DATAPATH
I had some problem.
First, in dataloader/action_genome.py, from scipy.misc import imread
. the lib scipy seems not surport the fuction imread in latest version. so i change it to from imageio import imread
Second, when I continue totry to run the project, a new problem occured:
ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'
it means i didnt generate the file in this directory, so I went back to
cd fpn/box_intersections_cpu
python setup.py install ( i found if i dont add 'install ' , the command cant work)
and then
~/project/STTran/lib/fpn/box_intersections_cpu$ python setup.py install
running install
running build
running build_ext
running install_lib
running install_egg_info
Removing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info
Writing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info
So, it didnt generate the file(lib.fpn.box_intersections_cpu.bbox), and i dont know how to fix it, then I may need your help.^_^
Hi,
In the Temporal Decoder,
"we customize the Frame Encodings to inject the temporal position in the relationship representations. The frame encodings E f are constructed with learned embedding parameters"
I can't understand this sentence, can you explain Frame Encoding?
Thanks.
我有个疑惑想请您帮忙解答。为了研究半限制条件下这种阈值对Recall@K的影响,STTran在0.7到0.95的所有阈值水平上一致地优于其他三个模型。那这个置信度阈值0.9是怎么确定的呢?
Hi
My GPU only support cuda>=11, only support pytorch>=1.7.
so i cannt use pytorch=1.1.
Thanks for sharing the nice work!
But I find the performance presented in the paper is very different with methods in "Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs" and "detecting human-object relationships in videos".
What are the causes for this?
Hi, can you release the pre-trained code of detector? Thank you.
I found if I want to use sgcls or sgdet mode, I needd this file.
So how can I gennrate it?
Traceback (most recent call last):
File "train.py", line 13, in
from lib.object_detector import detector
File "/home/xxx/project/STTran3/lib/object_detector.py", line 11, in
from fasterRCNN.lib.model.faster_rcnn.resnet import resnet
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in
from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in
from fasterRCNN.lib.model.rpn.rpn import _RPN
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/rpn/rpn.py", line 8, in
from .proposal_layer import _ProposalLayer
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/rpn/proposal_layer.py", line 21, in
from fasterRCNN.lib.model.roi_layers import nms
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/roi_layers/init.py", line 3, in
from .nms import nms
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/roi_layers/nms.py", line 3, in
from fasterRCNN.lib.model import _C
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
你好,我打算复现项目的时候遇到了这个错误,这个错误是否和cuda的版本有关?
我用nvidia-smi的命令看了下我的cuda是11.4,然后试了多个pytorch的版本都遇到了这个问题。这个问题是因为cuda版本不对导致的吗
Please help
Hello author, thank you very much for your excellent open-source work. After reproducing your results, I have the following questions. If you could provide me with answers, I would greatly appreciate it.
1.I found that this code cannot change the batch size. I analyze the issue with the following code. May I ask why this processing is necessary? If the following code is not used, can the model work properly under different batch sizes after adjustment
2.I noticed that you had a discussion with the author of ActionGenome and mentioned that the mAP results of the AG dataset processed using fastRCNN were not good. Have you ever used the AG dataset for multi object detection? I used the most advanced multi object detection model to perform target detection on the AG dataset and obtained very poor results. Do you have any relevant experiments to explain this phenomenon?
I download your trained model, it is tar file and I have tried all unzip command in linux, but it can't be unzipped? I'm very grateful for your help!!
执行生成帧代码时,我这生成都是空文件夹,显示这个sh: 1: ffmpeg: not found,请问是什么原因呢
作者你好,请问sptial_mask代表着什么呢?
作者你好,对于训练的过程我有些问题。整个系统在训练时候用来检测的fasterrcnn也一起跟着训练吗,还是说只是用来提取图片的特征?object_detector文件下的is_train代码的作用是什么呢?
请问文件里面这个object_bbox标注位置信息的格式是什么,左上右下坐标(x1,y1,x2,y2)还是中心宽高坐标(x,y,w,h)呢
Thanks for sharing the nice work!
I have successfully trained and tested the model in predcls mode, but when I want to train the model in the sgdet mode, I met a few problems.
In object_detector.py, fasterRCNN needs to return five values (Figure 1), but in faster_rnn.py, there are eight values to be retured (Figure 2), and there is no roi_features needed in object_detector.py
Could you tell me how to solve the problem requires, thank you very much
您好,打扰了。我注意到在您的论文中 SGDET 的 Recall@10 指标 No Constraint 比 With Constraint 低,是什么原因造成这种情况的出现呀?一般情况下 No Constraint 不是应该比 With Constraint 高的吗?
嗨 ,又打扰你了,想请教一下关于论文模型的部分,,,,老师让我明天向他汇报,,看代码应该是来不及了,麻烦您了!!
我知道模型的大概过程是首先使用self attention得到每个帧中,关系的上下文表示。然后得到T个大X,每个大X看论文应该是 K(t)1936,然后按照论文中的说法是u个frame,那就是 Z = uK(t)*1936.然后再decoder阶段,是如何对Z做attention的呢?
总结的话 有三个疑问: 一个是 Z如何和Ef相加,是将Ei与所有K(t)相加吗? 第二个是如何对Z做attention?是将每个帧中所有x^k_t融合成一个表示还是怎么做attention?第三个是,我最常见到的transformer decoder都是K和V一致,这里采用Q和K一致是基于什么考虑呢?
I.m trying to setup sttran from scratch and I'm getting errors while setting up fasterRCNN package.
The command I run is :
STTran/fasterRCNN/lib: python setup.py install
fatal error: THC/THC.h: No such file or directory
5 | #include <THC/THC.h>
| ^~~~~~~~~~~
compilation terminated.
@yrcong hi, I would like to save the scene graph. Is it possible to save these graph like nodes and edges format? I don't know if the author realizes the scene graph saving.
作者你好,我正在尝试使用自己的数据集复现您的代码,我已经生成了txt格式的数据(其中包含各对象的2D bbox以及对象之间的关系),但是我不知道如何将该txt文件转化成此代码用的pkl文件作为GT。希望作者能够提供支持。谢谢!
@yrcong What train-val-test split is used with Action Genome dataset? From the codebase, it seems like the same test set was used as validation set during training (in train.py) and also as test set in (test.py). What dataset split is used for reported metrics in paper?
from fasterRCNN.lib.model import _C
ImportError: /home/valca509/STTran-main/fasterRCNN/lib/model/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E
Hi, I want to make sure for "sgdet" task, why is the decoder_lin not used during testing
Does it mean that the object prediction labels from the faserRCNN are used directly in "sgdet"? Then why is decoder_lin trained?
Wonderful code!
Howerever,I encountered the following error.Did I miss something that caused this error?
mode : predcls
save_path : /media/wow/disk2/AG/save
model_path : /media/wow/disk2/AG/predcls.tar
data_path : /media/wow/disk2/AG/dataset
datasize : large
ckpt : None
optimizer : adamw
lr : 1e-05
nepoch : 10
enc_layer : 1
dec_layer : 3
bce_loss : False
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
loading word vectors from data/glove.6B.200d.pt
loading word vectors from /media/wow/disk2/AG/glove.6B.200d.pt
background -> background
fail on background
CKPT /media/wow/disk2/AG/predcls.tar is loaded
THCudaCheck FAIL file=/home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu line=297 error=78 : a PTX JIT compilation failed
Traceback (most recent call last):
File "test.py", line 80, in
entry = object_detector(im_data, im_info, gt_boxes, num_boxes, gt_annotation, im_all=None)
File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/wow/disk2/STT/STTran-main/lib/object_detector.py", line 306, in forward
FINAL_FEATURES = self.fasterRCNN.RCNN_roi_align(FINAL_BASE_FEATURES, FINAL_BBOXES)
File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 58, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio
File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 20, in forward
output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio)
RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu:297
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.