GithubHelp home page GithubHelp logo

yrcong / sttran Goto Github PK

View Code? Open in Web Editor NEW
176.0 3.0 33.0 5.61 MB

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

License: MIT License

Python 28.92% Makefile 0.01% Jupyter Notebook 64.45% Cython 1.14% MATLAB 0.07% C 2.34% C++ 0.53% Cuda 2.52% Shell 0.02%
video scene-graph iccv2021 relationship-detection

sttran's Introduction

Here is Yuren Cong!

sttran's People

Contributors

yrcong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sttran's Issues

About SGCLS

Hi!
I tried your code but I have found that the evaluation for sgcls is extremely low (near to 0) and the predcls and sgdet were the same with your paper.
Could you please give me some hint about this problem or could you please check your code for sgcls again?
Thank you so much!

view size is not compatible with input tensor for sgdet

Hi, thanks for your code and paper in advance.
However, I have a small question.
When I run the training code in predcls or sgcls mode, everything is fine but when I run the training code in sgdet mode, the error below shows:

File "/home/quhaoxuan/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 50, in reshape
x = x.view(
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I understand that this function seems to be only triggered in sgdet setting.
But can I ask is there any suggestions on any possible solutions to this error?
Many thanks in advance

关于usage

您好,关于usage那里我没看明白,clone完之后应该怎么操作?clone完之后运行下面五行代码,然后把pth模型放在对应目录,然后就能开始跑下面的实验代码了吗?

use glove failed

Thanks for your codes, its helpful!
but some error happend when i run the Train command,error like this:

The CKPT saved here: data/
spatial encoder layer num: 1 / temporal decoder layer num: 3
mode : predcls
save_path : data/
model_path : None
data_path : /home/abc/NewDisk/origin_datasets/ActionGenome/dataset/ag/
datasize : large
ckpt : None
optimizer : adamw
lr : 1e-05
nepoch : 10
enc_layer : 1
dec_layer : 3
bce_loss : False
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 7584 videos and 177330 valid frames
144 videos are invalid (no person), remove them
49 videos are invalid (only one frame), remove them
21643 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

loading word vectors from data/glove.6B.200d.pt
loading word vectors from /home/abc/NewDisk/model_zoo/STTran/glove.6B.200d.pt
__background__ -> __background__ 
fail on __background__

The program is stuck at this point, so i wonder why?
Thanks!

关于ag_database.pkl

您好,打扰了!我注意到在预训练faster_rcnn时您用到了ag_database.pkl这个文件,请问能提供下这个文件或者这个文件的具体格式吗?

video action recognition visualization

In the introduction video on youtube, we see the author visualize the recognition of human actions in video frames. What I want to know is, did the author include the code for video action recognition visualization in the code? I hope the author can provide some help, thank you very much

How to use faster rcnn?

您好,打扰了!请问目前编译 jwyang/faster-rcnn.pytorch,是需要将您仓库中的fasterRCNN目录下文件替换为https://github.com/jwyang/faster-rcnn.pytorch 中文件、修改https://github.com/yrcong/STTran/blob/main/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py 中代码,然后按照 jwyang/faster-rcnn.pytorch 的readme要求进行编译吗?此外, jwyang中的faster rcnn对于环境的要求似乎与您readme中的环境不同,请问会有影响吗?

Faster RCNN configuration

Hi, I am trying to use the pre-trained object detector's model with detectron2 code, since the https://github.com/jwyang/faster-rcnn.pytorch is deprecated (and I could not fix the incompatibility issues with current libraries anyway). For this purpose, I used the configuration yaml file (faster_rcnn_R_101_C4_3x) from the detectron2 model zoo, while changing the necessary values (mainly anchors' sizes and the number of classes). To match the pre-trained model with the detectron2 rennet's architecture I also had to change some parts of their code (such as the shapes of some layers). However, testing the STTran model with the adjusted detectron2 code yielded poor results. I tried to check and test the object detector separately (using the detectron2 evaluation code), and the AP results were below 0 except for the "person" class. I understand that this is not a relevant issue since the pre-trained model you provided might be working well with the original code, but I'd appreciate it if you could provide the configurations of faster RCNN used in training your model. Thank you.

customized inputs

Is there any suggestion on how to run the model on customized input videos?

Thank you!

clean_class in ObjectClassifier

Thanx for ur wonderful work.
It seems that the clean_class replaces the categories of those data that originally classified as class_ idx with their second highest confidence categories when it is sgdet mode. Could you explain why this operation performs here?
Thanx again.
Waiting for reply : )

Trained model file doesn't open

I downloaded the trained model files(predcls.tar, ...) from your github.
And i tried to untar that files using "tar -xvf predcls.tar"
But it gives me these error messages..

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Do you know any solutions? or problems?

俩次筛选

请问action_genome.py中为什么会进行俩次筛选打印呢
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 7584 videos and 177330 valid frames
144 videos are invalid (no person), remove them
49 videos are invalid (only one frame), remove them
21643 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

预训练好的模型

您好,您工程中预训练好的模型faster_rcnn_ag.pth是更快的rcnn的PyTorch-1.0分支预训练出来的吗,还是之前版本的呢

clean_class

你好,请问ObjectClassifier中的clean_class的class_idx为什么输入5,8,17呢

Query about "Other Relationships"

Hello,
Thanks a lot your nice work.

In the paper, you mentioned to evaluate on 25 classes, however, in your dataloader you have "Other Relationships" as a contact class which makes in total 26 classes. Is there any particular reason for using it?

关于AGdataset

我发现AG数据集原论文中说relationship一共有26个 并且数据集中也是如此标注的。
但是在你的AGdateset中(把no person and only one frame等不满足情况的去掉后)我发现只有16个relationship在你的dateset中 也即是说 剩下的10种是没有数据的?

Problem with torch.utils.ffi

Hello,

I am trying to run your code but I'm coming across some issues regarding torch.utils.ffi.
I compiled the draw_rectangles, fpn. Then compiled faster rcnn as instructed and copied the files over the fasterRCNN folder inside STTran, but when I try to run the code as it tries to resolve all the dependencies from line 10 on test.py from lib.object_detector import detector it crashes with the following error:

File "/media/hdd5/raphael/HOI/STTran/test.py", line 10, in
from lib.object_detector import detector
File "/media/hdd5/raphael/HOI/STTran/lib/object_detector.py", line 10, in
from fasterRCNN.lib.model.faster_rcnn.resnet import resnet
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in
from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in
from model.rpn.rpn import _RPN
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 8, in
from .proposal_layer import _ProposalLayer
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/proposal_layer.py", line 20, in
from model.nms.nms_wrapper import nms
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_wrapper.py", line 10, in
from model.nms.nms_gpu import nms_gpu
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_gpu.py", line 4, in
from ._ext import nms
File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/_ext/nms/init.py", line 2, in
from torch.utils.ffi import _wrap_function
File "/media/hdd5/raphael/HOI/HOI_env/lib/python3.8/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

This seems to be deprecated for pytorch versions > 0.4 but on the repo, it is stated to use 1.1. Strangely I tried downgrading to torch==0.4.1 but the same error persists.
Any ideas on how to work around that would be appreciated!

关于训练的时长

作者你好,请问整个网络当时您训练了多久达到的现在的效果?

the Action Genome Dump frames of this paper

In your readme, it is said to follow JingweiJ/ActionGenome to get the dataset.
Could you please supply the version of your ffmpeg? For I followed JingweiJ/ActionGenome and get the empty folders of all the videos.
By the way, I'd like to know whether you use 'python tools/dump_frames.py --all_frames' or just 'python tools/dump_frames.py' to get the dataset?
Thanks for your work!

ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'

thanks for your great job.
when I followed README, and try to run the train command:

python train.py -mode predcls -datasize large -data_path $DATAPATH

I had some problem.
First, in dataloader/action_genome.py, from scipy.misc import imread . the lib scipy seems not surport the fuction imread in latest version. so i change it to from imageio import imread

Second, when I continue totry to run the project, a new problem occured:
ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'
it means i didnt generate the file in this directory, so I went back to

cd fpn/box_intersections_cpu
python setup.py install ( i found if i dont add 'install ' , the command cant work)

and then

~/project/STTran/lib/fpn/box_intersections_cpu$ python setup.py install
running install
running build
running build_ext
running install_lib
running install_egg_info
Removing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info
Writing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info

So, it didnt generate the file(lib.fpn.box_intersections_cpu.bbox), and i dont know how to fix it, then I may need your help.^_^

About Frame Encoding

Hi,
In the Temporal Decoder,

"we customize the Frame Encodings to inject the temporal position in the relationship representations. The frame encodings E f are constructed with learned embedding parameters"

I can't understand this sentence, can you explain Frame Encoding?
Thanks.

阈值0.9

我有个疑惑想请您帮忙解答。为了研究半限制条件下这种阈值对Recall@K的影响,STTran在0.7到0.95的所有阈值水平上一致地优于其他三个模型。那这个置信度阈值0.9是怎么确定的呢?

About Pytorch Version

Hi
My GPU only support cuda>=11, only support pytorch>=1.7.
so i cannt use pytorch=1.1.

Performance very different to Action Genome baselines

Thanks for sharing the nice work!

But I find the performance presented in the paper is very different with methods in "Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs" and "detecting human-object relationships in videos".
What are the causes for this?

ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Traceback (most recent call last):
File "train.py", line 13, in
from lib.object_detector import detector
File "/home/xxx/project/STTran3/lib/object_detector.py", line 11, in
from fasterRCNN.lib.model.faster_rcnn.resnet import resnet
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in
from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in
from fasterRCNN.lib.model.rpn.rpn import _RPN
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/rpn/rpn.py", line 8, in
from .proposal_layer import _ProposalLayer
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/rpn/proposal_layer.py", line 21, in
from fasterRCNN.lib.model.roi_layers import nms
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/roi_layers/init.py", line 3, in
from .nms import nms
File "/home/xxx/project/STTran3/fasterRCNN/lib/model/roi_layers/nms.py", line 3, in
from fasterRCNN.lib.model import _C
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

你好,我打算复现项目的时候遇到了这个错误,这个错误是否和cuda的版本有关?
我用nvidia-smi的命令看了下我的cuda是11.4,然后试了多个pytorch的版本都遇到了这个问题。这个问题是因为cuda版本不对导致的吗

About released pretrained models

Hello, I try to used the pretrained models, but the three tar files appear to be corrupted, can you upload new models? Thanks
image

Data related problem

Hello author, thank you very much for your excellent open-source work. After reproducing your results, I have the following questions. If you could provide me with answers, I would greatly appreciate it.
1.I found that this code cannot change the batch size. I analyze the issue with the following code. May I ask why this processing is necessary? If the following code is not used, can the model work properly under different batch sizes after adjustment
image
2.I noticed that you had a discussion with the author of ActionGenome and mentioned that the mAP results of the AG dataset processed using fastRCNN were not good. Have you ever used the AG dataset for multi object detection? I used the most advanced multi object detection model to perform target detection on the AG dataset and obtained very poor results. Do you have any relevant experiments to explain this phenomenon?

Trained model can't be unzipped

I download your trained model, it is tar file and I have tried all unzip command in linux, but it can't be unzipped? I'm very grateful for your help!!

ffmpeg问题

执行生成帧代码时,我这生成都是空文件夹,显示这个sh: 1: ffmpeg: not found,请问是什么原因呢

关于训练过程的问题

作者你好,对于训练的过程我有些问题。整个系统在训练时候用来检测的fasterrcnn也一起跟着训练吗,还是说只是用来提取图片的特征?object_detector文件下的is_train代码的作用是什么呢?

about object_bbox

请问文件里面这个object_bbox标注位置信息的格式是什么,左上右下坐标(x1,y1,x2,y2)还是中心宽高坐标(x,y,w,h)呢

Question of FastRCNN when using the sgdet mode

Thanks for sharing the nice work!

I have successfully trained and tested the model in predcls mode, but when I want to train the model in the sgdet mode, I met a few problems.

In object_detector.py, fasterRCNN needs to return five values (Figure 1), but in faster_rnn.py, there are eight values to be retured (Figure 2), and there is no roi_features needed in object_detector.py

Could you tell me how to solve the problem requires, thank you very much
1
2

About evaluation metrics

您好,打扰了。我注意到在您的论文中 SGDET 的 Recall@10 指标 No Constraint 比 With Constraint 低,是什么原因造成这种情况的出现呀?一般情况下 No Constraint 不是应该比 With Constraint 高的吗?

关于模型

嗨 ,又打扰你了,想请教一下关于论文模型的部分,,,,老师让我明天向他汇报,,看代码应该是来不及了,麻烦您了!!
我知道模型的大概过程是首先使用self attention得到每个帧中,关系的上下文表示。然后得到T个大X,每个大X看论文应该是 K(t)1936,然后按照论文中的说法是u个frame,那就是 Z = uK(t)*1936.然后再decoder阶段,是如何对Z做attention的呢?
总结的话 有三个疑问: 一个是 Z如何和Ef相加,是将Ei与所有K(t)相加吗? 第二个是如何对Z做attention?是将每个帧中所有x^k_t融合成一个表示还是怎么做attention?第三个是,我最常见到的transformer decoder都是K和V一致,这里采用Q和K一致是基于什么考虑呢?

fasterRCNN compilation error with python 3.8 and pytorch 1.12 cuda 11.3

I.m trying to setup sttran from scratch and I'm getting errors while setting up fasterRCNN package.
The command I run is :
STTran/fasterRCNN/lib: python setup.py install

fatal error: THC/THC.h: No such file or directory
    5 | #include <THC/THC.h>
      |          ^~~~~~~~~~~
compilation terminated.

save graph and graph visualisation

@yrcong hi, I would like to save the scene graph. Is it possible to save these graph like nodes and edges format? I don't know if the author realizes the scene graph saving.

如何生成pkl格式文件

作者你好,我正在尝试使用自己的数据集复现您的代码,我已经生成了txt格式的数据(其中包含各对象的2D bbox以及对象之间的关系),但是我不知道如何将该txt文件转化成此代码用的pkl文件作为GT。希望作者能够提供支持。谢谢!

Test set for performance metrics

@yrcong What train-val-test split is used with Action Genome dataset? From the codebase, it seems like the same test set was used as validation set during training (in train.py) and also as test set in (test.py). What dataset split is used for reported metrics in paper?

fasterrcnn

from fasterRCNN.lib.model import _C
ImportError: /home/valca509/STTran-main/fasterRCNN/lib/model/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E

Decoder_lin in Object Classifier

Hi, I want to make sure for "sgdet" task, why is the decoder_lin not used during testing

Does it mean that the object prediction labels from the faserRCNN are used directly in "sgdet"? Then why is decoder_lin trained?

RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at

Wonderful code!
Howerever,I encountered the following error.Did I miss something that caused this error?

mode : predcls
save_path : /media/wow/disk2/AG/save
model_path : /media/wow/disk2/AG/predcls.tar
data_path : /media/wow/disk2/AG/dataset
datasize : large
ckpt : None
optimizer : adamw
lr : 1e-05
nepoch : 10
enc_layer : 1
dec_layer : 3
bce_loss : False
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
loading word vectors from data/glove.6B.200d.pt
loading word vectors from /media/wow/disk2/AG/glove.6B.200d.pt
background -> background
fail on background


CKPT /media/wow/disk2/AG/predcls.tar is loaded
THCudaCheck FAIL file=/home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu line=297 error=78 : a PTX JIT compilation failed
Traceback (most recent call last):
File "test.py", line 80, in
entry = object_detector(im_data, im_info, gt_boxes, num_boxes, gt_annotation, im_all=None)
File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/wow/disk2/STT/STTran-main/lib/object_detector.py", line 306, in forward
FINAL_FEATURES = self.fasterRCNN.RCNN_roi_align(FINAL_BASE_FEATURES, FINAL_BBOXES)
File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 58, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio
File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 20, in forward
output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio)
RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu:297

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.