kenshohara / 3d-resnets-pytorch Goto Github PK

View Code? Open in Web Editor NEW

3.9K 3.9K 932.0 336 KB

3D ResNets for Action Recognition (CVPR 2018)

License: MIT License

Python 100.00%

action-recognition computer-vision deep-learning python pytorch video-recognition

3d-resnets-pytorch's People

Contributors

Stargazers

Watchers

Forkers

bityangke baiyancheng20 benjamesbabala dimplesl liviust jeffreyyihuang hyzcn zhaojob blancaag qijiezhao aresthu liu3xing3long willdamon caomw shubhampachori12110095 yurkovanton sophiazy ml-lab wanjinchang wotulong b2220333 stupidzz xjsxujingsong cagbal cryptedp xy0806 stair-lab-cit pkdogcom elevanth skrish13 mamilat bmeatayi henry-e zcunyi gabrielwithtina zhnidj aimeng100 yaozy15 wuzzh vateye zhanghaoinf satoshirobatofujimoto dlecnu leonelcuevas fesianxu jingang-cv gaalipour tangyoubao shebak11 keyky wyhsirius 17skye17 zcrwind feirenlg kansea bestlin xuzf2016 lxtgh locussam highclow liketheflower chenmaolin88 xiaobai12345 pplntech w530248323 akumar14 helq2612 shlpu grseb9s riteshhota2008 ht93 danilopetrocelli xiaoyuliu sunnyxiaohu alvinhech xjhaoren zhuxinqimac spandanagella sunbau zizi21 mzraghib simonguiroy baifanysu spxtrm dsp6414 ahmetgunduz euivmar clarencechen haomood guoswang yf817 ishida66 gchen2016 tanyjiang laura-wang linhanxiao hengqujushi codes-kzhan rohanchidrewar05 parsonszeng

3d-resnets-pytorch's Issues

Pretrained Model on Sports1M

Dear @kenshohara,
Thank you very much for your fantastic repository. Do you have any pretrained model on Sports1M?

question about load "resnet-152-kinetics.pth"

Thank you for your wonderful work.
Just read the paper, it is noted that each clip contains 16 frames. I read two other papers in which the author claims that 32 frame input would be better, have you tried 32 frames input? If you trained such models, can you please release the pretrained models?

The downsample branch parameters of resnet18 pretrain model is missing

I cannot find the downsample branch parameters in resnet18 pretrain model: resnet-18-kinetics.pth.
These are all the keys in pretrain model state_dict:

module.conv1.weight
module.bn1.weight
module.bn1.bias
module.bn1.running_mean
module.bn1.running_var
module.layer1.0.conv1.weight
module.layer1.0.bn1.weight
module.layer1.0.bn1.bias
module.layer1.0.bn1.running_mean
module.layer1.0.bn1.running_var
module.layer1.0.conv2.weight
module.layer1.0.bn2.weight
module.layer1.0.bn2.bias
module.layer1.0.bn2.running_mean
module.layer1.0.bn2.running_var
module.layer1.1.conv1.weight
module.layer1.1.bn1.weight
module.layer1.1.bn1.bias
module.layer1.1.bn1.running_mean
module.layer1.1.bn1.running_var
module.layer1.1.conv2.weight
module.layer1.1.bn2.weight
module.layer1.1.bn2.bias
module.layer1.1.bn2.running_mean
module.layer1.1.bn2.running_var
module.layer2.0.conv1.weight
module.layer2.0.bn1.weight
module.layer2.0.bn1.bias
module.layer2.0.bn1.running_mean
module.layer2.0.bn1.running_var
module.layer2.0.conv2.weight
module.layer2.0.bn2.weight
module.layer2.0.bn2.bias
module.layer2.0.bn2.running_mean
module.layer2.0.bn2.running_var
module.layer2.1.conv1.weight
module.layer2.1.bn1.weight
module.layer2.1.bn1.bias
module.layer2.1.bn1.running_mean
module.layer2.1.bn1.running_var
module.layer2.1.conv2.weight
module.layer2.1.bn2.weight
module.layer2.1.bn2.bias
module.layer2.1.bn2.running_mean
module.layer2.1.bn2.running_var
module.layer3.0.conv1.weight
module.layer3.0.bn1.weight
module.layer3.0.bn1.bias
module.layer3.0.bn1.running_mean
module.layer3.0.bn1.running_var
module.layer3.0.conv2.weight
module.layer3.0.bn2.weight
module.layer3.0.bn2.bias
module.layer3.0.bn2.running_mean
module.layer3.0.bn2.running_var
module.layer3.1.conv1.weight
module.layer3.1.bn1.weight
module.layer3.1.bn1.bias
module.layer3.1.bn1.running_mean
module.layer3.1.bn1.running_var
module.layer3.1.conv2.weight
module.layer3.1.bn2.weight
module.layer3.1.bn2.bias
module.layer3.1.bn2.running_mean
module.layer3.1.bn2.running_var
module.layer4.0.conv1.weight
module.layer4.0.bn1.weight
module.layer4.0.bn1.bias
module.layer4.0.bn1.running_mean
module.layer4.0.bn1.running_var
module.layer4.0.conv2.weight
module.layer4.0.bn2.weight
module.layer4.0.bn2.bias
module.layer4.0.bn2.running_mean
module.layer4.0.bn2.running_var
module.layer4.1.conv1.weight
module.layer4.1.bn1.weight
module.layer4.1.bn1.bias
module.layer4.1.bn1.running_mean
module.layer4.1.bn1.running_var
module.layer4.1.conv2.weight
module.layer4.1.bn2.weight
module.layer4.1.bn2.bias
module.layer4.1.bn2.running_mean
module.layer4.1.bn2.running_var
module.fc.weight
module.fc.bias

No downsample branch parameters in the above keys.

keyerror when loading pretrained model

hi dear
when i tried to run command:
python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json
--result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101
--pretrain_path models/resnet-34-kinetics-cpu.pth --ft_begin_index 4
--model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5

i have also set no_cuda=True in opts as i m running it on cpu only**

I m getting this error:

Traceback (most recent call last):
File "HAR_3D/main.py", line 47, in
model, parameters = generate_model(opt)
File "/home/chandni/HAR_3D/model.py", line 178, in generate_model
model.load_state_dict(pretrain['state_dict'])
File "/home/chandni/anaconda3/envs/har/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "module.conv1.weight" in state_dict'

any idea to resolve this issue

what should i do if i wanna evaluate this model on my own dataset ?

Hi kenshohara !
I want to evaluate this model on my own dataset without modifying too much the code. I have skimmed through the code and then i think i should modify the file dataset.py. However i dont know how to begin to modify it. Would you please give me some suggestions ? Thanks

error_of_--resume

Thanks for your sharing! @kenshohara
when i use the --resume, I met an error like this :
KeyError: 'missing keys in state_dict: "set(['module.layer2.0.downsample.1.running_var', 'module.layer3.0.downsample.1.running_var', 'module.layer2.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_var', 'module.layer3.0.downsample.1.weight', 'module.layer2.0.downsample.0.weight', 'module.layer3.0.downsample.0.weight', 'module.layer4.0.downsample.0.weight', 'module.layer4.0.downsample.1.bias', 'module.layer4.0.downsample.1.weight', 'module.layer3.0.downsample.1.bias', 'module.layer2.0.downsample.1.weight', 'module.layer2.0.downsample.1.bias', 'module.layer3.0.downsample.1.running_mean'])"'

Could you please tell how to debug it ?

Folder structure for Kinetics train, val and test data

For me it is not quite clear how to structure the video_directory with these datasets for kinetics. Should it be video_directory/{train,val,test}/jpg, so that it can train and validate at the same time ? or is there another folder structure i should adhere to ?

finetune on custom dataset

I have a small two classes dataset, which have 1000 videos for each class. I want to use finetune your pretrained models, but it seems to overfit my dataset. How can i figure it out? Enlarge my dataset?

Data loading speed is too slow

Thank you for providing a nice code!

I tested the pretrained model "resnext-101-64f-kinetics-ucf101_split1.pth" using UCF 101.
I got 93.99% video level accuracy.

However, the computational speed is really slow (roughly a few hours) because of data loading.
Although I'm using HDD not SSD, data loading speed is much slower than my expectation.
Especially, (4n+1) times speed is significantly slow (4 is the number of threads).
I added my log file as follows:

[1/197] Time 335.936 (335.936) Data 333.094 (333.094)
[2/197] Time 1.754 (168.845) Data 0.000 (166.547)
[3/197] Time 1.758 (113.149) Data 0.000 (111.032)
[4/197] Time 1.769 (85.304) Data 0.000 (83.274)
[5/197] Time 298.968 (128.037) Data 297.199 (126.059)
[6/197] Time 1.750 (106.989) Data 0.000 (105.049)
[7/197] Time 1.760 (91.956) Data 0.000 (90.042)
[8/197] Time 1.757 (80.681) Data 0.000 (78.787)
[9/197] Time 280.848 (102.922) Data 279.067 (101.040)
[10/197] Time 1.766 (92.807) Data 0.000 (90.936)
[11/197] Time 1.754 (84.529) Data 0.000 (82.669)
[12/197] Time 1.760 (77.632) Data 0.000 (75.780)
[13/197] Time 290.565 (94.011) Data 288.792 (92.166)
[14/197] Time 1.750 (87.421) Data 0.000 (85.582)
[15/197] Time 1.763 (81.711) Data 0.000 (79.877)
[16/197] Time 1.756 (76.713) Data 0.000 (74.885)
[17/197] Time 303.138 (90.032) Data 301.384 (88.208)
[18/197] Time 4.009 (85.253) Data 2.253 (83.433)
[19/197] Time 3.780 (80.965) Data 2.027 (79.148)
[20/197] Time 1.759 (77.005) Data 0.000 (75.191) ...

Is the speed is normal or something wrong?
If it is something wrong, could you kindly let me know how to fix it?

Evaluation HMDB51

Hi Kenshohara, I'm not sure how to evaluate the HMDB51 test set on the model which is fine-tuned. And I wonder how to loaded HMDB51 pretrained weight on your model. I've tried but it report some dictionary errors.

3d_vgg_model

@kenshohara Thanks for your wonderful works and sorry for bothering you!
I see that you have released c3d-sports1m-kinetics.t7( 608 MB), did it has the same architecture and get the same performance as the C3D(https://arxiv.org/abs/1412.0767) ?
Besides, i can not find any results about this model on any dataset in your provided papers. Could you please show the accuarcy on some datasets(ucf101,hmdb51)?
Finally, could you please share the code that finetuning the c3d-sports1m-kinetics.t7 on the UCF101?
Thank you in advance!

options for testing

on running code for resnext101(fine-tuning resnext 101 kinetics pretrained model ) i got clip accuracy around 86 %.
val.json file is created during testing as per ur code, now i want to get video accuracy, so first i need to to test by setting no_val and no_train =true and Test=true in opts.py.
i ran code with above options, it created val.json
on evaluating val.json its giving accuracy =0.28
do in neeed to change any other option in opts.py?

can please mention options for testing only using fine tuned model.

Training accuracy on Kinetics

Hi
Since I cannot find, what training accuracy you achieved on the Kinetics training set, I am not sure if the accuracy I obtained is high enough?
I obtain around 30% training accuracy, however the loss is similar to that reported in the paper (around 3), so I am not sure if the accuracy is also similiar ?

normalization

Are the inputs of all pretrained models normalized by mean? I find you add the mean of kinetics, while there is only the mean of of activitynet several weeks ago, so could tell us which mean is used to train the pretrained models? When I finetune the pretrained models to my custom dataset, need I compute the mean of the custom dataset?

Error while loading weights

KeyError: 'unexpected key "module.features.conv0.weight" in state_dict'

I'm using densenet-201-kinetics.pth file with the densenet.py file from models folder.

net = densenet.densenet201(sample_size=64, sample_duration=30, num_classes=400)
pretrained_weights = torch.load(pretrained_path)
net.load_state_dict(pretrained_weights['state_dict'])

question about the 'Temporal duration of inputs'

Hi@kenshohara ,
in the opts.py ,whether I can change temporal duration of inputs in parser.add_argument('--sample_duration', default=16, type=int, help='Temporal duration of inputs'),like 32 frames,64 frames,etc? have you take the similar experiments? I really appreciate for your reply, Thanks.

Very Slow Training

I am training Resnet with depth 34 on the kinetics dataset, however the training procedure is not improving anything. How long does it take till the model starts improving ? I have attached a screenshot; currently I am at epoch 34 but the loss is still 5.99 and is not decreasing, and accuracy is very volatile

Incorrect Conv3d weights initialization?

The Resnet and ResNeXt models (I haven't checked others) seem to be trying to initialize the weights using kaiming's initialization method in his Resnet paper using a normal distribution. However, by comparing the codes with the details in the paper as well as pytorch's own implementation (yes, pytorch has implemented kaiming's initializations), the Conv3d weight initialization seems to miss calculating the size of the third dimension in the kernel, i.e. instead of using

if isinstance(m, nn.Conv3d):
    # Kernel is 3D but here only considers the time and row
    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 
    m.weight.data.normal_(0, math.sqrt(2. / n))

we can use pytorch's implementation directly:

nn.init.kaiming_normal(m.weight, mode='fan_out')

whose fan in/out factor is calculated by

        num_input_fmaps = tensor.size(1)
        num_output_fmaps = tensor.size(0)
        receptive_field_size = 1
        if tensor.dim() > 2:
            receptive_field_size = tensor[0][0].numel() # which should be kernel_size[0]*kernel_size[1]*kernel_size[2]
        fan_in = num_input_fmaps * receptive_field_size
        fan_out = num_output_fmaps * receptive_field_size

I've done a quick test (train 200 epochs once) on the mini-kinetics dataset and by fixing the weights initialization the accuracy seems to improve.

Let me know if it makes sense.

finetune on hmdb51, low accuracy on val set

Hi, I finetune the pretrained resnet-34-kinetics model on hmdb51, with the following command:
python main.py --root_path ~/data --video_path hmdb51/jpg --annotation_path hmdb51_1.json --result_path resnet34_finetune_hmdb51_results --dataset hmdb51 --n_classes 400 --n_finetune_classes 51 --pretrain_path models/resnet-34-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5
When I check the performance on train and val sets, accuracy(train) is kind of high - around 0.8, while accuracy(val) is always around 0.5. Is this correct? It seems the training overfits into the train set?

requirements

Please mention hardware and software requirements,brief steps to follow to use this code.

RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

@kenshohara Thanks for your wonderful works and sorry for bothering you.
I try to repeat your work, but there are some problems when I try it. I did not change your code except for the necessary place, Could you please help me to fix this problem? Thank you so much.

The problem is showed as blow:

Traceback (most recent call last):
File "main.py", line 123, in
opt, train_logger, train_batch_logger)
File "/home/deep_ww/3D-ResNets-PyTorch-master/train.py", line 29, in train_epoch
outputs = model(inputs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/3D-ResNets-PyTorch-master/resnet.py", line 164, in forward
x = self.fc(x)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 835, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

fine-tuning resnet-18 on UCF

Hi, great work! Thank you.

Could you please tell me how long it took for you to fine-tune resnet-18(pretrained on kinetics) on UCF101 to get the reported accuracy (~84%)? Also, was there any specific hyperparameter setting I need to change while fine-tuning except freezing conv layers?

-Ananth

ActivityNet Download

you said downloading datasets using official crawler codes, can you show me the offiical code or some guidence to download the dataset

normalization of inputs

Did you normalize the inputs to [0,1] in the training phases of the pretrained models?

Pretrained ActivityNet

Dear Kensho

Thanks for your github, which is very useful for the community! I wonder if any pre-trained ActivityNet model could be downloaded?

Thanks & Bests

accimage.Image() is called before accimage is imported

flake8 testing of https://github.com/kenshohara/3D-ResNets-PyTorch on Python 2.7.13

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./activitynet.py:22:16: F821 undefined name 'accimage'
        return accimage.Image(path)
               ^

./kinetics.py:22:16: F821 undefined name 'accimage'
        return accimage.Image(path)
               ^

low validation accuracy with pre-train model on kinetic dataset

Hi，when I try to apply the pre-train model(resnext-101-64f-kinetics.pth)on the validation set from Kinetic dataset, the accuracy turns out to be very low(much like random prediction). I have check the loader and did not modify the code. I am wondering how does the pre-train model come from. Did you directly train Pytorch code from zero or transfer the weights from Torch-version pre-train model? Thx!

regarding batch_size

Hi dear
can u please tell about how to choose batch size?
My system has one GPU(8 GB memory) and System memory is 64 GB.
when i tried to run with default batch size of 128 , it gave run time error: out of memory
Then i reduced it to 20, it started working.
But i want to know what should be appropriate batch size according to my system configuration, little about how to set appropriate batch size and whether it will affect accuracy?
I would be very much thankful for your suggestion on this....

RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

Hi dear
Need help....on running main.py ,everything is going well till dataset loading as shown below:
model generated
dataset loading [0/9537]
dataset loading [1000/9537]
dataset loading [2000/9537]
dataset loading [3000/9537]
dataset loading [4000/9537]
dataset loading [5000/9537]
dataset loading [6000/9537]
dataset loading [7000/9537]
dataset loading [8000/9537]
dataset loading [9000/9537]
dataset loading [0/3783]
dataset loading [1000/3783]
dataset loading [2000/3783]
dataset loading [3000/3783]
run

error occured here:

train at epoch 1
Traceback (most recent call last):
File "/media/psrana/New Volume/chandni/HAR_3D_TU/main.py", line 139, in
train_logger, train_batch_logger)
File "/media/psrana/New Volume/chandni/HAR_3D_TU/train.py", line 22, in train_epoch
for i, (inputs, targets) in enumerate(data_loader):
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter
return DataLoaderIter(self)
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 242, in init
self._put_indices()
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 290, in _put_indices
indices = next(self.sample_iter, None)
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 119, in iter
for idx in self.sampler:
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 50, in iter
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

what could be the

kinetics database

Hi,first ,you did a very nice work，and I am very interested in it.
But now ,I can't download kinetics database from the official crawler because I can't access YouTube（No vpn）.
So,could you please offer the database in somewhere？such as：Google Drive？

Best wishes for you.
sincerely ckjiao（LakyTT）

Is the train.log right?

@kenshohara

First , it is a great job!

I use the kinetics data to train a resnet-34-kinetics model. Every action have 50 MP4s. I train the model just like this:
python3 main.py --root_path ~/3D-ResNets-PyTorch --video_path kineticsJPG --annotation_path kineticsjson/kinetics.json --result_path results --dataset kinetics --model resnet --model_depth 34 --resnet_shortcut A --pretrain_path models/resnet-34-kinetics.pth --n_classes 400 --batch_size 30 --n_threads 4 --checkpoint 5
It success to create a model "save_30.pth".
My train.log file looks like this:
epoch loss acc lr
21 5.05777625165984 0.050443081117927745 0.1
22 5.051979898375048 0.05023333857689686 0.1
23 5.037916659033671 0.05075769492947407 0.1
24 5.020105431997609 0.05448062503277227 0.1
25 5.006561265645096 0.05689266425462745 0.1
26 4.986441563412287 0.05615856536101935 0.1
27 4.982790104450624 0.05909496093545173 0.1
28 4.962981188055365 0.05883278275916313 0.1
29 4.957985667213246 0.06077290126369881 0.1
30 4.930375250401841 0.06276545540349221 0.1
Is it OK?
When i use "video-classification-3d" to check the model ,
Command is:
python3 main.py --input ./input --video_root ./videos --output ./output.json --model pathto/save_30.pth --mode score

I find the result is poor.

Why?
Is the train.log right?
Do i train the model sufficiently?

val.json file

Hi dear
Can u please upload your val.json for ucf101 split-1?

Performance of pretrained weights on UCF101

Hi,
Nice work! I have a question about your results on UCF101 split 1. I've evaluated your pretrained weight of "resnext-101-kinetics-ucf101_split1.pth" on UCF101 split 1 and got the accuracy of ~85.99. I'm wondering if it is the right accuracy or not. Would you please provide the accuracies of the pretrained models?

Performance of fine-tuning on UCF101

I downloaded the network ResNet-101 pretrained on Kinetics, and fine-tuned on UCF101 following the example script. However, I can only get 82.5 by averaging the three splits. In the paper, the authors reported 88.9. Any suggestion?

Other function(bounding box regression)

Such as UCF101 or HMDB51 datasets, labeling the entire image.
Then,The Google AVA dataset has action annotation and bounding box annotation.

Is it possible to modify the network with bounding box regression (object detection such as yolo)
It can make the network more accurate by learning human the action.

Pretrained resnext-101-64f-kinetics model

Hi,

I have seen you uploaded pretrained models using 64 frames for ucf101 and hmdb51.
Is there a chance you can upload a 64f model pretrained using kinetics only?

I would like to compare results between 16f and 64f, but in order to make a proper comparison I would rather use the pretrained model only on kinetics as well.

Thanks for the great work!

No checkpoint saved

I am running the command of the readme

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json
--result_path results --dataset kinetics --model resnet
--model_depth 34 --n_classes 400 --batch_size 128 --n_threads 4 --checkpoint 5

(with my options), but somehow if I check in my results folder, where opts.json and stuff is saved, there is no pth file saved, even after 100 epochs of training. Do I have to specify the checkpoint path too when calling main.py ?

Test set Kinetics

Hi,

Thanks for releasing this awesome repository. However I cannot find the test set son file (matching the youtube_id to a label) on the Kinetics dataset webpage.
Could you show where to find it?

Thanks

whether the 16 frames in one clip need the uniform spatial transform?

According to the 181 line of kinetics.py and the 69 line of main.py, because of the the existence of randomize.parameter(),I find the 16 frames in one clip have different spatial transform ,Will it have any impact?in my opinion , the 16 frames in one clip have the uniform spatial transform may be more reasonable.

CPU issue

I am trying to use resnet-34 (cpu version) for both classification and feature extraction. Here is the error:
(tensorflow) mariankyoussef@elecsim:~/video-classification-3d-cnn-pytorch$ python main.py --input ./input --video_root ./home/mariankyoussef/UCF101_videos --output ./output.json --model ./models/resnet-34-kinetics-cpu.pth --mode score --no_cuda
loading model ./models/resnet-34-kinetics-cpu.pth

Traceback (most recent call last):
File "main.py", line 24, in
model_data = torch.load(opt.model)
File "/home/mariankyoussef/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/serialization.py", line 267, in load
return _load(f, map_location, pickle_module)
File "/home/mariankyoussef/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/serialization.py", line 410, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: invalid load key, '<'.

I didn't try to fix the error yet, but I am wondering if the first couple of lines have the right settings?

Low accuracy on HMDB51

Hi, nice work first.
I have finetune my model on hmdb51, then I got a checkpoint save_200.pth. Then I try to run the following script to evaluate on validation.

python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_1.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_finetune_classes 51 --n_classes 51 --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --test --test_subset val --pretrain_path ~/Research/3D-ResNets-PyTorch/results/hmdb51/save_200.pth --no_train --no_val

After that I got a file called val.json. Then I run evaluate the script you provide in utils/eval_hmdb51.py.

hmdb = HMDBclassification('hmdb51_1.json', 'val.json',verbose=True, top_k=1) hmdb.evaluate()
Then I got

[INIT] Loaded annotations from validation subset.
Number of ground truth instances: 1530
Number of predictions: 15290
[RESULTS] Performance on ActivityNet untrimmed video classification task.
Error@1: 0.9784313725490196

It seems that something wrong with the prediction numbers. Can you tell me how you run the evaluation script. Thx :)

Other dataset

Did you try Charades dataset? It seems need more temporal information to classify.

Continue training on fine-tune model

I have used the code to fine-tune on hmdb51, I use the following command.
python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_1.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_classes 400 --n_finetune_classes 51 --pretrain_path ~/Research/3D-ResNets-PyTorch/pretrain/resnext-101-kinetics.pth --ft_begin_index 5 --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --checkpoint 5 --n_epochs 20

After training, I got 'save_20.pth' weight, then I run the following code to continue training from 21st epochs.
python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_2.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_classes 51 --resume_path ~/Research/3D-ResNets-PyTorch/results/hmdb51/save_20.pth --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --checkpoint 5 --n_epochs 20

I got an error:
Traceback (most recent call last): File "main.py", line 131, in <module> optimizer.load_state_dict(checkpoint['optimizer']) File "/home/ole/anaconda3/lib/python3.6/site-packages/torch/optim/optimizer.py", line 87, in load_state_dict raise ValueError("loaded state dict has a different number of ValueError: loaded state dict has a different number of parameter groups

How can I fix this?

Accuracy of fine-tuning on UCF-101

Hello!
I got 85.2% when fine tuning resnet-50 on UCF-101 split-1 instead of 89%, my settings are:

python main.py --root_path ~/big/3D-ResNets-PyTorch --video_path ~/big/UCF-101_jpg --annotation_path utils/ucf101_01.json --result_path ucf101_results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path model_weights/resnet-50-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 50 --resnet_shortcut B --batch_size 128 --n_threads 4 --checkpoint 5 --learning_rate 0.001

HMDB51 annonation

Hi, where i can get this section "annotation_dir_path includes brush_hair_test_split1.txt". I don't find it in the dataset website.

Why not test the inception-resnet which is more powerful?

Size mismatch in resnet forward pass

I am running the following command.
CUDA_VISIBLE_DEVICES=2 python main.py --root_path --video_path ucf101_jpg --annotation_path ucfTrainTestlist/ucf101_01.json --result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path models/resnet-34-kinetics.pth --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --no_train --no_val --test

I followed the steps in the readme to set up the ucf101 jpg frames and annotations. I printed out the shape of x after each layer in the forward pass and I get the following before the size mismatch error occurs.
input: (128L, 3L, 16L, 112L, 112L)
self.conv1 output: (128L, 64L, 16L, 56L, 56L)
self.bn1 output: (128L, 64L, 16L, 56L, 56L)
self.relu output: (128L, 64L, 16L, 56L, 56L)
self.maxpool output: (128L, 64L, 8L, 28L, 28L)
self.layer1 output: (128L, 64L, 8L, 28L, 28L)
self.layer2 output: (128L, 128L, 4L, 14L, 14L)
self.layer3 output: (128L, 256L, 2L, 7L, 7L)
self.layer4 output: (128L, 512L, 1L, 4L, 4L)
self.avgpool output: (128L, 512L, 1L, 2L, 2L)
x.view(x.size(0), -1) output: (128L, 2048L)

The issue is that self.fc is (512, 101). As a temporary hack, I changed the stride of self.avgpool from 1 to 2, but otherwise I am not sure where the error is.

RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

In the experiment of fine-tuning conv5_x and fc layers of a pretrained model on UCF-101. I got a size mismatch error. I have checked the shape of UCF-101 input data is (128L, 3L, 16L, 112L, 112L).

Complete error message:
Traceback (most recent call last): File "main.py", line 140, in <module> train_logger, train_batch_logger) File "/home/magic/yc/ActionRecognition/3D-ResNets-PyTorch-master/train.py", line 34, in train_epoch outputs = model(inputs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

My command:
python main.py --root_path ./data --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path models/resnet-34-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339 terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184

dataset loading [0/3570]
dataset loading [1000/3570]
dataset loading [2000/3570]
dataset loading [3000/3570]
dataset loading [0/1530]
dataset loading [1000/1530]
run
train at epoch 1
Epoch: [1][1/112] Time 4.807 (4.807) Data 2.836 (2.836) Loss 3.9053 (3.9053) Acc 0.000 (0.000)
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 137, in
train_logger, train_batch_logger)
File "/media/ole/Document/Ubuntu/Research/3D-ResNets-PyTorch/train.py", line 31, in train_epoch
acc = calculate_accuracy(outputs, targets)
File "/media/ole/Document/Ubuntu/Research/3D-ResNets-PyTorch/utils.py", line 58, in calculate_accuracy
n_correct_elems = correct.sum().data[0]
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184

kenshohara / 3d-resnets-pytorch Goto Github PK

3d-resnets-pytorch's People

Contributors

Stargazers

Watchers

Forkers

3d-resnets-pytorch's Issues

I cannot find the downsample branch parameters in resnet18 pretrain model: resnet-18-kinetics.pth. These are all the keys in pretrain model state_dict:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

I cannot find the downsample branch parameters in resnet18 pretrain model: resnet-18-kinetics.pth.
These are all the keys in pretrain model state_dict: