GithubHelp home page GithubHelp logo

kenshohara / 3d-resnets-pytorch Goto Github PK

View Code? Open in Web Editor NEW
3.9K 58.0 931.0 336 KB

3D ResNets for Action Recognition (CVPR 2018)

License: MIT License

Python 100.00%
deep-learning computer-vision pytorch python action-recognition video-recognition

3d-resnets-pytorch's Introduction

3D ResNets for Action Recognition

Update (2020/4/13)

We published a paper on arXiv.

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time.

Update (2020/4/10)

We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.

This update includes as follows:

  • Refactoring whole project
  • Supporting the newer PyTorch versions
  • Supporting distributed training
  • Supporting training and testing on the Moments in Time dataset.
  • Adding R(2+1)D models
  • Uploading 3D ResNet models trained on the Kinetics-700, Moments in Time, and STAIR-Actions datasets

Summary

This is the PyTorch code for the following papers:

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Towards Good Practice for Action Recognition with Spatiotemporal 3D Convolutions",
Proceedings of the International Conference on Pattern Recognition, pp. 2516-2521, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes training, fine-tuning and testing on Kinetics, Moments in Time, ActivityNet, UCF-101, and HMDB-51.

Citation

If you use this code or pre-trained models, please cite the following:

@inproceedings{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={6546--6555},
  year={2018},
}

Pre-trained models

Pre-trained models are available here.
All models are trained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S), or merged datasets of them (KM, KS, MS, KMS).
If you want to finetune the models on your dataset, you should specify the following options.

r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

Old pretrained models are still available here.
However, some modifications are required to use the old pretrained models in the current scripts.

Requirements

conda install pytorch torchvision cudatoolkit=10.1 -c soumith
  • FFmpeg, FFprobe

  • Python 3

Preparation

ActivityNet

  • Download videos using the official crawler.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet
  • Add fps infomartion into the json file util_scripts/add_fps_into_activitynet_json.py
python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

Kinetics

  • Download videos using the official crawler.
    • Locate test set in video_directory/test.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics
  • Generate annotation file in json format similar to ActivityNet using util_scripts/kinetics_json.py
    • The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.
python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101
  • Generate annotation file in json format similar to ActivityNet using util_scripts/ucf101_json.py
    • annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51
  • Generate annotation file in json format similar to ActivityNet using util_scripts/hmdb51_json.py
    • annotation_dir_path includes brush_hair_test_split1.txt, ...
python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

Confirm all options.

python main.py -h

Train ResNets-50 on the Kinetics-700 dataset (700 classes) with 4 CPU threads (for data loading).
Batch size is 128.
Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=....

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --model resnet \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/save_100.pth is loaded.)

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_100.pth \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json --subset val -k 1 --ignore

Fine-tune fc layers of a pretrained model (~/data/models/resnet-50-kinetics.pth) on UCF-101.

python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json \
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 \
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc \
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5

3d-resnets-pytorch's People

Contributors

cagbal avatar husencd avatar kenshohara avatar skamdar avatar skrish13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-resnets-pytorch's Issues

normalization

Are the inputs of all pretrained models normalized by mean? I find you add the mean of kinetics, while there is only the mean of of activitynet several weeks ago, so could tell us which mean is used to train the pretrained models? When I finetune the pretrained models to my custom dataset, need I compute the mean of the custom dataset?

Pretrained resnext-101-64f-kinetics model

Hi,

I have seen you uploaded pretrained models using 64 frames for ucf101 and hmdb51.
Is there a chance you can upload a 64f model pretrained using kinetics only?

I would like to compare results between 16f and 64f, but in order to make a proper comparison I would rather use the pretrained model only on kinetics as well.

Thanks for the great work!

Data loading speed is too slow

Thank you for providing a nice code!

I tested the pretrained model "resnext-101-64f-kinetics-ucf101_split1.pth" using UCF 101.
I got 93.99% video level accuracy.

However, the computational speed is really slow (roughly a few hours) because of data loading.
Although I'm using HDD not SSD, data loading speed is much slower than my expectation.
Especially, (4n+1) times speed is significantly slow (4 is the number of threads).
I added my log file as follows:

[1/197] Time 335.936 (335.936) Data 333.094 (333.094)
[2/197] Time 1.754 (168.845) Data 0.000 (166.547)
[3/197] Time 1.758 (113.149) Data 0.000 (111.032)
[4/197] Time 1.769 (85.304) Data 0.000 (83.274)
[5/197] Time 298.968 (128.037) Data 297.199 (126.059)
[6/197] Time 1.750 (106.989) Data 0.000 (105.049)
[7/197] Time 1.760 (91.956) Data 0.000 (90.042)
[8/197] Time 1.757 (80.681) Data 0.000 (78.787)
[9/197] Time 280.848 (102.922) Data 279.067 (101.040)
[10/197] Time 1.766 (92.807) Data 0.000 (90.936)
[11/197] Time 1.754 (84.529) Data 0.000 (82.669)
[12/197] Time 1.760 (77.632) Data 0.000 (75.780)
[13/197] Time 290.565 (94.011) Data 288.792 (92.166)
[14/197] Time 1.750 (87.421) Data 0.000 (85.582)
[15/197] Time 1.763 (81.711) Data 0.000 (79.877)
[16/197] Time 1.756 (76.713) Data 0.000 (74.885)
[17/197] Time 303.138 (90.032) Data 301.384 (88.208)
[18/197] Time 4.009 (85.253) Data 2.253 (83.433)
[19/197] Time 3.780 (80.965) Data 2.027 (79.148)
[20/197] Time 1.759 (77.005) Data 0.000 (75.191) ...

Is the speed is normal or something wrong?
If it is something wrong, could you kindly let me know how to fix it?

No checkpoint saved

I am running the command of the readme

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json
--result_path results --dataset kinetics --model resnet
--model_depth 34 --n_classes 400 --batch_size 128 --n_threads 4 --checkpoint 5

(with my options), but somehow if I check in my results folder, where opts.json and stuff is saved, there is no pth file saved, even after 100 epochs of training. Do I have to specify the checkpoint path too when calling main.py ?

3d_vgg_model

@kenshohara Thanks for your wonderful works and sorry for bothering you!
I see that you have released c3d-sports1m-kinetics.t7( 608 MB), did it has the same architecture and get the same performance as the C3D(https://arxiv.org/abs/1412.0767) ?
Besides, i can not find any results about this model on any dataset in your provided papers. Could you please show the accuarcy on some datasets(ucf101,hmdb51)?
Finally, could you please share the code that finetuning the c3d-sports1m-kinetics.t7 on the UCF101?
Thank you in advance!

what should i do if i wanna evaluate this model on my own dataset ?

Hi kenshohara !
I want to evaluate this model on my own dataset without modifying too much the code. I have skimmed through the code and then i think i should modify the file dataset.py. However i dont know how to begin to modify it. Would you please give me some suggestions ? Thanks

HMDB51 annonation

Hi, where i can get this section "annotation_dir_path includes brush_hair_test_split1.txt". I don't find it in the dataset website.

Pretrained ActivityNet

Dear Kensho

Thanks for your github, which is very useful for the community! I wonder if any pre-trained ActivityNet model could be downloaded?

Thanks & Bests

OM

Accuracy of fine-tuning on UCF-101

Hello!
I got 85.2% when fine tuning resnet-50 on UCF-101 split-1 instead of 89%, my settings are:

python main.py --root_path ~/big/3D-ResNets-PyTorch --video_path ~/big/UCF-101_jpg --annotation_path utils/ucf101_01.json --result_path ucf101_results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path model_weights/resnet-50-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 50 --resnet_shortcut B --batch_size 128 --n_threads 4 --checkpoint 5 --learning_rate 0.001

The downsample branch parameters of resnet18 pretrain model is missing

I cannot find the downsample branch parameters in resnet18 pretrain model: resnet-18-kinetics.pth.
These are all the keys in pretrain model state_dict:

module.conv1.weight
module.bn1.weight
module.bn1.bias
module.bn1.running_mean
module.bn1.running_var
module.layer1.0.conv1.weight
module.layer1.0.bn1.weight
module.layer1.0.bn1.bias
module.layer1.0.bn1.running_mean
module.layer1.0.bn1.running_var
module.layer1.0.conv2.weight
module.layer1.0.bn2.weight
module.layer1.0.bn2.bias
module.layer1.0.bn2.running_mean
module.layer1.0.bn2.running_var
module.layer1.1.conv1.weight
module.layer1.1.bn1.weight
module.layer1.1.bn1.bias
module.layer1.1.bn1.running_mean
module.layer1.1.bn1.running_var
module.layer1.1.conv2.weight
module.layer1.1.bn2.weight
module.layer1.1.bn2.bias
module.layer1.1.bn2.running_mean
module.layer1.1.bn2.running_var
module.layer2.0.conv1.weight
module.layer2.0.bn1.weight
module.layer2.0.bn1.bias
module.layer2.0.bn1.running_mean
module.layer2.0.bn1.running_var
module.layer2.0.conv2.weight
module.layer2.0.bn2.weight
module.layer2.0.bn2.bias
module.layer2.0.bn2.running_mean
module.layer2.0.bn2.running_var
module.layer2.1.conv1.weight
module.layer2.1.bn1.weight
module.layer2.1.bn1.bias
module.layer2.1.bn1.running_mean
module.layer2.1.bn1.running_var
module.layer2.1.conv2.weight
module.layer2.1.bn2.weight
module.layer2.1.bn2.bias
module.layer2.1.bn2.running_mean
module.layer2.1.bn2.running_var
module.layer3.0.conv1.weight
module.layer3.0.bn1.weight
module.layer3.0.bn1.bias
module.layer3.0.bn1.running_mean
module.layer3.0.bn1.running_var
module.layer3.0.conv2.weight
module.layer3.0.bn2.weight
module.layer3.0.bn2.bias
module.layer3.0.bn2.running_mean
module.layer3.0.bn2.running_var
module.layer3.1.conv1.weight
module.layer3.1.bn1.weight
module.layer3.1.bn1.bias
module.layer3.1.bn1.running_mean
module.layer3.1.bn1.running_var
module.layer3.1.conv2.weight
module.layer3.1.bn2.weight
module.layer3.1.bn2.bias
module.layer3.1.bn2.running_mean
module.layer3.1.bn2.running_var
module.layer4.0.conv1.weight
module.layer4.0.bn1.weight
module.layer4.0.bn1.bias
module.layer4.0.bn1.running_mean
module.layer4.0.bn1.running_var
module.layer4.0.conv2.weight
module.layer4.0.bn2.weight
module.layer4.0.bn2.bias
module.layer4.0.bn2.running_mean
module.layer4.0.bn2.running_var
module.layer4.1.conv1.weight
module.layer4.1.bn1.weight
module.layer4.1.bn1.bias
module.layer4.1.bn1.running_mean
module.layer4.1.bn1.running_var
module.layer4.1.conv2.weight
module.layer4.1.bn2.weight
module.layer4.1.bn2.bias
module.layer4.1.bn2.running_mean
module.layer4.1.bn2.running_var
module.fc.weight
module.fc.bias

No downsample branch parameters in the above keys.

question about the 'Temporal duration of inputs'

Hi@kenshohara ,
in the opts.py ,whether I can change temporal duration of inputs in parser.add_argument('--sample_duration', default=16, type=int, help='Temporal duration of inputs'),like 32 frames,64 frames,etc? have you take the similar experiments? I really appreciate for your reply, Thanks.

fine-tuning resnet-18 on UCF

Hi, great work! Thank you.

Could you please tell me how long it took for you to fine-tune resnet-18(pretrained on kinetics) on UCF101 to get the reported accuracy (~84%)? Also, was there any specific hyperparameter setting I need to change while fine-tuning except freezing conv layers?

-Ananth

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339 terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184

dataset loading [0/3570]
dataset loading [1000/3570]
dataset loading [2000/3570]
dataset loading [3000/3570]
dataset loading [0/1530]
dataset loading [1000/1530]
run
train at epoch 1
Epoch: [1][1/112] Time 4.807 (4.807) Data 2.836 (2.836) Loss 3.9053 (3.9053) Acc 0.000 (0.000)
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 137, in
train_logger, train_batch_logger)
File "/media/ole/Document/Ubuntu/Research/3D-ResNets-PyTorch/train.py", line 31, in train_epoch
acc = calculate_accuracy(outputs, targets)
File "/media/ole/Document/Ubuntu/Research/3D-ResNets-PyTorch/utils.py", line 58, in calculate_accuracy
n_correct_elems = correct.sum().data[0]
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184

RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

Hi dear
Need help....on running main.py ,everything is going well till dataset loading as shown below:
model generated
dataset loading [0/9537]
dataset loading [1000/9537]
dataset loading [2000/9537]
dataset loading [3000/9537]
dataset loading [4000/9537]
dataset loading [5000/9537]
dataset loading [6000/9537]
dataset loading [7000/9537]
dataset loading [8000/9537]
dataset loading [9000/9537]
dataset loading [0/3783]
dataset loading [1000/3783]
dataset loading [2000/3783]
dataset loading [3000/3783]
run

error occured here:

train at epoch 1
Traceback (most recent call last):
File "/media/psrana/New Volume/chandni/HAR_3D_TU/main.py", line 139, in
train_logger, train_batch_logger)
File "/media/psrana/New Volume/chandni/HAR_3D_TU/train.py", line 22, in train_epoch
for i, (inputs, targets) in enumerate(data_loader):
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter
return DataLoaderIter(self)
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 242, in init
self._put_indices()
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 290, in _put_indices
indices = next(self.sample_iter, None)
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 119, in iter
for idx in self.sampler:
File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 50, in iter
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

what could be the

Continue training on fine-tune model

I have used the code to fine-tune on hmdb51, I use the following command.
python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_1.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_classes 400 --n_finetune_classes 51 --pretrain_path ~/Research/3D-ResNets-PyTorch/pretrain/resnext-101-kinetics.pth --ft_begin_index 5 --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --checkpoint 5 --n_epochs 20

After training, I got 'save_20.pth' weight, then I run the following code to continue training from 21st epochs.
python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_2.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_classes 51 --resume_path ~/Research/3D-ResNets-PyTorch/results/hmdb51/save_20.pth --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --checkpoint 5 --n_epochs 20

I got an error:
Traceback (most recent call last): File "main.py", line 131, in <module> optimizer.load_state_dict(checkpoint['optimizer']) File "/home/ole/anaconda3/lib/python3.6/site-packages/torch/optim/optimizer.py", line 87, in load_state_dict raise ValueError("loaded state dict has a different number of ValueError: loaded state dict has a different number of parameter groups

How can I fix this?

Incorrect Conv3d weights initialization?

The Resnet and ResNeXt models (I haven't checked others) seem to be trying to initialize the weights using kaiming's initialization method in his Resnet paper using a normal distribution. However, by comparing the codes with the details in the paper as well as pytorch's own implementation (yes, pytorch has implemented kaiming's initializations), the Conv3d weight initialization seems to miss calculating the size of the third dimension in the kernel, i.e. instead of using

if isinstance(m, nn.Conv3d):
    # Kernel is 3D but here only considers the time and row
    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 
    m.weight.data.normal_(0, math.sqrt(2. / n))

we can use pytorch's implementation directly:

nn.init.kaiming_normal(m.weight, mode='fan_out')

whose fan in/out factor is calculated by

        num_input_fmaps = tensor.size(1)
        num_output_fmaps = tensor.size(0)
        receptive_field_size = 1
        if tensor.dim() > 2:
            receptive_field_size = tensor[0][0].numel() # which should be kernel_size[0]*kernel_size[1]*kernel_size[2]
        fan_in = num_input_fmaps * receptive_field_size
        fan_out = num_output_fmaps * receptive_field_size

I've done a quick test (train 200 epochs once) on the mini-kinetics dataset and by fixing the weights initialization the accuracy seems to improve.

Let me know if it makes sense.

ActivityNet Download

you said downloading datasets using official crawler codes, can you show me the offiical code or some guidence to download the dataset

Low accuracy on HMDB51

Hi, nice work first.
I have finetune my model on hmdb51, then I got a checkpoint save_200.pth. Then I try to run the following script to evaluate on validation.

python main.py --root_path ~/Research/datasets --video_path hmdb51/jpg --annotation_path hmdb51/testTrainMulti/hmdb51_1.json --result_path ~/Research/3D-ResNets-PyTorch/results/hmdb51 --dataset hmdb51 --n_finetune_classes 51 --n_classes 51 --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32 --batch_size 32 --n_threads 4 --test --test_subset val --pretrain_path ~/Research/3D-ResNets-PyTorch/results/hmdb51/save_200.pth --no_train --no_val

After that I got a file called val.json. Then I run evaluate the script you provide in utils/eval_hmdb51.py.

hmdb = HMDBclassification('hmdb51_1.json', 'val.json',verbose=True, top_k=1) hmdb.evaluate()
Then I got

[INIT] Loaded annotations from validation subset.
Number of ground truth instances: 1530
Number of predictions: 15290
[RESULTS] Performance on ActivityNet untrimmed video classification task.
Error@1: 0.9784313725490196

It seems that something wrong with the prediction numbers. Can you tell me how you run the evaluation script. Thx :)

Very Slow Training

I am training Resnet with depth 34 on the kinetics dataset, however the training procedure is not improving anything. How long does it take till the model starts improving ? I have attached a screenshot; currently I am at epoch 34 but the loss is still 5.99 and is not decreasing, and accuracy is very volatile

selection_121

Evaluation HMDB51

Hi Kenshohara, I'm not sure how to evaluate the HMDB51 test set on the model which is fine-tuned. And I wonder how to loaded HMDB51 pretrained weight on your model. I've tried but it report some dictionary errors.

Test set Kinetics

Hi,

Thanks for releasing this awesome repository. However I cannot find the test set son file (matching the youtube_id to a label) on the Kinetics dataset webpage.
Could you show where to find it?

Thanks

Input of Densenet

Thank you for your wonderful work.
Just read the paper, it is noted that each clip contains 16 frames. I read two other papers in which the author claims that 32 frame input would be better, have you tried 32 frames input? If you trained such models, can you please release the pretrained models?

finetune on custom dataset

I have a small two classes dataset, which have 1000 videos for each class. I want to use finetune your pretrained models, but it seems to overfit my dataset. How can i figure it out? Enlarge my dataset?

finetune on hmdb51, low accuracy on val set

Hi, I finetune the pretrained resnet-34-kinetics model on hmdb51, with the following command:
python main.py --root_path ~/data --video_path hmdb51/jpg --annotation_path hmdb51_1.json --result_path resnet34_finetune_hmdb51_results --dataset hmdb51 --n_classes 400 --n_finetune_classes 51 --pretrain_path models/resnet-34-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5
When I check the performance on train and val sets, accuracy(train) is kind of high - around 0.8, while accuracy(val) is always around 0.5. Is this correct? It seems the training overfits into the train set?

Other dataset

Did you try Charades dataset? It seems need more temporal information to classify.

requirements

Please mention hardware and software requirements,brief steps to follow to use this code.

kinetics database

Hi,first ,you did a very nice work,and I am very interested in it.
But now ,I can't download kinetics database from the official crawler because I can't access YouTube(No vpn).
So,could you please offer the database in somewhere?such as:Google Drive?

Best wishes for you.
sincerely ckjiao(LakyTT)

CPU issue

I am trying to use resnet-34 (cpu version) for both classification and feature extraction. Here is the error:
(tensorflow) mariankyoussef@elecsim:~/video-classification-3d-cnn-pytorch$ python main.py --input ./input --video_root ./home/mariankyoussef/UCF101_videos --output ./output.json --model ./models/resnet-34-kinetics-cpu.pth --mode score --no_cuda
loading model ./models/resnet-34-kinetics-cpu.pth

Traceback (most recent call last):
File "main.py", line 24, in
model_data = torch.load(opt.model)
File "/home/mariankyoussef/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/serialization.py", line 267, in load
return _load(f, map_location, pickle_module)
File "/home/mariankyoussef/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/serialization.py", line 410, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: invalid load key, '<'.

I didn't try to fix the error yet, but I am wondering if the first couple of lines have the right settings?

Error while loading weights

KeyError: 'unexpected key "module.features.conv0.weight" in state_dict'

I'm using densenet-201-kinetics.pth file with the densenet.py file from models folder.

net = densenet.densenet201(sample_size=64, sample_duration=30, num_classes=400)
pretrained_weights = torch.load(pretrained_path)
net.load_state_dict(pretrained_weights['state_dict'])

regarding batch_size

Hi dear
can u please tell about how to choose batch size?
My system has one GPU(8 GB memory) and System memory is 64 GB.
when i tried to run with default batch size of 128 , it gave run time error: out of memory
Then i reduced it to 20, it started working.
But i want to know what should be appropriate batch size according to my system configuration, little about how to set appropriate batch size and whether it will affect accuracy?
I would be very much thankful for your suggestion on this....

Performance of fine-tuning on UCF101

I downloaded the network ResNet-101 pretrained on Kinetics, and fine-tuned on UCF101 following the example script. However, I can only get 82.5 by averaging the three splits. In the paper, the authors reported 88.9. Any suggestion?

options for testing

on running code for resnext101(fine-tuning resnext 101 kinetics pretrained model ) i got clip accuracy around 86 %.
val.json file is created during testing as per ur code, now i want to get video accuracy, so first i need to to test by setting no_val and no_train =true and Test=true in opts.py.
i ran code with above options, it created val.json
on evaluating val.json its giving accuracy =0.28
do in neeed to change any other option in opts.py?

can please mention options for testing only using fine tuned model.

low validation accuracy with pre-train model on kinetic dataset

Hi,when I try to apply the pre-train model(resnext-101-64f-kinetics.pth)on the validation set from Kinetic dataset, the accuracy turns out to be very low(much like random prediction). I have check the loader and did not modify the code. I am wondering how does the pre-train model come from. Did you directly train Pytorch code from zero or transfer the weights from Torch-version pre-train model? Thx!

whether the 16 frames in one clip need the uniform spatial transform?

According to the 181 line of kinetics.py and the 69 line of main.py, because of the the existence of randomize.parameter(),I find the 16 frames in one clip have different spatial transform ,Will it have any impact?in my opinion , the 16 frames in one clip have the uniform spatial transform may be more reasonable.

RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

In the experiment of fine-tuning conv5_x and fc layers of a pretrained model on UCF-101. I got a size mismatch error. I have checked the shape of UCF-101 input data is (128L, 3L, 16L, 112L, 112L).

Complete error message:
Traceback (most recent call last): File "main.py", line 140, in <module> train_logger, train_batch_logger) File "/home/magic/yc/ActionRecognition/3D-ResNets-PyTorch-master/train.py", line 34, in train_epoch outputs = model(inputs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/magic/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

My command:
python main.py --root_path ./data --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path models/resnet-34-kinetics.pth --ft_begin_index 4 --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5

Other function(bounding box regression)

Such as UCF101 or HMDB51 datasets, labeling the entire image.
Then,The Google AVA dataset has action annotation and bounding box annotation.

Is it possible to modify the network with bounding box regression (object detection such as yolo)
It can make the network more accurate by learning human the action.

keyerror when loading pretrained model

hi dear
when i tried to run command:
python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json
--result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101
--pretrain_path models/resnet-34-kinetics-cpu.pth --ft_begin_index 4
--model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --checkpoint 5

i have also set no_cuda=True in opts as i m running it on cpu only**

I m getting this error:

Traceback (most recent call last):
File "HAR_3D/main.py", line 47, in
model, parameters = generate_model(opt)
File "/home/chandni/HAR_3D/model.py", line 178, in generate_model
model.load_state_dict(pretrain['state_dict'])
File "/home/chandni/anaconda3/envs/har/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "module.conv1.weight" in state_dict'

any idea to resolve this issue

Folder structure for Kinetics train, val and test data

For me it is not quite clear how to structure the video_directory with these datasets for kinetics. Should it be video_directory/{train,val,test}/jpg, so that it can train and validate at the same time ? or is there another folder structure i should adhere to ?

RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

@kenshohara Thanks for your wonderful works and sorry for bothering you.
I try to repeat your work, but there are some problems when I try it. I did not change your code except for the necessary place, Could you please help me to fix this problem? Thank you so much.

The problem is showed as blow:

Traceback (most recent call last):
File "main.py", line 123, in
opt, train_logger, train_batch_logger)
File "/home/deep_ww/3D-ResNets-PyTorch-master/train.py", line 29, in train_epoch
outputs = model(inputs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/3D-ResNets-PyTorch-master/resnet.py", line 164, in forward
x = self.fc(x)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/home/deep_ww/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 835, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

Training accuracy on Kinetics

Hi
Since I cannot find, what training accuracy you achieved on the Kinetics training set, I am not sure if the accuracy I obtained is high enough?
I obtain around 30% training accuracy, however the loss is similar to that reported in the paper (around 3), so I am not sure if the accuracy is also similiar ?

Is the train.log right?

@kenshohara

First , it is a great job!

I use the kinetics data to train a resnet-34-kinetics model. Every action have 50 MP4s. I train the model just like this:
python3 main.py --root_path ~/3D-ResNets-PyTorch --video_path kineticsJPG --annotation_path kineticsjson/kinetics.json --result_path results --dataset kinetics --model resnet --model_depth 34 --resnet_shortcut A --pretrain_path models/resnet-34-kinetics.pth --n_classes 400 --batch_size 30 --n_threads 4 --checkpoint 5
It success to create a model "save_30.pth".
My train.log file looks like this:
epoch loss acc lr
21 5.05777625165984 0.050443081117927745 0.1
22 5.051979898375048 0.05023333857689686 0.1
23 5.037916659033671 0.05075769492947407 0.1
24 5.020105431997609 0.05448062503277227 0.1
25 5.006561265645096 0.05689266425462745 0.1
26 4.986441563412287 0.05615856536101935 0.1
27 4.982790104450624 0.05909496093545173 0.1
28 4.962981188055365 0.05883278275916313 0.1
29 4.957985667213246 0.06077290126369881 0.1
30 4.930375250401841 0.06276545540349221 0.1
Is it OK?
When i use "video-classification-3d" to check the model ,
Command is:
python3 main.py --input ./input --video_root ./videos --output ./output.json --model pathto/save_30.pth --mode score

I find the result is poor.

Why?
Is the train.log right?
Do i train the model sufficiently?

error_of_--resume

Thanks for your sharing! @kenshohara
when i use the --resume, I met an error like this :
KeyError: 'missing keys in state_dict: "set(['module.layer2.0.downsample.1.running_var', 'module.layer3.0.downsample.1.running_var', 'module.layer2.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_var', 'module.layer3.0.downsample.1.weight', 'module.layer2.0.downsample.0.weight', 'module.layer3.0.downsample.0.weight', 'module.layer4.0.downsample.0.weight', 'module.layer4.0.downsample.1.bias', 'module.layer4.0.downsample.1.weight', 'module.layer3.0.downsample.1.bias', 'module.layer2.0.downsample.1.weight', 'module.layer2.0.downsample.1.bias', 'module.layer3.0.downsample.1.running_mean'])"'

Could you please tell how to debug it ?

val.json file

Hi dear
Can u please upload your val.json for ucf101 split-1?

Performance of pretrained weights on UCF101

Hi,
Nice work! I have a question about your results on UCF101 split 1. I've evaluated your pretrained weight of "resnext-101-kinetics-ucf101_split1.pth" on UCF101 split 1 and got the accuracy of ~85.99. I'm wondering if it is the right accuracy or not. Would you please provide the accuracies of the pretrained models?

Size mismatch in resnet forward pass

I am running the following command.
CUDA_VISIBLE_DEVICES=2 python main.py --root_path --video_path ucf101_jpg --annotation_path ucfTrainTestlist/ucf101_01.json --result_path results --dataset ucf101 --n_classes 400 --n_finetune_classes 101 --pretrain_path models/resnet-34-kinetics.pth --model resnet --model_depth 34 --resnet_shortcut A --batch_size 128 --n_threads 4 --no_train --no_val --test

I followed the steps in the readme to set up the ucf101 jpg frames and annotations. I printed out the shape of x after each layer in the forward pass and I get the following before the size mismatch error occurs.
input: (128L, 3L, 16L, 112L, 112L)
self.conv1 output: (128L, 64L, 16L, 56L, 56L)
self.bn1 output: (128L, 64L, 16L, 56L, 56L)
self.relu output: (128L, 64L, 16L, 56L, 56L)
self.maxpool output: (128L, 64L, 8L, 28L, 28L)
self.layer1 output: (128L, 64L, 8L, 28L, 28L)
self.layer2 output: (128L, 128L, 4L, 14L, 14L)
self.layer3 output: (128L, 256L, 2L, 7L, 7L)
self.layer4 output: (128L, 512L, 1L, 4L, 4L)
self.avgpool output: (128L, 512L, 1L, 2L, 2L)
x.view(x.size(0), -1) output: (128L, 2048L)

The issue is that self.fc is (512, 101). As a temporary hack, I changed the stride of self.avgpool from 1 to 2, but otherwise I am not sure where the error is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.