GithubHelp home page GithubHelp logo

yjxiong / tsn-pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.1K 28.0 314.0 31 KB

Temporal Segment Networks (TSN) in PyTorch

License: BSD 2-Clause "Simplified" License

Python 100.00%
action-recognition deep-learning video-understanding pytorch temporal-segment-networks

tsn-pytorch's Introduction

TSN-Pytorch

We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as other STOA frameworks for various tasks. The lessons we learned in this repo are incorporated into MMAction to make it bettter. We highly recommend you switch to it. This repo will remain here for historical references.

Note: always use git clone --recursive https://github.com/yjxiong/tsn-pytorch to clone this project. Otherwise you will not be able to use the inception series CNN archs.

This is a reimplementation of temporal segment networks (TSN) in PyTorch. All settings are kept identical to the original caffe implementation.

For optical flow extraction and video list generation, you still need to use the original TSN codebase.

Training

To train a new model, use the main.py script.

The command to reproduce the original TSN experiments of RGB modality on UCF101 can be

python main.py ucf101 RGB <ucf101_rgb_train_list> <ucf101_rgb_val_list> \
   --arch BNInception --num_segments 3 \
   --gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 \
   -b 128 -j 8 --dropout 0.8 \
   --snapshot_pref ucf101_bninception_ 

For flow models:

python main.py ucf101 Flow <ucf101_flow_train_list> <ucf101_flow_val_list> \
   --arch BNInception --num_segments 3 \
   --gd 20 --lr 0.001 --lr_steps 190 300 --epochs 340 \
   -b 128 -j 8 --dropout 0.7 \
   --snapshot_pref ucf101_bninception_ --flow_pref flow_  

For RGB-diff models:

python main.py ucf101 RGBDiff <ucf101_rgb_train_list> <ucf101_rgb_val_list> \
   --arch BNInception --num_segments 7 \
   --gd 40 --lr 0.001 --lr_steps 80 160 --epochs 180 \
   -b 128 -j 8 --dropout 0.8 \
   --snapshot_pref ucf101_bninception_ 

Testing

After training, there will checkpoints saved by pytorch, for example ucf101_bninception_rgb_checkpoint.pth.

Use the following command to test its performance in the standard TSN testing protocol:

python test_models.py ucf101 RGB <ucf101_rgb_val_list> ucf101_bninception_rgb_checkpoint.pth \
   --arch BNInception --save_scores <score_file_name>

Or for flow models:

python test_models.py ucf101 Flow <ucf101_rgb_val_list> ucf101_bninception_flow_checkpoint.pth \
   --arch BNInception --save_scores <score_file_name> --flow_pref flow_

tsn-pytorch's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tsn-pytorch's Issues

Running out of memory

I was trying to run training for UCF-101 RGB split-1 but the model seems to be running out of memory. I am using a GPU with 16 GB VRAM. What is the memory requirement ?

Loss doesn't decrease when training optical flow model based on BNInception

Thanks for your great job! But when I train TSN flow model on myself datasets(There are about 25000 training examples), the training loss and test loss cannot be reduced anymore when it decreased to about 1.8. After that, the training loss and test loss will stabilise at about 1.8, even though I have tried to decrease learning rate and increase training loop.

My training strategies are the same as what you write down on "readme.md".

python main.py ucf101 Flow <ucf101_flow_train_list> <ucf101_flow_val_list> \
   --arch BNInception --num_segments 3 \
   --gd 20 --lr 0.001 --lr_steps 190 300 --epochs 340 \
   -b 128 -j 8 --dropout 0.7 \
   --snapshot_pref ucf101_bninception_ --flow_pref flow_  

I don't know why the training loss will get stuck in 1.8, and top1 accuracy of training set is only about 60%.

Does there any other methods that I can try to fix the proplem? Will Adam be more efficiency than SGD?

randint size

It seems you forgot setting the size of randint to be the num_segments in the line 70 in file dataset.py.

TSN followed by RNN

Hi,
Would you plan to release the RNN version of TSN? Like `Long Term Recurrent Convolutioanl Neural Network" by J Donahue ?
Moreover, why torch.nn.DataParallel on non-zero GPUs always raise errors? Is there a smart way to fix this?

Thanks!

About generating RGB diff input images

@yjxiong hi,
For generating the RGB diff images as the input to train the flow branch, you said we just directly subtract two consecutive frames. Does that mean, for example, I want to generate all the input images (frames, optical flow, rgb diff) from dense_flow code, I added the following in the dense_flow_gpu.cpp:

image_diff = capture_image - prev_image;
imencode(".jpg", image_diff, str_img);

Is this the correct way to generate the diff image? If yes, then the image_diff can range from -255 to 255. The visualization image is most black with some edge highlighted region.

If NO, do we need to set any bound (like optical flow one) to normalize the image_diff (-255255) to (0255) in dense_flow_gpu.cpp code? Like the following:
#define CAST(v, L, H) ((v) > (H) ? 255 : (v) < (L) ? 0 : cvRound(255*((v) - (L))/((H)-(L))))
for (int i = 0; i < image_diff.rows; ++i) {
for (int j = 0; j < image_diff.cols; ++j) {
for (int k = 0; k < 3; k++) {
float t = image_diff.at(i,j)[k];
//bound = 255
image_diff.at(i,j)[k] = CAST(t, -bound, bound);
}
}
}
#undef CAST
In this case after normalization, we will have visualization image with most gray region with edge highlighted region, just like the optical flow one.

After storing all the diff images, we run the RGB diff script to train the RGB diff branch, right?

Many thanks.

can not test

why i can not use the file ucf101_bninception__rgb_checkpoint.pth.tar

Traceback (most recent call last):
File "test_models.py", line 53, in
checkpoint = torch.load(args.weights)
File "/home/xxx/anaconda3/envs/py35/lib/python3.5/site-packages/torch/serialization.py", line 265, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'ucf101_bninception2_rgb_checkpoint.pth'

but i can not decompression ucf101_bninception__rgb_checkpoint.pth.tar

Network is unreachable

When I run the training script, I encounter the following error:

Downloading: "https://yjxiong.blob.core.windows.net/models/bn_inception-9f5701afb96c8044.pth" to /mnt/lustre/ganweihao/.torch/models/bn_inception-9f5701afb96c8044.p
th

Initializing TSN with base model: BNInception.
TSN Configurations:
input_modality: RGB
num_segments: 3
new_length: 1
consensus_module: avg
dropout_ratio: 0.8

Traceback (most recent call last):
File "main.py", line 301, in
main()
File "main.py", line 35, in main
consensus_type=args.consensus_type, dropout=args.dropout, partial_bn=not args.no_partialbn)
File "/mnt/lustre/ganweihao/codes/tsn-pytorch/models.py", line 39, in init
self._prepare_base_model(base_model)
File "/mnt/lustre/ganweihao/codes/tsn-pytorch/models.py", line 96, in _prepare_base_model
self.base_model = getattr(tf_model_zoo, base_model)()
File "/mnt/lustre/ganweihao/codes/tsn-pytorch/tf_model_zoo/bninception/pytorch_load.py", line 35, in init
self.load_state_dict(torch.utils.model_zoo.load_url(weight_url))
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/site-packages/torch/utils/model_zoo.py", line 56, in load_url
_download_url_to_file(url, cached_file, hash_prefix)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/site-packages/torch/utils/model_zoo.py", line 61, in _download_url_to_file
u = urlopen(url)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 1241, in https_open
context=self._context)
File "/mnt/lustre/ganweihao/anaconda3/envs/python27/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 101] Network is unreachable>

Any idea to solve this?
Many thanks.

resnet

Will the results become better by using the resnet101 as pretrained model?

About the evaluation

Hi, @yjxiong
I found you use 'accuracy' evaluation in training but use 'precision' evaluation in testing, am I right? If I'm right, please tell me why?

Pretrained Models and Performance

Hi,
I am wondering if there are any pretrained pytorch models that could be downloaded.
In addition, could you pls let us know the performance of this tsn-pytorch on UCF 101 / HMDB 51/ kinetics?
Thanks a lot!

issue about feature extraction

Hi, @yjxiong I want to use your model to extract video feature, my question is following:
1). We should use output of global_average layer as feature, suppose one video has 3 segments, then I can get feature with 3x1024 dimension, no matter rgb modality or flow modality, am I right?
2). Is there any official code to extract feature?
3). How to choose the number of segments when extracting feature, 3 or 25?
Thank you in advance!

confused about fusion of two streams

Hi, @yjxiong .Happy New Year!
I have some questions. Suppose I have two models, A and B. Test accuracy of two streams of A are both better than B, but the fused accuracy of A is worse than B. Is this right?
If it's true, how do I choose the stream model. I mean I don't know when to stop training, because lower RGB accuracy and Flow accuracy may cause higher fused accuracy.

Optical flow training tricks

Hi, Xiong:

sorry to impose again, My experiment of optical flow only receives an average accuracy of 84.4% on the split 1( result is given by the test script). My training setting is (batch_size=128, init_lr=0.001, lr_step=(190,300 epochs)), which is different from the caffe implement version of TSN (batch_size=24, init_lr=0.005,lr_step=(10000,16000 iterations)).

can you please help figure out which setting can bring the proposed result on your paper?
If both of them can't, so what is the best setting?

How to reproduce the experiment that combines RGB and optical flow?

Hello, thanks for your code sharing - a great work!

Still, I have a question - we can train models of RGB/RGB-diff/flow modality separately as you introduced in the README.md, but if we want to reproduce the experiment that combines RGB and optical flow, how can we achieve this? Should I write some my own code to jointly infer from two models of RGB and optical flow modality?

Thank you in advance!

testing process met error.

I run with the command:
CUDA_VISIBLE_DEVICES=2,3 python test_models.py ucf101 Flow data/ucf101_flow_val_split_1.txt ucf101_flow_bninception_flow_checkpoint.pth.tar --arch BNInception --save_score=flow_bninception --flow_pref flow_ --workers=1

The output content is as follows:

video 160 done, total 161/3783, average 3.05683402393 sec/video
video 161 done, total 162/3783, average 3.04620183986 sec/video
video 162 done, total 163/3783, average 3.05641095039 sec/video
video 163 done, total 164/3783, average 3.04159315766 sec/video
video 164 done, total 165/3783, average 3.06657971469 sec/video
video 165 done, total 166/3783, average 3.05109782535 sec/video
video 166 done, total 167/3783, average 3.06426371786 sec/video
video 167 done, total 168/3783, average 3.04766791633 sec/video
video 168 done, total 169/3783, average 3.06247991077 sec/video
video 169 done, total 170/3783, average 3.04995046503 sec/video
Traceback (most recent call last):
  File "test_models.py", line 125, in <module>
    for i, (data, label) in data_gen:
  File "/opt/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 187, in __next__
    return self._process_next_batch(batch)
  File "/opt/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
  File "/opt/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/S2/MI/zqj/video_classification/tsn-pytorch/dataset.py", line 99, in __getitem__
    return self.get(record, segment_indices)
  File "/S2/MI/zqj/video_classification/tsn-pytorch/dataset.py", line 107, in get
    seg_imgs = self._load_image(record.path, p)
  File "/S2/MI/zqj/video_classification/tsn-pytorch/dataset.py", line 51, in _load_image
    x_img = Image.open(os.path.join(directory, self.image_tmpl.format('x', idx))).convert('L')
  File "/opt/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2477, in open
    fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '../data/ucf101/flows_tvl1/v_Skijet_g04_c03/flow_x_00000.jpg'

THCudaCheckWarn FAIL file=/pytorch/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down

when i training RGB-diff models,There are some mistakes.

Freezing BatchNorm2D except the first one.
Test: [0/60] Time 97.470 (97.470) Loss 0.1188 (0.1188) Prec@1 95.312 (95.312) Prec@5 100.000 (100.000)
Test: [20/60] Time 6.395 (6.560) Loss 3.9672 (0.9311) Prec@1 43.750 (80.208) Prec@5 70.312 (95.461)
Test: [40/60] Time 15.587 (5.824) Loss 0.8059 (0.9010) Prec@1 84.375 (81.174) Prec@5 92.188 (95.351)
Testing Results: Prec@1 82.236 Prec@5 95.691 Loss 0.83703
Freezing BatchNorm2D except the first one.
Epoch: [55][0/150], lr: 0.00100 Time 52.754 (52.754) Data 52.346 (52.346) Loss 0.1943 (0.1943) Prec@1 93.750 (93.750) Prec@5 100.000 (100.000)
Traceback (most recent call last):
File "main.py", line 301, in
main()
File "main.py", line 124, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 157, in train
for i, (input, target) in enumerate(train_loader):
File "/home/kong.ye/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 275, in next
idx, batch = self._get_batch()
File "/home/kong.ye/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 254, in _get_batch
return self.data_queue.get()
File "/usr/lib/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/usr/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
File "/home/kong.ye/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 175, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 32569) is killed by signal: Killed.

what should I do,thanks

typeerror: mean received an invalid combination of arguments - got ()

Hi, I met trouble when running the train script:
python main.py ucf101 RGB ../TSN/data/ucf101_rgb_train_split_1.txt ../TSN/data/ucf101_rgb_val_split_1.txt --arch BNInception --num_segments 3 --gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 -b 128 -j 8 --snapshot_pref ucf101_bninception --b 128

however, an error appears as follows:(including the output of code info)

Initializing TSN with base model: BNInception.
TSN Configurations:
input_modality: RGB
num_segments: 3
new_length: 1
consensus_module: avg
dropout_ratio: 0.5

group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 69 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 69 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 2 params, lr_mult: 1, decay_mult: 0
Freezing BatchNorm2D except the first one.
Traceback (most recent call last):
File "main.py", line 301, in
main()
File "main.py", line 124, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 166, in train
output = model(input_var)
File "/opt/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/opt/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/opt/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 46, in parallel_apply
raise output
TypeError: mean received an invalid combination of arguments - got (dim=int, keepdim=bool, ), but expected one of:

  • no arguments
  • (int dim)

where is the bug?

Runtime Error in RGBDiff Experiment

Hello,

I am running the RGBDiff experiment my own dataset. I am sure the file list is made properly since I have tested them in original TSN repo(Caffe implementation).
I am using the following command -
python main.py sdha-actionness RGBDiff dataset/sdha/train_actionness_rgb.txt dataset/sdha/val_actionness_rgb.txt --arch BNInception --num_segments 7 --gd 40 --lr 0.001 --lr_steps 80 160 --epochs 180 -b 32 -j 8 --dropout 0.8 --snapshot_pref sdha_actionness_bninception_

I am getting the following error -

Initializing TSN with base model: BNInception.
TSN Configurations:
    input_modality:     RGBDiff
    num_segments:       7
    new_length:         5
    consensus_module:   avg
    dropout_ratio:      0.8

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py:360: UserWarning: src is not broadcastable to dst, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
  own_state[name].copy_(param)
Converting the ImageNet model to RGB+Diff init model
Traceback (most recent call last):
  File "main.py", line 316, in <module>
    main()
  File "main.py", line 37, in main
    consensus_type=args.consensus_type, dropout=args.dropout, partial_bn=not args.no_partialbn)
  File "/var/www/tsn-pytorch/models.py", line 49, in __init__
    self.base_model = self._construct_diff_model(self.base_model)
  File "/var/www/tsn-pytorch/models.py", line 268, in _construct_diff_model
    new_kernels = params[0].data.mean(dim=1).expand(new_kernel_size).contiguous()
RuntimeError: The expanded size of the tensor (15) must match the existing size (64) at non-singleton dimension 1. at /pytorch/torch/lib/TH/generic/THTensor.c:308

Kindly help.

different segment numbers for validation and testing?

In the main code, the segment number is the same in both training and validation (default=3), but in test_model.py, the segment number is 25. Why is that?

I know the reason for 25 segments, but I don't know why can't we use 25 segments for validation during training?

Thanks.

Problems with the test_models.py

Hi,

I have trained the RGB models for all 3 splits but I am facing some issues with the test_models.py program.

  • Line 48, while calling the model, two arguments are passed( rnn=args.rnn, rnn_mem_size=args.rnn_mem_size ) which are not valid.
  • If I remove these arguments and run, I am getting an list index out of range error on line 123.

Here is the error stack trace

model epoch 80 best prec@1: 83.4522855911
Traceback (most recent call last):
  File "test_models.py", line 123, in <module>
    rst = eval_video((i, data, label))
  File "test_models.py", line 111, in eval_video
    rst = net(input_var).data.cpu().numpy().copy()
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 56, in forward
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 67, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 30, in scatter_kwargs
    inputs = scatter(inputs, target_gpus, dim)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 25, in scatter
    return scatter_map(inputs)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 18, in scatter_map
    return tuple(zip(*map(scatter_map, obj)))
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
    return Scatter(target_gpus, dim=dim)(obj)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 59, in forward
    streams = [_get_stream(device) for device in self.target_gpus]
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 85, in _get_stream
    if _streams[device] is None:
IndexError: list index out of range

Parameter_load

If I don't want to load the BN-Inception pre-train weight, and set the random weight on BN-Inception. what should I do, just annotation the load_state_dict function? thx~

Runtime Error 59

I am trying to run the TSN training with the same specifications of the RGB script in the README.md file. And I end up with this error:

Epoch: [0][0/75], lr: 0.00100 Time 39.106 (39.106) Data 2.112 (2.112) Loss 4.6157 (4.6157) Prec@1 0.781 (0.781) Prec@5 3.125 (3.125)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generated/../generic/THCTensorSort.cu line=153 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 301, in
main()
File "main.py", line 124, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 170, in train
prec1, prec5 = accuracy(output.data, target, topk=(1,5))
File "main.py", line 289, in accuracy
_, pred = output.topk(maxk, 1, True, True)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generated/../generic/THCTensorSort.cu:153
terminate called without an active exception
Aborted (core dumped)

I was checking the Pytorch guide but there is nothing there about it. I am using Pytorch 0.2, but the code doesn't specify which version to use.

About the train and test accuracy

Hi yjxiong:
I train and test your TSN model with parameters in README.md.
I got the result as follow:
Model | split1 | split2 | split3
RGB | 84.75% | 84.13% | 85.24%
Flow | 79.31% | 81.00% | 81.09%
The accuracy of RGB model is similar to your project website http://yjxiong.me/others/tsn/.
But, the accuracy of Flow is very different from yours.
So, my question is following:

  1. Can the pytoch TSN code achieve the offical accuracy in http://yjxiong.me/others/tsn/.
  2. Is there anything else to pay attention to when i train the flow model?

Thank you in advance!

Dropout wasn't working in the code?

Hi, I noticed that the author defined a dropout layer here, but it seems that this layer wasn't used during the forward of the network. Could you please explain about the dropout implementation in more detail?

num_segments when testing

Hi, @yjxiong I saw the following code in test_models.py:

net = TSN(num_class, 1, args.modality,
base_model=args.arch,
consensus_type=args.crop_fusion_type,
dropout=args.dropout)

Does it mean we set num_segmenrts=1 when testing and why??
please help me! Thank you!

RuntimeError: cuda runtime error (2) : out of memory

While testing the RGBDiff model using the command
python test_models.py ucf101 RGBDiff /media/sda/nandan/data/ucf101_rgb_val_split_1.txt ucf101_bninception__rgbdiff_checkpoint.pth.tar --arch BNInception --save_scores SCORE_UCF101_1_RGBDIFF --workers=2
I'm getting this error
Traceback (most recent call last):
File "test_models.py", line 130, in
rst = eval_video((i, data, label))
File "test_models.py", line 117, in eval_video
rst = net(input_var).data.cpu().numpy().copy()
File "/home/nandan/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/nandan/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/nandan/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/nandan/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCStorage.cu:58

I'm using two K40 GPU with each global memory capacity 4742MiB.

Meet troubles when using multi-GPUs

Thanks for your nice sharing!@yjxiong

According to your readme.md, i use the " --gpus 4 5 6 7 " to train tsn on the 4,5,6,7 GPUs, but the logs show that "RuntimeError: all tensors must be on devices[0]".

I also tried with the CUDA_VISIBLE_DEVICES, but the logs show that "TypeError: mean received an invalid combination of arguments - got (dim=int, keepdim=bool, ), but expected one of: * no arguments,* (int dim)".

Could you please show how to train tsn with multi-GPUs? Thank you very much!

worse performance in pytorch

Hi, Thanks for sharing this nice implementation. I notice that the performance of pytorch is somehow worse than the original caffe implementation used in the ECCV paper. Does anyone know the reason?

GroupRandomHorizontalFlip return img_group or ret?

Hi, I find in the transform.m file.
///////////////////
def call(self, img_group, is_flow=False):
v = random.random()
print(v)
if v < 0.5:
ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
if self.is_flow:
for i in range(0, len(ret), 2):
ret[i] = ImageOps.invert(ret[i]) # invert flow pixel values when flipping
return img_group
//////////////////

The function return img_group, should it return ret?

training the RGBDiff model

I assume we can use the same inputs to train a RGBDiff model. Am I correct?
However, I met this error message:

Initializing TSN with base model: resnet101.
TSN Configurations:
    input_modality:     RGBDiff
    num_segments:       5
    new_length:         5
    consensus_module:   avg
    dropout_ratio:      0.8

Converting the ImageNet model to RGB+Diff init model
Traceback (most recent call last):
  File "main.py", line 354, in <module>
    main()
  File "main.py", line 48, in main
    consensus_type=args.consensus_type, dropout=args.dropout, partial_bn=not args.no_partialbn)
  File "/home/tsn-pytorch/models.py", line 49, in __init__
    self.base_model = self._construct_diff_model(self.base_model)
  File "/home/tsn-pytorch/models.py", line 264, in _construct_diff_model
    first_conv_idx = filter(lambda x: isinstance(modules[x], nn.Conv2d), list(range(len(modules))))[0]
TypeError: 'filter' object is not subscriptable

Could you help me figure out what it is?
Thank you so much

About the consensus_type

Hi, @yjxiong

You used the consensus_type of avg in the paper, and l also want to know the effect of the type of topk, however l have not see the code in the program. so could you provide the script about the topk in the forward and backward part? Or do you have any idea about it?

Thanks for your kindly help!

`class SegmentConsensus(torch.autograd.Function):

def __init__(self, consensus_type, dim=1):
    self.consensus_type = consensus_type
    self.dim = dim
    self.shape = None

def forward(self, input_tensor):
    self.shape = input_tensor.size()
    if self.consensus_type == 'avg':
        output = input_tensor.mean(dim=self.dim, keepdim=True)
    elif self.consensus_type == 'identity':
        output = input_tensor
    else:
        output = None

    return output

def backward(self, grad_output):
    if self.consensus_type == 'avg':
        grad_in = grad_output.expand(self.shape) / float(self.shape[self.dim])
    elif self.consensus_type == 'identity':
        grad_in = grad_output
    else:
        grad_in = None

    return grad_in`

optical flow model using inceptionresnetv2 to train and experience overfit

Hi, have you try to train tsn based on inceptionresnetv2 model?
I set the batch size: 8 lr: 0.001 segment num: 3 dropout: 0.8, and then train the rgb model and can achieve 86.32% on test.

however, when I using the same setting with 0.7 dropout to train the optical flow model, and get like 99% training acc but maximal 75% on validation. I think the overfitting occurred. Do you have any idea about the difference performance between the rgb and optical flow model using inceptionresnetv2?

Normalization in RGBDiff model

in main.py, it doesn't do normalization for RGBDiff:

if args.modality != 'RGBDiff':
      normalize = GroupNormalize(input_mean, input_std)
else:
      normalize = IdentityTransform()

Is there any reason to do that?
And I found in test_model.py, you still have normalization for RGBDiff, so I got incorrect testing results when I first tried it. That problem was solved by changing to IdentityTransform. I wonder which one you used for your final results? IdentityTransform or GroupNormalize?

Thank you.

fusion

Spatial network and temporal network are trained separatly? The fusion of the networks are implemented just in testing stage?

dataloader runtime errors

python main.py ucf101 RGB ucf101_trainlist01new.txt ucf101_testlist01new.txt --gpus 1 --arch BNInception --num_segments 3 --gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 -b 128 -j 8 --dropout 0.8

Initializing TSN with base model: BNInception.
TSN Configurations:
input_modality: RGB
num_segments: 3
new_length: 1
consensus_module: avg
dropout_ratio: 0.8

/home/ytan/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py:360: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
own_state[name].copy_(param)
group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 69 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 69 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 2 params, lr_mult: 1, decay_mult: 0
Freezing BatchNorm2D except the first one.
Traceback (most recent call last):
File "main.py", line 301, in
main()
File "main.py", line 124, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 157, in train
for i, (input, target) in enumerate(train_loader):
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in next
return self._process_next_batch(batch)
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 109, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 109, in
return [default_collate(samples) for samples in transposed]
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 89, in default_collate
storage = batch[0].storage()._new_shared(numel)
File "/home/ytan/miniconda3/lib/python3.6/site-packages/torch/storage.py", line 113, in _new_shared
return cls._new_using_fd(size)
RuntimeError: unable to write to file </torch_476_615100490> at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/TH/THAllocator.c:271

Any suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.