GithubHelp home page GithubHelp logo

Comments (8)

yjxiong avatar yjxiong commented on May 26, 2024 3

This CUDA_VISIBLE_DEVICES=6 might be the problem.

We do not use this to specify which GPU to use. Instead, you can issue the following command when you update to the latest version.

python test_models.py ucf101 RGB ../temporal-segment-networks/data/ucf101_rgb_val_split_1.txt models/ucf101_bninception_RGB_1_rgb_checkpoint.pth  --arch BNInception --save_scores tsn_pytorch_rgb_split_1 --gpu 6 -j 1

from tsn-pytorch.

yjxiong avatar yjxiong commented on May 26, 2024 1

Please try again with the latest version. Have to say that the logic of torch.nn.DataParallel on non-zero GPUs is indeed a pain in the neck.

from tsn-pytorch.

yjxiong avatar yjxiong commented on May 26, 2024

Hi @utsavgarg , thanks for filing the issue.

I have fixed the first problem in the latest commit.

For the second one, I cannot reproduce the error. Would you please post your testing command and environment settings?

from tsn-pytorch.

utsavgarg avatar utsavgarg commented on May 26, 2024

My pytorch version is 0.2.0_1, the testing command is

CUDA_VISIBLE_DEVICES=6 python test_models.py ucf101 RGB ../temporal-segment-networks/data/ucf101_rgb_val_split_1.txt models/ucf101_bninception_RGB_1_rgb_checkpoint.pth  --arch BNInception --save_scores tsn_pytorch_rgb_split_1

And you can download the checkpoint from https://www.dropbox.com/s/upa0nnrrmi4q36z/ucf101_bninception_RGB_1_rgb_checkpoint.pth?dl=0
to test it

from tsn-pytorch.

utsavgarg avatar utsavgarg commented on May 26, 2024

@yjxiong thanks for the quick fix, but there still seems to some issue

Traceback (most recent call last):
  File "test_models.py", line 129, in <module>
    rst = eval_video((i, data, label))
  File "test_models.py", line 117, in eval_video
    rst = net(input_var).data.cpu().numpy().copy()
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/utsav/tsn/tsn-pytorch/models.py", line 197, in forward
    base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/utsav/tsn/tsn-pytorch/tf_model_zoo/bninception/pytorch_load.py", line 48, in forward
    data_dict[op[2]] = getattr(self, op[0])(data_dict[op[-1]])
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 254, in forward
    self.padding, self.dilation, self.groups)
  File "/export/home/utsav/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 52, in conv2d
    return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

from tsn-pytorch.

utsavgarg avatar utsavgarg commented on May 26, 2024

@yjxiong one more thing, the Flow model is taking much longer to complete one epoch compared to the RGB model.
The timings for one epoch are:

  • RGB - 93.2s
  • Flow - 793.2s
    What do you think is the reason for such a large difference ? Any solutions ?

from tsn-pytorch.

yjxiong avatar yjxiong commented on May 26, 2024

The flow model reads a lot of images for each video. This makes the data feeding slower than RGB. I have added pin memory in the latest commit. Maybe that could help. Also, try increasing the -j parameter for the flow model to prefetch more.

from tsn-pytorch.

nishanthrachakonda avatar nishanthrachakonda commented on May 26, 2024

https://www.dropbox.com/s/upa0nnrrmi4q36z/ucf101_bninception_RGB_1_rgb_checkpoint.pth?dl=0

It appears this checkpoint is deleted can you provide this checkpoint.

from tsn-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.