GithubHelp home page GithubHelp logo

CUDA out of memory about ms-g3d HOT 15 CLOSED

kenziyuliu avatar kenziyuliu commented on May 18, 2024
CUDA out of memory

from ms-g3d.

Comments (15)

yojayc avatar yojayc commented on May 18, 2024

try --half and --amp-opt-level 2 while excuting the command

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

I had already read the notes and tried with those flags. Same error.

from ms-g3d.

yojayc avatar yojayc commented on May 18, 2024

Have you ever tried smaller batch size? By setting batch size to 8, I can successfully run the project on a single GPU with maximum memory of about 12GB

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

I just tried. Same error again

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

Hi @rirri93,

Thanks for your interest. The model should be able to fit on 2080Ti's (~11GB memory) with forward batch size 16 per GPU during training as I believe some of the pretrained models were trained using those. Model testing should use much less memory and it does sound a little bit weird that batch size 8 doesn't fit. I don't have access to a 2080Ti right now so can't test it for you, but here are a few pointers I can think of:

  • Have you tried cloning a fresh MS-G3D repo and clearing your GPU memory? You may have (not knowingly) changed something in the repo - model size, number of layers, graph scales, etc. - that made the model larger.
  • For testing only, something is definitely off if batch size 8 doesn't fit :). Would you be able to share your launch command? Particularly the configs of batch_size, forward_batch_size, and amp_opt_level, and phase for training. These flags are specified in the config.yaml files for what you wanted to test on. In general, you should be able to test with a larger batch size than that used for training.
  • For training, each GPU ideally gets 16 forward batch size, so with 1 GPU you would use something like --batch-size 32 --forward-batch-size 16. If I recall correctly the provided training configs use almost all of the GPU memory (> 10700 MB) and due to non-deterministic memory alloc sometimes I just had to try the training command multiple times until it fits.

Hope this helps.

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

I have tried to clone a fresh repo and I have run the following command:
./eval_pretrained.sh --batch-size 8 --forward-batch-size 8 --amp-opt-level 2 --half

I obtain the same error:

main.py:687: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is
unsafe. Please read https://msg.pyyaml.org/load for full details.
default_arg = yaml.load(f)
[ Fri Nov 27 09:07:32 2020 ] Model total number of params: 3194595
Cannot parse global_step from model weights filename
[ Fri Nov 27 09:07:32 2020 ] Loading weights from pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Model: model.msg3d.Model
[ Fri Nov 27 09:07:32 2020 ] Weights: pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Eval epoch: 1
0%|▎ | 1/516 [00:01<13:54, 1.62s/it]
Traceback (most recent call last):
File "main.py", line 702, in
main()
File "main.py", line 698, in main
processor.start()
File "main.py", line 665, in start
result_file=rf
File "main.py", line 580, in eval
output = self.model(data)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 163, in forward
x = F.relu(self.sgcn2(x) + self.gcn3d2(x), inplace=True)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 100, in forward
out_sum += gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 61, in forward
x = self.gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/ms_gtcn.py", line 105, in forward
agg = agg.permute(0,3,1,2,4).contiguous().view(N, self.num_scales
C, T, V)
RuntimeError: CUDA out of memory. Tried to allocate 2.58 GiB (GPU 0; 10.76 GiB total capacity; 3.54 GiB already allocated; 2.40 GiB free; 7.17 GiB reserved in total by PyTorch)

Before using 2080Ti I was using 1080Ti with 8GB and I had the same problem.

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

Hi @rirri93,

I think I can clarify your misunderstanding: eval_pretrained.sh is just a shell script containing a list of testing commands to main.py, and the flags --batch-size 8, --forward-batch-size 8, etc are flags to the main.py python script; eval_pretrained.sh itself doesn't take any CLI flags.

So to reduce the testing batch size, you can go to the respective test_*.yaml files in the config/ folder and change the test batch size.

Hope this helps :).

UPDATE: Just to add that I assume by 1080Ti (which has 11GB and was used to train some of the pretrained models), you mean 1080, which I believe won't fit the models.

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

Ok, thanks, but if I want to use pre-trained models to do action recognition on my own 3D-skeleton annotations, what should I use?

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

Ok, thanks, but if I want to use pre-trained models to do action recognition on my own videos, what should I use?

I assume you are asking about using pre-trained models to do inference on your custom datasets. The general steps to follow would be

  1. make sure your data format follows that of the training data; see the preprocessing steps in data_gen/
  2. write your own config files, following the formats of the existing ones in config/
  3. if needed, include your graph definition in graph/
  4. (optional) fine-tune the model with your custom dataset if you have labeled data; have a look at the training template commands and how main.py runs

Note that I haven't tried transferring to another dataset yet, and there might be other changes you'll need to make that I could forget to include here.

Hope this helps. I'm going ahead and closing the issue now, feel free to comment below for further questions.

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

Hi. I use GASTNet by fabro66 to generate 3D skeletons. The output keypoints are in H3.6M format. Is there a way to evaluate an action from a generic video using pre-trained models and H3.6M skeletons?

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

Hi @rirri93, I'm not very familiar with H3.6M dataset, but I believe the number of keypoints (V) might be different than that of the Kinetics and NTU datasets. If you want to apply the pre-trained models, I would suggest thinking about which subset of weights might be affected by this; e.g. the adaptive residual masks are tied to the graph size (V x V), which means they probably won't work with a different V.

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

One reasonable strategy might be to initialize your H3.6M model with the compatible pre-trained weights and use random init for other modules; this could possibly give better results than simply training an H3.6M model from scratch.

Another strategy might be to insert synthetic keypoints / remove keypoints (and possibly devise your own skeleton graph) to the H3.6M definition so that the skeleton graph matches that of Kinetics/NTU. I would also ensure the resulting skeleton graph is sensible and the same normalization techniques are applied.

from ms-g3d.

guerrifrancesco avatar guerrifrancesco commented on May 18, 2024

Ok let's say I have 3D annotations following NTU format that should be this:

    • pelvis
    • spine
    • neck
    • head
    • shoulder sx
    • elbow sx
    • wrist sx
    • hand finger sx 1
    • shoulder dx
    • elbow dx
    • wrist dx
    • hand finger dx 1
    • hip sx
    • knee sx
    • ankle sx
    • toe sx
    • hip dx
    • knee dx
    • ankle dx
    • toe dx
    • thorax
    • hand finger sx 2
    • hand finger sx 3
    • hand finger dx 2
    • hand finger dx 3

How should I submit this data to MS-G3D? Should I create some sort of file with a specific format (maybe JSON)? And then how can I submit this file to MS-G3D? Thanks

from ms-g3d.

kenziyuliu avatar kenziyuliu commented on May 18, 2024

Hi @rirri93, I can share a couple of pointers:

  1. Define your graph in the graph/ folder: This is just how your joints are connected. For example, you can see in graph/ntu_rgb_d.py that we specify the edges of the skeleton graph that are used to construct the VxV adjacency matrix used by MS-G3D.
  2. Write your data generator that matches the joint indices (your list above) to your graph definition
    • Your generated data tensor should have a shape (N, C, T, V, M), where M is the number of persons/skeletons in that frame
    • You want to match the V dimension to your graph, so for example x[:, :, :, 0, :] should give the "pelvis" joint in the list you have above
  3. Make sure you preprocess the data the same way they were preprocessed with NTU or Kinetics (depending which pretrained models you are trying to use)

Hope this helps

from ms-g3d.

erinchen824 avatar erinchen824 commented on May 18, 2024

--amp-opt-level 2

what is this used for? will it be used with --half together? Thx!

from ms-g3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.