Hi, I'm trying to run MS-G3D with eval_pretrained.sh on a machine equipped with NV

Hi @rirri93, I think I can clarify your misunderstanding: <code clas

CUDA out of memory about ms-g3d HOT 15 CLOSED

kenziyuliu commented on May 18, 2024

CUDA out of memory

from ms-g3d.

Comments (15)

yojayc commented on May 18, 2024

try --half and --amp-opt-level 2 while excuting the command

from ms-g3d.

guerrifrancesco commented on May 18, 2024

I had already read the notes and tried with those flags. Same error.

from ms-g3d.

yojayc commented on May 18, 2024

Have you ever tried smaller batch size? By setting batch size to 8, I can successfully run the project on a single GPU with maximum memory of about 12GB

from ms-g3d.

guerrifrancesco commented on May 18, 2024

I just tried. Same error again

from ms-g3d.

kenziyuliu commented on May 18, 2024

Hi @rirri93,

Thanks for your interest. The model should be able to fit on 2080Ti's (~11GB memory) with forward batch size 16 per GPU during training as I believe some of the pretrained models were trained using those. Model testing should use much less memory and it does sound a little bit weird that batch size 8 doesn't fit. I don't have access to a 2080Ti right now so can't test it for you, but here are a few pointers I can think of:

Have you tried cloning a fresh MS-G3D repo and clearing your GPU memory? You may have (not knowingly) changed something in the repo - model size, number of layers, graph scales, etc. - that made the model larger.
For testing only, something is definitely off if batch size 8 doesn't fit :). Would you be able to share your launch command? Particularly the configs of batch_size, forward_batch_size, and amp_opt_level, and phase for training. These flags are specified in the config.yaml files for what you wanted to test on. In general, you should be able to test with a larger batch size than that used for training.
For training, each GPU ideally gets 16 forward batch size, so with 1 GPU you would use something like --batch-size 32 --forward-batch-size 16. If I recall correctly the provided training configs use almost all of the GPU memory (> 10700 MB) and due to non-deterministic memory alloc sometimes I just had to try the training command multiple times until it fits.

Hope this helps.

from ms-g3d.

guerrifrancesco commented on May 18, 2024

I have tried to clone a fresh repo and I have run the following command:
./eval_pretrained.sh --batch-size 8 --forward-batch-size 8 --amp-opt-level 2 --half

I obtain the same error:

main.py:687: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is
unsafe. Please read https://msg.pyyaml.org/load for full details.
default_arg = yaml.load(f)
[ Fri Nov 27 09:07:32 2020 ] Model total number of params: 3194595
Cannot parse global_step from model weights filename
[ Fri Nov 27 09:07:32 2020 ] Loading weights from pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Model: model.msg3d.Model
[ Fri Nov 27 09:07:32 2020 ] Weights: pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Eval epoch: 1
0%|▎ | 1/516 [00:01<13:54, 1.62s/it]
Traceback (most recent call last):
File "main.py", line 702, in
main()
File "main.py", line 698, in main
processor.start()
File "main.py", line 665, in start
result_file=rf
File "main.py", line 580, in eval
output = self.model(data)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 163, in forward
x = F.relu(self.sgcn2(x) + self.gcn3d2(x), inplace=True)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 100, in forward
out_sum += gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 61, in forward
x = self.gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/ms_gtcn.py", line 105, in forward
agg = agg.permute(0,3,1,2,4).contiguous().view(N, self.num_scalesC, T, V)
RuntimeError: CUDA out of memory. Tried to allocate 2.58 GiB (GPU 0; 10.76 GiB total capacity; 3.54 GiB already allocated; 2.40 GiB free; 7.17 GiB reserved in total by PyTorch)

Before using 2080Ti I was using 1080Ti with 8GB and I had the same problem.

from ms-g3d.

kenziyuliu commented on May 18, 2024

Hi @rirri93,

I think I can clarify your misunderstanding: eval_pretrained.sh is just a shell script containing a list of testing commands to main.py, and the flags --batch-size 8, --forward-batch-size 8, etc are flags to the main.py python script; eval_pretrained.sh itself doesn't take any CLI flags.

So to reduce the testing batch size, you can go to the respective test_*.yaml files in the config/ folder and change the test batch size.

Hope this helps :).

UPDATE: Just to add that I assume by 1080Ti (which has 11GB and was used to train some of the pretrained models), you mean 1080, which I believe won't fit the models.

from ms-g3d.

guerrifrancesco commented on May 18, 2024

Ok, thanks, but if I want to use pre-trained models to do action recognition on my own 3D-skeleton annotations, what should I use?

from ms-g3d.

kenziyuliu commented on May 18, 2024

Ok, thanks, but if I want to use pre-trained models to do action recognition on my own videos, what should I use?

I assume you are asking about using pre-trained models to do inference on your custom datasets. The general steps to follow would be

make sure your data format follows that of the training data; see the preprocessing steps in data_gen/
write your own config files, following the formats of the existing ones in config/
if needed, include your graph definition in graph/
(optional) fine-tune the model with your custom dataset if you have labeled data; have a look at the training template commands and how main.py runs

Note that I haven't tried transferring to another dataset yet, and there might be other changes you'll need to make that I could forget to include here.

Hope this helps. I'm going ahead and closing the issue now, feel free to comment below for further questions.

from ms-g3d.

guerrifrancesco commented on May 18, 2024

Hi. I use GASTNet by fabro66 to generate 3D skeletons. The output keypoints are in H3.6M format. Is there a way to evaluate an action from a generic video using pre-trained models and H3.6M skeletons?

from ms-g3d.

kenziyuliu commented on May 18, 2024

Hi @rirri93, I'm not very familiar with H3.6M dataset, but I believe the number of keypoints (V) might be different than that of the Kinetics and NTU datasets. If you want to apply the pre-trained models, I would suggest thinking about which subset of weights might be affected by this; e.g. the adaptive residual masks are tied to the graph size (V x V), which means they probably won't work with a different V.

from ms-g3d.

kenziyuliu commented on May 18, 2024

One reasonable strategy might be to initialize your H3.6M model with the compatible pre-trained weights and use random init for other modules; this could possibly give better results than simply training an H3.6M model from scratch.

Another strategy might be to insert synthetic keypoints / remove keypoints (and possibly devise your own skeleton graph) to the H3.6M definition so that the skeleton graph matches that of Kinetics/NTU. I would also ensure the resulting skeleton graph is sensible and the same normalization techniques are applied.

from ms-g3d.

guerrifrancesco commented on May 18, 2024

Ok let's say I have 3D annotations following NTU format that should be this:

- pelvis
- spine
- neck
- head
- shoulder sx
- elbow sx
- wrist sx
- hand finger sx 1
- shoulder dx
- elbow dx
- wrist dx
- hand finger dx 1
- hip sx
- knee sx
- ankle sx
- toe sx
- hip dx
- knee dx
- ankle dx
- toe dx
- thorax
- hand finger sx 2
- hand finger sx 3
- hand finger dx 2
- hand finger dx 3

How should I submit this data to MS-G3D? Should I create some sort of file with a specific format (maybe JSON)? And then how can I submit this file to MS-G3D? Thanks

from ms-g3d.

kenziyuliu commented on May 18, 2024

Hi @rirri93, I can share a couple of pointers:

Define your graph in the graph/ folder: This is just how your joints are connected. For example, you can see in graph/ntu_rgb_d.py that we specify the edges of the skeleton graph that are used to construct the VxV adjacency matrix used by MS-G3D.
Write your data generator that matches the joint indices (your list above) to your graph definition
- Your generated data tensor should have a shape (N, C, T, V, M), where M is the number of persons/skeletons in that frame
- You want to match the V dimension to your graph, so for example x[:, :, :, 0, :] should give the "pelvis" joint in the list you have above
Make sure you preprocess the data the same way they were preprocessed with NTU or Kinetics (depending which pretrained models you are trying to use)

Hope this helps

from ms-g3d.

erinchen824 commented on May 18, 2024

--amp-opt-level 2

what is this used for? will it be used with --half together? Thx!

from ms-g3d.

CUDA out of memory about ms-g3d HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs