Comments (15)
try --half and --amp-opt-level 2 while excuting the command
from ms-g3d.
I had already read the notes and tried with those flags. Same error.
from ms-g3d.
Have you ever tried smaller batch size? By setting batch size to 8, I can successfully run the project on a single GPU with maximum memory of about 12GB
from ms-g3d.
I just tried. Same error again
from ms-g3d.
Hi @rirri93,
Thanks for your interest. The model should be able to fit on 2080Ti's (~11GB memory) with forward batch size 16 per GPU during training as I believe some of the pretrained models were trained using those. Model testing should use much less memory and it does sound a little bit weird that batch size 8 doesn't fit. I don't have access to a 2080Ti right now so can't test it for you, but here are a few pointers I can think of:
- Have you tried cloning a fresh MS-G3D repo and clearing your GPU memory? You may have (not knowingly) changed something in the repo - model size, number of layers, graph scales, etc. - that made the model larger.
- For testing only, something is definitely off if batch size 8 doesn't fit :). Would you be able to share your launch command? Particularly the configs of
batch_size
,forward_batch_size
, andamp_opt_level
, andphase
for training. These flags are specified in theconfig.yaml
files for what you wanted to test on. In general, you should be able to test with a larger batch size than that used for training. - For training, each GPU ideally gets 16 forward batch size, so with 1 GPU you would use something like
--batch-size 32 --forward-batch-size 16
. If I recall correctly the provided training configs use almost all of the GPU memory (> 10700 MB) and due to non-deterministic memory alloc sometimes I just had to try the training command multiple times until it fits.
Hope this helps.
from ms-g3d.
I have tried to clone a fresh repo and I have run the following command:
./eval_pretrained.sh --batch-size 8 --forward-batch-size 8 --amp-opt-level 2 --half
I obtain the same error:
main.py:687: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is
unsafe. Please read https://msg.pyyaml.org/load for full details.
default_arg = yaml.load(f)
[ Fri Nov 27 09:07:32 2020 ] Model total number of params: 3194595
Cannot parse global_step from model weights filename
[ Fri Nov 27 09:07:32 2020 ] Loading weights from pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Model: model.msg3d.Model
[ Fri Nov 27 09:07:32 2020 ] Weights: pretrained-models/ntu60-xsub-joint-fusion.pt
[ Fri Nov 27 09:07:32 2020 ] Eval epoch: 1
0%|▎ | 1/516 [00:01<13:54, 1.62s/it]
Traceback (most recent call last):
File "main.py", line 702, in
main()
File "main.py", line 698, in main
processor.start()
File "main.py", line 665, in start
result_file=rf
File "main.py", line 580, in eval
output = self.model(data)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 163, in forward
x = F.relu(self.sgcn2(x) + self.gcn3d2(x), inplace=True)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 100, in forward
out_sum += gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/msg3d.py", line 61, in forward
x = self.gcn3d(x)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/ms-g3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(input, **kwargs)
File "/home/ms-g3d/MS-G3D_old/model/ms_gtcn.py", line 105, in forward
agg = agg.permute(0,3,1,2,4).contiguous().view(N, self.num_scalesC, T, V)
RuntimeError: CUDA out of memory. Tried to allocate 2.58 GiB (GPU 0; 10.76 GiB total capacity; 3.54 GiB already allocated; 2.40 GiB free; 7.17 GiB reserved in total by PyTorch)
Before using 2080Ti I was using 1080Ti with 8GB and I had the same problem.
from ms-g3d.
Hi @rirri93,
I think I can clarify your misunderstanding: eval_pretrained.sh
is just a shell script containing a list of testing commands to main.py
, and the flags --batch-size 8
, --forward-batch-size 8
, etc are flags to the main.py
python script; eval_pretrained.sh
itself doesn't take any CLI flags.
So to reduce the testing batch size, you can go to the respective test_*.yaml
files in the config/
folder and change the test batch size.
Hope this helps :).
UPDATE: Just to add that I assume by 1080Ti (which has 11GB and was used to train some of the pretrained models), you mean 1080, which I believe won't fit the models.
from ms-g3d.
Ok, thanks, but if I want to use pre-trained models to do action recognition on my own 3D-skeleton annotations, what should I use?
from ms-g3d.
Ok, thanks, but if I want to use pre-trained models to do action recognition on my own videos, what should I use?
I assume you are asking about using pre-trained models to do inference on your custom datasets. The general steps to follow would be
- make sure your data format follows that of the training data; see the preprocessing steps in
data_gen/
- write your own config files, following the formats of the existing ones in
config/
- if needed, include your graph definition in
graph/
- (optional) fine-tune the model with your custom dataset if you have labeled data; have a look at the training template commands and how
main.py
runs
Note that I haven't tried transferring to another dataset yet, and there might be other changes you'll need to make that I could forget to include here.
Hope this helps. I'm going ahead and closing the issue now, feel free to comment below for further questions.
from ms-g3d.
Hi. I use GASTNet by fabro66 to generate 3D skeletons. The output keypoints are in H3.6M format. Is there a way to evaluate an action from a generic video using pre-trained models and H3.6M skeletons?
from ms-g3d.
Hi @rirri93, I'm not very familiar with H3.6M dataset, but I believe the number of keypoints (V) might be different than that of the Kinetics and NTU datasets. If you want to apply the pre-trained models, I would suggest thinking about which subset of weights might be affected by this; e.g. the adaptive residual masks are tied to the graph size (V x V), which means they probably won't work with a different V.
from ms-g3d.
One reasonable strategy might be to initialize your H3.6M model with the compatible pre-trained weights and use random init for other modules; this could possibly give better results than simply training an H3.6M model from scratch.
Another strategy might be to insert synthetic keypoints / remove keypoints (and possibly devise your own skeleton graph) to the H3.6M definition so that the skeleton graph matches that of Kinetics/NTU. I would also ensure the resulting skeleton graph is sensible and the same normalization techniques are applied.
from ms-g3d.
Ok let's say I have 3D annotations following NTU format that should be this:
-
- pelvis
-
- spine
-
- neck
-
- head
-
- shoulder sx
-
- elbow sx
-
- wrist sx
-
- hand finger sx 1
-
- shoulder dx
-
- elbow dx
-
- wrist dx
-
- hand finger dx 1
-
- hip sx
-
- knee sx
-
- ankle sx
-
- toe sx
-
- hip dx
-
- knee dx
-
- ankle dx
-
- toe dx
-
- thorax
-
- hand finger sx 2
-
- hand finger sx 3
-
- hand finger dx 2
-
- hand finger dx 3
How should I submit this data to MS-G3D? Should I create some sort of file with a specific format (maybe JSON)? And then how can I submit this file to MS-G3D? Thanks
from ms-g3d.
Hi @rirri93, I can share a couple of pointers:
- Define your graph in the
graph/
folder: This is just how your joints are connected. For example, you can see ingraph/ntu_rgb_d.py
that we specify the edges of the skeleton graph that are used to construct the VxV adjacency matrix used by MS-G3D. - Write your data generator that matches the joint indices (your list above) to your graph definition
- Your generated data tensor should have a shape (N, C, T, V, M), where
M
is the number of persons/skeletons in that frame - You want to match the V dimension to your graph, so for example x[:, :, :, 0, :] should give the "pelvis" joint in the list you have above
- Your generated data tensor should have a shape (N, C, T, V, M), where
- Make sure you preprocess the data the same way they were preprocessed with NTU or Kinetics (depending which pretrained models you are trying to use)
Hope this helps
from ms-g3d.
--amp-opt-level 2
what is this used for? will it be used with --half together? Thx!
from ms-g3d.
Related Issues (20)
- Is the format of data generated with `ntu_gendata.py` same as described in `lshiwjx/2s-AGCN`
- CUDA out of memory while evaluating pretrained HOT 4
- 数据类型为NoneTyoe格式 HOT 2
- Function 'CudnnBatchNormBackward' returned nan values in its 0th output HOT 3
- Recognizing activities using your library HOT 2
- Softmax scores HOT 5
- ST-private HOT 3
- 使用VS code运行直接终止并且没有报错
- Vs code运行代码直接终止,并且没有报错 HOT 7
- Can you provide Flops of this model? HOT 2
- running_mean should contain 54 elements not 108 HOT 2
- kinect 400 dataset HOT 1
- How to use it to predict an RGB video? HOT 1
- Change number of people in model HOT 2
- ValueError: invalid literal for int() with base 10: 'ntu' when run python ntu_gendata.py HOT 2
- About over-smoothing HOT 8
- Half Precision Training may cause discrepancies of test results between training and test time for the same model HOT 2
- How to make the frames not change? HOT 2
- Error in generating data HOT 1
- Shape error while predicting on custom video
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ms-g3d.