GithubHelp home page GithubHelp logo

patrick-swk / p-stmo Goto Github PK

View Code? Open in Web Editor NEW
143.0 143.0 10.0 14.72 MB

[ECCV2022] The PyTorch implementation for "P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation"

License: MIT License

MATLAB 6.43% Python 93.57%

p-stmo's People

Contributors

patrick-swk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

p-stmo's Issues

Unexpected result of Training from scratch

Hi author,
Thanks for your excellent work!

I can evaluate your model with your pre-trained model.
But when I tried to follow your train from scratch guideline (For the pre-training stage on Human3.6M)

python run.py -f 243 -b 160 --MAE --train 1 --layers 3 -tds 2 -tmr 0.8 -smn 2 --lr 0.0001 -lrd 0.97

the train.log is like

2022/08/12 17:44:17 epoch: 1, lr: 0.0001000, loss: 0.0814, loss_test: 0.0305, p1: 30.54, p2: 0.00
2022/08/12 18:55:30 epoch: 2, lr: 0.0000970, loss: 0.0367, loss_test: 0.0245, p1: 24.47, p2: 0.00
2022/08/12 20:06:40 epoch: 3, lr: 0.0000941, loss: 0.0324, loss_test: 0.0212, p1: 21.22, p2: 0.00
2022/08/12 21:17:54 epoch: 4, lr: 0.0000913, loss: 0.0303, loss_test: 0.0193, p1: 19.25, p2: 0.00
2022/08/12 22:29:07 epoch: 5, lr: 0.0000885, loss: 0.0292, loss_test: 0.0184, p1: 18.41, p2: 0.00
2022/08/12 23:40:19 epoch: 6, lr: 0.0000859, loss: 0.0282, loss_test: 0.0183, p1: 18.39, p2: 0.00
2022/08/13 00:51:31 epoch: 7, lr: 0.0000833, loss: 0.0274, loss_test: 0.0176, p1: 17.70, p2: 0.00
2022/08/13 02:02:43 epoch: 8, lr: 0.0000808, loss: 0.0268, loss_test: 0.0170, p1: 17.09, p2: 0.00
2022/08/13 03:13:56 epoch: 9, lr: 0.0000784, loss: 0.0262, loss_test: 0.0169, p1: 17.00, p2: 0.00
2022/08/13 04:25:09 epoch: 10, lr: 0.0000760, loss: 0.0258, loss_test: 0.0169, p1: 17.04, p2: 0.00
2022/08/13 05:36:21 epoch: 11, lr: 0.0000737, loss: 0.0255, loss_test: 0.0162, p1: 16.26, p2: 0.00
2022/08/13 06:47:33 epoch: 12, lr: 0.0000715, loss: 0.0252, loss_test: 0.0168, p1: 16.98, p2: 0.00
2022/08/13 07:58:53 epoch: 13, lr: 0.0000694, loss: 0.0249, loss_test: 0.0159, p1: 16.04, p2: 0.00
2022/08/13 09:11:11 epoch: 14, lr: 0.0000673, loss: 0.0247, loss_test: 0.0160, p1: 16.20, p2: 0.00
2022/08/13 10:22:49 epoch: 15, lr: 0.0000653, loss: 0.0244, loss_test: 0.0149, p1: 15.07, p2: 0.00
2022/08/13 11:34:02 epoch: 16, lr: 0.0000633, loss: 0.0243, loss_test: 0.0145, p1: 14.70, p2: 0.00
...

I am confused that why p2 is always 0? and why p1 is lower than expected.
Could you please release your Training from scratch (pre-training & fine-tuning) on Human3.6M's train.log for me?

Thanks a lot.

Do I need to re-train the model in order to do causal prediction?

I saw your discussion in #3 , I got a further question.

I implemented videopose3d as well, it takes (1, 100, 17, 2) as inputs and yields (1, 100, 17, 3), so I can just take the last (17, 3) as the 3d pose prediction in the lasted frame. (many to many)

However, your approach is padding a frame at two sides and predicting the middle frame (many to one), so do I need to train the network again to let it predict the last frame ( In causal case)?

Thank you in advance!

About inference in real-time

Hi author,

Thanks for your such excellent work. I did some training and tests based on your paper and codes and the results are good. I am now curious about the inference in real time. My intention is to estimate the 3D coordinates while playing back a video. According to your strategy and demo code, estimation for a center frame need the 2D poses before and after it, which means the 3D pose of a certain frame cannot be achieved until the 2D poses after it are calculated. But for real-time inference, the 2D pose sequence after a centain frame cannot be acquired while it's played back.

I am now in a dilemma. I have already a 2D pose estimator which achieves a good balance between performance and speed even after being deployed on a mobile device after quantization. My thought is to use it plus P-STMO to act as a real-time 3D pose estimator, i.e, firstly get the 2D poses and then acquire the 3D pose. Actually I am a little confused about the training strategy. My understanding is that the frames "before" current frame are supposed to be enough for prediction, why are the frames "after" current one also collected for training? It's the case seen from your training code. My naive idea is just using the "before" frames as the input sequence for inference exclusive of the "after" frames. Appreicate your comment, big thanks.

How to estimate 3d pose using part of 2d key points?

Hi, is it possible to run model and render key points, using 2d pose with only part of key points(for cases when port of body is not visible and model for 2d HPE can't estimate some key points). I tried to use [-1, -1] or [nan, nan] for not visible key points but it doesn't work. Do you have any solution or maybe advices how I can solve this. Thanks!

using my 2d detector datasets, training precision is too low

hi, thank you for your work, this is nice!
I use my 2d detector which if fine-tuned in human3.6m, then I get new 2d points, such as cpn_ft_h36m_dbb.npz. But when I training the model using my 2d dataset, the precision is too low. This is a log picture!

20230112134419

数据加载

你好,请问在第一阶段时加载数据时间需要很久吗?tqdm的进度条,五个多小时才跑满一个,请问这是正常情况吗?batchsize是16.

Some clarifications on data preprocessing steps

Hi,
Thank you so much for open sourcing such an important work. I had a few queries regarding the preprocessing of the dataset. I understand, the data was used from the Video2Pose repo. Hence the data preprocessing steps are same as those. Also, they have in turn taken from 3d-pose-baseline. In this repo, the data pre processing steps, they have mean and std dev normalized the data (here). Is there any other preprocessing steps to input the 2D pose in the fine tuning phase? I am trying to finetune on a custom dataset, but the results are not up to par.

corrupt Checkpoint

Hi, I tried to run P_STMO to test the video in the wild. But when I load the P-STMO's check points, the error message looks like this checkpoint is corrupt.

Traceback (most recent call last):
File "/home/hongji/Documents/Tianma/pose/video-to-pose3D-master/videopose_PSTMO.py", line 199, in
inference_video('/home/hongji/Documents/Tianma/pose/video-to-pose3D-master/outputs/kunkun_cut.mp4', 'alpha_pose')
File "/home/hongji/Documents/Tianma/pose/video-to-pose3D-master/videopose_PSTMO.py", line 195, in inference_video
main(args)
File "/home/hongji/Documents/Tianma/pose/video-to-pose3D-master/videopose_PSTMO.py", line 110, in main
pre_dict = torch.load(no_refine_path)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/site-packages/torch/serialization.py", line 987, in _legacy_load
return legacy_load(f)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/site-packages/torch/serialization.py", line 884, in legacy_load
tar.extract('storages', path=tmpdir)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/tarfile.py", line 2265, in extract
tarinfo = self._get_extract_tarinfo(member, filter_function, path)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/tarfile.py", line 2272, in _get_extract_tarinfo
tarinfo = self.getmember(member)
File "/home/hongji/miniconda3/envs/STCFormer/lib/python3.8/tarfile.py", line 1955, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"

Could you help me to solve this? Is this the checkpoints' problem? I have tried to download two times but it still not work

About code detail

out_target[:, :, 0] = 0 in stage 2, what does setting this to 0 mean? Do you want to do root relative? Looking forward to your reply sir!

About MPI-INF-3DHP's training log

Hi author,
Could you please release the Stage I & Stage II's training log which train on MPI-INF-3DHP dataset??

Thank you so much!!

Using refined model

Hi thanks for the paper to start with.

I wondering about the refined model, it seems that needs the 3d ground truth to make the refinement and the camera parameters?. I was wondering if you obtained those automatically or it's kind of overfitting to the 3d ground-truth in some way

thanks

GT 81 Result

The author may not tell the training log or the results on GT (81 frame) dataset.

parser.add_argument

Hello, in the opt.py file, self.parser.add_ Argument ('-- MAE_test-reload', type=int, default=0) is missing. Is that right? Is default=0 or default=1? Looking forward to your reply, thank you.

Train and evaluate epoch

Hello, pre training and formal training epoch=80. Do the two stages add up to 80? Or is it said that each stage is 80. How much should epoch be evaluated? Looking forward to your reply @paTRICK-swk

training process

Hello author, thank you for your work. When I was training, I found that the generated files were not one, but multiple, and the generated file numbers were not consecutive. May I ask if this is normal?Looking forward to your reply, thank you.

About selected 2D detector & extended training data problem

Hi author,

I would appreciate it if you could help me understand the following two questions.

(1)【Why not use the same 2D detector on all datasets (e.g. h36m, 3dhp, in-the-wild...)】
I wonder why you (or most methods) use "AlphaPose" as the in-the-wild 2D keypoint detector while using "CPN" as the 2D detector of Human3.6M.
Again, in the MPI-INF-3DHP dataset, why do not use the same 2D detector in Human3.6M (e.g. CPN) to generate 2D pose but use GT 2D poses as inputs?

(2) 【Extend training data by merging h36m and 3dhp】
The experiment setting in your paper is training on h36m training set and then evaluating on h36m test set;
training on 3dhp training set and then evaluating on 3dhp test set.

I wonder if it is reasonable to merge h36m and 3dhp training set for more diverse training poses and scenes. If so, could you point me to how to pre-processing data in this case?

Thank you so much.
Look for your reply.

pre-trained file for stage 1

thanks for the great works!

can you please provide your_best_model_in_stage_I.pth for stage 1, the time for stage 1 training is too long.

About Cross-dataset Experimnts

Hi author,

Have you ever tried cross-dataset experiments where you pre-train and fine-tune on Human3.6M and then perform inference on MPI-INF-3DHP?

When I attempted this experiment, the results were unsatisfactory, with only a single point appearing in the visualization instead of a full 17 keypoints skeleton.

Could you please advise me on how to address this issue or provide suggestions on what approach to take?

Looking for your reply. Thank you very much.

ABOUT 3DHP

image

Hi, Great Job. I want to ask why my 3DHP dataset doesn't have a valid_frame metric. Is it because I didn't process the dataset, or because my downloaded dataset has an error. Looking forward to your reply, thank you very much!
Finally, congratulations, your new job has been accepted by iccv2023, which I havenoticed long on. Great job!

可视化细节

请问训练的时候使用 tds=2,最后 inference 和可视化的时候使用 tds=1,这个对性能不会有影响吗?

training and testing

Hello, thank you very much for your work. When using refine training, multiple pth format files will appear. When testing, it is necessary to specify which generated file name. If a name is specified, the following message will appear: IndexError: list index out of range. If you do not specify which file, the result is incorrect, not the result displayed in the log. May I ask how to solve this? Looking forward to your reply. @paTRICK-swk

About the output of the model

Hi Author,

I was wondering what the coordinate system in the output of the model is? There are different coordinate systems, such as camera/world/image/pixel. My guess is the output coordinates belong to camera system. Not sure about that. Any comment? Thanks.

run_in_wild.py fails at run time

Hi author,

Looks like options 'plot_MAE' and 'MAE_test_reload' are not defined in opt.py, which incur errors when running:

python run_in_the_wild.py -k detectron_pt_coco -f 243 -b 160 --MAE --train 1 --layers 3 -tds 2 -tmr 0.8 -smn 2 --lr 0.0001 -lrd 0.97

Should I just comment them or there's any better way to fix it? Thanks.

Multi-person Visualization

Hi,
First of all, thank you for the good research that inspires me.

I have a question about Multi-person video-to-pose.

I think P-STMO can estimate multi-person pose with yolo.

Is there a multi-person version video-to-pose code other than the existing single-person version?

I look forward to your response. Thank you!

How AUC and PCK Calculate Values

Hello author, thank you for your work. I would like to ask how to obtain a value for AUC and PCK in the paper, which are only one value. When I am running, they are the values of some columns.Is it the average value of the area circled in green?
1

Looking forward to your reply, thank you. @paTRICK-swk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.