GithubHelp home page GithubHelp logo

patrick-swk / d3dp Goto Github PK

View Code? Open in Web Editor NEW
140.0 140.0 5.0 31.82 MB

[ICCV2023] The PyTorch implementation for "Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation"

License: MIT License

MATLAB 3.69% Python 96.31%

d3dp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

d3dp's Issues

A simple operational issue, Mengxin Seeks Help

Hello, I'm sorry to bother you.
I would like to ask a potentially basic question. After downloading the code, when I run locally, it shows that these two files are not present in the from common.humaneva_dataset import HumanEvaDataset and from common.custom_dataset import CustomizaDataset, and I also did not find these two files in your project. What is the situation? Do I need to download it myself? If that's the case, what do I need to do?

Looking forward to your reply, thank you! (Computer configuration 2080ti)

Clarification for using in the wild

Hello,
We are preparing to use your model for production:

  1. What is the input & output formats for the in_the_wild_best_epoch.bin model? We need enough information in order to integrate with our code? We need to output the results in csv with column headers (ie. 'nose x', 'nose y', ...), so we need to map tensor indices to particular keypoints.
  2. Why is video-to-pose3D needed? Can you give a brief idea about it, what does it provide, why it is done this way as opposed to just using your own serving code?
  3. Any issue with compressing model via torchscript for serving?

Note: we are using VitPose for the 2D detector as the others do not perform for our needs.

Thank you for your help and for creating this project

training evaluation CUDA out of memory

Hi, when I am training from scratch on one 3090 GPU, it occurs:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 23.11 GiB (GPU 0; 23.70 GiB total capacity; 1.54 GiB already allocated; 20.83 GiB free; 1.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Train process is totally OK, but after the training and in the evaluation step, it occurs CUDA memory problem.

Some codes can't be found

Thanks for your great work! And I want to ask when can you update new version. This version can't work in the wild

About the best results

I tried to train you model using the 2D keypoints obtained by CPN as inputs,but i cant get results as your best epoch.I trained no a single 3090 and use 1 as seed.
Is there any different methods use on training progresses?or hyper-parameters?

training loss

hi, could you provide the script for noise prediction (with the corresponding reverse process) as the training loss? Thanks.

Question about the paper

Hi, thanks for your great work!
I have a question about the paper. In 2.2 of the paper, you say

Note that concurrent methods [20, 16, 12] also use diffusion models for this task, but they only report the upper bound of performance, which is not available in real-world applications.

For example, I can't find any ' upper bound of ' about the evaluation stage in [16] .
Hope for your reply, Thanks!

About the reverse process

Thanks for your interesting work~

I would like to know some details about the reverse process. Why did you consider applying the one-step solution for the reverse process? In my opinion, the multi-step method can produce higher quality results for generation tasks.

where is the model.py file??

HI!!
first of all, thank you for your working. Now, im following your instructions, but I stuck in steps...
When I run videopose_diffusion.py, i got a error message cause I cannot find some model.py file

  1. Put other files in ./in_the_wild folder to the ./common folder of their repo.
    And I wonder what is that meaning... should I have to copy that folder too?

Inference on in the wild videos

Hello,
I was looking into the inference part and noted that there were multiple skeletons inferenced, the exact shape was (5,5,no of frames,17,3). Now i looked into the other issue in which you told about the part of code in main.py which solves this problem, but i was unable to find the suitable code which is basically aggregating all the skeleton. Can you please guide me.

Video handling for video frames less than receptive field

Thanks @paTRICK-swk for the amazing work. I am having this issue when I try to run videos in the wild with frames less than the receptive field (243) the 2D and 3D predictions go out of sync. I found you are handling it by duplicating the last frame this is causing the desync. Could please guide me how should I handle this?

bug for rearrange

when i want to run "python main_draw.py -k cpn_ft_h36m_dbb -b 2 -c checkpoint -gpu 0 --nolog --evaluate h36m_best_epoch.bin -num_proposals 5 -sampling_timesteps 5 --render --viz-subject S11 --viz-action SittingDown --viz-camera 1" there is a bug for inputs_2d_p = rearrange(inputs_2d_p, 'b f c -> f c b') , i find the inputs_2d_p is torch.Size([2, 2356, 17, 2]).

MPI-INF-3DHP Generalization Better than Human3.6M?

Hi @paTRICK-swk ,

Thanks for your great work and public contribution. May I ask, it seems 3DHP (GT 2D)'s result 28.1 is even better than 35.4 on Human3.6M (Det 2D) (though K is different, I think it will not change) even if the model is trained on Human3.6M? I guess the performance on Human3.6M (GT) can be smaller than 20.

Any elaboration would be appreciated.:)

Thanks & regards,

Question about Reimplementation of MixSTE

I discovered in your paper, that you have written the results of MixSTE that replicated on your machine. I wonder whether you have changed something from the source code offered by MixSTE. Now, I'm confused about the initial hyperparameter weight of loss about MixSTE when replication.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.