GithubHelp home page GithubHelp logo

wham's People

Contributors

arthur151 avatar rohaana avatar yohanshin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wham's Issues

'Auxiliary SMPL-related data' url may broken

Thank you for share nice work!

I tried to google colab the day before yesterday. And it worked nicely.

But, I try to installation on local today. There are some problem.

On data preparation process, in fetch_demo_data.sh file,

Auxiliary SMPL-related data

wget "https://drive.google.com/uc?id=1pbmzRbWGgae6noDIyQOnohzaVnX_csUZ&export=download&confirm=t" -O 'dataset/body_models.tar.gz'
tar -xvf dataset/body_models.tar.gz -C dataset/

When i tried to activate getting data command, i got this message.

image

Why are the joints regressed from posed vertices?

Thanks for the great work! I have a question about the evaluation. Currently, the joint positions are computed for both GT and prediction by applying the joint regressor to the posed mesh vertices. This won't be the same as the actual joints of SMPL given as the joint output from the SMPL model (since the standard way is to apply the regressor after applying the shape blendshapes but before posing). Is there any particular reason for not using the joint output from SMPL? Thanks!

Training

Dear authors,

Thank you for this amazing work! May I know if you have a plan to release the code for training?

Some Questions Regarding WHAM

Hi,

Finally the colab notebook has been released, and I got to try this model on tennis footage. I was really impressed by the model's performance on tennis videos, and this model has been the most accurate model I have seen (I have been looking for 3D models that can perform well in tennis scenarios for the past few months)!

Kudos to the team for making such an excellent model! :)
Here's a video of the performance of the model, albeit it took me about 2 hours on colab to produce this!

output.mp4

I have a few questions regarding this output from the team, which I will number down below.

  1. As you can see, the right side of the video was blank for the greater part of this video, how exactly does the model detect which person to track and show on the right side? Can I choose the player of my choosing or turn off the right display?
  2. The model is highly accurate but as I mentioned before this output took me more than 2 hours on a Nvidia T4 gpu on google colab. I understand that 3D models are heavy, and I wanted to ask if I were to arrange an instance with multiple gpus available (for instance on Azure), will the model automatically be able to use the extra GPUs to speed up the render time? Moreover, which GPU would generally be preferred for this kind of processing?
  3. I want to move this 3d animation from video to blender. Originally VIBE had a plugin which was able to do this, the same script also worked for TCMR. I haven't tried it yet, but do you think the same script will also work in converting this model's output to blender? (I am talking about this script)
  4. I only want the model to detect the tennis players, not the players outside. I have a yolov8 model that is trained on only detecting the 2 tennis players. Is it possible to use that model on the detection phase? if not, is there any other suggested alternate from the team? (One other way could be by zooming the video)

Once again, I was really impressed by the model performance, I am sure this model will set a new benchmark for other HMR models in the future :)

eval EMDB

image
Exciting work. I have a problem with the error in the picture when I evaluate on EMDB(1). Do you know how to solve it? (Both 3DPW and RICH are runnable)

Can it running in realtime?

作者您好,我想问问这种方法是不是有机会做到实时输入视频流进行动捕呢,把RNN的状态暂存到下一帧做init的话🤔

About the provided pretrained models

Great work!
Want to know if you could release the pretrained models with different backbones( WHAM (Res)/(HR)) in the paper. I want to run the demo in my own pc in real-time(>30fps) way, but the WHAM(ViT) is time costly.
Thanks a lot!

Support SMPL-X estimation?

Nice work! I am wondering if the method can supprot reconstructing world 3D SMPL-X motions?

Thank you!

Question about contact conf?

Thanks for your great work. Do you have any idea why the foot_contact conf could be lower than 0 or higher than 1. thank you

How to improve estimation of arm and hand angles in extreme scenarios?

As I already checked many repos and papers, that work with SMPL fitting, I didn't get good apropriate result in hand/arm/forearm tracking in negative extreme angles. Here are few screenshots (left 4d humans, middle WHAM w/o new flag, right WHAM with 'run_smplify' flag. Is there any way how to improve this or finetune the model?
image
image
image
image

Export/Import to 3D software

I noticed in a closed comment about the output stating that

The future demo code with custom videos will support visualization and storing SMPL parameters.

Would it be possible for either yourself (owner) or another member of the opensource community to write a guide on importing the resulting data to visualize as a 3D model in Blender?

I'm just starting to research this field so I apologize if that is a simple or complicated task.

Best regards

mmpose installation

Thank you for the great work.
Hi, I ran into an error where mmpose is missing. Which version of mmpose are you using?

sorry found it, mmpose 0.24.0. It was in the vitpose.

Units

Thanks for this amazing work.

Do you know the unit of the chessboard on the demo videos? How can we get body displacements in meters?

output

I was able to install and run the evaluation code, but where is the output? I couldnt find it. I just found a yaml file on the experiments folder.

Import PKL to Blender

I saw that there was another issue with the tip to get the code from VIBE, I did some rewrite to that code and put in a blend file to be easier to use.

Just need to install joblib in blender (there is a tab with the installation script) and add the SMPL fbx for male on the same folder as you have the blend file, and put the path to the pkl file.

Hope its useful
loading_WHAM_import_SMPL_FINAL.zip

Inquiry Regarding Model Output

Dear authors,

Thank you for the amazing paper. I have a question regarding output If i want to extract the 3D pose from your model is it pred["pose"] or pred["poses_body"] and can you please tell me the difference between both.

How to generate coco_aug_dict.pth file?

Thanks for your work,i have some questions about "self.aug_dict = torch.load(_C.KEYPOINTS.COCO_AUG_DICT) "
pmask, jittering, peak, bias both with the size of [1,1,17].
What criteria did you follow to generate these parameters? Could you provide the related code?

recovering SMPL coefficients

I'd like to extract SMPL coefficients like humannerf did (https://github.com/chungyiweng/humannerf?tab=readme-ov-file#metadatajson)

after WHAM, I could get results (['poses_body', 'poses_root_cam', 'betas', 'verts_cam', 'poses_root_world', 'trans_world', 'frame_id']),
and needed coefficients for humannerf are "poses", "betas", "cam_intrinsics", "cam_extrinsics"

I could easily find that "betas" are exactly same as results[0]["betas"],
and it seems "cam_extrinsics" is results[0]["poses_root_world"] + results[0]["trans_world"].

But in humannerf or VIBE, "poses" were (72,) array, while results[0]['poses_body'] were (23,3,3).
also I'm not sure about recovering cam_intrinsics. Maybe results[0]["poses_root_cam"] means cam_intrinsics?

Could you please let me know how to recover these coefficients?
It would be great if you also introduce me (poses_root_cam, verts_cam).

RecursionError when running DPVO

Thank you for share nice work!
I got this error in preprocessing stage:

  File "/root/WHAM/demo.py", line 223, in <module>
    run(cfg, 
  File "/root/WHAM/demo.py", line 76, in run
    slam_results = slam.process()
  File "/root/WHAM/lib/models/preproc/slam.py", line 70, in process
    return self.slam.terminate()[0]
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/dpvo.py", line 166, in terminate
    poses = [self.get_pose(t) for t in range(self.counter)]
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/dpvo.py", line 166, in <listcomp>
    poses = [self.get_pose(t) for t in range(self.counter)]
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/dpvo.py", line 158, in get_pose
    return dP * self.get_pose(t0)
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/dpvo.py", line 158, in get_pose
    return dP * self.get_pose(t0)
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/dpvo.py", line 158, in get_pose
    return dP * self.get_pose(t0)
  [Previous line repeated 986 more times]
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/lietorch/groups.py", line 203, in __mul__
    return self.mul(other)
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/lietorch/groups.py", line 151, in mul
    return self.__class__(self.apply_op(Mul, self.data, other.data))
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/lietorch/groups.py", line 127, in apply_op
    inputs, out_shape = broadcast_inputs(x, y)
  File "/root/miniconda3/envs/wham/lib/python3.9/site-packages/dpvo/lietorch/broadcasting.py", line 28, in broadcast_inputs
    x1 = x.repeat(x_expand + [1]).reshape(-1, xd).contiguous()
RecursionError: maximum recursion depth exceeded while calling a Python object

Installing WHAM on windows

I was trying since last week to install on windows but I was having problems with DPVO and lietorch.

agreat guy called @psiydown helped a lot and another person from the lietorch github.

I created this issue so it could be easier to people to find how to process that part.

I'll add here the notes I used as reference and the files that I had to change to compile DPVO.

pytorch3d access denied

While following the installation instructions, I've got this error.

image

I thought, ok, I'm usgin windows, which was not made to work with, so I went to the link from the pip installation, and I got this error

image

In any case, I'll try toinstall the default pytorch3d, but I wanted to let you knwo abiut what happaned, maybe it can help on somethng.

jitter problem when using other docker image

when I use the official docker image : yusun9/wham-vitpose-dpvo-cuda11.3-python3.9:latest
the wham output is fine, the animation is smooth.

But when I make my own docker image, base on pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
the wham output suddenly have jitter problem

I suspect the reason is pytorch version different, does the code support pytorch 2.2.1 ?

Regarding the issue of predicting 3D points

How can I map the network-predicted 31 points “pred_kp3d” (ranging from -1 to 1) back to the original image? I attempted the following operation to project: 'ratio = 1.0 / 224; points_3d = (points_3d + 1.0) / (2 * ratio)', but found that the positions were not accurate.

Can camera real-time inference be performed on features and 3D joint positions?

1.Can faster methods be used for feature extraction, and currently, a single batch of feature extraction can only be done at 15fps?

2.Can we obtain 3D joint positions in the Integrate features stage without using SMPL and extract them in context? What is the format?

3.Also, can the smoothing filter parameters be adjusted? If it is too smooth by default, the details of small actions cannot be seen.

4.Can it support both palm and toe inference simultaneously?

After commit with --run_smplify getting issue "No such file or directory: 'configs/yamls/demo_w_fit.yaml'"

(wham) ubuntu@ip-172-31-38-235:~/WHAM$ python demo.py --video examples/tennis1.mp4 --visualize --run_smplify
apex is not installed
apex is not installed
apex is not installed
/home/ubuntu/miniconda3/envs/wham/lib/python3.9/site-packages/mmcv/cnn/bricks/transformer.py:27: UserWarning: Fail to import ``MultiScaleDeformableAttention`` from ``mmcv.ops.multi_scale_deform_attn``, You should install ``mmcv-full`` if you need this module.
  warnings.warn('Fail to import ``MultiScaleDeformableAttention`` from '
Traceback (most recent call last):
  File "/home/ubuntu/WHAM/demo.py", line 207, in <module>
    cfg.merge_from_file('configs/yamls/demo_w_fit.yaml')
  File "/home/ubuntu/miniconda3/envs/wham/lib/python3.9/site-packages/yacs/config.py", line 211, in merge_from_file
    with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/yamls/demo_w_fit.yaml'
```.
Before pulling the new version it was working.

Image ViT feature dimensions: 1x1024 v.s. 192x1280

Hi @yohanshin , I'm here again with some new questions😄
Which HMR2.0 variant (a or b) are you using to extract ViT features? I've also noticed that both variants yield a feature dimension of 1280 at the end of the image backbone. More specifically, the input is 256 x 192 x 3, while the output is (16*12) x 1280. "I am confused as to why your pre-saved ViT feature for an image is 1 x 1024. Could you give me some advice?

When running the demo, the program gets stuck

Dear authors,

Thank you for this amazing work!

After I installed it successfully, when I ran the demo, the program got stuck in the transform function in projective_ops.py in the dpvo module.

image
image

Anyone else having this problem or have an idea where to start in solving it?

Questions on SMPL Rendering in Video Demo and Initialization of Trajectory Decoder

Hello, thanks for the great work! I am currently trying to understand your paper and have a couple of questions:

  1. SMPL Rendering in Video Demo: In the video demonstration where SMPL is rendered onto the observed image, are the root-global-orientation and camera-parameters predicted by the motion-decoder being used? Also, is it feasible to substitute these with the global-orientation provided by the trajectory-refiner? If so, how might this change affect the results?
  2. Trajectory Decoder Initialization: Regarding the initialization of the trajectory decoder, are there specific requirements for the input root pose (6d)? For instance, does it need to align with the direction of gravity?

J_regressor_wham.npy where down it?

python demo.py --video dance.mp4 --visualize
2024-03-04 19:44:25.282 | INFO | main::27 - DPVO is not properly installed. Only estimate in local coordinates !
2024-03-04 19:44:25.316 | INFO | main::209 - GPU name -> NVIDIA GeForce RTX 2060
2024-03-04 19:44:25.316 | INFO | main::210 - GPU feat -> CudaDeviceProperties(name='NVIDIA GeForce RTX 2060', major=7, minor=5, total_memory=12287MB, multi_processor_count=34)
Traceback (most recent call last):
File "C:\ia\wham\WHAM\demo.py", line 214, in
smpl = build_body_model(cfg.DEVICE, smpl_batch_size)
File "C:\ia\wham\WHAM\lib\models_init
.py", line 12, in build_body_model
body_model = SMPL(
File "C:\ia\wham\WHAM\lib\models\smpl.py", line 25, in init
J_regressor_wham = np.load(_C.BMODEL.JOINTS_REGRESSOR_WHAM)
File "C:\Users\ultim\miniconda3\envs\wham\lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/body_models/J_regressor_wham.npy'

any idea?

Shrinking Issue with WHAM

Hi!

We just figured out the issue of the heavy foot sliding and jittering it's because of the shrinking!

output.mp4

Please see this video below.

Thank you!

Questions about trajectory inference time

  1. Does SLAM part need first go through all videos so that it can be used in realtime application?
  2. Does the global trajectory predict consumes a lot time?
  3. Except the 2d video, does there any additional input information needed?

Visualize joints on the 3D mesh

Hi,

This is really great work. One of the best repositories which uses SMPL.
I wanted to know if there is a way to visualize the joints on top of the 3D mesh using WHAM?
Also, check the angles between the joints?

I want to demo a video using WHAM where I can show angles overlaid on the 3D model.

Thank you!

About how to generate parsed data

Nice work! I can frankly speak that WHAM is the most well-performed method I have ever seen in this literature, and I'm now trying to train WHAM and do evaluations on my own dataset. I have noticed that when running stage1 training following the guidance provided by the train branch, the "3dpw_val_vit.pth" is missing. Though the problem could be solved by simply renaming "3dpw_test_vit.pth" to "3dpw_val_vit.pth", as mentioned in another issue, I'm still wondering if I can generate the parsed ViT version of datasets on my own :)

3DPW with demo code

Hi @yohanshin ,
Thanks for your outstanding work!

I tried to run some demo in 3DPW dataset. However, the output video from demo.py ( I used groundtruth detection and tracking) and from evaluate_3dpw.py quite different. Do you have any suggestions for resolving this issue?

downtown_runForBus_00.mp4
downtown_runForBus_00_0.mp4

Estimate that both feet are above ground level

Hello @yohanshin
The video shows walking on flat ground, but the estimated action will become walking on a slope. And the feet are higher than the ground level

cameras = renderer.create_camera(global_R[frame_i3], global_T[frame_i3])

Using example videos\examples\IMG_9730.mov. Change the "global-R [frame_i3]" in this line of code to "default-R" to output the video from the camera at zero angle, and you will be able to see the issue I mentioned.
img

Clarification about wham_results.

My understanding from the paper is that the joint angles (results["poses_body"] = wham_inference(.)) are deltas from the nominal pose of smpl, is this correct? if so, is the ok to get the nominal position of the joints with smpl.get_output(.).original_pose? and what about the nominal orientation of the joint frames?
Additionally, results["poses_body"] is a 23xR_{3x3} matrix while smpl joint count is 24; are the poses_body the joint angles of j1 to j23, leaving j0 to be the root of the body?

Thank you in advance for the response and great work!!

ValueError: need at least one array to concatenate

Thank you for the code and repo! When running on some custom videos, I get the following errors, but the example file works as expected:

Traceback (most recent call last):
  File "/home/WHAM/demo.py", line 145, in <module>
    run(cfg, args.video, output_pth, network, args.calib, visualize=args.visualize)
  File "/home/WHAM/demo.py", line 107, in run
    run_vis_on_demo(cfg, video, results, output_pth, network.smpl, vis_global=True)
  File "/home/WHAM/lib/vis/run_vis.py", line 43, in run_vis_on_demo
    renderer.set_ground(scale, cx.item(), cz.item())
  File "/home/WHAM/lib/vis/renderer.py", line 171, in set_ground
    v, f, vc, fc = map(torch.from_numpy, checkerboard_geometry(length=length, c1=center_x, c2=center_z, up="y"))
  File "/home/WHAM/lib/vis/tools.py", line 207, in checkerboard_geometry
    vertices = np.concatenate(vertices, axis=0).astype(np.float32)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate```

multiple person video input

Dear authors,

Does the current demo support videos with multiple people?

image

It seems only one person is visualized on the right side of the demo.

An Immobile person have the pelvis bones that move up/down

Hi,

I noticed that in any video where the person don't move, in WHAM the pelvis is moving and it make the immobile person move down/up sometimes left/right.

This problem is with the pelvis bone, that can't recognize that the person is immobile and not moving.

A sample video is not required, any recorded video of someone not moving foot but only hands will make this bug appear.

Thank you.

How to use "transl" in the dataset/parsed_data/rich_test_vit.pth

Hi @yohanshin,
Can you give me some suggestions on how to use the transl in rich_test_vit.pth to recover the ground truth global motion?

I found the transls are the same for index 6 and 37 (test/Gym_012_lunge1/cam_05 and test/Gym_012_lunge1/cam_04). labels["transl"][6][1:] != labels["transl"][37][1:]).sum() gives 0. Therefore, I think it means that the transl is in the world coordinate. However, the results of the computed joints3d are different from the provided joints3d (joints3d = labels["joints3D"][index][1:]). Could you help me to understand the differences?

smplx_out = smplx_models[gender](
    body_pose=pose[:, 3:-6].reshape(-1, 63),
    global_orient=pose[:, :3],
    betas=betas,
)
smpl_verts = torch.matmul(smplx2smpl, smplx_out.vertices)  # in camera view, but zero-centered

transl_ = (cam_poses[:, :, :3] @ transl[:, :, None]  +cam_poses[:, :, 3:]).squeeze(2) 
smpl_verts = smpl_verts + transl_[:, None, :]
smpl_joints_c = smpl_J_regressor @ smpl_verts  # (F, 24, 3)
(smpl_joints_c - joints3d).abs().max()  # 0.0971

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.