stanford-tml / edge Goto Github PK

View Code? Open in Web Editor NEW

430.0 430.0 60.0 5.07 MB

Official PyTorch Implementation of EDGE (CVPR 2023)

Home Page: https://edge-dance.github.io

License: MIT License

Python 95.17% Shell 0.67% Jupyter Notebook 4.16%

animation dance-generation diffusion-models pytorch

edge's People

Contributors

Stargazers

Watchers

edge's Issues

How to synchronize with sound

Thanks for sharing the interesting code.

I immediately tried test.py.
The result was about 150 seconds of input and 30 seconds of output (*.mp4).
Is it possible to synchronize the input and output and check the dance to the sound for this result?

test_2RicaUqd9Hg.mp4

NameError: name 'FbxAnimCurve' is not defined

Thanks for providing the interesting code.

Actually, I tried SMPL to FBX and got the following error

The execution environment is as follows

Ubuntu 20.04
Python 3.10
PyTorch 1.13.1

FBX SDK was installed according to the following procedure

FBX SDK Download (https://www.autodesk.com/developer-network/platform-technologies/fbx-sdk-2020-3)

mkdir -p /fbx-sdk/install
tar -zxvf /tmp/fbx202032_fbxpythonsdk_linux.tar.gz -C /fbx-sdk
/fbx-sdk/fbx202032_fbxpythonsdk_linux /fbx-sdk/install

If there is a way to fix this, please let me know.

when convert smpl to fbx, you will have an error in windows？

First , i using python==3.7, Python FBX SDK, in windows.
you should change this line:
fbxReadWrite.writeFbx(output_dir, pkl_name)
to:

baseName = os.path.basename(pkl_name)
fbxReadWrite.writeFbx(output_dir, baseName)

any other smpl format characters available? （except ybot）

I find it difficult to retarget mixamo character.

What's the input of evaluation?

When I run the eval_pfc.py, the reslut is always changing. But the number is 1.5363 or 1.6545 in the thesis. I wonder that my input is wrong. I am using the 20 pkl files generated by the model from the test set in AIST++.

the result .fbx file with no audio(music)

when I open the .fbx in SMPL-to-FBX/fbx_out with fbx review , but there is no music in it.

Clarification on Reproducing PFC Score from EDGE Paper

Dear EDGE Authors,

I am fascinated by your paper and wish to gain a better understanding of your methodology, particularly concerning the process to replicate the evaluation metrics detailed in your publication.

In your work, it's mentioned that for automatic evaluations such as PFC, beat alignment, Distk, and Distg, 5-second clips were obtained from each model using slices from the test music set with a 2.5-second stride. However, the process of deriving these 5-second clips from the AIST++ dataset remains unclear to me. The dataset test set, as far as I understand, comprises 20 musical pieces ranging from 8-15 seconds each.

I have attempted to replicate the PFC metrics using the following approaches:

I used the raw input music from the AIST++ test set, selecting the initial 5 seconds, resulting in a PFC of 1.6836428132824115.
I made use of pre-split slices in the test set, incorporating 186 samples, each 5 seconds long, which led to a PFC of 1.2385500567535723.
By employing the original implementation in the test.py file with an output length (interpreted as a random slice selection for motion generation), two generation runs yielded PFCs of 1.5676957425076252 and 1.7647031391283114, respectively.

I am trying to replicate the PFC score (1.5363) reported in your paper, and I would greatly appreciate your guidance in this matter. Please let me know if there are any misconceptions in my understanding.

Looking forward to your kind assistance.

code for metrics mesuring

Can you share the code for measuring the indicator?

Demo is not working

Hi thanks for the extensive research on this ! I was trying to see how the demo works in real time, but realise the site is down. https://edge-sandbox.com/. Will you all be fixing this or if any other friends here have been able to replicate this in real-time, do share it thanks !

Loading custom music (ValueError: empty range for randrange())

Hello, thanks for your amazing work!

I tried to use my custom music(.wav) to test the model following the README with the command python test.py --music_dir custom_music/. However, It raises ValueError. Can you get me how to manage this issue?

Thanks!

Computing features for input music
Slicing custom_music/gasoline.wav
Traceback (most recent call last):
  File "test.py", line 128, in <module>
    test(opt)
  File "test.py", line 81, in test
    rand_idx = random.randint(0, len(file_list) - sample_size)
  File "/opt/conda/envs/edge/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/opt/conda/envs/edge/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, -1, -1)

other format aside from smpl .pkl

Is it possible to train with other motion format such as .bvh/.fbx/ smplx/smplh and output in .bvh/.fbx/ smplx/smplh

data download

Hi! I'm trying to download and preprocess the dataset.
But in Github you said that it will take ~24 hrs and ~50 GB to precompute all the Jukebox features for the dataset.
If I'm just allowed 720 minutes (12 hours) to use the GPU server, how can I continue to preprocess the dataset without starting from the first for another next 12 hours?

Variable length animations

I'm loving this so far and it's produced some hilarious dances, but-- I apologize for being a rookie-- what's a method I could use to batch process songs of varying lengths, and render them at 30 or 60fps? I tried editing FbxReadWriter using GPT-4 but my attempts to edit the code didn't produce the results I wanted. Thanks!

How to compute the pfc of GT .pkls in ./data/test/motions?

Hi, @rodrigo-castellon @jtseng20
Thank you for your great work! I wonder how to compute the pfc index of GT .pkls in ./data/test/motions. When I run ./eval/eval_pfc.py with the motion_dir is ./data/test/motions(GT). Then I got "./data/test/motions has a mean of nan", then I found that code like

info = pickle.load(open(pkl, "rb"))
joint3d = info["full_pose"]

but info doesn't have "full_pose", instead being like:

So I just wonder what should I do to figure it out?
Best wishes!
Andy

why the foot is moving all the time?

amazing work!
i found in the result, the foot of dancer is moving, no matter when the music is. it seems like skating.
I wonder if the final skeleton global translation is right?

About the geometric loss and data normalizer

Hello authors, thank you for the very impressive work.

I interestingly noticed that the code used a normalizer to preprocess the pose data vectors before training the model. Then in the FK forward pass to compute the Fk loss and the foot loss, I saw the lines to unnormalize the data (L482-483) was commented out. From my intuition, after normalizing, the rotations may not be valid rotations anymore. So the FK forward may not work properly and I thought the pose should be unnormalized to fix this problem. I'm just not sure whether or not it is true for the case of 6D rotation, maybe I was wrong. Could you explain the reason for this? Thank you!

The reference code:

b, s, c = model_out.shape
# unnormalize
# model_out = self.normalizer.unnormalize(model_out)
# target = self.normalizer.unnormalize(target)
# X, Q
model_x = model_out[:, :, :3]
model_q = ax_from_6v(model_out[:, :, 3:].reshape(b, s, -1, 6))
target_x = target[:, :, :3]
target_q = ax_from_6v(target[:, :, 3:].reshape(b, s, -1, 6))

Easy installation

Hi. I'm trying to install this libary and its super hard. It combines installing pytorch3d and much more.
Is there any docker we can maybe create in order to maintain this project easily?
or at least a working guideline for every OS?

It's really making hard impact on developers trying to learn and see what you have done...

Normalization of COM acc

Thanks for sharing a very nice work!
I can't understand the meaning of partly normalization only for COM in formulation 10 in the paper. If the implausibility is attributed to foot velocities, COM acceleration in formulation 8 should just be a sign simbol(0 or 1).If the implausibility isn't only attributed to foot velocities, there may be a reasonable understanding. I'm confused on the PFC formulation, especially the partly normalization.Can you explain it in detail?

Length of motion used for calculating evaluation metrics

Your paper mentioned: "We compute these metrics on 5-second dance clips produced by each approach". However, in the codebase, you are generating 30s clips by default. Can you please clarify how you computed the metrics comparing other methods to yours?

There is no training through the diffusion process.

The model is just training inside the DanceDecoder without denoising training. Please confirm if there is any training involved for the denoising diffusion process as mentioned in the paper.

About Dance Editing

Thanks for sharing such great work!
I saw in the paper that EDGE is capable of Motion Editing. I want to know how to use them in the demo. Provided with first and last pose of motion.

The code for evaluating beat alignment and diversity

I am trying to run the code for this article, but I just find one evaluation metric—PFC in the code. If it is convenient for you, can you send me the code for evaluating beat alignment and diversity?

How to get smpl_offset

Hello,

I want to create a new dataset about motion, but the number of keypoints in my dataset and your dataset are different, so I need to adjust the visualization part accordingly. I have already found some existing issues in vis.py.

Could you please explain in detail the steps to determine the smpl_offset parameter in this document herevarious conditions? Additionally, providing some example code or more detailed explanations in the documentation would be immensely helpful for understanding and utilizing this parameter.

About PFC implementation and eval result

Thanks for the amazing work, authors.
I am trying to reproduce the result reported in the paper and have two questions.

What does the constant 10000 mean here in PFC's implementation? I can't map this to the formulation (10) in the paper.

EDGE/eval/eval_pfc.py

Line 50 in 17c3428

out = np.mean(scores) * 10000
I find it hard to reproduce the PFC result 1.5363 in the paper. Do you only use the test dataset of AIST++ to compute PFC? and what do you mean here in README to Generate ~1k samples? There are only 20 data and will be 186 pieces after slice in the test dataset.

Has anyone successfully reproduced the results? 🤔

Why use 2 time tokens?

self.to_time_tokens = nn.Sequential(
    nn.Linear(latent_dim * 4, latent_dim * 2),  # 2 time tokens
    Rearrange("b (r d) -> b r d", r=2),
)

In L278-L281 of model/model.py, what is the purpose of making 2 time tokens instead of just 1 time token ?

Maximo bone retarget

Hi， I have a problem about the maximo bone retarget. How does you match the bone?

How to calculate your beats align score?

How to calculate your beats align score? I did not find the metric calculation code. Thx.

SMPL-to-FBX convert error

Thanks for sharing the work!
I try to convert the pred motion to FBX but get this error:

Traceback (most recent call last):
  File "SMPL-to-FBX/Convert.py", line 45, in <module>
    fbxReadWrite.addAnimation(pkl_name, smpl_params)
  File "/EDGE/SMPL-to-FBX/FbxReadWriter.py", line 93, in addAnimation
    lCurve = node.LclRotation.GetCurve(lAnimLayer, "X", True)
AttributeError: 'NoneType' object has no attribute 'LclRotation'

Any help would be appreciated !

Why use x_start as the target in each timestep of diffusion training?

I have seen using noise, x_noisy or v_prediction, etc. as the training target, but each timestep uses x_start as the training target, which seems a bit strange. Can you explain it or provide relevant articles?

What is the best batch size without considering GPU memory?

Truly a masterpiece! And thank you for your willingness to share your work. I have a few questions about batch_size and would like to ask the following.
The paper mentions the use of 4 A100 graphics cards and batch size of 512. Is the model trained in this batch_size better? If I want to use a different batch_size, is there a recommended data set split ratio? What batch_size is the checkpoint.pt provided in GoogleDrive trained under?

Could you please provide the learning curve?

Could you provide the learning curve of loss over time during the training process, so that we can serve as a reference to make sure that the training process is in the correct direction when we are reproducing the results.

The test dataset is too small

I'm curious, the total length of the test dataset is not enough for the batch size 512 in the paper, how did you train it?
Is there any other data that needs to be added to the test dataset?

About the contents of default_config.yaml

Could you please provide the contents of the accelerate config file? That is the content of default_config.yaml

Load pretrain model not working

I've tried to load pretrain model for further training.
I've check torch.load() did work and I've set learning_rate small, so it should not look so different from original model.
but the result did not seems like it's been pretrain.

the modified code:
model = EDGE(opt.feature_type, learning_rate=0.000002, checkpoint_path="checkpoint.pt")
in train.py

Discrepancy code/paper about the L_simple loss

Hello,

First of all, thanks for your nice work.

I am trying to understand the training losses, and there is something I don't understand.
In the paper (equation 2, page 3), it is written that you are using the loss L_simple (L2 between x_gt and x_generated).

But when I looked at the code, it seems that this loss is multiplied by the "p2_loss_weight" term:

EDGE/model/diffusion.py

Line 464 in 17c3428

loss = loss * extract(self.p2_loss_weight, t, loss.shape)

which is defined here:
https://github.com/Stanford-TML/EDGE/blob/main/model/diffusion.py#L127-L131

Can you explain why and how the loss is scaled depending on the time step? I did not find this information in the paper.

Thanks for your help!

--
Mathis

Could you please upload a environment.yml file or a requirements.txt file.

Having lots of problems with installation of pytorch3D. It would help all of us who would try to use your repo

eta should set to 0 when DDIM sampling

EDGE/model/diffusion.py

Line 286 in 17c3428

 batch, device, total_timesteps, sampling_timesteps, eta = shape[0], self.betas.device, self.n_timestep, 50, 1 

Here is my directory hierarchy.

(edge) root@37faa67a6076:/app/SMPL-to-FBX# tree                                                                                                               │·····················································
.                                                                                                                                                             │·····················································
|-- Convert.py                                                                                                                                                │·····················································
|-- FbxFormatConverter.exe                                                                                                                                    │·····················································
|-- FbxReadWriter.py                                                                                                                                          │·····················································
|-- SmplObject.py                                                                                                                                             │·····················································
|-- __pycache__                                                                                                                                               │·····················································
|   |-- FbxReadWriter.cpython-38.pyc                                                                                                                          │·····················································
|   `-- SmplObject.cpython-38.pyc                                                                                                                             │·····················································
|-- fbx_out                                                                                                                                                   │·····················································
|-- smpl_samples                                                                                                                                              │·····················································
|   `-- test_gasoline_demo.pkl                                                                                                                                │·····················································
`-- ybot.fbx

Thank you!

stanford-tml / edge Goto Github PK

edge's People

Contributors

Stargazers

Watchers

Forkers

edge's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs