stanford-tml / edge Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch Implementation of EDGE (CVPR 2023)
Home Page: https://edge-dance.github.io
License: MIT License
Official PyTorch Implementation of EDGE (CVPR 2023)
Home Page: https://edge-dance.github.io
License: MIT License
Thanks for sharing the interesting code.
I immediately tried test.py.
The result was about 150 seconds of input and 30 seconds of output (*.mp4).
Is it possible to synchronize the input and output and check the dance to the sound for this result?
Thanks for providing the interesting code.
Actually, I tried SMPL to FBX and got the following error
The execution environment is as follows
Ubuntu 20.04
Python 3.10
PyTorch 1.13.1
FBX SDK was installed according to the following procedure
FBX SDK Download (https://www.autodesk.com/developer-network/platform-technologies/fbx-sdk-2020-3)
mkdir -p /fbx-sdk/install
tar -zxvf /tmp/fbx202032_fbxpythonsdk_linux.tar.gz -C /fbx-sdk
/fbx-sdk/fbx202032_fbxpythonsdk_linux /fbx-sdk/install
If there is a way to fix this, please let me know.
First , i using python==3.7, Python FBX SDK, in windows.
you should change this line:
fbxReadWrite.writeFbx(output_dir, pkl_name)
to:
baseName = os.path.basename(pkl_name)
fbxReadWrite.writeFbx(output_dir, baseName)
I find it difficult to retarget mixamo character.
When I run the eval_pfc.py, the reslut is always changing. But the number is 1.5363 or 1.6545 in the thesis. I wonder that my input is wrong. I am using the 20 pkl files generated by the model from the test set in AIST++.
when I open the .fbx in SMPL-to-FBX/fbx_out with fbx review , but there is no music in it.
Dear EDGE Authors,
I am fascinated by your paper and wish to gain a better understanding of your methodology, particularly concerning the process to replicate the evaluation metrics detailed in your publication.
In your work, it's mentioned that for automatic evaluations such as PFC, beat alignment, Distk, and Distg, 5-second clips were obtained from each model using slices from the test music set with a 2.5-second stride. However, the process of deriving these 5-second clips from the AIST++ dataset remains unclear to me. The dataset test set, as far as I understand, comprises 20 musical pieces ranging from 8-15 seconds each.
I have attempted to replicate the PFC metrics using the following approaches:
I am trying to replicate the PFC score (1.5363) reported in your paper, and I would greatly appreciate your guidance in this matter. Please let me know if there are any misconceptions in my understanding.
Looking forward to your kind assistance.
Can you share the code for measuring the indicator?
Hi thanks for the extensive research on this ! I was trying to see how the demo works in real time, but realise the site is down. https://edge-sandbox.com/. Will you all be fixing this or if any other friends here have been able to replicate this in real-time, do share it thanks !
Hello, thanks for your amazing work!
I tried to use my custom music(.wav) to test the model following the README with the command python test.py --music_dir custom_music/
. However, It raises ValueError. Can you get me how to manage this issue?
Thanks!
Computing features for input music
Slicing custom_music/gasoline.wav
Traceback (most recent call last):
File "test.py", line 128, in <module>
test(opt)
File "test.py", line 81, in test
rand_idx = random.randint(0, len(file_list) - sample_size)
File "/opt/conda/envs/edge/lib/python3.8/random.py", line 248, in randint
return self.randrange(a, b+1)
File "/opt/conda/envs/edge/lib/python3.8/random.py", line 226, in randrange
raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, -1, -1)
Is it possible to train with other motion format such as .bvh/.fbx/ smplx/smplh and output in .bvh/.fbx/ smplx/smplh
Hi! I'm trying to download and preprocess the dataset.
But in Github you said that it will take ~24 hrs and ~50 GB to precompute all the Jukebox features for the dataset.
If I'm just allowed 720 minutes (12 hours) to use the GPU server, how can I continue to preprocess the dataset without starting from the first for another next 12 hours?
I'm loving this so far and it's produced some hilarious dances, but-- I apologize for being a rookie-- what's a method I could use to batch process songs of varying lengths, and render them at 30 or 60fps? I tried editing FbxReadWriter using GPT-4 but my attempts to edit the code didn't produce the results I wanted. Thanks!
Hi, @rodrigo-castellon @jtseng20
Thank you for your great work! I wonder how to compute the pfc index of GT .pkls in ./data/test/motions. When I run ./eval/eval_pfc.py with the motion_dir is ./data/test/motions(GT). Then I got "./data/test/motions has a mean of nan", then I found that code like
info = pickle.load(open(pkl, "rb"))
joint3d = info["full_pose"]
but info doesn't have "full_pose", instead being like:
So I just wonder what should I do to figure it out?
Best wishes!
Andy
amazing work!
i found in the result, the foot of dancer is moving, no matter when the music is. it seems like skating.
I wonder if the final skeleton global translation is right?
Hello authors, thank you for the very impressive work.
I interestingly noticed that the code used a normalizer to preprocess the pose data vectors before training the model. Then in the FK forward pass to compute the Fk loss and the foot loss, I saw the lines to unnormalize the data (L482-483) was commented out. From my intuition, after normalizing, the rotations may not be valid rotations anymore. So the FK forward may not work properly and I thought the pose should be unnormalized to fix this problem. I'm just not sure whether or not it is true for the case of 6D rotation, maybe I was wrong. Could you explain the reason for this? Thank you!
The reference code:
b, s, c = model_out.shape
# unnormalize
# model_out = self.normalizer.unnormalize(model_out)
# target = self.normalizer.unnormalize(target)
# X, Q
model_x = model_out[:, :, :3]
model_q = ax_from_6v(model_out[:, :, 3:].reshape(b, s, -1, 6))
target_x = target[:, :, :3]
target_q = ax_from_6v(target[:, :, 3:].reshape(b, s, -1, 6))
Hi. I'm trying to install this libary and its super hard. It combines installing pytorch3d and much more.
Is there any docker we can maybe create in order to maintain this project easily?
or at least a working guideline for every OS?
It's really making hard impact on developers trying to learn and see what you have done...
Thanks for sharing a very nice work!
I can't understand the meaning of partly normalization only for COM in formulation 10 in the paper. If the implausibility is attributed to foot velocities, COM acceleration in formulation 8 should just be a sign simbol(0 or 1).If the implausibility isn't only attributed to foot velocities, there may be a reasonable understanding. I'm confused on the PFC formulation, especially the partly normalization.Can you explain it in detail?
Your paper mentioned: "We compute these metrics on 5-second dance clips produced by each approach". However, in the codebase, you are generating 30s clips by default. Can you please clarify how you computed the metrics comparing other methods to yours?
The model is just training inside the DanceDecoder without denoising training. Please confirm if there is any training involved for the denoising diffusion process as mentioned in the paper.
Thanks for sharing such great work!
I saw in the paper that EDGE is capable of Motion Editing. I want to know how to use them in the demo. Provided with first and last pose of motion.
I am trying to run the code for this article, but I just find one evaluation metric—PFC in the code. If it is convenient for you, can you send me the code for evaluating beat alignment and diversity?
Hello,
I want to create a new dataset about motion, but the number of keypoints in my dataset and your dataset are different, so I need to adjust the visualization part accordingly. I have already found some existing issues in vis.py.
Could you please explain in detail the steps to determine the smpl_offset parameter in this document herevarious conditions? Additionally, providing some example code or more detailed explanations in the documentation would be immensely helpful for understanding and utilizing this parameter.
Thanks for the amazing work, authors.
I am trying to reproduce the result reported in the paper and have two questions.
What does the constant 10000
mean here in PFC's implementation? I can't map this to the formulation (10) in the paper.
Line 50 in 17c3428
I find it hard to reproduce the PFC result 1.5363 in the paper. Do you only use the test dataset of AIST++ to compute PFC? and what do you mean here in README to Generate ~1k samples? There are only 20 data and will be 186 pieces after slice in the test dataset.
Has anyone successfully reproduced the results? 🤔
self.to_time_tokens = nn.Sequential(
nn.Linear(latent_dim * 4, latent_dim * 2), # 2 time tokens
Rearrange("b (r d) -> b r d", r=2),
)
In L278-L281 of model/model.py, what is the purpose of making 2 time tokens instead of just 1 time token ?
How to calculate your beats align score? I did not find the metric calculation code. Thx.
Thanks for sharing the work!
I try to convert the pred motion to FBX but get this error:
Traceback (most recent call last):
File "SMPL-to-FBX/Convert.py", line 45, in <module>
fbxReadWrite.addAnimation(pkl_name, smpl_params)
File "/EDGE/SMPL-to-FBX/FbxReadWriter.py", line 93, in addAnimation
lCurve = node.LclRotation.GetCurve(lAnimLayer, "X", True)
AttributeError: 'NoneType' object has no attribute 'LclRotation'
Any help would be appreciated !
I have seen using noise, x_noisy or v_prediction, etc. as the training target, but each timestep uses x_start as the training target, which seems a bit strange. Can you explain it or provide relevant articles?
Truly a masterpiece! And thank you for your willingness to share your work. I have a few questions about batch_size
and would like to ask the following.
The paper mentions the use of 4 A100 graphics cards and batch size
of 512. Is the model trained in this batch_size
better? If I want to use a different batch_size
, is there a recommended data set split ratio? What batch_size
is the checkpoint.pt
provided in GoogleDrive trained under?
Could you provide the learning curve of loss over time during the training process, so that we can serve as a reference to make sure that the training process is in the correct direction when we are reproducing the results.
Could you please provide the contents of the accelerate config file? That is the content of default_config.yaml
I've tried to load pretrain model for further training.
I've check torch.load()
did work and I've set learning_rate
small, so it should not look so different from original model.
but the result did not seems like it's been pretrain.
the modified code:
model = EDGE(opt.feature_type, learning_rate=0.000002, checkpoint_path="checkpoint.pt")
in train.py
Hello,
First of all, thanks for your nice work.
I am trying to understand the training losses, and there is something I don't understand.
In the paper (equation 2, page 3), it is written that you are using the loss L_simple (L2 between x_gt and x_generated).
But when I looked at the code, it seems that this loss is multiplied by the "p2_loss_weight" term:
Line 464 in 17c3428
which is defined here:
https://github.com/Stanford-TML/EDGE/blob/main/model/diffusion.py#L127-L131
Can you explain why and how the loss is scaled depending on the time step? I did not find this information in the paper.
Thanks for your help!
--
Mathis
Having lots of problems with installation of pytorch3D. It would help all of us who would try to use your repo
Line 286 in 17c3428
I encountered a problem during the final generation of the fbx file called 'NameError: name' FbxAnimCurve 'is not defined'. May I know how to resolve this issue? Thank you for your answer
Hello, I'm trying to visualize the .pkl
output from running test.py
with --save_motions
option with following command.
python SMPL-to-FBX/Convert.py --input_dir SMPL-to-FBX/smpl_samples/ --output_dir SMPL-to-FBX/fbx_out
But it says Segmentation fault (core dumped)
and never works. I already installed Python FBX.
Here is my directory hierarchy.
(edge) root@37faa67a6076:/app/SMPL-to-FBX# tree │·····················································
. │·····················································
|-- Convert.py │·····················································
|-- FbxFormatConverter.exe │·····················································
|-- FbxReadWriter.py │·····················································
|-- SmplObject.py │·····················································
|-- __pycache__ │·····················································
| |-- FbxReadWriter.cpython-38.pyc │·····················································
| `-- SmplObject.cpython-38.pyc │·····················································
|-- fbx_out │·····················································
|-- smpl_samples │·····················································
| `-- test_gasoline_demo.pkl │·····················································
`-- ybot.fbx
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.