uttaranb127 / speech2affective_gestures Goto Github PK

This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".

Home Page: https://gamma.umd.edu/s2ag/

License: MIT License

Python 100.00%

affective-computing text-processing speech-processing intelligent-agent virtual-agent gesture-generation

speech2affective_gestures's People

Contributors

Stargazers

Watchers

Forkers

birdflies ishine zf223669 jackzhousz jh-gglabs whitespur paperwave ddoron9 g20945tn-1

speech2affective_gestures's Issues

Could not run the code.

Hi, thanks for your work.
I am working on the gesture generation project, and I want to make some comparisons with your work.
I followed your readme, but I could not run the code successfully, even I fixed too many bugs.
Could you please review your code and make sure it can run following the readme?

Where is the trimodal_gen.pth.tar?

I can't find the file trimodal_gen.pth.tar used during training s2ag.

Not able to run the pre-trained model

Hi sir, I cloned the repo and installed all dependencies but when I am trying to run, it throws error saying unrecognoized arguments

Research Paper Link Broken

Hey Uttaran,

Awesome work on Gesture Generation using multiple modalities !
Just FYI the research Paper link is broken, you can update with this link [https://arxiv.org/abs/2108.00262].

Thanks.

Where is the "outputs/embedding_net.pth.tar"?

I find too many bugs in this repo, such as absoulte path.
And I could not find the file "outputs/embedding_net.pth.tar" in net/embedding_space_evaluator.py, line 24.
Could you check the code and make sure that it is can run following the README?

Running on a custom dataset

Hi,
How can I run the script on a custom dataset? I have been going through the code, but many things look like they are tailored to the Ted Dataset. If you could point me to some code, I would be grateful.

data_loader self.num_total_samples = 0

Hi,
In processor_v2.py, I found that the self.num_total_samples = 0, I try to debug and show that the variation n_samples in self.data_loader['train_data_s2ag']/[eval_data_s2ag]/[test_data_s2ag] is zero. what`s that problem?
Thanks!

custom text

Hi, by reading your report it seems that I cannot give to the model come custom audio or text right?

Error when running the code with the pretrained model

Hi, thanks for the great work!

I met an error when I tried to run the code with the pretrained model.

Followed the previous issues, I downloaded the files:
lmdb_test_s2ag_v2_cache_mfcc_14
lmdb_train_s2ag_v2_cache_mfcc_14
lmdb_val_s2ag_v2_cache_mfcc_14
vocab_models_s2ag
vocab_models
speaker_models
trimodal_gen.pth.tar
epoch_290_loss_-0.0048_model.pth.tar

Then I modified the basepath and the config yml, and tried the command "python main_v2.py --train-s2ag False --config ./config/multimodal_context_v2.yml".

However, I met this error:
Traceback (most recent call last):
File "main_v2.py", line 147, in
s2ag_epoch=290, make_video=True, save_pkl=True)
File "/data/sunyn/speech2affective_gestures/processor_v2.py", line 1436, in generate_gestures_by_dataset
s2ag_model_found = self.load_model_at_epoch(epoch=s2ag_epoch)
File "/data/sunyn/speech2affective_gestures/processor_v2.py", line 362, in load_model_at_epoch
self.s2ag_generator.load_state_dict(loaded_vars['gen_model_dict'])
File "/data/sunyn/miniconda3/envs/s2ag/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:

Missing key(s) in state_dict: "module.audio_encoder.conv1.weight", "module.audio_encoder.conv1.bias", "module.audio_encoder.batch_norm1.weight", "module.audio_encoder.batch_norm1.bias", "module.audio_encoder.batch_norm1.running_mean", "module.audio_encoder.batch_norm1.running_var", "module.audio_encoder.conv2.weight", "module.audio_encoder.conv2.bias",  ETC.

Unexpected key(s) in state_dict: "audio_encoder.conv1.weight", "audio_encoder.conv1.bias", "audio_encoder.batch_norm1.weight", "audio_encoder.batch_norm1.bias", "audio_encoder.batch_norm1.running_mean", "audio_encoder.batch_norm1.running_var", "audio_encoder.batch_norm1.num_batches_tracked", "audio_encoder.conv2.weight", "audio_encoder.conv2.bias", "audio_encoder.batch_norm2.weight", "audio_encoder.batch_norm2.bias",  ETC.

As I noticed that the only difference between the missing keys and the unexpected keys is "module.". Could you help me fix this bug? Thanks so much!

About the character 3D model

Hi，
I would like to ask how to get the character 3D model which you showed in your paper and your show case video?
Thanks!

main_v2.py: error: the following arguments are required: -c/--config

Dear Sir:
I run the main-v2.py with no arguments, it showed an error below:
main_v2.py: error: the following arguments are required: -c/--config
how could i fix it?
Thank you

Errors and missing files

Hi, Thanks for your great work!
I tried to test pretrained model on ted-db using config/multimodal_context_v2.yml file and set train-s2ag argument to False, but I faced this error :

File "main_v2.py", line 120, in <module>
    train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
  File "/content/speech2affective_gestures/loader_v2.py", line 589, in load_ted_db_data
    train_dataset = TedDBParamsMinimal(config_args.train_data_path[0])
  File "/content/speech2affective_gestures/loader_v2.py", line 442, in __init__
    self._make_speaker_model(self.lmdb_dir, precomputed_model)
AttributeError: 'TedDBParamsMinimal' object has no attribute '_make_speaker_model'

and there are two missing files from config/multimodal_context_v2.yml, I did not found them at repo :

wordembed_path: /mnt/q/Gamma/Gestures/src/data/fasttext/crawl-300d-2M-subword.bin
val_net_path: /mnt/q/Gamma/Gestures/src/Speech2Gestures/speech2affective_gestures/outputs/train_h36m_gesture_autoencoder/gesture_autoencoder_checkpoint_best.bin

Could you help me solving this error and downloading missing files?

Error when runing main_v2

Hello,
I am trying to run you great work with the command:
python main_v2.py -c config/multimodal_context_v2.yml
but got an error:
File "/home/zhewei.qiu/anaconda3/envs/s2ag-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1237, in leaky_relu
result = torch._C.nn.leaky_relu(input, negative_slope)
RuntimeError: CUDA error: no kernel image is available for execution on the device

I googled it downgrade my pytorch version to 1.5.0 but the error still persists.
Do you know how to fix this error? Any help will be appreciate!!

My torch version:
pytorch 1.5.0 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
torchvision 0.6.0 py37_cu102 pytorch

Baseline trimodal_gen model

Hi,
Thanks for the fantastic work.

Would you mind sharing with us your "trimodal_gen.pth.tar" pretrained model? I tried to use the one provided in the trimodal repo, but it doesn't work here. The stat_dict does not match.

I would like to do some comparisons in the research.

Affective Encoder

Hi
I could not find the source code for the affective encoder mentioned in your paper.
Could you point it out for me?
Thanks!

Model download

Hello
I can't find trimodal_gen.pth.tar

Output pose length does not match the input audio

Hi,

I am wondering why we have to divide the word and audio into several sequences during inference.
That somehow results in shorter output pose than input audio. Is there a way to fix that?

output format

Hi, do the output support bvh format?

Evaluate the quantitative performance on the Dataset

How would you evaluate the quantitative performance of your model on the genea_challenge_2020 dataset? I only found the code for evaluation on the TED dataset.

Can we use this to generate pose from text?

Hello and thank you for you work, this looks really good.
I was able to download and install the repo but I was unsure on how to use it,
Can we create gestures from text using this model? if so, then how?

Could you please provide training script and pre-trained model for Trinity Dataset?

Audio_feat_seq and text_feat_seq not uniform in the second dimension

Hello，When I retrained this model，the input of WavEncoder is [51236327], the output is [5121032], the input of TextEncoderTCN is [51234], the output is [5123432], it can be seen that the outputs of the two encoders is different in the second dimension, so it cannot be concat. How can I solve this problem?

Question about the shape of poses.

I find the the length of out_poses_trimodal and out_poses is not equal to clip_poses_resampled, so I can not do comparision.
Could you please tell me how to fix it?

uttaranb127 / speech2affective_gestures Goto Github PK

speech2affective_gestures's People

Contributors

Stargazers

Watchers

Forkers

speech2affective_gestures's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs