GithubHelp home page GithubHelp logo

uttaranb127 / speech2affective_gestures Goto Github PK

View Code? Open in Web Editor NEW
44.0 2.0 9.0 110.26 MB

This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".

Home Page: https://gamma.umd.edu/s2ag/

License: MIT License

Python 100.00%
affective-computing text-processing speech-processing intelligent-agent virtual-agent gesture-generation

speech2affective_gestures's People

Contributors

uttaranb127 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

speech2affective_gestures's Issues

Could not run the code.

Hi, thanks for your work.
I am working on the gesture generation project, and I want to make some comparisons with your work.
I followed your readme, but I could not run the code successfully, even I fixed too many bugs.
Could you please review your code and make sure it can run following the readme?

Research Paper Link Broken

Hey Uttaran,

Awesome work on Gesture Generation using multiple modalities !
Just FYI the research Paper link is broken, you can update with this link [https://arxiv.org/abs/2108.00262].

Thanks.

Where is the "outputs/embedding_net.pth.tar"?

I find too many bugs in this repo, such as absoulte path.
And I could not find the file "outputs/embedding_net.pth.tar" in net/embedding_space_evaluator.py, line 24.
Could you check the code and make sure that it is can run following the README?

Running on a custom dataset

Hi,
How can I run the script on a custom dataset? I have been going through the code, but many things look like they are tailored to the Ted Dataset. If you could point me to some code, I would be grateful.

data_loader self.num_total_samples = 0

Hi,
In processor_v2.py, I found that the self.num_total_samples = 0, I try to debug and show that the variation n_samples in self.data_loader['train_data_s2ag']/[eval_data_s2ag]/[test_data_s2ag] is zero. what`s that problem?
Thanks!

custom text

Hi, by reading your report it seems that I cannot give to the model come custom audio or text right?

Error when running the code with the pretrained model

Hi, thanks for the great work!

I met an error when I tried to run the code with the pretrained model.

Followed the previous issues, I downloaded the files:
lmdb_test_s2ag_v2_cache_mfcc_14
lmdb_train_s2ag_v2_cache_mfcc_14
lmdb_val_s2ag_v2_cache_mfcc_14
vocab_models_s2ag
vocab_models
speaker_models
trimodal_gen.pth.tar
epoch_290_loss_-0.0048_model.pth.tar

Then I modified the basepath and the config yml, and tried the command "python main_v2.py --train-s2ag False --config ./config/multimodal_context_v2.yml".

However, I met this error:
Traceback (most recent call last):
File "main_v2.py", line 147, in
s2ag_epoch=290, make_video=True, save_pkl=True)
File "/data/sunyn/speech2affective_gestures/processor_v2.py", line 1436, in generate_gestures_by_dataset
s2ag_model_found = self.load_model_at_epoch(epoch=s2ag_epoch)
File "/data/sunyn/speech2affective_gestures/processor_v2.py", line 362, in load_model_at_epoch
self.s2ag_generator.load_state_dict(loaded_vars['gen_model_dict'])
File "/data/sunyn/miniconda3/envs/s2ag/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:

Missing key(s) in state_dict: "module.audio_encoder.conv1.weight", "module.audio_encoder.conv1.bias", "module.audio_encoder.batch_norm1.weight", "module.audio_encoder.batch_norm1.bias", "module.audio_encoder.batch_norm1.running_mean", "module.audio_encoder.batch_norm1.running_var", "module.audio_encoder.conv2.weight", "module.audio_encoder.conv2.bias",  ETC.

Unexpected key(s) in state_dict: "audio_encoder.conv1.weight", "audio_encoder.conv1.bias", "audio_encoder.batch_norm1.weight", "audio_encoder.batch_norm1.bias", "audio_encoder.batch_norm1.running_mean", "audio_encoder.batch_norm1.running_var", "audio_encoder.batch_norm1.num_batches_tracked", "audio_encoder.conv2.weight", "audio_encoder.conv2.bias", "audio_encoder.batch_norm2.weight", "audio_encoder.batch_norm2.bias",  ETC.

As I noticed that the only difference between the missing keys and the unexpected keys is "module.". Could you help me fix this bug? Thanks so much!

About the character 3D model

Hi,
I would like to ask how to get the character 3D model which you showed in your paper and your show case video?
Thanks!

Errors and missing files

Hi, Thanks for your great work!
I tried to test pretrained model on ted-db using config/multimodal_context_v2.yml file and set train-s2ag argument to False, but I faced this error :

File "main_v2.py", line 120, in <module>
    train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
  File "/content/speech2affective_gestures/loader_v2.py", line 589, in load_ted_db_data
    train_dataset = TedDBParamsMinimal(config_args.train_data_path[0])
  File "/content/speech2affective_gestures/loader_v2.py", line 442, in __init__
    self._make_speaker_model(self.lmdb_dir, precomputed_model)
AttributeError: 'TedDBParamsMinimal' object has no attribute '_make_speaker_model'

and there are two missing files from config/multimodal_context_v2.yml, I did not found them at repo :

wordembed_path: /mnt/q/Gamma/Gestures/src/data/fasttext/crawl-300d-2M-subword.bin
val_net_path: /mnt/q/Gamma/Gestures/src/Speech2Gestures/speech2affective_gestures/outputs/train_h36m_gesture_autoencoder/gesture_autoencoder_checkpoint_best.bin

Could you help me solving this error and downloading missing files?

Error when runing main_v2

Hello,
I am trying to run you great work with the command:
python main_v2.py -c config/multimodal_context_v2.yml
but got an error:
File "/home/zhewei.qiu/anaconda3/envs/s2ag-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1237, in leaky_relu
result = torch._C.nn.leaky_relu(input, negative_slope)
RuntimeError: CUDA error: no kernel image is available for execution on the device

I googled it downgrade my pytorch version to 1.5.0 but the error still persists.
Do you know how to fix this error? Any help will be appreciate!!

My torch version:
pytorch 1.5.0 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
torchvision 0.6.0 py37_cu102 pytorch

Baseline trimodal_gen model

Hi,
Thanks for the fantastic work.

Would you mind sharing with us your "trimodal_gen.pth.tar" pretrained model? I tried to use the one provided in the trimodal repo, but it doesn't work here. The stat_dict does not match.

I would like to do some comparisons in the research.

Affective Encoder

Hi
I could not find the source code for the affective encoder mentioned in your paper.
Could you point it out for me?
Thanks!

Can we use this to generate pose from text?

Hello and thank you for you work, this looks really good.
I was able to download and install the repo but I was unsure on how to use it,
Can we create gestures from text using this model? if so, then how?

Audio_feat_seq and text_feat_seq not uniform in the second dimension

Hello,When I retrained this model,the input of WavEncoder is [51236327], the output is [5121032], the input of TextEncoderTCN is [51234], the output is [5123432], it can be seen that the outputs of the two encoders is different in the second dimension, so it cannot be concat. How can I solve this problem?

Question about the shape of poses.

I find the the length of out_poses_trimodal and out_poses is not equal to clip_poses_resampled, so I can not do comparision.
Could you please tell me how to fix it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.