eeskimez / emotalkingface Goto Github PK

View Code? Open in Web Editor NEW

161.0 161.0 29.0 86.27 MB

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"

License: MIT License

Python 100.00%

emotalkingface's People

Contributors

Stargazers

Watchers

emotalkingface's Issues

about the training receipy

Thank you very much for open source your code and model. I tried to trained your code. It works fine. I just want to know how many epoch did you train for the pre-train disc emotion and pre-train the generator. Thanks.

已创建中文的讨论组想加入的请添加微信xaaheng

how to preserve a better identity?

When I use an image of mine it is distorted a lot and does not look like the original face, is there any parameter to improve the identity?

Error in Data preparation

When I run the following code: python .\data_prep\prepare_data.py -i \25_fps_video_folder\ -o \output_folder --mode 1 --nw 1
I faced with this problem:
TypeError: Can't parse 'center'. Sequence item with index 0 has a wrong type

Can you please give your suggestions to handle it?

How to improve clarity 256x256

The image size is 128x128。 I want to train a 256x256，what should I do？thank you？

About personal test wav result

Thanks for your great work! I use the original wav extracted from flv to test the pretrained model, it will return the good result. However, when I use my original wav, sometime the image is blur and deformed. Can you give me some suggestion to solve it. Thanks in davace.

Error when pre-train emoticon discriminator!

Traceback (most recent call last):
File "train.py", line 204, in
train()
File "train.py", line 130, in train
train_loader = torch.utils.data.DataLoader(trainDset,
File "C:\Users\INHA\anaconda3\envs\face\lib\site-packages\torch\utils\data\dataloader.py", line 268, in init
sampler = RandomSampler(dataset, generator=generator)
File "C:\Users\INHA\anaconda3\envs\face\lib\site-packages\torch\utils\data\sampler.py", line 102, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

How to train gan for face movements without caring about emotions

Hi everyone, is there anyway to train this model to work on lips movement without caring about emotions? I'm trying to use my dataset but it does not have emotion labels. I was wondering if there was anyway to replicate the lips movement from a video to an image without having to label the emotions for every video since they are video taken from youtube and in different languages.

how to continue training

Hi, my pc crashed while training the model. Is there anyway to continue the training from the last save?

Recreating training from scratch

Hi, I am trying to retrain everything from scratch.

Could I request further details on the training hyperparameters used for all three stages: Emotion discrimination, pretraining, training? Or perhaps on the final expected loss values for those plotted on tensorboard?

As mentioned in another issue raised using the pretrained model is fine. However, trying to train the model from scratch (as instructed via github) outputs different results. At best I have only been able to simulate opening of the mouth. I noticed that there are some discrepancies between the paper and the source code. (e.g. Discriminator's LR was said to be all 1e-4 but are defaulted to lower values inside train.py, frames per sample is stated to be 32 in the paper but is defaulted to 25 in train.py). I have also cut the training to shorter than the default 1000 epochs since 100k iterations stated in the paper seem to be around 100 epochs only if I understand this correctly.

Apologies for the nitpicking. I'm trying to recreate the retraining and am at a loss. Your response would greatly help me in the reducing the time for experimenting and recreating the training.

May I ask how much the speed can be achieved during the test

Training gives same frame in whole video

Hi,
We have been trying to run the code. The pretrained model which you have provided works perfectly fine, however when we train the model from scratch, the generated output is always same frame in the whole video with audio running in background and no changes in facial or lip features.
What could be the potential reasons for this observation? Was such an observation noticed by you while doing training?

The following are the changes we did to the code:

We have been trying to train this model on only 2 emotions: Happy and Sad. Rest all emotions are removed when creating the dataset. Also we selected only subset of dataset for these 2 emotions (around 500 videos)
Pretraining of discriminator and generator for 5 epochs and performed the joint training for 7 epochs

Is this due to incorrect dataset preparation or absence of other emotions (like Neutral face) or incomplete training (very less epochs) or any other reason?

Thanks in advance

result message : Image file is not valid...

Could you guess what I miss?

about evaluation results

Can u tell me how can I get the results of SSIM and PSNR

eeskimez / emotalkingface Goto Github PK

emotalkingface's People

Contributors

Stargazers

Watchers

Forkers

emotalkingface's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs