eeskimez / emotalkingface Goto Github PK
View Code? Open in Web Editor NEWThe code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
License: MIT License
The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
License: MIT License
Thank you very much for open source your code and model. I tried to trained your code. It works fine. I just want to know how many epoch did you train for the pre-train disc emotion and pre-train the generator. Thanks.
When I use an image of mine it is distorted a lot and does not look like the original face, is there any parameter to improve the identity?
The image size is 128x128。 I want to train a 256x256,what should I do?thank you?
Thanks for your great work! I use the original wav extracted from flv to test the pretrained model, it will return the good result. However, when I use my original wav, sometime the image is blur and deformed. Can you give me some suggestion to solve it. Thanks in davace.
Traceback (most recent call last):
File "train.py", line 204, in
train()
File "train.py", line 130, in train
train_loader = torch.utils.data.DataLoader(trainDset,
File "C:\Users\INHA\anaconda3\envs\face\lib\site-packages\torch\utils\data\dataloader.py", line 268, in init
sampler = RandomSampler(dataset, generator=generator)
File "C:\Users\INHA\anaconda3\envs\face\lib\site-packages\torch\utils\data\sampler.py", line 102, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
Hi everyone, is there anyway to train this model to work on lips movement without caring about emotions? I'm trying to use my dataset but it does not have emotion labels. I was wondering if there was anyway to replicate the lips movement from a video to an image without having to label the emotions for every video since they are video taken from youtube and in different languages.
Hi, my pc crashed while training the model. Is there anyway to continue the training from the last save?
Hi, I am trying to retrain everything from scratch.
Could I request further details on the training hyperparameters used for all three stages: Emotion discrimination, pretraining, training? Or perhaps on the final expected loss values for those plotted on tensorboard?
As mentioned in another issue raised using the pretrained model is fine. However, trying to train the model from scratch (as instructed via github) outputs different results. At best I have only been able to simulate opening of the mouth. I noticed that there are some discrepancies between the paper and the source code. (e.g. Discriminator's LR was said to be all 1e-4 but are defaulted to lower values inside train.py, frames per sample is stated to be 32 in the paper but is defaulted to 25 in train.py). I have also cut the training to shorter than the default 1000 epochs since 100k iterations stated in the paper seem to be around 100 epochs only if I understand this correctly.
Apologies for the nitpicking. I'm trying to recreate the retraining and am at a loss. Your response would greatly help me in the reducing the time for experimenting and recreating the training.
Hi,
We have been trying to run the code. The pretrained model which you have provided works perfectly fine, however when we train the model from scratch, the generated output is always same frame in the whole video with audio running in background and no changes in facial or lip features.
What could be the potential reasons for this observation? Was such an observation noticed by you while doing training?
The following are the changes we did to the code:
Is this due to incorrect dataset preparation or absence of other emotions (like Neutral face) or incomplete training (very less epochs) or any other reason?
Thanks in advance
we aligned the ground-truth, baseline, and proposed videos into a template image and cropped them into the same size using similarity transformation.
How do u do it?
The classification-loss while pre-training the emotion-disc should go down to what value?
Hi. I have the following error when I used sample image img01.png and speech file speech01.wav.
python generate.py -im ./data/image_samples/img01.png -is ./data/speech_samples/speech01.wav -m ./model/ -o ./results/
result message : Image file is not valid...
Could you guess what I miss?
Can u tell me how can I get the results of SSIM and PSNR
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.