jixinya / evp Goto Github PK
View Code? Open in Web Editor NEWCode for paper 'Audio-Driven Emotional Video Portraits'.
Code for paper 'Audio-Driven Emotional Video Portraits'.
Hi,
Awesome work!
I would like know more about 3DMM work. I tried to generate them with a 3dmm model. But didn't get accuracy. Which 3DMM model did you used?
Running the script test_target.sh for M003 using its pretrained texture model produces a sequence of frames (shown below) in which the texture for the ears, neck and portions of the hair disappear. The output for the other test subject M030 does not contain such an anomaly. Could you explain what could possibly cause this issue and how to rectify it?
这个文件解压后,不是和trian里面的train文件夹下面的内容是一样的吗
There are some problems when I running python train/disentanglement/dtw/MFCC_dtw.py.
EVP/train/disentanglement/dtw/MFCC_dtw.py
Lines 132 to 140 in 1f725b8
Waiting your reply.
Thanks.
it shows
抱歉,您目前无法查看或下载此文件。
最近查看或下载此文件的用户过多。请稍后再尝试访问此文件。如果您尝试访问的文件特别大或由很多人一起共享,则可能需要长达 24 小时才能查看或下载该文件。如果 24 小时后您仍然无法访问文件,请与您的域管理员联系。
or maybe 503 error
Thank you for the great work.
Could you please share the detail of calculating one specific person's landmark mean and pca?
Take a specific person's video data in MEAD for example. We have multiple viewpoints, emotions and intensities. Could you please tell which videos are used to calculate the landmark and produce mean and pca? Like front view with intensity 3? or front view with intensity 2? or some other combination or even all the videos.
I am trying to pre-process the MEAD dataset and do some face reenactment work, I would appreciate it a lot if you may share the pre-processing details.
Looking forward to the reply!
Given two audio features of shape [N1, 28, 12] and [N2, 28, 12], could you please show me the demo code of align them using DTW methods?
When training the disentanglement module, lots of audio features are used. Did you align all of them to the minimum length?
I am really curious about the details of audio feature preprocessing.
Thank you for your great work and looking forward to your reply!
Could you provide the ground truth landmarks used for validating the models. Specifically I' m looking for the test landmarks of the M030 identity.
Hi and thanks for sharing the great work!
I follow your instructions and successfully run the test scripts to generate output. But I am confused about the meaning of the lm2video output 'M003_01_3_output_01/', 'M003_02_3_output_01'...
Could you please briefly explain what is the driving audio for synthesizing the results and where can we find the corresponding lip-sync video?
Looking forward to your response. Thanks.
How to apply the output (results/target.npy) of step1 to step2?
Hi @jixinya,
Thanks for your excellent work!
I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below)
EVP/train/disentanglement/code/dataload.py
Lines 103 to 107 in 990ea8b
Thanks for releasing the code and pre-trained models. Could you please release the pre-trained models and preprocessed data for all the other MEAD test subjects used in EVP.
Thank you for the great work and the disentanglement of content and emotion features are really novel.
When I re-product this module, I get frustrated about the decoder structure. Could you show me the demo code?
Say we get the concatenation of content and emotion features of shape [Batchsize, N, content_dim+emotion_dim], how to convert it to the mfcc features of shape [Batchsize, N, 28, 12]?
Looking forward to your reply and thank you in advance!
To ensure audio emotion and speech content are disentangled, you design a Cross-Reconstructed Emotion Disentanglement module in paper. In my opinion, emotion encoder and content encoder should be freeze once the disentanglement training if finished. But i found that the two pretrained models of two different subjects you provide has totally different weights in the emotion encoder and content encoder. Thus i guess that you finetune these two encoders together with other parts when you train your audio2lm module, but how can you guarantee the disentanglement once you finetune these two encoders?
Thank you for your excellent work, but how did you get the data in train.zip? Looking forward to your reply
Hi,
I am trying to test with my own dataset but have some trouble with generating 3dmm parameters.
I am wondering if you can explain the procedure of how you get the parameters.
Thanks,
How are the data in mfcc and face key point landmark files related and whether they are time aligned
|--train
|---landmark
|---dataSet-M030
--landmark
---mfcc
in testing , step1 we get the landmarks stored in target.npy file using audio2landmark, but how to use it in step2 landmark2video?
Hello and thank you so much for sharing the code, I was trying to implement you training code but didn't get where a should take the training data that you mentioned "audio2lm/data/M030/audio/".
Enjoyed the paper - thanks! Looking forward to seeing the training code :) -
A few things:
Hi and thank you for the great work and sharing test and train code! I'm wondering, is it possible to generate a video having only 1 image and audio as an input?
Generalization cannot be proved if only test cases can be generated
audio2lm/test.py#L113
example_landmark = example_landmark - mean.expand_as(example_landmark)
Both example_landmark
and mean
are loaded from config['mean']
, which will output 0?
Can the eyes blink in the result video?
Thank you for the great work. When I reproduce the results by the testing step2, I meet some confusion. The images provided in
"test/data/M003/3DMM/3DMM/M003_01_3_output_01/image/" seem that the target person is very angry.
However, the images generated in "/results/M003/test_latest/M003_01_3_output_01" do not look so angry and the mouth shape is not so consistent with the images provided.
I judge that they should have a corresponding relationship according to the consistency of their folder naming, but the result puzzles me. I think I may have some misunderstandings. Can you help me out of my confusion?
I am running python landmark / code / preProcess.py.
Then, this path does not exist. How should it be operated
Traceback (most recent call last):
File "/home/user/PycharmProjects/EVP/EVP/train/landmark/code/preprocess.py", line 145, in
a = np.load(path)
File "/home/user/anaconda3/envs/evp/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/PycharmProjects/EVP/EVP/train/landmark/dataset_M030/landmark/M030_fear_3_026/5.npy'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.