jixinya / evp Goto Github PK

View Code? Open in Web Editor NEW

285.0 285.0 47.0 65.55 MB

Code for paper 'Audio-Driven Emotional Video Portraits'.

Python 35.52% Shell 0.78% Jupyter Notebook 63.70%

evp's People

Contributors

Stargazers

Watchers

evp's Issues

3DMM model

Hi,
Awesome work!

I would like know more about 3DMM work. I tried to generate them with a 3dmm model. But didn't get accuracy. Which 3DMM model did you used?

Running the script test_target.sh for M003 using its pretrained texture model produces a sequence of frames (shown below) in which the texture for the ears, neck and portions of the hair disappear. The output for the other test subject M030 does not contain such an anomaly. Could you explain what could possibly cause this issue and how to rectify it?

train.zip这个文件作用是啥

这个文件解压后，不是和trian里面的train文件夹下面的内容是一样的吗

我们创建了一个中文讨论组，有需要的加我微信douzijun1999

1705126444.mp4

question about training

There are some problems when I running python train/disentanglement/dtw/MFCC_dtw.py.

Can you tell the function in 132-140?

EVP/train/disentanglement/dtw/MFCC_dtw.py

Lines 132 to 140 in 1f725b8

 for j in range(13): 

 f=open(filepath+str(j)+'/'+str(i)+'.pkl','rb') 

 mfcc=pickle.load(f) 

 f.close() 

 for k in range(len(mfcc)): 

 mfcc_k = mfcc[k] 

 with open(os.path.join(con_path,str(k+n)+'.pkl'), 'wb') as f: 

 pickle.dump(mfcc_k, f) 

 n += len(mfcc)

the dimension of mfcc_k is (13, ) is right? The dimension of mfcc_k can not feed into network.(it will cause dismatch dimension)

Waiting your reply.
Thanks.

can not download pre-trained model

it shows
抱歉，您目前无法查看或下载此文件。

最近查看或下载此文件的用户过多。请稍后再尝试访问此文件。如果您尝试访问的文件特别大或由很多人一起共享，则可能需要长达 24 小时才能查看或下载该文件。如果 24 小时后您仍然无法访问文件，请与您的域管理员联系。
or maybe 503 error

About the landmark MEAN and PCA

Thank you for the great work.
Could you please share the detail of calculating one specific person's landmark mean and pca?
Take a specific person's video data in MEAD for example. We have multiple viewpoints, emotions and intensities. Could you please tell which videos are used to calculate the landmark and produce mean and pca? Like front view with intensity 3? or front view with intensity 2? or some other combination or even all the videos.

I am trying to pre-process the MEAD dataset and do some face reenactment work, I would appreciate it a lot if you may share the pre-processing details.
Looking forward to the reply!

About Dynamic Timing Warping preprocess mfcc features

Given two audio features of shape [N1, 28, 12] and [N2, 28, 12], could you please show me the demo code of align them using DTW methods?
When training the disentanglement module, lots of audio features are used. Did you align all of them to the minimum length?
I am really curious about the details of audio feature preprocessing.
Thank you for your great work and looking forward to your reply!

Ground Truth Validation Landmarks

Could you provide the ground truth landmarks used for validating the models. Specifically I' m looking for the test landmarks of the M030 identity.

The meaning of the output

Hi and thanks for sharing the great work!

I follow your instructions and successfully run the test scripts to generate output. But I am confused about the meaning of the lm2video output 'M003_01_3_output_01/', 'M003_02_3_output_01'...

Could you please briefly explain what is the driving audio for synthesizing the results and where can we find the corresponding lip-sync video?

Looking forward to your response. Thanks.

Are there any missing steps between step1 and step2 of testing?

How to apply the output (results/target.npy) of step1 to step2?

The implementation of "Cross-Reconstructed Emotion Disentanglement" module

Hi @jixinya,
Thanks for your excellent work!
I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below)

EVP/train/disentanglement/code/dataload.py

Lines 103 to 107 in 990ea8b

 return {"input11": mfcc11, "target11": target11, 

 "target21": target11, "target22": target12, 

 "input12": mfcc12, "target12": target12, 

 "label1": label1, "label2": label2, 

 "input21": mfcc21, "input32": mfcc32

Could you please explain this? Hope for your response.

Pre-trained models of Test subjects

Thanks for releasing the code and pre-trained models. Could you please release the pre-trained models and preprocessed data for all the other MEAD test subjects used in EVP.

About the DECODER used in the Cross-reconstruction Emotion Disentanglement Module

Thank you for the great work and the disentanglement of content and emotion features are really novel.
When I re-product this module, I get frustrated about the decoder structure. Could you show me the demo code?
Say we get the concatenation of content and emotion features of shape [Batchsize, N, content_dim+emotion_dim], how to convert it to the mfcc features of shape [Batchsize, N, 28, 12]?
Looking forward to your reply and thank you in advance!

The effectiveness on "Cross-Reconstructed Emotion Disentanglement" module

To ensure audio emotion and speech content are disentangled, you design a Cross-Reconstructed Emotion Disentanglement module in paper. In my opinion, emotion encoder and content encoder should be freeze once the disentanglement training if finished. But i found that the two pretrained models of two different subjects you provide has totally different weights in the emotion encoder and content encoder. Thus i guess that you finetune these two encoders together with other parts when you train your audio2lm module, but how can you guarantee the disentanglement once you finetune these two encoders?

how did you get the data in train.zip？

Thank you for your excellent work, but how did you get the data in train.zip? Looking forward to your reply

3DMM parameters

Hi,

I am trying to test with my own dataset but have some trouble with generating 3dmm parameters.
I am wondering if you can explain the procedure of how you get the parameters.

Thanks,

生成的视频闪烁，请问为什么？

生成的视频闪烁，见下图：

请问，可能是什么原因啊？

How are the data in voice and face key point files related

How are the data in mfcc and face key point landmark files related and whether they are time aligned
|--train
|---landmark
|---dataSet-M030
--landmark
---mfcc

How to use the predicted landmarks of audio2landmark in landmark2video?

in testing , step1 we get the landmarks stored in target.npy file using audio2landmark, but how to use it in step2 landmark2video?

Training data of M030 into train.zip

Hello and thank you so much for sharing the code, I was trying to implement you training code but didn't get where a should take the training data that you mentioned "audio2lm/data/M030/audio/".

No such file or directory: base_weight.pickle

Enjoyed the paper - thanks! Looking forward to seeing the training code :) -
A few things:

in test.py line 208 add_audio(config['video_dir'], opt.in_file) throws an error - AttributeError: 'Namespace' object has no attribute 'in_file' (the coide runs if this line is commented out).
Line 3 in M003_test.yaml video_dir: audio2lm/result/M003.mp4 throws an error - OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' that can be fixed by changing it to video_dir: audio2lm/result/M003.avi
In section 2 - running python lm2video/lm2map.py results in FileNotFoundError: [Errno 2] No such file or directory: './base_weight.pickle' - not sure how this file is generated or obtained.
Any help much appreciated - 谢谢你

Generating video from one image and audio

Hi and thank you for the great work and sharing test and train code! I'm wondering, is it possible to generate a video having only 1 image and audio as an input?

how to generate a demo with self-prepared data？

Generalization cannot be proved if only test cases can be generated

Confusion about audio2lm/test.py

audio2lm/test.py#L113
example_landmark = example_landmark - mean.expand_as(example_landmark)

Both example_landmark and mean are loaded from config['mean'], which will output 0?

audio read problem

eyes blink

Can the eyes blink in the result video?

Confusion about the testing result

Thank you for the great work. When I reproduce the results by the testing step2, I meet some confusion. The images provided in
"test/data/M003/3DMM/3DMM/M003_01_3_output_01/image/" seem that the target person is very angry.
However, the images generated in "/results/M003/test_latest/M003_01_3_output_01" do not look so angry and the mouth shape is not so consistent with the images provided.

I judge that they should have a corresponding relationship according to the consistency of their folder naming, but the result puzzles me. I think I may have some misunderstandings. Can you help me out of my confusion?

Generate the data for training error

I am running python landmark / code / preProcess.py.
Then, this path does not exist. How should it be operated

Traceback (most recent call last):
File "/home/user/PycharmProjects/EVP/EVP/train/landmark/code/preprocess.py", line 145, in
a = np.load(path)
File "/home/user/anaconda3/envs/evp/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/PycharmProjects/EVP/EVP/train/landmark/dataset_M030/landmark/M030_fear_3_026/5.npy'

	for j in range(13):
	f=open(filepath+str(j)+'/'+str(i)+'.pkl','rb')
	mfcc=pickle.load(f)
	f.close()
	for k in range(len(mfcc)):
	mfcc_k = mfcc[k]
	with open(os.path.join(con_path,str(k+n)+'.pkl'), 'wb') as f:
	pickle.dump(mfcc_k, f)
	n += len(mfcc)

	return {"input11": mfcc11, "target11": target11,
	"target21": target11, "target22": target12,
	"input12": mfcc12, "target12": target12,
	"label1": label1, "label2": label2,
	"input21": mfcc21, "input32": mfcc32

jixinya / evp Goto Github PK

evp's People

Contributors

Stargazers

Watchers

Forkers

evp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs