Contact Me:
✉ Email: [email protected]
✧ Google Scholar: https://scholar.google.com/citations?user=e6tKXQEAAAAJ&hl=en
Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
License: Other
Contact Me:
✉ Email: [email protected]
✧ Google Scholar: https://scholar.google.com/citations?user=e6tKXQEAAAAJ&hl=en
When will you release the training code? Thanks very much!
CustomDatasetDataLoader
dataset [FaceDataset] was created
1it [00:02, 2.94s/it]==================done=====================
2it [00:06, 3.37s/it]==================done=====================
2it [00:06, 3.31s/it]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
==================done=====================
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError
Thanks for the great work - could you pelase clarify how to process custom videos?
Thank you for your excellent work. We want to train the network based on our own datasets. Any plan to upload the training code?
作者您好,您的项目非常棒,我很感兴趣。但是我在使用预处理代码处理Vox数据集的时候,发现处理时长长达700多小时,请问是我代码处理问题,还是所需时间确实很长?您是否有预处理好的数据集提供,若有,不胜感激。
This work is perfect, especially for the tooth generation. So when i replace the wav? What things i will do?
============== extract lmk for crop =================
[INFO] loading facial landmark predictor...
100% 1/1 [00:00<00:00, 2.87it/s]
======= extract speech in deepspeech_features =======
Traceback (most recent call last):
File "/content/EAT_code/preprocess/deepspeech_features/extract_ds_features.py", line 10, in
from deepspeech_features import conv_audios_to_deepspeech
File "/content/EAT_code/preprocess/deepspeech_features/deepspeech_features.py", line 10, in
import resampy
ModuleNotFoundError: No module named 'resampy'
Traceback (most recent call last):
File "/content/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError
============== organize file for demo ===============
cp: cannot stat './deepfeature32/output.npy': No such file or directory
您好,看到您这里的推理使用了从 video 提取到的 speaker 的pose 和 对应的 deepspeech audio feature,如果要使用单独的语音,或者说从 TTS 生成的语音来推理一张图片的话,要怎么做呢,感谢您~
MEAD PartA has 48 identities. In your experiments, could you please provide your test list? Thanks!
作者您好!非常感谢您能公开“Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation ”的相关代码!
请问您能否提供 “PSNR M/F-LMD Sync Accemo” 这四个指标的量化代码。
deepprompt_eam3d_all_final_313
cuda is available
/usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/1 [00:00<?, ?it/s]
0% 0/20 [00:00<?, ?it/s]
0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/demo.py", line 467, in
test(f'./ckpt/{name}.pth.tar', args.emo, save_dir=f'./demo/output/{name}/')
File "/content/drive/MyDrive/EAT_code/demo.py", line 396, in test
he_driving_emo_xi, input_st_xi = audio2kptransformer(xi, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True) # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 775, in forward
hp = self.rotation_and_translation(x['he_driving'], bbs, bs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 763, in rotation_and_translation
yaw = headpose_pred_to_degree(headpose['yaw'].reshape(bbsbs, -1))
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 478, in headpose_pred_to_degree
degree = torch.sum(pred*idx_tensor, axis=1)
RuntimeError: The size of tensor a (165) must match the size of tensor b (66) at non-singleton dimension 1
作者您好,您的项目非常棒,我很感兴趣,但是为在进行A2KP Training时遇到了一些问题,当我执行python pretrain_a2kp.py --config config/pretrain_a2kp_s1.yaml --device_ids 0,1,2,3 --checkpoint ./ckpt/pretrain_new_274.pth.tar时,终端循环输出如下:
/Vox2-mp4/dev//voxs_images/id00062_osRcP9DYjAQ_00416 59878 /Vox2-mp4/dev//voxs_wavs/id00062_osRcP9DYjAQ_00416.wav
/Vox2-mp4/dev//voxs_images/id00776_f4QpbV2nV14_00184 3282 /Vox2-mp4/dev//voxs_wavs/id00776_f4QpbV2nV14_00184.wav
/Vox2-mp4/dev//voxs_images/id00287_DJpelTdmYdk_00039 83446 /Vox2-mp4/dev//voxs_wavs/id00287_DJpelTdmYdk_00039.wav
我判断是没有找到voxs_images和voxs_wavs文件夹,我下载了vox数据集,里面没有voxs_images和voxs_wavs,请问是否是需要对vox数据集进行预处理,我没有找到数据集的预处理代码。谢谢!
!python demo.py --root_wav /content/EAT_code/demo/video_processed/output --emo hap
deepprompt_eam3d_all_final_313
cuda is available
/usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/1 [00:00<?, ?it/s]
0it [00:00, ?it/s]
100% 1/1 [00:00<00:00, 3715.06it/s]
Thats it, nothing is saved anywhere. However I am unsure what this refers to?
Note 2: Replace the video_name/video_name.wav and deepspeech feature video_name/deepfeature32/video_name.npy, you can test with a new wav. The output length will depend on the shortest length of the audio and driven poses. Refer to here for more details.
Hi @yuangan ,
I have troubles in training Emotional Adaption with 1 GPU and the runtime errors were found due to mismatching dimension.
Thank you for your great work and your time to help me out.
device_ids 0
.face_feature_map
face_feature_map
is not as expected transformer.py#L807.Traceback (most recent call last):
File "prompt_st_dp_eam3d.py", line 129, in <module>
train(config, generator, discriminator, kp_detector, audio2kptransformer, emotionprompt, sidetuning, opt.checkpoint, log_dir, dataset, opt.device_ids)
File "/home/phphuc/Desktop/EAT_code/train_transformer.py", line 272, in train_batch_deepprompt_eam3d_sidetuning
losses_generator, generated = generator_full(x, train_params['train_with_img'])
File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phphuc/Desktop/EAT_code/modules/model_transformer.py", line 781, in forward
he_driving_emo, input_st = self.audio2kptransformer(x, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True) # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/phphuc/Desktop/EAT_code/modules/transformer.py", line 807, in forward
face_feature_map.repeat(bs, seqlen, 1, 1, 1).reshape(bs * seqlen, 32, 64, 64)),
RuntimeError: shape '[55, 32, 64, 64]' is invalid for input of size 28835840
batch_size=1
in deepprompt_eam3d_st_tanh_304_3090_all.yaml#L70, it can train normally.作者您好,感谢您优秀的工作,看到您昨天进行了项目更新,请问目前训练和推理相关的所有代码都开源了吗,谢谢~
I want to animate an image with an audio, how should I do? Appreciate your great work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.