yuangan / eat_code Goto Github PK

View Code? Open in Web Editor NEW

255.0 255.0 32.0 23.01 MB

Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".

License: Other

Python 99.89% Shell 0.11%

eat_code's Introduction

Hi there 👋

Contact Me:

✉ Email: [email protected]

✧ Google Scholar: https://scholar.google.com/citations?user=e6tKXQEAAAAJ&hl=en

eat_code's People

Contributors

Stargazers

Watchers

eat_code's Issues

The training code

When will you release the training code? Thanks very much!

AssertionError: Caught AssertionError in DataLoader worker process 0.

CustomDatasetDataLoader
dataset [FaceDataset] was created
1it [00:02, 2.94s/it]==================done=====================
2it [00:06, 3.37s/it]==================done=====================
2it [00:06, 3.31s/it]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
==================done=====================
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError

Custom Video - Preprocess code

Thanks for the great work - could you pelase clarify how to process custom videos?

About training code

Thank you for your excellent work. We want to train the network based on our own datasets. Any plan to upload the training code?

关于Vox数据集的预处理

作者您好，您的项目非常棒，我很感兴趣。但是我在使用预处理代码处理Vox数据集的时候，发现处理时长长达700多小时，请问是我代码处理问题，还是所需时间确实很长？您是否有预处理好的数据集提供，若有，不胜感激。

How can I replace the wav?

This work is perfect, especially for the tooth generation. So when i replace the wav? What things i will do?

Error when running preprocess.py No module named resamply

============== extract lmk for crop =================
[INFO] loading facial landmark predictor...
100% 1/1 [00:00<00:00, 2.87it/s]
======= extract speech in deepspeech_features =======
Traceback (most recent call last):
File "/content/EAT_code/preprocess/deepspeech_features/extract_ds_features.py", line 10, in
from deepspeech_features import conv_audios_to_deepspeech
File "/content/EAT_code/preprocess/deepspeech_features/deepspeech_features.py", line 10, in
import resampy
ModuleNotFoundError: No module named 'resampy'

Traceback (most recent call last):
File "/content/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError

============== organize file for demo ===============
cp: cannot stat './deepfeature32/output.npy': No such file or directory

请问如果只有图片+语音，没有 pose 如何推理

您好，看到您这里的推理使用了从 video 提取到的 speaker 的pose 和对应的 deepspeech audio feature，如果要使用单独的语音，或者说从 TTS 生成的语音来推理一张图片的话，要怎么做呢，感谢您~

MEAD test list

MEAD PartA has 48 identities. In your experiments, could you please provide your test list? Thanks!

能否提供“评价指标”的量化代码

作者您好！非常感谢您能公开“Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation ”的相关代码！
请问您能否提供 “PSNR M/F-LMD Sync Accemo” 这四个指标的量化代码。

RuntimeError: The size of tensor a (165) must match the size of tensor b (66) at non-singleton dimension 1 on my training dataset

deepprompt_eam3d_all_final_313
cuda is available
/usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/1 [00:00<?, ?it/s]
0% 0/20 [00:00<?, ?it/s]
0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/demo.py", line 467, in
test(f'./ckpt/{name}.pth.tar', args.emo, save_dir=f'./demo/output/{name}/')
File "/content/drive/MyDrive/EAT_code/demo.py", line 396, in test
he_driving_emo_xi, input_st_xi = audio2kptransformer(xi, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True) # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 775, in forward
hp = self.rotation_and_translation(x['he_driving'], bbs, bs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 763, in rotation_and_translation
yaw = headpose_pred_to_degree(headpose['yaw'].reshape(bbsbs, -1))
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 478, in headpose_pred_to_degree
degree = torch.sum(pred*idx_tensor, axis=1)
RuntimeError: The size of tensor a (165) must match the size of tensor b (66) at non-singleton dimension 1

关于voxs_images和voxs_wavs

作者您好，您的项目非常棒，我很感兴趣，但是为在进行A2KP Training时遇到了一些问题，当我执行python pretrain_a2kp.py --config config/pretrain_a2kp_s1.yaml --device_ids 0,1,2,3 --checkpoint ./ckpt/pretrain_new_274.pth.tar时，终端循环输出如下：
/Vox2-mp4/dev//voxs_images/id00062_osRcP9DYjAQ_00416 59878 /Vox2-mp4/dev//voxs_wavs/id00062_osRcP9DYjAQ_00416.wav
/Vox2-mp4/dev//voxs_images/id00776_f4QpbV2nV14_00184 3282 /Vox2-mp4/dev//voxs_wavs/id00776_f4QpbV2nV14_00184.wav
/Vox2-mp4/dev//voxs_images/id00287_DJpelTdmYdk_00039 83446 /Vox2-mp4/dev//voxs_wavs/id00287_DJpelTdmYdk_00039.wav
我判断是没有找到voxs_images和voxs_wavs文件夹，我下载了vox数据集，里面没有voxs_images和voxs_wavs，请问是否是需要对vox数据集进行预处理，我没有找到数据集的预处理代码。谢谢！

nothing happens when I run demo.py

!python demo.py --root_wav /content/EAT_code/demo/video_processed/output --emo hap

Thats it, nothing is saved anywhere. However I am unsure what this refers to?
Note 2: Replace the video_name/video_name.wav and deepspeech feature video_name/deepfeature32/video_name.npy, you can test with a new wav. The output length will depend on the shortest length of the audio and driven poses. Refer to here for more details.

Dimension not matching when training Emotional Adaption

Hi @yuangan ,
I have troubles in training Emotional Adaption with 1 GPU and the runtime errors were found due to mismatching dimension.
Thank you for your great work and your time to help me out.

Environment diff from README:

Adjust devices to train on one GPU only device_ids 0.

Errors

1. Mismatch shapes in `face_feature_map`

Shape of face_feature_map is not as expected transformer.py#L807.

Traceback (most recent call last):
  File "prompt_st_dp_eam3d.py", line 129, in <module>
    train(config, generator, discriminator, kp_detector, audio2kptransformer, emotionprompt, sidetuning, opt.checkpoint, log_dir, dataset, opt.device_ids)
  File "/home/phphuc/Desktop/EAT_code/train_transformer.py", line 272, in train_batch_deepprompt_eam3d_sidetuning
    losses_generator, generated = generator_full(x, train_params['train_with_img'])
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/Desktop/EAT_code/modules/model_transformer.py", line 781, in forward
    he_driving_emo, input_st = self.audio2kptransformer(x, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True)           # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/Desktop/EAT_code/modules/transformer.py", line 807, in forward
    face_feature_map.repeat(bs, seqlen, 1, 1, 1).reshape(bs * seqlen, 32, 64, 64)),
RuntimeError: shape '[55, 32, 64, 64]' is invalid for input of size 28835840

2. Unexpected case

However, when I change batch_size=1 in deepprompt_eam3d_st_tanh_304_3090_all.yaml#L70, it can train normally.

Q2: How to specify the source image when i run python demo.py?

Q3: How is the generated talking head video driven?

yuangan / eat_code Goto Github PK

eat_code's Introduction

Hi there 👋

eat_code's People

Contributors

Stargazers

Watchers

Forkers

eat_code's Issues

Environment diff from README:

Errors

1. Mismatch shapes in face_feature_map

2. Unexpected case

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. Mismatch shapes in `face_feature_map`