GithubHelp home page GithubHelp logo

yuangan / eat_code Goto Github PK

View Code? Open in Web Editor NEW
255.0 255.0 32.0 23.01 MB

Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".

License: Other

Python 99.89% Shell 0.11%

eat_code's Introduction

eat_code's People

Contributors

eltociear avatar yuangan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eat_code's Issues

AssertionError: Caught AssertionError in DataLoader worker process 0.

CustomDatasetDataLoader
dataset [FaceDataset] was created
1it [00:02, 2.94s/it]==================done=====================
2it [00:06, 3.37s/it]==================done=====================
2it [00:06, 3.31s/it]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
==================done=====================
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/MyDrive/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError

About training code

Thank you for your excellent work. We want to train the network based on our own datasets. Any plan to upload the training code?

关于Vox数据集的预处理

作者您好,您的项目非常棒,我很感兴趣。但是我在使用预处理代码处理Vox数据集的时候,发现处理时长长达700多小时,请问是我代码处理问题,还是所需时间确实很长?您是否有预处理好的数据集提供,若有,不胜感激。

How can I replace the wav?

This work is perfect, especially for the tooth generation. So when i replace the wav? What things i will do?

Error when running preprocess.py No module named resamply

============== extract lmk for crop =================
[INFO] loading facial landmark predictor...
100% 1/1 [00:00<00:00, 2.87it/s]
======= extract speech in deepspeech_features =======
Traceback (most recent call last):
File "/content/EAT_code/preprocess/deepspeech_features/extract_ds_features.py", line 10, in
from deepspeech_features import conv_audios_to_deepspeech
File "/content/EAT_code/preprocess/deepspeech_features/deepspeech_features.py", line 10, in
import resampy
ModuleNotFoundError: No module named 'resampy'

Traceback (most recent call last):
File "/content/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
for i, data in tqdm(enumerate(dataset)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 36, in getitem
assert(0)
AssertionError

============== organize file for demo ===============
cp: cannot stat './deepfeature32/output.npy': No such file or directory

请问如果只有图片+语音,没有 pose 如何推理

您好,看到您这里的推理使用了从 video 提取到的 speaker 的pose 和 对应的 deepspeech audio feature,如果要使用单独的语音,或者说从 TTS 生成的语音来推理一张图片的话,要怎么做呢,感谢您~

MEAD test list

MEAD PartA has 48 identities. In your experiments, could you please provide your test list? Thanks!

能否提供“评价指标”的量化代码

作者您好!非常感谢您能公开“Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation ”的相关代码!
请问您能否提供 “PSNR M/F-LMD Sync Accemo” 这四个指标的量化代码。

RuntimeError: The size of tensor a (165) must match the size of tensor b (66) at non-singleton dimension 1 on my training dataset

deepprompt_eam3d_all_final_313
cuda is available
/usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/1 [00:00<?, ?it/s]
0% 0/20 [00:00<?, ?it/s]
0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/EAT_code/demo.py", line 467, in
test(f'./ckpt/{name}.pth.tar', args.emo, save_dir=f'./demo/output/{name}/')
File "/content/drive/MyDrive/EAT_code/demo.py", line 396, in test
he_driving_emo_xi, input_st_xi = audio2kptransformer(xi, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True) # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 775, in forward
hp = self.rotation_and_translation(x['he_driving'], bbs, bs)
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 763, in rotation_and_translation
yaw = headpose_pred_to_degree(headpose['yaw'].reshape(bbs
bs, -1))
File "/content/drive/MyDrive/EAT_code/modules/transformer.py", line 478, in headpose_pred_to_degree
degree = torch.sum(pred*idx_tensor, axis=1)
RuntimeError: The size of tensor a (165) must match the size of tensor b (66) at non-singleton dimension 1

关于voxs_images和voxs_wavs

作者您好,您的项目非常棒,我很感兴趣,但是为在进行A2KP Training时遇到了一些问题,当我执行python pretrain_a2kp.py --config config/pretrain_a2kp_s1.yaml --device_ids 0,1,2,3 --checkpoint ./ckpt/pretrain_new_274.pth.tar时,终端循环输出如下:
/Vox2-mp4/dev//voxs_images/id00062_osRcP9DYjAQ_00416 59878 /Vox2-mp4/dev//voxs_wavs/id00062_osRcP9DYjAQ_00416.wav
/Vox2-mp4/dev//voxs_images/id00776_f4QpbV2nV14_00184 3282 /Vox2-mp4/dev//voxs_wavs/id00776_f4QpbV2nV14_00184.wav
/Vox2-mp4/dev//voxs_images/id00287_DJpelTdmYdk_00039 83446 /Vox2-mp4/dev//voxs_wavs/id00287_DJpelTdmYdk_00039.wav
我判断是没有找到voxs_images和voxs_wavs文件夹,我下载了vox数据集,里面没有voxs_images和voxs_wavs,请问是否是需要对vox数据集进行预处理,我没有找到数据集的预处理代码。谢谢!

nothing happens when I run demo.py

!python demo.py --root_wav /content/EAT_code/demo/video_processed/output --emo hap

deepprompt_eam3d_all_final_313
cuda is available
/usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/1 [00:00<?, ?it/s]
0it [00:00, ?it/s]
100% 1/1 [00:00<00:00, 3715.06it/s]

Thats it, nothing is saved anywhere. However I am unsure what this refers to?
Note 2: Replace the video_name/video_name.wav and deepspeech feature video_name/deepfeature32/video_name.npy, you can test with a new wav. The output length will depend on the shortest length of the audio and driven poses. Refer to here for more details.

Dimension not matching when training Emotional Adaption

Hi @yuangan ,
I have troubles in training Emotional Adaption with 1 GPU and the runtime errors were found due to mismatching dimension.
Thank you for your great work and your time to help me out.

Environment diff from README:

  • Adjust devices to train on one GPU only device_ids 0.

Errors

1. Mismatch shapes in face_feature_map

Traceback (most recent call last):
  File "prompt_st_dp_eam3d.py", line 129, in <module>
    train(config, generator, discriminator, kp_detector, audio2kptransformer, emotionprompt, sidetuning, opt.checkpoint, log_dir, dataset, opt.device_ids)
  File "/home/phphuc/Desktop/EAT_code/train_transformer.py", line 272, in train_batch_deepprompt_eam3d_sidetuning
    losses_generator, generated = generator_full(x, train_params['train_with_img'])
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/Desktop/EAT_code/modules/model_transformer.py", line 781, in forward
    he_driving_emo, input_st = self.audio2kptransformer(x, kp_canonical, emoprompt=emoprompt, deepprompt=deepprompt, side=True)           # {'yaw': yaw, 'pitch': pitch, 'roll': roll, 't': t, 'exp': exp}
  File "/home/phphuc/anaconda3/envs/eat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phphuc/Desktop/EAT_code/modules/transformer.py", line 807, in forward
    face_feature_map.repeat(bs, seqlen, 1, 1, 1).reshape(bs * seqlen, 32, 64, 64)),
RuntimeError: shape '[55, 32, 64, 64]' is invalid for input of size 28835840

2. Unexpected case

训练时间

您好,作者,非常感谢您的工作!我在训练第二阶段的代码的时候,发现训练的时间变得非常长,长达800多个小时,见下图:
训练截图
我采用的是一张3090显卡。想问一下这个是正常的吗?

How to change a new image as input?

image

This note indicated to put images in ./demo/imgs/, but the default file tree is like this:
image

Q1: What image name should i set?

Q2: How to specify the source image when i run python demo.py?

Q3: How is the generated talking head video driven?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.