Thanks for the authors wonder work. As I am attempting to train the

Inquiry Regarding Preprocessing VOX2 and MEAD Dataset for Training about eat_code HOT 8 OPEN

Calmepro777 commented on September 2, 2024

Inquiry Regarding Preprocessing VOX2 and MEAD Dataset for Training

from eat_code.

Comments (8)

yuangan commented on September 2, 2024 1

Thank you for your attention.

You can download the preprocessed MEAD data from Yandex or Baidu.

As for the Vox2, you can find some details from this issue. In short, we filtered the Vox2 data to 213400 videos and you can find the list from our processed deepfeature32. The training data can also be preprocessed with our preprocessing code. But you'd better reorganize them according to their function, such as:

vox
|----voxs_images
      |----id00530_9EtkaLUCdWM_00026
      |----...
|----voxs_latent
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----voxs_wavs
      |----id00530_9EtkaLUCdWM_00026.wav
      |----...
|----deepfeature32
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----bboxs
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----poseimg
      |----id00530_9EtkaLUCdWM_00026.npy.gz
      |----...

They can be extracted with our preprocess code here. As for the upgrade of the Python environment, there may be some differences in the extracted files. If you find something missing or something wrong, please let us know.

from eat_code.

yuangan commented on September 2, 2024 1

Hi,

the videos will be processed into images at last. We train EAT with the images in the provided data.

However, the provided MEAD data is preprocessed by ffmpeg without -crf 10. Hence, the quality may be lower than the data preprocessed with the current preprocess code. If you want higher-quality training data, you can preprocess MEAD from the original MEAD video.

from eat_code.

yuangan commented on September 2, 2024 1

Thank you for your attention.

This is a good question. In my experience, the driven results will be better if the source image and driven videos have similar face shapes and poses. You can use the relative-driven poses by modifying the pose of the source image. Here is a function for reference.

I hope this will make your results better. If not, trying more driven poses may also be a solution.

from eat_code.

Calmepro777 commented on September 2, 2024

Thanks for the clarification.

I am folllowing your guidance to process the vox2 dataset.

Regarding the preprocessed MEAD dataset I downloaded via the link you provided, however, it appears to only contain images sampled from videos. I wonder if this is good enough for training.

from eat_code.

Calmepro777 commented on September 2, 2024

In addition, I noticed that even if the person in the video that serve as headpose source has minimal head movement, the person in the generated video is like being zoomed in, zoomed out and shaking.

I would appreciate any guideline that could help to improve this.

Thanks in advance

fl.mp4

KatiG_MTrump.mp4

from eat_code.

Calmepro777 commented on September 2, 2024

Thank you for your attention.

You can download the preprocessed MEAD data from Yandex or Baidu.

As for the Vox2, you can find some details from this issue. In short, we filtered the Vox2 data to 213400 videos and you can find the list from our processed deepfeature32. The training data can also be preprocessed with our preprocessing code. But you'd better reorganize them according to their function, such as:
vox
|----voxs_images
      |----id00530_9EtkaLUCdWM_00026
      |----...
|----voxs_latent
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----voxs_wavs
      |----id00530_9EtkaLUCdWM_00026.wav
      |----...
|----deepfeature32
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----bboxs
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----poseimg
      |----id00530_9EtkaLUCdWM_00026.npy.gz
      |----...
They can be extracted with our preprocess code here. As for the upgrade of the Python environment, there may be some differences in the extracted files. If you find something missing or something wrong, please let us know.

Thank you so much for your detailed and clear explanation.

I decide to do Emotional Adaptation Training with the processed MEAD dataset you processed, and I have some questions.

Is it true that the Emotional Adaptation Training does not requrie vox2 dataset
I noticed that the deepfeature released with the processed MEAD dataset is from vox dataset, and hence I experienced following error:

Original Traceback (most recent call last):
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 2519, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 1005, in __getitem__
    return self.getitem_neu(idx)
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 1137, in getitem_neu
    deeps = np.load(deep_path)
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/data/mead//deepfeature32/W011_con_3_014.npy'

Any comments/guidelines would be appreciated.

from eat_code.

yuangan commented on September 2, 2024

Yes, we do not use Vox2 data in fine-tuning the emotional adaptation stage.
The deepfeature32 contains audio features extracted by the DeepSpeech code. Every dataset should have its deepfeature32 folder. Have you checked the folders in mead.tar.gz?

from eat_code.

Calmepro777 commented on September 2, 2024

Yes, we do not use Vox2 data in fine-tuning the emotional adaptation stage.

The deepfeature32 contains audio features extracted by the DeepSpeech code. Every dataset should have its deepfeature32 folder. Have you checked the folders in mead.tar.gz?

Thanks for your reply.

I think I figured out the problem.

The processed MEAD dataset I previously downloaded from Yandex was, for some reason, corrupted and only contains the images sampled from videos.

I downloaded the processed MEAD dataset from Baidu Cloud again, which contains all the files required for emotional adaptation fine-tuning.

Again, thanks for the wonderful work.

from eat_code.

Inquiry Regarding Preprocessing VOX2 and MEAD Dataset for Training about eat_code HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs