GithubHelp home page GithubHelp logo

Comments (8)

yuangan avatar yuangan commented on September 2, 2024 1

Thank you for your attention.

You can download the preprocessed MEAD data from Yandex or Baidu.

As for the Vox2, you can find some details from this issue. In short, we filtered the Vox2 data to 213400 videos and you can find the list from our processed deepfeature32. The training data can also be preprocessed with our preprocessing code. But you'd better reorganize them according to their function, such as:

vox
|----voxs_images
      |----id00530_9EtkaLUCdWM_00026
      |----...
|----voxs_latent
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----voxs_wavs
      |----id00530_9EtkaLUCdWM_00026.wav
      |----...
|----deepfeature32
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----bboxs
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----poseimg
      |----id00530_9EtkaLUCdWM_00026.npy.gz
      |----...

They can be extracted with our preprocess code here. As for the upgrade of the Python environment, there may be some differences in the extracted files. If you find something missing or something wrong, please let us know.

from eat_code.

yuangan avatar yuangan commented on September 2, 2024 1

Hi,

the videos will be processed into images at last. We train EAT with the images in the provided data.

However, the provided MEAD data is preprocessed by ffmpeg without -crf 10. Hence, the quality may be lower than the data preprocessed with the current preprocess code. If you want higher-quality training data, you can preprocess MEAD from the original MEAD video.

from eat_code.

yuangan avatar yuangan commented on September 2, 2024 1

Thank you for your attention.

This is a good question. In my experience, the driven results will be better if the source image and driven videos have similar face shapes and poses. You can use the relative-driven poses by modifying the pose of the source image. Here is a function for reference.

I hope this will make your results better. If not, trying more driven poses may also be a solution.

from eat_code.

Calmepro777 avatar Calmepro777 commented on September 2, 2024

Thanks for the clarification.

I am folllowing your guidance to process the vox2 dataset.

Regarding the preprocessed MEAD dataset I downloaded via the link you provided, however, it appears to only contain images sampled from videos. I wonder if this is good enough for training.

from eat_code.

Calmepro777 avatar Calmepro777 commented on September 2, 2024

In addition, I noticed that even if the person in the video that serve as headpose source has minimal head movement, the person in the generated video is like being zoomed in, zoomed out and shaking.

I would appreciate any guideline that could help to improve this.

Thanks in advance

fl.mp4
KatiG_MTrump.mp4

from eat_code.

Calmepro777 avatar Calmepro777 commented on September 2, 2024

Thank you for your attention.

You can download the preprocessed MEAD data from Yandex or Baidu.

As for the Vox2, you can find some details from this issue. In short, we filtered the Vox2 data to 213400 videos and you can find the list from our processed deepfeature32. The training data can also be preprocessed with our preprocessing code. But you'd better reorganize them according to their function, such as:

vox
|----voxs_images
      |----id00530_9EtkaLUCdWM_00026
      |----...
|----voxs_latent
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----voxs_wavs
      |----id00530_9EtkaLUCdWM_00026.wav
      |----...
|----deepfeature32
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----bboxs
      |----id00530_9EtkaLUCdWM_00026.npy
      |----...
|----poseimg
      |----id00530_9EtkaLUCdWM_00026.npy.gz
      |----...

They can be extracted with our preprocess code here. As for the upgrade of the Python environment, there may be some differences in the extracted files. If you find something missing or something wrong, please let us know.

Thank you so much for your detailed and clear explanation.

I decide to do Emotional Adaptation Training with the processed MEAD dataset you processed, and I have some questions.

  1. Is it true that the Emotional Adaptation Training does not requrie vox2 dataset
  2. I noticed that the deepfeature released with the processed MEAD dataset is from vox dataset, and hence I experienced following error:
Original Traceback (most recent call last):
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 2519, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 1005, in __getitem__
    return self.getitem_neu(idx)
  File "/home/qw/proj/BH/EAT/frames_dataset_transformer25.py", line 1137, in getitem_neu
    deeps = np.load(deep_path)
  File "/home/qw/anaconda3/envs/eat/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/data/mead//deepfeature32/W011_con_3_014.npy'

Any comments/guidelines would be appreciated.

from eat_code.

yuangan avatar yuangan commented on September 2, 2024
  1. Yes, we do not use Vox2 data in fine-tuning the emotional adaptation stage.
  2. The deepfeature32 contains audio features extracted by the DeepSpeech code. Every dataset should have its deepfeature32 folder. Have you checked the folders in mead.tar.gz?

from eat_code.

Calmepro777 avatar Calmepro777 commented on September 2, 2024
  1. Yes, we do not use Vox2 data in fine-tuning the emotional adaptation stage.
  2. The deepfeature32 contains audio features extracted by the DeepSpeech code. Every dataset should have its deepfeature32 folder. Have you checked the folders in mead.tar.gz?

Thanks for your reply.

I think I figured out the problem.

The processed MEAD dataset I previously downloaded from Yandex was, for some reason, corrupted and only contains the images sampled from videos.

I downloaded the processed MEAD dataset from Baidu Cloud again, which contains all the files required for emotional adaptation fine-tuning.

Again, thanks for the wonderful work.

from eat_code.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.