time seqenence len diff

Improving Few-shot Learning for Talking Face System with TTS Data Augmentation

Statements

This repository is only used for academic research, any commercial use is prohibited.
The copyright of digital human presented in our demo is reserved by SMT.

Acknowledgements

Thanks to Shanghai Media Tech(SMT) for providing the data set and rendering service.
We use pre-trained HuBERT model from this repository.
We use implementation of soft-DTW loss from this repository.
We use implementation of Transformer from this repository

Thanks to the authors of above repositories.

Demos

TTS Data Augmentation

contrast_TTSaug_record.mp4

contrast_TTSaug_Obama.mp4

contrast_TTSaug_News_f.mp4

contrast_TTSaug_News_m.mp4

TTS-driven Talking Face

contrast_T2A_record.mp4

contrast_T2A_Obama.mp4

contrast_T2A_News_f.mp4

contrast_T2A_News_m.mp4

Different Audio Features

contrast_features.mp4

Different Loss Functions

contrast_loss.mp4

Different Data Resources

contrast_resource.mp4

Pre-trained model and tools preparation

Download pre-trained HuBERT model

The pre-trained HuBERT model is obtained from this repository.

Please download Chinese HuBERT model and put it on directory ./data/pretrained_models/ by executing the following command:

wget -P ./data/pretrained_models/ https://huggingface.co/TencentGameMate/chinese-hubert-large/resolve/main/chinese-hubert-large-fairseq-ckpt.pt

Download fairseq tool

git clone [email protected]:facebookresearch/fairseq.git
cd fairseq
git checkout acd9a53
pip install --editable ./
cd ..
cp hubert.py ./fairseq/fairseq/models/hubert/

Feature extraction

Extract HuBERT feature

python utils/generate_hubert.py --input_dir ./data/wavs/[speaker name] --output_dir ./data/wav_features/[speaker name]

Extract MFCC feature

python utils/generate_mfcc.py --input_dir ./data/wavs/[speaker name] --output_dir ./data/wav_features/[speaker name]

Train

run bash train.sh to train

important arguments for `main.py`

arch: chinese_hubert_large | mfcc | pgg
feature_combine: True if you want to use weighted sum of hubert feature
output_path: "result" if you want to generate output of test set | [other name] if you want to generate other data
test_input_path: you should explicitly assign path of test_input_path if output_path != "result", test_input_path is the dir of csv files
test_epoch: do not need to explicitly assign, will find the model with best
root_dir: dir of dataset root
feature_dir: hubert_large | mfcc | ppg
train_speaker_list: assign several speaker names for training
train_json: used to change data resource, path of json file which includes list of audio name in training set
freq: 50 if feature is chinese_hubert_large or ppg , 100 if feature is mfcc
input_dim: 39 for mfcc, 128 for ppg

Validate

run bash validate.sh to pick the best model by validating on validation set of certain speaker, change --val_speaker to decide speaker for validation.

Test

run bash test.sh to test

Citation

@article{chen2023improving,
  title={Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation},
  author={Chen, Qi and Ma, Ziyang and Liu, Tao and Tan, Xu and Lu, Qu and Yu, Kai and Chen, Xie},
  booktitle={ICASSP 2022-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023}
}

moon0316 / t2a Goto Github PK

t2a's Introduction

Improving Few-shot Learning for Talking Face System with TTS Data Augmentation

Statements

Acknowledgements

Demos

TTS Data Augmentation

TTS-driven Talking Face

Different Audio Features

Different Loss Functions

Different Data Resources

Pre-trained model and tools preparation

Download pre-trained HuBERT model

Download fairseq tool

Feature extraction

Extract HuBERT feature

Extract MFCC feature

Train

important arguments for main.py

Validate

Test

Citation

t2a's People

Contributors

Stargazers

Watchers

Forkers

t2a's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs

important arguments for `main.py`