This repo contains the official implementation for the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.
Dataset file structure:
/path/to/database
├── spkr_1
│ ├── sample.wav
├── spkr_2
│ ├── sample.wav
│ ...
└── spkr_N
├── sample.wav
...
# The directory under each speaker cannot be nested.
example
python preprocess.py --model_name [name of the model] --dataset [path/to/dataset]
python train.py --model_name [name of the model] --dataset [path/to/dataset]
Examples of generated audios using flicker8k audio dataset https://ebadawy.github.io/post/speech_style_transfer.
- Rewrite
preprocess.py
to handle:- multi-process feature extraction
- create train/test/val split
- display error messages for faild cases
- Create:
inference.py
requirements.txt
- Notebook for data visualisation
- Upload pre-trained models
- Want to add something else? Please feel free to submit a PR with your changes or open an issue for that.
If you find this code useful please cite us in your work:
@inproceedings{AlBadawy2020,
author={Ehab A. AlBadawy and Siwei Lyu},
title={{Voice Conversion Using Speech-to-Speech Neuro-Style Transfer}},
year=2020,
booktitle={Proc. Interspeech 2020},
pages={4726--4730},
doi={10.21437/Interspeech.2020-3056},
url={http://dx.doi.org/10.21437/Interspeech.2020-3056}
}