GithubHelp home page GithubHelp logo

axxxjt / fluenteditor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ai-s2-lab/fluenteditor

0.0 0.0 0.0 12.95 MB

[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency

CSS 0.45% HTML 3.55% JavaScript 0.11% Python 95.75% Shell 0.15%

fluenteditor's Introduction

FluentEditor: text-based speech editing by considering acoustic and prosody consistency

This repo contains official PyTorch implementations of:



This repo contains unofficial PyTorch implementations of:

Supported Datasets

Our framework supports the following datasets:

  • VCTK

Downloading VCTK

You can download the VCTK dataset from the official website. Follow these steps:

  1. Visit the VCTK dataset download page.
  2. Download the dataset (VCTK-Corpus.tar.gz).

Extract the downloaded file to your desired directory. For example:

tar -xzf VCTK-Corpus.tar.gz -C ../data

Install Dependencies

Please install the latest numpy, torch and tensorboard first. Then run the following commands:

export PYTHONPATH=.
# install requirements.
pip install -U pip
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3

Finally, install Montreal Forced Aligner following the link below:

https://montreal-forced-aligner.readthedocs.io/en/latest/

Download the pre-trained vocoder

mkdir pretrained
mkdir pretrained/hifigan_hifitts

download model_ckpt_steps_2168000.ckpt, config.yaml, from https://drive.google.com/drive/folders/1n_0tROauyiAYGUDbmoQ__eqyT_G4RvjN?usp=sharing to pretrained/hifigan_hifitts

Data Preprocess

# The default dataset is ``vctk``.
python data_gen/tts/base_preprocess.py
python data_gen/tts/run_mfa_train_align.sh
python data_gen/tts/base_binarizer.py

Train

# Example run for Fluenteditor.
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --dir /path/to/your/fluenteditor --config egs/fluenteditor.yaml --exp_name fluenteditor --reset

Inference

We provide the data structure of inference in inference/example.csv. text and edited_text refer to the original text and target text. region refers to the word idx range (start from 1 ) that you want to edit. edited_region refers to the word idx range of the edited_text.

id item_name text edited_text wav_fn_orig edited_region region
0 1 "I'd love to be at the world cup." "I'd absolutely love to be at the world cup." inference/audio_example/1.wav [1,3] [1,2]
# run with one example
python inference/tts/fluenteditor.py --exp_name fluenteditor

Evaluation

# Example Objective Evaluation for Fluenteditor.
# You can use the following objective evaluation metrics: MCD, STOI, PESQ
python eval/get_metrics.py

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

Tips

  1. If you find the mfa_dict.txt, mfa_model.zip, phone_set.json, or word_set.json are missing in inference, you need to run the preprocess script in our repo to get them. You can also download all of these files you need for inferencing the pre-trained model from https://drive.google.com/drive/folders/1BOFQ0j2j6nsPqfUlG8ot9I-xvNGmwgPK?usp=sharing and put them in data/processed/vctk.
  2. Please specify the MFA version as 2.0.0rc3.

If you find any other problems, please contact me.

fluenteditor's People

Contributors

ttslr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.