GithubHelp home page GithubHelp logo

joessattes / thai-tts-evaluation Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 135 KB

This repository hosts a Thai Text-to-Speech (TTS) evaluation script, focusing on assessing speaker tone and pronunciation performance.

License: Other

Python 100.00%

thai-tts-evaluation's Introduction

Thai-TTS-Evaluation

This repository hosts a Thai Text-to-Speech (TTS) evaluation script, focusing on assessing speaker tone and pronunciation performance.

boundary_problem.png

Speaker Encoder Model

For the speaker tone objective, we utilized the Speaker Encoder Cosine Similarity (SECS) metric to assess the resemblance between the synthesized speech and the original speaker's speech. This method involves calculating the cosine similarity between the speaker embeddings derived from two speech samples, using a speaker encoder. We utilize the Coqui speaker encoder, trained on the comprehensive VoxCeleb1, VoxCeleb2, and all language CommonVoice datasets, ensuring broad generalizability in our evaluations.

Speech-to-Text Model

For pronunciation, we utilized a Thai speech-to-text model. The underlying assumption is that high-quality synthesized speech should yield similar speech-to-text results as the original speech. This method allows us to gauge the accuracy of pronunciation in the synthesized speech.

Requirements

  • python==3.7.13
  • TTS==0.8.0
  • webrtcvad==2.0.10
  • torch==1.13.0
  • torch-audiomentations==0.11.0
  • torch-complex==0.4.3
  • torch-pitch-shift==1.2.2
  • torchaudio==0.13.0
  • torchmetrics==0.8.0
  • numpy==1.21.6
  • huggingface-hub==0.14.1
  • transformers==4.25.1

Setup

  1. Environment Setup: Ensure that you have a compatible Python environment. Using Conda or virtualenv is recommended to manage dependencies.

    conda create --name voice_env python=3.7.13
    conda activate voice_env
  2. Install Dependencies:

    pip install <requirement lists>
  3. GPU Support: If you're planning to use a GPU, ensure that your PyTorch installation is compatible with your CUDA version.

Running

python evaluate_tts.py

Acknowledgements

  • We thank Coqui for their Speaker Embedding Model, available at: https://github.com/coqui-ai/TTS.git.
  • We are grateful to the Biomedical and Data Lab at Mahidol University for their contribution to the proposed Thai speech-to-text model.

thai-tts-evaluation's People

Contributors

joessattes avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.