GithubHelp home page GithubHelp logo

voice-conversion-evaluation's Introduction

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Make metadata

The metadata plays an important role in this repo. There are several information in the metadata, include dataset name, number of samples, speaker names, the relative path of audio, and the content of source audio.

You can find more information with metadata here.

There is an example for generating metadata.

  python make_metadata.py \ 
    VCTK /path_of_datasets/VCTK-Corpus \ 
    CMU /path_of_datasets/CMU_ARCTIC \ 
    -n 10 -nt 5 \ 
    -o [path of output dir]

You can find an example metadata in the directory of examples.

We provide several dataset parsers in the directory of parsers. The default parser is the same as the dataset name. You can name the dataset by yourself and specify a particular parser.

Inference

Utilize a unified I/O interface to inference voice conversion models. You should prepare metadata before you inference voice conversion models.

After inferencing voice conversion models, the relative path of converted audios will be add into the metadata, and the metadata will be copied into the output directory.

All of the pretrained models can be found in releases. Please put the checkpoints into the corresponding model directory, e.g. models/any2any/AdaIN-VC/checkpoints.

There is an example for inference.

  python inference.py \ 
    -m examples/metadata_example.json \ 
    -s /path_of_datasets/VCTK-Corpus \ 
    -t /path_of_datasets/CMU_ARCTIC \ 
    -o [path of output dir] \ 
    -r models/any2any/AdaIN-VC

For BLOW, there are some issues when reloading the checkpoint. Please inference BLOW in its directory.

Metrics

The metrics include Nerual mean opinion score assessment, character error rate, and speaker verification acceptance rate.

If you only want to use the metrics here, it is unnecessary to use the inference code in this repo. You can use your own inference code with the metadata. Be careful, you need to add the relative paths of converted audios into metadata.

  • Nerual mean opinion score assessment:
    • Ensemble several MBNet which is implemented by sky1456723.
    • You can calculate nerual mean opinion score assessment without metadata.
      python calculate_objective_metric.py \ 
        -d [data_dir] \ 
        -r metrics/mean_opinion_score
    
  • Character error rate:
    • Use the automatic speech recognition model provided by Hugging Face.
    • You should prepare metadata before you calculate character error rate.
      python calculate_objective_metric.py \ 
        -d [data_dir] \ 
        -r metrics/character_error_rate
    
  • Speaker verification acceptance rate:
    • Use the speaker verification model provided by Resemblyzer.
    • You can calculate the equal error rate and threshold by metrics/speaker_verification/equal_error_rate/.
    • And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/.
    • You should prepare metadata before you calculate speaker verification acceptance rate.
      python calculate_objective_metric.py \ 
        -d [data_dir] \ 
        -r metrics/speaker_verification \ 
        -t [target_dir] \ 
        -th [threshold_path]
    

Reference Repositories

Voice conversion models

Metrics

Others

voice-conversion-evaluation's People

Contributors

kimythanly avatar tzuhsien avatar yistlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

voice-conversion-evaluation's Issues

What's the recommendation for the value of the n_samples in prepare_eer_samples.py

Hi, thanks for your great work of this repository.

Im using the speaker verification parts of this repository to calculate the threshold value for resemblyzer acceptance rate. In my case, I give the value 256 to the n_sample argument in prepare_eer_samples.py to calculate the threshold value. Im not very sure is this setting good enough for the threshold value. What is your recommendation for the value of n_sample.

Thanks

What's the recommendation for the value of the n_samples in prepare_eer_samples.py

Hi, thanks for your great work of this repository.

Im using the speaker verification parts of this repository to calculate the threshold value for resemblyzer acceptance rate. In my case, I give the value 256 to the n_sample argument in prepare_eer_samples.py to calculate the threshold value. Im not very sure is this setting good enough for the threshold value.

What is your recommendation for the value of n_sample?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.