GithubHelp home page GithubHelp logo

twinkled / data-for-customvoice.ai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from boltomli/data-for-customvoice.ai

0.0 2.0 0.0 17 KB

How to prepare audio and text to create a custom voice powered by Microsoft Azure Cognitive Services

Home Page: https://customvoice.ai

License: MIT License

Jupyter Notebook 100.00%

data-for-customvoice.ai's Introduction

Data preparation for CustomVoice.AI

Goal

Provide a working end-to-end sample process of how to prepare data for Custom Voice powered by Microsoft Azure Cognitive Services. Target audience: developers that are not quite familiar with audio processing.

中文版文档 for Chinese version

Data formats

  • Text

    Text files should be saved with UTF-16 little endian encoding at the moment. Most modern text editor should be able to handle this. Take Visual Studio Code as an example. Bring up Command Pallette in View menu item (shortcut: ⇧⌘P or Ctrl+Shift+P), type in Change File Encoding, select Save with Encoding, then UTF-16 LE. UTF-8 with or without BOM would be supported in future but not working yet.

  • Audio

    Audio files should be saved as 16k sampling rate 16-bit depth mono PCM wave with .wav extension. Typical format information as viewed in MediaInfo:

Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 8 s 255 ms
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 16.0 kHz
Bit depth                                : 16 bits
Stream size                              : 258 KiB (100%)

If collected audio is in other formats, FFmpeg and SoX will be helpful in conversion.

ffmpeg -i sourcemedia.mp4 targetaudio.wav
sox targetaudio.wav -c 1 -r 16000 -b 16 00001.wav --norm -R

Data structure

It's recommended that the audio files and correspondent text scripts are collected and organized per batch. This way the uploading of each batch can be faster and it'll be easier to identify and fix issues. It'll also enable custom voice model creation using different sets of data. Multiple data sets selection is supported so if there's no overlap between batches, you can still use all the data for one model.

.
├── batch1
│   ├── 01001.wav
│   ├── 01002.wav
├── batch2
│   ├── 02001.wav
│   ├── 02002.wav
│   └── 02003.wav
│   └── 02004.wav
├── batch3
│   ├── 01001.wav
│   ├── 03002.wav
│   └── 03003.wav
├── text_batch1.txt
├── text_batch2.txt
└── text_batch3.txt

Content of text_batch1.txt is like:

01001	Text of the first sentence is here.
01002	Is this the last sentence of the batch?

Note that the ID in text must match the wave file name without extension. Between ID and sentence there is a Tab but not spaces. There should be exactly one Tab in each line.

Archive files to upload should contain wave files only. Take 7-Zip as an example, run the command in each batch folder cd batch1 && 7z a batch1.zip *.wav.

Advanced data processing

This basic guide assumes each wave contains one sentence (so the text is also one sentence per line). If you have only a big media file with many sentences, it should be pre-processed. See advanced for hints.

data-for-customvoice.ai's People

Contributors

boltomli avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.