GithubHelp home page GithubHelp logo

p1an-lin-jung / tacotron2-gst Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jinhan/tacotron2-gst

0.0 0.0 0.0 9.17 MB

Tacotron2 with Global Style Tokens

License: BSD 3-Clause "New" or "Revised" License

Python 4.06% Jupyter Notebook 95.94%

tacotron2-gst's Introduction

tacotron2-gst

Overview

Data

  1. Dataset

    • Korean Speech Emotion Dataset (more info)
    • Single Female Voice Actor recorded six diffrent emotions(neutral, happy, sad, angry, disgust, fearful), each with 3,000 sentences. Total 30 hours
  2. Text

    • KoG2P: Given an input of a series of Korean graphemes/letters (i.e. Hangul), KoG2P outputs the corresponding pronunciations.
    • test: python -m text.cleaners
    • examples
    감정있는 한국어 목소리 생성
     ==>
     ㄱㅏㄻㅈㅓㆁㅇㅣᄔㄴㅡᄔ ㅎㅏᄔㄱㅜㄱㅓ ㅁㅗㄺㅆㅗㄹㅣ ㅅㅐㆁㅅㅓㆁ
     ==>
    ['k0', 'aa', 'mf', 'c0', 'vv', 'ng', 'ii', 'nf', 'nn', 'xx', 'nf', 'zz', 'h0', 'aa', 'nf', 'k0', 'uu', 'k0', 'vv', 'zz', 'mm', 'oo', 'kf', 'ss', 'oo', 'rr', 'ii', 'zz', 's0', 'qq', 'ng', 's0', 'vv', 'ng', '~']
    ==>
    [6, 29, 21, 12, 31, 24, 26, 22, 16, 30, 22, 47, 11, 29, 22, 6, 32, 6, 31, 47, 15, 33, 20, 10, 33, 17, 26, 47, 9, 28, 24, 9, 31, 24, 62] 
    
  3. Audio

    • sampling rate: 16000
    • filter length: 1024
    • hop length: 256
    • win length: 1024
    • n_mel: 80
    • mel_fmin: 0
    • mel_fmax: 8000
  4. Training files

    • ./filelists/*.txt
    • path | text
    /KoreanEmotionSpeech/wav/sad/sad_00002266.wav|과외 선생님이 열심히 지도해준 덕택에 수학실력이 점점 늘고 있다.
    /KoreanEmotionSpeech/wav/ang/ang_00000019.wav|명백한 것은 각 당이 투사하고 있는 실상과 허상이 있다면 이제 허상은 걷어들여야 한다는 것이다.
    

Training

  1. Prepare Datasets

  2. Clone this repo: git clone https://github.com/jinhan/tacotron2-gst.git

  3. CD into this repo: cd tacotron2-gst

  4. Initialize submodule: git submodule init; git submodule update

  5. Update .wav paths: sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt

  6. Install python requirements: pip install -r requirements.txt

  7. Training: python train.py --output_directory=outdir --log_directory=logdir -- hparams=training_files='filelists/koemo_spk_emo_all_train.txt',validation_files='filelists/koemo_spk_emo_all_valid.txt',batch_size=6

  8. Monitoring: tensorboard --logdir=outdir/logdir --host=127.0.0.1

  9. Training results (~ 288,000 steps)

    alignment

Inference

source: inference.ipynb

Condition on Reference Audio

  • Generate voice that follows the style of the reference audio

    Extract style vector from reference audio

    ref_audio_mel = load_mel(ref_audio)
    latent_vector = model.gst(ref_audio_mel)
    latent_vector = latent_vector.expand_as(transcript_outputs)
    encoder_outputs = transcript_outputs + latent_vector

    Generate voice

    generate_mels_by_ref_audio(text, ref_audio):

Condition on Style Tokens

  • Generate by style tokens

    Style token

    GST = torch.tanh(model.gst.stl.embed)
    
    for idx in range(10):
        query = torch.zeros(1, 1, hparams.E//2).cuda()
        keys = GST[idx].unsqueeze(0).expand(1,-1,-1)
        style_emb = model.gst.stl.attention(query, keys)
        encoder_outputs = transcript_outputs + style_emb

    Generate voice

    generate_mels_by_style_tokens(text)

Samples

  • ./samples/tokens: condition on style tokens
  • ./samples/refs: condition on reference audio

References

tacotron2-gst's People

Contributors

jinhan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.