GithubHelp home page GithubHelp logo

madipraise / tfg-voice-conversion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from albertaparicio/tfg-voice-conversion

0.0 0.0 0.0 3.33 GB

Deep Learning-based Voice Conversion system

License: GNU General Public License v3.0

Shell 14.59% Python 85.41%

tfg-voice-conversion's Introduction

Voice Conversion using Deep Learning

Download PDF

This project will be carried out at the Signal Theory and Communications Department (TSC) of the Polytechnic University of Catalonia (UPC). Specifically, it will be developed at the Speech Processing investigation group (VEU) as a contribution to its research project DeepVoice: Deep Learning Technologies for Speech and Audio Processing.

The purpose of this project is to develop a deep learning-based system able to convert a voice signal from a speaker into another that sounds as if it were uttered by a different one. The result signal must keep the linguistic and prosodic elements of the original signal unmodified.

Deep Learning techniques have shown remarkable results in other areas of speech processing, such as voice recognition and voice synthesis. These techniques are often combined with other, more classic, techniques of voice processing and modeling, such as feature extractions from a vocoder. These techniques are used for pre and post-processing purposes.

Before this system can be developed, there are some previous tasks that must be accomplished. Mainly, these tasks comprise acquiring a thorough knowledge of Neural Networks and how to apply them in Deep Learning, as well as getting familiarized with the tools that will be used in the project. These tools include several Python libraries, such as NumPy, TensorFlow, Theano and Keras.

Regarding the programming tools and libraries, some preparation work has already been done beforehand during summer 2016, working with Python, NumPy and TensorFlow.

The project’s main goals are:

  1. Develop a Deep Learning-based system able to convert recorded speech from a speaker into that of another speaker
    1. Profound understanding of Deep Learning architectures
    2. Solid knowledge in the use of the Keras Deep Learning Python library
    3. Propose an innovative architecture following the state of the art in Deep Learning for Voice Conversion
    4. Evaluate the developed system’s conversion so it performs better than those submitted to Interspeech 2016 Voice Conversion Challenge

tfg-voice-conversion's People

Contributors

albertaparicio avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.