GithubHelp home page GithubHelp logo

perceptual-coding-in-python's Introduction

Perceptual Coding In Python

Created by Stephen Welch and Matthew Cohen

###Research Question: How do we measure how similar two signals sound?

###Introduction

Quantifying human perceptions of physical phenomena is a complex task that covers various disciplines. Speech and image recognition are relatively mature fields that have seen particular progress in the last few years through training deep neural networks on large labeled datasets. Notably for both speech and image recognition, hand-engineered features such as SIFT and MFCC are quickly being replaced by learned features. It is difficult to say if the deep learning paradigm will replace, augment, or provide more insight into other relevant fields concerned with quantifying the perception of sounds. We believe relevant work can be divided broadly into the following areas: Audio Compression and Perceptual Coding, Music Information Retrieval, Machine Learning, and Measures of Audio Quality.

###Audio Compression

Audio compression technologies have been incredibly successful at achieving excellent quality at high compression rates. Mp3 and AAC algorithms are at a high-level described by an international standard, but in implementation are largely heuristics driven. There are a number of open source implementation available, but the best and most used implementations are patented and proprietary. It seems like “good enough” compression solutions we’re largely reached in the 1990s, and not much has changed since then.

###Machine Learning

Machine learning, specifically through deep neural networks, offers excellent performance on speech recognition in noisy environments and other audio classification tasks. In this framework, expert knowledge is spent on setting up clever training and data structures, rather than hand engineering features. Similar to audio compression, the progress in this area made by machine learning can be largely heuristic at times, and tends to produce best results when developed by an expert in the field or sub-field.

###Psychoacoustics/Perceptual Coding

Psychoacoustics and perceptual processing has received focus by researchers, telecommunication companies, and other groups hoping to develop algorithms for objectively measuring perceived audio quality. These algorithms simulate perceptual properties of the human ear and apply computer models of the auditory signal processing in order to estimate the perceived similarity between two audio signals. Some examples of these algorithms include PEAQ (Perceptual Evaluation of Audio Quality), a standardized algorithm; PEMO-Q (PErception MOdel-based Quality estimation); PESQ (Perceptual Evaluation of Speech Quality); and EAQUAL (Evaluation of Audio Quality).

These algorithms incorporate strong understanding of the perceptual processes involved in hearing and auditory processing, and make use of in-depth DSP stages. The downside with many of these algorithms is that they are patented and licensed, mostly to companies due to the high costs for licenses. Over the years, students and academia professionals have attempted to implement some of these algorithms for open-source usage; however, the precision and accuracy of these seem variable, and available implementations are scarce. EAQUAL’s source code is made available to the public, but has not been extensively tested.

perceptual-coding-in-python's People

Contributors

mcohen28 avatar stephencwelch avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.