GithubHelp home page GithubHelp logo

park323 / awe-vq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kamperh/recipe_bucktsong_awe_py3

0.0 0.0 0.0 5.79 MB

Unsupervised acoustic word embeddings evaluated on Buckeye English and NCHLT Xitsonga data in Python 3.

Shell 0.51% Python 33.74% Makefile 0.01% Jupyter Notebook 65.75%

awe-vq's Introduction

Unsupervised Acoustic Word Embeddings on Buckeye English and NCHLT Xitsonga

Overview

Note: This is an updated version of the recipe at https://github.com/kamperh/recipe_bucktsong_awe. The code here uses Python 3 (instead of Python 2.7) and uses LibROSA for feature extraction (instead of HTK). Because of slight differences in the resulting features, the results here does not exactly match those in the paper below, since the older recipe was used for the paper.

Unsupervised acoustic word embedding (AWE) approaches are implemented and evaluated on the Buckeye English and NCHLT Xitsonga speech datasets. The experiments are described in:

  • H. Kamper, "Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models," in Proc. ICASSP, 2019. [arXiv]

Please cite this paper if you use the code.

Disclaimer

The code provided here is not pretty. But I believe that research should be reproducible. I provide no guarantees with the code, but please let me know if you have any problems, find bugs or have general comments.

Download datasets

Portions of the Buckeye English and NCHLT Xitsonga corpora are used. The whole Buckeye corpus is used and a portion of the NCHLT data. These can be downloaded from:

From the complete Buckeye corpus we split off several subsets: the sets labelled as devpart1 and zs respectively correspond to the English1 and English2 sets in Kamper et al., 2016. We use the Xitsonga dataset provided as part of the Zero Speech Challenge 2015 (a subset of the NCHLT data).

Create and run Docker image

This recipe provides a Docker image containing all the required dependencies. The recipe can be run without Docker, but then the dependencies need to be installed separately (see below). To use the Docker image, you need to:

To build the Docker image, run:

cd docker
docker build -f Dockerfile.gpu -t py3_tf1.13 .
cd ..

The remaining steps in this recipe can be run in a container in interactive mode. The dataset directories will also need to be mounted. To run a container in interactive mode with the mounted directories, run:

docker run --runtime=nvidia -it --rm -u $(id -u):$(id -g) -p 8887:8887 \
    -v /r2d2/backup/endgame/datasets/buckeye:/data/buckeye \
    -v /r2d2/backup/endgame/datasets/zrsc2015/xitsonga_wavs:/data/xitsonga_wavs \
    -v "$(pwd)":/home \
    py3_tf1.13

Alternatively, run ./docker.sh, which executes the above command and starts an interactive container.

To directly start a Jupyter notebook in a container, run ./docker_notebook.sh and open http://localhost:8889/.

If not using Docker: Install dependencies

If you are not using Docker, install the following dependencies:

To install speech_dtw, clone the required GitHub repositories into ../src/ and compile the code as follows:

mkdir ../src/  # not necessary using docker
git clone https://github.com/kamperh/speech_dtw.git ../src/speech_dtw/
cd ../src/speech_dtw
make
make test
cd -

Extract speech features

Update the paths in paths.py to point to the datasets. If you are using docker, paths.py will already point to the mounted directories. Extract MFCC and filterbank features in the features/ directory as follows:

cd features
./extract_features_buckeye.py
./extract_features_xitsonga.py

More details on the feature file formats are given in features/readme.md.

Evaluate frame-level features using the same-different task

This is optional. To perform frame-level same-different evaluation based on dynamic time warping (DTW), follow samediff/readme.md.

Obtain downsampled acoustic word embeddings

Extract and evaluate downsampled acoustic word embeddings by running the steps in downsample/readme.md.

Train neural acoustic word embeddings

Train and evaluate neural network acoustic word embedding models by running the steps in embeddings/readme.md.

Notebooks

Some notebooks used during development are given in the notebooks/ directory. Note that these were used mainly for debugging and exploration, so they are not polished. A docker container can be used to launch a notebook session by running ./docker_notebook.sh and then opening http://localhost:8889/.

Unit tests

In the root project directory, run make test to run unit tests.

License

The code is distributed under the Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0).

awe-vq's People

Contributors

kamperh avatar park323 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.