GithubHelp home page GithubHelp logo

wavbert's Introduction

Welcome to WavBERT: Exploiting Semantic and Non-semantic Speech using Wav2vec and BERT for Dementia Detection πŸ‘‹

Version Documentation Maintenance

In this project, we exploit semantic and non-semantic information from patient’s speech data usingWav2vec and Bidirectional Encoder Representations from Transformers (BERT) for dementia detection. We first propose a basic WavBERT model by extracting semantic information from speech data using Wav2vec, and analyzing the semantic information using BERT for dementia detection. While the basic model discards the non-semantic information, we propose extended WavBERT models that convert the output ofWav2vec to the input to BERT for preserving the non-semantic information in dementia detection. Specifically, we determine the locations and lengths of inter-word pauses using the number of blank tokens from Wav2vec where the threshold for setting the pauses is automatically generated via BERT. We further design a pre-trained embedding conversion network that converts the output embedding of Wav2vec to the input embedding of BERT, enabling the fine-tuning of WavBERT with non-semantic information. Our evaluation results using the ADReSSo dataset showed that the WavBERT models achieved the highest accuracy of 83.1% in the classification task, the lowest Root-Mean-Square Error (RMSE) score of 4.44 in the regression task, and a mean F1 of 70.91% in the progression task. We confirmed the effectiveness of WavBERT models exploiting both semantic and non-semantic speech.

🏠 Homepage

Dataset Results

WavBert Data

Author

πŸ‘€ Xiaohui Liang

Author

πŸ‘€ Youxiang Zhu

Author

πŸ‘€ Abdelrahman Obyat

The challenge homepage can be found here:

http://www.homepages.ed.ac.uk/sluzfil/ADReSSo-2021/

Installation

Python3.5 or newer is required for this project

Installing python3.8:

1. sudo apt update
2. sudo apt install software-properties-common
3. sudo add-apt-repository ppa:deadsnakes/ppa
4. sudo apt install python3.8
5. sudo apt install python3-pip

Installing env:

sudo apt-get install python3.8-venv

Please note, if root system is full and no system storage available error is encountered, restarting system in order also clear root cache and pip cache is recommended.

Please note, the following must be installed in env variable using the following commands:

1. python3.8 -m venv env
2. source env/bin/activate

Installing Torch:

 python3.8 -m pip --no-cache-dir install torch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0

Customization of Pytorch for different builds can be found here:

 https://pytorch.org/get-started/locally/

Please note, if pip wheel errors are encountered, reinstalling pip wheel is recommended:

 pip install wheel

Installing Additional Libraries:

1. python3.8 -m pip install transformers==4.0.0

2. python3.8 -m pip install tqdm
3. python3.8 -m pip install numpy

4. python3.8 -m pip install matplotlib
5. python3.8 -m pip install jiwer

6. python3.8 -m pip install librosa
7. python3.8 -m pip install fairseq

8. python3.8 -m pip install datasets

9. python3.8 -m pip --no-cache-dir install editdistance
10. python3.8 -m pip --no-cache-dir install sentencepiece

Wav2Vec

Follow here:

 https://medium.com/@shaheenkader/how-to-install-wav2letter-dc94c3b74e97

Installing Arrayfire:

1. sudo apt-get install cmake g++
2. wget https://arrayfire.s3.amazonaws.com/3.6.1/ArrayFire-no-gl-v3.6.1_Linux_x86_64.sh
3. chmod u+x ArrayFire-no-gl-v3.6.1_Linux_x86_64.sh
4. sudo bash ArrayFire-no-gl-v3.6.1_Linux_x86_64.sh --include-subdir --prefix=/opt
5. sudo bash -c 'echo /opt/arrayfire-no-gl/lib > /etc/ld.so.conf.d/arrayfire.conf'
6. sudo ldconfig

Installing GoogleTest:

1. sudo apt-get install libgtest-dev
2. sudo apt-get install cmake # install cmake
3. cd /usr/src/gtest
4. sudo cmake CMakeLists.txt
5. sudo make
6. cd lib
7. sudo cp \*.a /usr/lib

Installing OpenMPI:

sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev

Installing CUDA:

1. wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
2. sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
3. sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
4. sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
5. sudo apt-get update
6. sudo apt-get -y install cuda
7. sudo apt install nvidia-cuda-toolkit
8. pip3.8 install mkl-devel

References:

https://www.eriksmistad.no/getting-started-with-google-test-on-ubuntu/

https://arrayfire.org/docs/installing.htm

https://askubuntu.com/questions/103643/cannot-echo-hello-x-txt-even-with-sudo

https://gitlab.kitware.com/cmake/cmake/-/issues/19396

Acess to the DataSet can be requested below:

For Diagnosis task (train and test):   
https://media.talkbank.org/dementia/English/0extra/ADReSSo21-diagnosis-train.tgz
https://media.talkbank.org/dementia/English/0extra/ADReSSo21-diagnosis-test.tgz 
For Progression Tasks (train and test):     
https://media.talkbank.org/dementia/English/0extra/ADReSSo21-progression-train.tgz
https://media.talkbank.org/dementia/English/0extra/ADReSSo21-progression-test.tgz

Procedures for training:

# Get Wav2vec embedding
python3 audio_asr_to_text.py

# Get pause thresholds
python3 check_pause_length.py

# Train WavBERT model (parameters may be needed, check text_train.py)
python3 text_train.py
# Or
sh run2.sh

Procedures for pre-training:

# Download LibriSpeech dataset, merge train-clean-100 train-clean-360 train-other-500 to train_960

# Get Wav2vec embedding of LibriSpeech dataset
python3 pre_train_audio_asr_to_text.py

# Pre-train WavBERT model (ASR embedding conversion part)
sh run.sh

Show your support

Give a ⭐️ if this project helped you!


This README was generated with ❀️ by readme-md-generator

wavbert's People

Contributors

obyat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.