GithubHelp home page GithubHelp logo

daniellin94144 / dual-textless-sqa Goto Github PK

View Code? Open in Web Editor NEW
34.0 7.0 11.0 113.95 MB

Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless Spoken Question Answering with Speech Discrete Unit Adaptive Learning" paper.

License: Creative Commons Attribution Share Alike 4.0 International

Python 97.96% Shell 2.04%
spoken-question-answering dataset

dual-textless-sqa's Introduction

DUAL-textless-SQA

This repository is the official implementation for DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering paper, and the release of the Natural Multi-speakers Spoken Question Answering (NMSQA) dataset.

Installation

Model

Dataset

Download the NMSQA dataset

Data Preparation for Original Dataset

Preprocessed data link (including passage merging and unit-level labels, updated with question code): [link]

  • Directory format

    • train
    • dev
    • test
  • Files

    • For train and dev split {split}-answer-span.csv: answer time span in seconds meta-{split}.csv: the duration, speaker, and transcription of each utterance {split}-textgrid.tar.gz: force alignment of each utterance {split}_audio.tar.gz: utterance waveform files {split}_hash2question.json: map the hash value to question id
    • For test split lxt_sqa.tar.gz: contains all audio files in audio and transcriptions meta-lxt.csv: the duration, speaker, and transcription of each utterance test/test-SQuAD/test-SQuAD-answer-span.csv: the answer span in the test-SQuAD split test/test-OOD/test-OOD-answer-span.csv: the answer span in the test-OOD split

    NOTE Current the spoken passage is split to segments of utterances. For the standard QA task, you should merge the segments back to the whole passages. The suffix of -1, -2, ..., -n is the segment number of specific passage.

    • Speech Content Encoder Please see details in speeech-content-encoder.
    • Pre-process the QA labels
    python code_answer.py
    

Parquet Format & Huggingface Format dataset

It basically follow the same file format as the Origin SQuAD with the following extra field:

{
   "id": Same as SQuAD,
   "title": Same as SQuAD,
   "context": Same as SQuAD,
   "question": Same as SQuAD,
   "answers":{
      "answer_start": Same as SQuAD,
      "audio_full_answer_end":[], Audio answer end position in second
      "audio_full_answer_start":[], Audio answer start position in second
      "audio_full_neg_answer_end":[], Audio answer end position in second that using the same words but not the correct one
      "audio_full_neg_answer_start":[], Audio answer start position in second that using the same words but not the correct one
      "audio_segment_answer_end":[],
      "audio_segment_answer_start":[],
      "text": Same as SQuAD
   },
   "content_segment_audio_path": Segment Audio Path,
   "content_full_audio_path": Complete Audio Path,
   "content_audio_sampling_rate": Audio Sampling Rate,
   "content_audio_speaker": Audio Speaker,
   "content_segment_text":"",
   "content_segment_normalized_text": Normalized Text for generating audio,
   "question_audio_path": Question Audio Path,
   "question_audio_sampling_rate": Audio Sampling Rate,
   "question_audio_speaker": Audio Speaker,
   "question_normalized_text": Normalized Text for generating audio,
}

Training

python train.py --exp_name [exp name] --config baseline.yaml

Evaluation

python evaluate.py --data_dir [data dir path] --model_path [model checkpoint dir] --output_dir [output dir path] --out_fname [output name]

Results

Discrete unit PLM dev FF1 dev AOS test FF1 test AOS
HuBERT-64 Longformer 47.8 42.4 39.0 33.0
HuBERT-128 Longformer 54.2 48.5 56.0 49.1
HuBERT-512 Longformer 55.0 49.6 17.3 12.5

Contact

Guan-Ting Lin (Email: [email protected]) Eric Lam (Email: [email protected])

Citation

@inproceedings{lin22c_interspeech,
  author={Guan-Ting Lin and Yung-Sung Chuang and Ho-Lam Chung and Shu-wen Yang and Hsuan-Jui Chen and Shuyan Annie Dong and Shang-Wen Li and Abdelrahman Mohamed and Hung-yi Lee and Lin-shan Lee},
  title={{DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={5165--5169},
  doi={10.21437/Interspeech.2022-612}
}

dual-textless-sqa's People

Contributors

daniellin94144 avatar splend1d avatar voidful avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dual-textless-sqa's Issues

Preprocessed units for the segments

Hello!

I'm recently doing some experiments on NMSQA, and the code for DUAL provided here are really helpful! While I encountered some difficulty building the units using scripts provided to reproduce the results. Particularly, I'm trying to extract the units for each segment of context, while the preprocessed ones currently provided in the repo are already concatenated for each article following the standard QA scheme (using the merge_passage.py, I guess). May I know if the preprocessed units for each segment could be provided?

Thank you!

P.S. Just seen you and had some chat on Interspeech at the poster. The work was really impressive and useful for us :)

no question code available in processed data

Brilliant works! We are trying to train this model and we find that only codes of contexts are provided in the processed data link. Would you mind providing the processed codes of questions of train and dev split? since the preprocess pipeline is a little time consuming.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.