GithubHelp home page GithubHelp logo

bsantraigi / hier Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 0.0 58.63 MB

Official Repo for Implementations of Models/Experiments in "Hierarchical Transformer for Task Oriented Dialog Systems" - NAACL 2021 (Long Paper)

License: MIT License

Python 20.40% Jupyter Notebook 79.28% Shell 0.28% Makefile 0.04%
naacl transformer dialog-systems hier conversational-ai hierarchical-models hierarchical-encoding hierarchical naacl2021

hier's Introduction

HIER - Pytorch

Implementation of HIER, in Pytorch

Title: Hierarchical Transformer for Task Oriented Dialog Systems Bishal Santra, Potnuru Anusha and Pawan Goyal

Abstract

Generative models for dialog systems have gained much interest because of the recent success of RNN and Transformer based models in tasks like question answering and summarization. Although the task of dialog response generation is generally seen as a sequence to sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn meaningful utterance and conversation level features, Sordoni et al. (2015b); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer-based models dominating the seq2seq problems lately, the natural question to ask is the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we propose a generalized framework for Hierarchical Transformer Encoders and show how a standard transformer can be morphed into any hierarchical encoder, including HRED and HIBERT like models, by using specially designed attention masks and positional encodings. We demonstrate that Hierarchical Encoding helps achieve better natural language understanding of the contexts in transformer-based models for task-oriented dialog systems through a wide range of experiments.

Figure 1: Detailed Architecture for a Hierarchical Transformer Encoder or HT-Encoder: The main inductive bias incorporated in this model is to encode the full dialog context hierarchically in two stages. This is done by the two encoders, 1) Shared Utterance Encoder (M layers) and 2) Context Encoder (N layers), as shown in the figure. Shared encoder first encodes each utterance (formula) individually to extract the utterance level features. The same parameterized Shared Encoder is used for encoding all utterances in the context. In the second Context Encoder the full context is encoded using a single transformer encoder for extracting dialog level features. The attention mask in context encoder decides how the context encoding is done and is a choice of the user. This one depicted in the figure is for the HIER model described in Section 2.3 of paper. Only the final utterance in the Context Encoder gets to attend over all the previous utterances as shown. This allows the model to have access to both utterance level features and dialog level features till the last layer of the encoding process. Notation: Utterance formula, formula, formula is the word embedding for formula word in formula utterance.

Install

git clone 
bash extract_data.sh

Usage

Please refer to instructions in this repo for using the HT-Encoder architecture in your model to achieve hierarchical encoding in Transformer encoders.

Dataset Setup:

(From https://github.com/wenhuchen/HDSA-Dialog) Generated using the create_delex_data.py from original multiwoz repository for multiwoz 2.1 version.

  1. Add preprocessed data - train.json, val.json and test.json into hdsa_data/hdsa_data/ folder.
  2. Add delex.json file into data folder, large file(~87MB)
wget --directory-prefix=data/  https://hdsa-dialog.s3-us-west-2.amazonaws.com/delex.json

Citations

@misc{santra2021hierarchical,
      title={Hierarchical Transformer for Task Oriented Dialog Systems}, 
      author={Bishal Santra and Potnuru Anusha and Pawan Goyal},
      year={2021},
      eprint={2011.08067},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

We thank the authors and developers of MarCo and HDSA for releasing their models and codes. The code of this work is derived from these two papers. We also thank the developers of MultiWoz for creating the dataset.

hier's People

Contributors

bsantraigi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hier's Issues

Data Missing

No such file or directory: 'data/delex.json' when running HIER-master/HIER/evaluate.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.