multiLSTM

tentative name: `iNERt network`

for intent, Named Entity Recognition, and topic identification network

requirements

gensim
h5py
keras
keras-contrib
matplotlib
mlxtend
tensorflow
nltk
numpy
pandas
pycrfsuite
sklearn

this also uses a keras attention layer that i adapted from the gist by nigeljyng implemented here: https://gist.github.com/nigeljyng/37552fb4869a5e81338f82b338a304d3

rename this file to attention.py and add to relevant directories - only works with Theano backend; see comments for possible TensorFlow fix. ( my implementation not included as it is derivative of this gist with fixes for tensorflow compatibility and per-timestep output )

EDIT: this fork by rmdort fixes TF & Keras compatibility: https://gist.github.com/rmdort/596e75e864295365798836d9e8636033

NB: as of late nov, keras 2.08 has compatibility issues with keras-contrib. so i am using keras 2.04 for now.

purpose

unify "intent detection" of syntactic-pragmatic speech act 'intents', semantic topic and named entity detection into a single neural network. this would allow chat systems to use a single architecture to detect and identify .

previous work

Multi-Domain Joint Semantic Frame Parsing using bi-RNN-LSTM (Hakkani-Tur et al. 2016)

single bi-LSTM network for both NER (using intermediate output) and intent classification (using final output)

idea

adapt the CoNLL 2003 NER demo with a secondary output to train intent classification, and freeze embeddings to allow future networks to incorporate larger vocabulary (ofc embeddings will need to be retrained, but simulate using subset of large embedding matrix by freezing).

network will be modified from the above by using a CRF layer for NER and an attention layer for sentence vectorization before classification along multiple axes (speech act, topic).

current sample output (dec 05)

ner.predict("do you have any flights from Seoul leaving tomorrow?", debug=True)
['do', 'you', 'have', 'any', 'flights', 'from', 'seoul', 'leaving', 'tomorrow']
['O', 'O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'DAT']
({'DAT': 'tomorrow', 'GEO': 'seoul'}, 'reqInfo', 'day')

ner.predict("are there any flights to London at 5:30?", debug=True)
['are', 'there', 'any', 'flights', 'to', 'london', 'at', '5:30']
['O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'TIM']
({'GEO': 'london', 'TIM': '5:30'}, 'reqInfo', 'time')

ner.predict("let's do the 11:20 am flight to Tokyo", debug=True)
['lets', 'do', 'the', '11:20', 'am', 'flight', 'to', 'tokyo']
['O', 'O', 'O', 'TIM', 'TIM', 'O', 'O', 'GEO']
({'GEO': 'tokyo', 'TIM': '11:20 am'}, 'state', 'enum')

a trained model is included so you can test out new results using class_based_decoder.ipynb in the decoding directory.

todo

edit preprocessing scripts to enforce train-test split across tests
clean up shared script libraries (attention.py, datasets.py, embeddings.py)
tune model hyperparameters

network training

download the corpus files at http://martinweisser.org
extract to the data directory
run the notebooks in preprocessing to generate the network data
run the notebook in training to train and save a model
run the notebooks in decoding to evaluate against baselines
the class decoder notebook in decoding demonstrates a basic idea of implementation in a larger system and allows testing on novel sentences

note on evaluations

these evaluation numbers are not rigorous analyses because they are conducted over the bootstrapped data (raw data plus synthesized data) using automatic labels that lack human evaluation, and critically, train and test sets were not fixed during testing (after shuffling, train and test indices should be saved and used to reconstitute exact sets for each baseline and experimental trial). they are only presented as rough estimates of performance and to demonstrate that the actual code is functional.

`sklearn` baseline evaluation for speech acts and topics

these results are from the bootstrapped data, using a linear SVM classifier over tf.idf vectors:

# speech act classification

train precision 0.843945255398
train recall    0.726487430623
train accuracy  0.871683309558

test precision 0.764387869858
test recall    0.722603761044
test accuracy  0.809370988447

# topic classification

train precision 0.882497828431
train recall    0.838802577875
train accuracy  0.934236804565

test precision 0.86166845483
test recall    0.814780976901
test accuracy  0.885750962773

`python-CRFsuite` baseline evaluation for NER

test accuracy 0.998266078184111

`iNERt` results

entity accuracy		0.9968102073365231
speech_act accuracy	0.9059278350515464
topic accuracy		0.9072164948453608

keeping in mind the implementational flaws above, the model accuracy exceeds all baselines above.

other implementations, related links

i stumbled upon this keras implementation of a similar task using the ATIS data:

https://github.com/yvchen/JointSLU

codeaudit / multilstm Goto Github PK

multilstm's Introduction

multiLSTM

tentative name: iNERt network

requirements

purpose

previous work

idea

current sample output (dec 05)

todo

network training

note on evaluations

sklearn baseline evaluation for speech acts and topics

python-CRFsuite baseline evaluation for NER

iNERt results

other implementations, related links

multilstm's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs

tentative name: `iNERt network`

`sklearn` baseline evaluation for speech acts and topics

`python-CRFsuite` baseline evaluation for NER

`iNERt` results