GithubHelp home page GithubHelp logo

xmunlp / tagger Goto Github PK

View Code? Open in Web Editor NEW
305.0 16.0 86.0 3 MB

Deep Semantic Role Labeling with Self-Attention

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
deep-learning tagging semantic-role-labeling tensorflow srl srltagger

tagger's Introduction

Tagger

This is the source code for the paper "Deep Semantic Role Labeling with Self-Attention".

Contents

Basics

Notice

The original code used in the paper is implemented using TensorFlow 1.0, which is obsolete now. We have re-implemented our methods using PyTorch, which is based on THUMT. The differences are as follows:

  • We only implement DeepAtt-FFN model
  • Model ensemble are currently not available

Please check the git history to use TensorFlow implementation.

Prerequisites

  • Python 3
  • PyTorch
  • TensorFlow-2.0 (CPU version)
  • GloVe embeddings and srlconll scripts

Walkthrough

Data

Training Data

We follow the same procedures described in the deep_srl repository to convert the CoNLL datasets. The GloVe embeddings and srlconll scripts can also be found in that link.

If you followed these procedures, you can find that the processed data has the following format:

2 My cats love hats . ||| B-A0 I-A0 B-V B-A1 O

The CoNLL datasets are not publicly available. We cannot provide these datasets.

Vocabulary

You can use the build_vocab.py script to generate vocabularies. The command is described as follows:

python tagger/scripts/build_vocab.py --limit LIMIT --lower TRAIN_FILE OUTPUT_DIR

where LIMIT specifies the vocabulary size. This command will create two vocabularies named vocab.txt and label.txt in the OUTPUT_DIR.

Training

Once you finished the procedures described above, you can start the training stage.

Preparing the validation script

An external validation script is required to enable the validation functionality. Here's the validation script we used to train an FFN model on the CoNLL-2005 dataset. Please make sure that the validation script can run properly.

#!/usr/bin/env bash
SRLPATH=/PATH/TO/SRLCONLL
TAGGERPATH=/PATH/TO/TAGGER
DATAPATH=/PATH/TO/DATA
EMBPATH=/PATH/TO/GLOVE_EMBEDDING
DEVICE=0

export PYTHONPATH=$TAGGERPATH:$PYTHONPATH
export PERL5LIB="$SRLPATH/lib:$PERL5LIB"
export PATH="$SRLPATH/bin:$PATH"

python $TAGGERPATH/tagger/bin/predictor.py \
  --input $DATAPATH/conll05.devel.txt \
  --checkpoint train \
  --model deepatt \
  --vocab $DATAPATH/deep_srl/word_dict $DATAPATH/deep_srl/label_dict \
  --parameters=device=$DEVICE,embedding=$EMBPATH/glove.6B.100d.txt \
  --output tmp.txt

python $TAGGERPATH/tagger/scripts/convert_to_conll.py tmp.txt $DATAPATH/conll05.devel.props.gold.txt output
perl $SRLPATH/bin/srl-eval.pl $DATAPATH/conll05.devel.props.* output

Training command

The command below is what we used to train a model on the CoNLL-2005 dataset. The content of run.sh is described in the above section.

#!/usr/bin/env bash
SRLPATH=/PATH/TO/SRLCONLL
TAGGERPATH=/PATH/TO/TAGGER
DATAPATH=/PATH/TO/DATA
EMBPATH=/PATH/TO/GLOVE_EMBEDDING
DEVICE=[0]

export PYTHONPATH=$TAGGERPATH:$PYTHONPATH
export PERL5LIB="$SRLPATH/lib:$PERL5LIB"
export PATH="$SRLPATH/bin:$PATH"

python $TAGGERPATH/tagger/bin/trainer.py \
  --model deepatt \
  --input $DATAPATH/conll05.train.txt \
  --output train \
  --vocabulary $DATAPATH/deep_srl/word_dict $DATAPATH/deep_srl/label_dict \
  --parameters="save_summary=false,feature_size=100,hidden_size=200,filter_size=800,"`
               `"residual_dropout=0.2,num_hidden_layers=10,attention_dropout=0.1,"`
               `"relu_dropout=0.1,batch_size=4096,optimizer=adadelta,initializer=orthogonal,"`
               `"initializer_gain=1.0,train_steps=600000,"`
               `"learning_rate_schedule=piecewise_constant_decay,"`
               `"learning_rate_values=[1.0,0.5,0.25,],"`
               `"learning_rate_boundaries=[400000,50000],device_list=$DEVICE,"`
               `"clip_grad_norm=1.0,embedding=$EMBPATH/glove.6B.100d.txt,script=run.sh"

Decoding

The following is the command used to generate outputs:

#!/usr/bin/env bash
SRLPATH=/PATH/TO/SRLCONLL
TAGGERPATH=/PATH/TO/TAGGER
DATAPATH=/PATH/TO/DATA
EMBPATH=/PATH/TO/GLOVE_EMBEDDING
DEVICE=0

python $TAGGERPATH/tagger/bin/predictor.py \
  --input $DATAPATH/conll05.test.wsj.txt \
  --checkpoint train/best \
  --model deepatt \
  --vocab $DATAPATH/deep_srl/word_dict $DATAPATH/deep_srl/label_dict \
  --parameters=device=$DEVICE,embedding=$EMBPATH/glove.6B.100d.txt \
  --output tmp.txt

Benchmarks

We've performed 4 runs on CoNLL-05 datasets. The results are shown below.

Runs Dev-P Dev-R Dev-F1 WSJ-P WSJ-R WSJ-F1 BROWN-P BROWN-R BROWN-F1
Paper 82.6 83.6 83.1 84.5 85.2 84.8 73.5 74.6 74.1
Run0 82.9 83.7 83.3 84.6 85.0 84.8 73.5 74.0 73.8
Run1 82.3 83.4 82.9 84.4 85.3 84.8 72.5 73.9 73.2
Run2 82.7 83.6 83.2 84.8 85.4 85.1 73.2 73.9 73.6
Run3 82.3 83.6 82.9 84.3 84.9 84.6 72.3 73.6 72.9

Pretrained Models

The pretrained models of TensorFlow implementation can be downloaded at Google Drive.

LICENSE

BSD

Citation

If you use our codes, please cite our paper:

@inproceedings{tan2018deep,
  title = {Deep Semantic Role Labeling with Self-Attention},
  author = {Tan, Zhixing and Wang, Mingxuan and Xie, Jun and Chen, Yidong and Shi, Xiaodong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year = {2018}
}

Contact

This code is written by Zhixing Tan. If you have any problems, feel free to send an email.

tagger's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tagger's Issues

Preparing the validation script

SRLPATH=/PATH/TO/SRLCONLL
TAGGERPATH=/PATH/TO/TAGGER
DATAPATH=/PATH/TO/DATA

export PERL5LIB="$SRLPATH/lib:$PERL5LIB"
export PATH="$SRLPATH/bin:$PATH"

python $TAGGERPATH/main.py predict --data_path # $DATAPATH/conll05.devel.txt
--model_dir train --model_name deepatt
--vocab_path $DATAPATH/word_dict $DATAPATH/label_dict
--device_list 0
--decoding_params="decode_batch_size=512"
--model_params="num_hidden_layers=10,feature_size=100,hidden_size=200,filter_size=800"
python $TAGGERPATH/scripts/convert_to_conll.py# conll05.devel.txt.deepatt.decodes # $DATAPATH/conll05.devel.props.gold.txt output
perl $SRLPATH/bin/srl-eval.pl # $DATAPATH/conll05.devel.props. output

请问以上加粗的文件名 对应的文件分别是什么格式?好像项目中并没有给出,可否分享下,谢谢!

只做deocding的时候Input 文件的格式

我大致看了下代码,似乎每一行起始是一个数字,是什么含义?
我试了
"""
4 john wants to go
"""
报数组引用越界的错
改成
"""
1 john wants to go
"""
就好了。

麻烦详细的介绍下decoding时的数据格式

How to set train path ?

Hi, Thanks for your great work ! I am wondering how to set the TRAIN_PATH as mentioned in the training command. I tried to set it to the path that contains the processed tf.Record files but got this error:

ValueError: No data files found in ./data/processed*train* .

I have checked another closed thread, but did not figure out the correct way to set the path. What should be the intended parameter here ? And to make sure, the conll05.devel.props.gold.txt, conll05.devel.props.* mentioned in the validation script are in the BIO tagging format, right ?

一个bug问题

首先感谢作者分享的代码。

在测试时发现一个问题:
1 Wait ! ! Wait '' ! ! Cried the guard who ran from the hut to shout to other men standing about outside . ||| O B-V O O O O O O O O O O O O O O O O O O O O O O O 4 Wait ! ! Wait '' ! ! Cried the guard who ran from the hut to shout to other men standing about outside . ||| O O O O B-V O O O O O O O O O O O O O O O O O O O O

比如在对这两个case做预测时,发现遇到“!”后,预测就停止了。导致长度被截断。用perl脚本做验证时报错。把"!"换成"."就没问题了。

features里面文件是什么?

我想查看train时features里面是包含的是什么文件
试着print(features)的时候报错了
可能是我的train.txt不行
所以我试图了解feature到底包含了哪两种信息
如果能给出具体的答案的话,对我有非常大的帮助!

So weird training result

When i trained the model with these parameters, the result is fine. Normally, F1 Score is up to 0.81 or even better.

glove.6B.100d.txt
feature_size=100
hidden_size=200
filter_size=800

But when i tried the following hyper-parameter, F1 score is always less than 0.1. Specifically, it is 0.042365085 and the model only predicted B-V label. No any other labels. It is so weird. What's the problem?

glove.6B.200d.txt
feature_size=200
hidden_size=400
filter_size=1600

Decoding seems to be reloading the model...?

I'm using the provided command to generate outputs, the one in the section under "decoding." I have a few questions:

  1. It continuously prints things like:
2018-05-14 16:51:35.363220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] Adding visible gpu devices: 0
INFO:tensorflow:Restoring parameters from models/conll05/single/checkpoint/model.ckpt-540432
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Graph was finalized.

why doesn't it restore parameters and figure out which GPUs are visible just once? Is the model being loaded every single iteration?

  1. Is there any way I can see how much of the input file it has now decoded? It doesn't seem to write the output until the very end.

Thanks!

训练脚本疑问

python tagger/main.py train
--data_path TRAIN_PATH --model_dir train --model_name deepatt
--vocab_path word_dict label_dict --emb_path glove.6B.100d.txt
--model_params=feature_size=100,hidden_size=200,filter_size=800,residual_dropout=0.2,
num_hidden_layers=10,attention_dropout=0.1,relu_dropout=0.1
-training_params=batch_size=4096,eval_batch_size=1024,optimizer=Adadelta,initializer=orthogonal,
use_global_initializer=false,initializer_gain=1.0,train_steps=600000,
learning_rate_decay=piecewise_constant,learning_rate_values=[1.0,0.5,0.25],
learning_rate_boundaries=[400000,500000],device_list=[0],clip_grad_norm=1.0 \
--validation_params=script=run.sh

问下最后一个参数run.sh这个脚本是哪个?仓库貌似没有。先谢过了。

在FFN中的问题

论文中我看到的FFN是FFN(X) = ReLU(XW1)W2,为什么在代码中的_ffn_layer 中的linear 函数里会有tf.nn.convolution?只有_linear_2d 没有卷积函数 其他的都有

运行

请问更改之后怎么运行呢

losses_avg

请问 您在loss中加入
with tf.variable_scope("losses_avg"):
loss_moving_avg = tf.get_variable("training_loss",
initializer=100.0,
trainable=False)
lm = loss_moving_avg.assign(loss_moving_avg * 0.9 + loss * 0.1)
tf.summary.scalar("loss_avg/total_loss", lm)

            with tf.control_dependencies([lm]):
                loss = tf.identity(loss)

为什么要加入一个losses_avg

维度不匹配问题

初始化时,运行报错:
InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[4080,105] labels_size=[3978,105]
[[Node: tagger/softmax_cross_entropy_with_logits_sg = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tagger/softmax_cross_entropy_with_logits_sg/Reshape, tagger/softmax_cross_entropy_with_logits_sg/Reshape_1)]]
[[Node: training/global_norm_2/global_norm/_2135 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9328_training/global_norm_2/global_norm", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

数据集:conll2005
embeddings: glove 200d

关于数据

不知道能否分享一份论文中的CoNLL-2005和CoNLL-2012的数据呢?
先谢谢了

缺少embed.npy文件

跑这个项目的时候少一个预训练的词向量embed.npy文件,能否提供下,或者能够告知怎么训练的,保存什么格式,谢谢

Error during running the pretrained model

Hello, I encountered the following error when trying to use the provided command for prediction with pre-trained models:

I used the following command:

$ python main.py predict --data_path ./data/preprocessed.txt --model_dir pre-trained/conll05/ensemble/run0 --model_name deepatt --vocab_path ./pre-trained/conll05/ensemble/dict/word_dict0 ./pre-trained/conll05/ensemble/dict/label_dict --device_list 0 --emb_path ./data/glove/glove.6B.100d.txt

and got this error information:

Traceback (most recent call last):
File "main.py", line 895, in
predict(parsed_args)
File "main.py", line 616, in predict
as_iterable=True)
File "/Users/xiaotang/Documents/SRL/Tagger/venv/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/Users/xiaotang/Documents/SRL/Tagger/venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 670, in predict
iterate_batches=iterate_batches)
File "/Users/xiaotang/Documents/SRL/Tagger/venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 967, in _infer_model
features = self._get_features_from_input_fn(input_fn)
File "/Users/xiaotang/Documents/SRL/Tagger/venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 947, in _get_features_from_input_fn
result = input_fn()
File "/Users/xiaotang/Documents/SRL/Tagger/data/plain_text.py", line 43, in _decode_batch_input_fn
outputs = preprocess_fn(inputs)
File "/Users/xiaotang/Documents/SRL/Tagger/data/plain_text.py", line 175, in
lambda x: convert_text(x, vocab, params))
File "/Users/xiaotang/Documents/SRL/Tagger/data/plain_text.py", line 158, in convert_text
emb[i] = params.embedding[word]
ValueError: could not broadcast input array from shape (100) into shape (128)

Is there something wrong in my command ? How could I run prediction with pretrained model ?

Checkpoint averaging

For the evaluation metrics you report in the paper, do you use any checkpoint averaging? Tagger/scripts/avg_checkpoints.py. Thanks.

How to configure early stopping?

The readme says to point to the train/best directory for decoding a trained model, but the code doesn't seem to be saving any models to that directory. It is saving models to the train directory, which I can successfully evaluate. How can I configure training to save the best model during training?

Here is the command I am running to train:

python main.py train \
    --data_path data2/ \
    --model_dir train \
    --model_name deepatt \
    --vocab_path data/word_dict data/label_dict \
    --emb_path data/glove.6B.100d.txt \
    --model_params=feature_size=100,hidden_size=200,filter_size=800,residual_dropout=0.2,num_hidden_layers=10,attention_dropout=0.1,relu_dropout=0.1 \
    --training_params=batch_size=4096,eval_batch_size=1024,optimizer=Adadelta,initializer=orthogonal,use_global_initializer=false,initializer_gain=1.0,train_steps=600000,learning_rate_decay=piecewise_constant,learning_rate_values=[1.0,0.5,0.25],learning_rate_boundaries=[400000,500000],device_list=[0],clip_grad_norm=1.0 \
    --validation_params=script=run.sh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.