GithubHelp home page GithubHelp logo

liyuanlucasliu / lm-lstm-crf Goto Github PK

View Code? Open in Web Editor NEW
845.0 44.0 209.0 6.96 MB

Empower Sequence Labeling with Task-Aware Language Model

Home Page: http://arxiv.org/abs/1709.04109

License: Apache License 2.0

Python 100.00%
ner language-model crf sequence-labeling pytorch

lm-lstm-crf's Introduction

LM-LSTM-CRF

Documentation Status License Insight.io

Check Our New NER Toolkit🚀🚀🚀

  • Inference:
    • LightNER: inference w. models pre-trained / trained w. any following tools, efficiently.
  • Training:
    • LD-Net: train NER models w. efficient contextualized representations.
    • VanillaNER: train vanilla NER models w. pre-trained embedding.
  • Distant Training:
    • AutoNER: train NER models w.o. line-by-line annotations and get competitive performance.

This project provides high-performance character-aware sequence labeling tools, including Training, Evaluation and Prediction.

Details about LM-LSTM-CRF can be accessed here, and the implementation is based on the PyTorch library.

Important: A serious bug was found on the bioes_to_span function in the original implementation, please refer the numbers reported in the Benchmarks section as the accurate performance.

The documents would be available here.

Quick Links

Model Notes

As visualized above, we use conditional random field (CRF) to capture label dependencies, and adopt a hierarchical LSTM to leverage both char-level and word-level inputs. The char-level structure is further guided by a language model, while pre-trained word embeddings are leveraged in word-level. The language model and the sequence labeling model are trained at the same time, and both make predictions at word-level. Highway networks are used to transform the output of char-level LSTM into different semantic spaces, and thus mediating these two tasks and allowing language model to empower sequence labeling.

Installation

For training, a GPU is strongly recommended for speed. CPU is supported but training could be extremely slow.

PyTorch

The code is based on PyTorch and supports PyTorch 0.4 now . You can find installation instructions here.

Dependencies

The code is written in Python 3.6. Its dependencies are summarized in the file requirements.txt. You can install these dependencies like this:

pip3 install -r requirements.txt

Data

We mainly focus on the CoNLL 2003 NER dataset, and the code takes its original format as input. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get it here. You may also want to search online (e.g., Github), someone might release it accidentally.

Format

We assume the corpus is formatted as same as the CoNLL 2003 NER dataset. More specifically, empty lines are used as separators between sentences, and the separator between documents is a special line as below.

-DOCSTART- -X- -X- -X- O

Other lines contains words, labels and other fields. Word must be the first field, label mush be the last, and these fields are separated by space. For example, the first several lines in the WSJ portion of the PTB POS tagging corpus should be like the following snippet.

-DOCSTART- -X- -X- -X- O

Pierre NNP
Vinken NNP
, ,
61 CD
years NNS
old JJ
, ,
will MD
join VB
the DT
board NN
as IN
a DT
nonexecutive JJ
director NN
Nov. NNP
29 CD
. .


Usage

Here we provide implementations for two models, one is LM-LSTM-CRF and the other is its variant, LSTM-CRF, which only contains the word-level structure and CRF. train_wc.py and eval_wc.py are scripts for LM-LSTM-CRF, while train_w.py and eval_w.py are scripts for LSTM-CRF. The usages of these scripts can be accessed by the parameter -h, i.e.,

python train_wc.py -h
python train_w.py -h
python eval_wc.py -h
python eval_w.py -h

The default running commands for NER and POS tagging, and NP Chunking are:

  • Named Entity Recognition (NER):
python train_wc.py --train_file ./data/ner/train.txt --dev_file ./data/ner/testa.txt --test_file ./data/ner/testb.txt --checkpoint ./checkpoint/ner_ --caseless --fine_tune --high_way --co_train --least_iters 100
  • Part-of-Speech (POS) Tagging:
python train_wc.py --train_file ./data/pos/train.txt --dev_file ./data/pos/testa.txt --test_file ./data/pos/testb.txt --eva_matrix a --checkpoint ./checkpoint/pos_ --caseless --fine_tune --high_way --co_train
  • Noun Phrase (NP) Chunking:
python train_wc.py --train_file ./data/np/train.txt.iobes --dev_file ./data/np/testa.txt.iobes --test_file ./data/np/testb.txt.iobes --checkpoint ./checkpoint/np_ --caseless --fine_tune --high_way --co_train --least_iters 100

For other datasets or tasks, you may wanna try different stopping parameters, especially, for smaller dataset, you may want to set least_iters to a larger value; and for some tasks, if the speed of loss decreasing is too slow, you may want to increase lr.

Benchmarks

Here we compare LM-LSTM-CRF with recent state-of-the-art models on the CoNLL 2000 Chunking dataset, the CoNLL 2003 NER dataset, and the WSJ portion of the PTB POS Tagging dataset. All experiments are conducted on a GTX 1080 GPU.

A serious bug was found on the bioes_to_span function in the original implementation, please refer the following numbers as the accurate performance.

NER

When models are only trained on the WSJ portion of the PTB POS Tagging dataset, the results are summarized as below.

Model Max(Acc) Mean(Acc) Std(Acc) Time(h)
LM-LSTM-CRF 91.35 91.24 0.12 4
-- HighWay 90.87 90.79 0.07 4
-- Co-Train 91.23 90.95 0.34 2

POS

When models are only trained on the WSJ portion of the PTB POS Tagging dataset, the results are summarized as below.

Model Max(Acc) Mean(Acc) Std(Acc) Reported(Acc) Time(h)
Lample et al. 2016 97.51 97.35 0.09 N/A 37
Ma et al. 2016 97.46 97.42 0.04 97.55 21
LM-LSTM-CRF 97.59 97.53 0.03 16

Pretrained Model

Evaluation

We released pre-trained models on these three tasks. The checkpoint file can be downloaded at the following links. Notice that the NER model and Chunking model (coming soon) are trained on both the training set and the development set:

WSJ-PTB POS Tagging CoNLL03 NER
Args Args
Model Model

Also, eval_wc.py is provided to load and run these checkpoints. Its usage can be accessed by command python eval_wc.py -h, and a running command example is provided below:

python eval_wc.py --load_arg checkpoint/ner/ner_4_cwlm_lstm_crf.json --load_check_point checkpoint/ner_ner_4_cwlm_lstm_crf.model --gpu 0 --dev_file ./data/ner/testa.txt --test_file ./data/ner/testb.txt

Prediction

To annotated raw text, seq_wc.py is provided to annotate un-annotated text. Its usage can be accessed by command python seq_wc.py -h, and a running command example is provided below:

python seq_wc.py --load_arg checkpoint/ner/ner_4_cwlm_lstm_crf.json --load_check_point checkpoint/ner_ner_4_cwlm_lstm_crf.model --gpu 0 --input_file ./data/ner2003/test.txt --output_file output.txt

The input format is similar to CoNLL, but each line is required to only contain one field, token. For example, an input file could be:

-DOCSTART-

But
China
saw
their
luck
desert
them
in
the
second
match
of
the
group
,
crashing
to
a
surprise
2-0
defeat
to
newcomers
Uzbekistan
.

and the corresponding output is:

-DOCSTART- -DOCSTART- -DOCSTART-

But <LOC> China </LOC> saw their luck desert them in the second match of the group , crashing to a surprise 2-0 defeat to newcomers <LOC> Uzbekistan </LOC> . 

Reference

@inproceedings{2017arXiv170904109L,
  title = "{Empower Sequence Labeling with Task-Aware Neural Language Model}", 
  author = {{Liu}, L. and {Shang}, J. and {Xu}, F. and {Ren}, X. and {Gui}, H. and {Peng}, J. and {Han}, J.}, 
  booktitle={AAAI},
  year = 2018, 
}

lm-lstm-crf's People

Contributors

aaronzira avatar cabishop avatar cash avatar chongzhe avatar dependabot[bot] avatar frankxu2004 avatar liyuanlucasliu avatar napsternxg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lm-lstm-crf's Issues

Asking for the cuda OOM questions

I run the code on a Chinese ner train data(around 70 thousand sentences, and I set the LM-LSTM-crf to co-train model), and I got the OMM error:

When I set the batch_size to 10, it results in:

  • Tot it 6916 (epoch 0): 6308it [26:09, 4.02it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
    Traceback (most recent call last):
    File "train_wc.py", line 243, in
    loss.backward()
    File "/usr/local/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
    File "/usr/local/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
    variables, grad_variables, retain_graph)
    RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

When I set the batch_size to 128, it results in:

  • Tot it 543 (epoch 0): 455it [03:57, 1.91it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory

Traceback (most recent call last):
File "train_wc.py", line 241, in
loss = loss + args.lambda0 * crit_lm(cbs, cf_y.view(-1))
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 601, in forward
self.ignore_index, self.reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 1140, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 786, in log_softmax
return torch._C._nn.log_softmax(input, dim)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

Could any one give me some advise to solve it?

Not able to understand, this specific part of the code

buckets[idx][1].append([label[ind] * label_size + label[ind + 1] for ind in range(0, cur_len)] + [
            label[cur_len] * label_size + pad_label] + [pad_label * label_size + pad_label] * (
                                   thresholds[idx] - cur_len_1))

there is a part of the code where targets are formed using the above function in construct_bucket_vb

what is it happening in this part of code?

RuntimeError: invalid argument 1: input is not contiguous

When training LM_LSTM_CRF model, I encountered the following error:

Traceback (most recent call last):         
  File "train_wc.py", line 190, in <module>
    loss = crit_ner(scores, tg_v, mask_v)
  File "/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/local/LM-LSTM-CRF/model/crf.py", line 306, in forward
    mask[idx].view(bat_size, 1).expand(bat_size, self.tagset_size)).view(bat_size, -1)
  File "/local/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 510, in view
    return View.apply(self, sizes)
  File "/local/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 96, in forward
    result = i.view(*sizes)
RuntimeError: invalid argument 1: input is not contiguous at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/TH/generic/THTensor.c:231

I solved this problem by calling contiguous() to make the mentioned input contiguous. Maybe this needs a minor fix.

Assertion `t >= 0 && t < n_classes` failed

Hi, I was trying to run train_wc.py on our own dataset in the CoNLL format specified.

But i encountered this error:

/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered

Traceback (most recent call last):
File "train_wc.py", line 204, in <module>
loss.backward()
File "/home/rudra/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/rudra/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generated/../THCReduceAll.cuh:339

The ClassNLLCriterion is getting values out of label range. In the file utils.py, function `construct_bucket_vb_wc' creates a data label tensor and I'm not sure what is happening in that function.
Till that function the label range seems to be perfect.

evaluator.calc_score

The 'evaluator.calc_score' function can return two types of objects: dictionary and tuple
Can raise an error in train_wc line 222

LM-LSTM-CRF training performance

Hello,thank for your good job.

But when i use your train_wc.py code to train your model on CONll2003 NER dataset , i just get such result : test f1 score 91.14 ,which is different with the result in your paper. The hyperparameters are the same with your suggestions.
So , why?
Maybe i have not taken something into consideration.

Thank for your help.

How do you tune the model to get a large # of keywords outputted by the CRF layer?

I read through the paper, and looked through the code in train_wc as well as the arguments that can be passed during initialization. One of the issue I am facing is that after fine-tuning the model on my own dataset, the number of keywords that are outputted varies significantly.

Some texts have no keywords, but still have entities that should be found. Other texts would get between 5 - 10 keywords. I am not trying to tune the maximum number of keywords, because I believe that filtering can be done in post-processing by the confidence scores.

I am interested in knowing if there is a way to tune the minimum number of keywords found, or lower the score threshold so more keywords are found in general.

about batch training

hello,
I have a question, why there is no function 'pack_padded_sequence' before function 'word_lstm',
I'll very appreciate it if you can tell me the reason.
thank you!

About Xuezhe's result

hello,
I replicated the LSTM-CNN-CRF model, and my best result is 91.23, which is close to Xuezhe's reported result.
I wonder why in your paper, the mean result is better than Xuezhe's reported result in LSTM-CNN-CRF model.
It is because you modified Xuezhe's code or anything else?

Thank you vary much if you can tell me about this.

train_w.py Error, TypeError: can't convert np.ndarray of type numpy.object_

Hi, Liyuan, I encountered the following problem while running train_w.py,
Traceback (most recent call last):
File "/workspace/train.py", line 117, in
shrink_to_corpus=args.shrink_embedding)
File "/workspace/models/utils.py", line 457, in load_embedding_wlm
embedding_tensor_1 = torch.FloatTensor(pretrained_weight)
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

And ,at the line 457, my coding as follows:
(1)
if not shrink_to_corpus:
pretrained_weight = np.asarray(outdoc_embedding_array)
embedding_tensor_1 = torch.from_numpy(pretrained_weight)
word_emb_len = embedding_tensor_0.size(1)
assert(word_emb_len == emb_len)

(2)
if not shrink_to_corpus:
embedding_tensor_1 = torch.Tensor(np.asarray(outdoc_embedding_array))
word_emb_len = embedding_tensor_0.size(1)
assert(word_emb_len == emb_len)

Lample and Xuezhe

Hi,
By using the code provided in your file, I cannot replicate the results of Lample's and Xuezhe's study. Is there anything I need to pay attention to specifically?

RuntimeError: dimension specified as 0 but tensor has no dimensions (backward)

Hi,
I was trying to run the train_w.py on ner conll2003 data using this command: python train_w.py --train_file ner-conll2003/train --dev_file ner-conll2003/dev --test_file ner-conll2003/test --checkpoint ./checkpoint/ner_ --caseless --fine_tune --emb_file Glove5g_200.txt --embedding_dim 200 --gpu 1

But I got this error:

Traceback (most recent call last):
File "train_w.py", line 189, in
loss.backward()
File "../py3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File ".../py3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: dimension specified as 0 but tensor has no dimensions

Did you have an idea how can I fix this problem please?
Thank you

Incorrect Precision Output for test_rec

In train_wc.py line 224 test_rec and test_pre are switched in the assignment statement. This leads to the final precision and recall being backward when the model finishes training and outputs the final results. Order should be the same as line 215 for the dev set results.
line 224: (test_f1, test_rec, test_pre, test_acc, msg) = test_result['total']

Question abuout parameter in code

I want to ask a question about parameter- shrink_embedding in train_w.py, If I don't use external resources, this parameter can set False, it doesn't affect the final result? The other parameter fine_tune, I can't totally understand the meaning, can you explain it for me? Thank you very much!

if args.fine_tune:              # which means does not do fine-tune  
        f_map = {'<eof>': 0}

i think this code may be wrong, if I choose fine_tune=Ture, will process this code, then can't fune pre-trained embedding dictionary

train_w.py Error

train_wc.py has been upgraded, however trian_w.py hasn't been fully upgraded yet, changing code line from 201 to 233 in train_w.py to the corresponding part in train_wc.py seems to work

RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50])

After running the first epoch(successfully), I got the following error message and the program got interrupted:

/Users/shaun/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.55 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/Users/shaun/Desktop/UniMelb/Semester_Unimelb/Semester3/Project/LM-LSTM-CRF/model/utils.py:805: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_linear.weight, -bias, bias)
/Users/shaun/Desktop/UniMelb/Semester_Unimelb/Semester3/Project/LM-LSTM-CRF/model/utils.py:816: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)
/Users/shaun/Desktop/UniMelb/Semester_Unimelb/Semester3/Project/LM-LSTM-CRF/model/utils.py:819: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)

  • Tot it 2195 (epoch 0): 0it [00:00, ?it/s]train_wc.py:201: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
    nn.utils.clip_grad_norm(ner_model.parameters(), args.clip_grad)
    Traceback (most recent call last):
    File "train_wc.py", line 212, in
    dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(ner_model, dev_dataset_loader)
    File "/Users/shaun/Desktop/UniMelb/Semester_Unimelb/Semester3/Project/LM-LSTM-CRF/model/evaluator.py", line 209, in calc_score
    decoded = self.decoder.decode(scores.data, mask_v.data)
    File "/Users/shaun/Desktop/UniMelb/Semester_Unimelb/Semester3/Project/LM-LSTM-CRF/model/crf.py", line 379, in decode
    decode_idx[idx] = pointer
    RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Does anyone know how to resolve this?

Best regards

Is word level bi-lstm reflected in the code ?

Hi there,

Excuse the naiveness, I am having a bit of trouble understanding how the word-level bi-directional lstm (before the CRF layer) in the paper is reflected in the code, to be specific see the arrows in the image below.

image

Why is there only one word_lstm in the lm_lstm_crf.py ?

I see that in train_wc.py , if co_train is enabled, a bi-directional word level lstm is applied, but this after the CRF layer ?, shouldn't it be before the concatenated output passed to CRF as shown in the model architecture in the paper?
May be I am missing something really big & obvious :/

Thanks,

Missing "eval_batch" in train_w.py line 163

The model is very useful , And I want to reload model to train , and get some error in train_w.py line 163. I search all the project to find function "eval_batch" , only in model/evaluator , but it is wrong file.
Are you forget to write this function or some how ?
How can I resolve this problem?

Line 530 in utils.py is too slow with huge datasets

Line 530 in construct_bucket_vb_wc function in utils.py is too slow with huge datasets. It even freezes if dataset is larger than 300k objects.

I propose to change line

forw_corpus = [pad_char_feature] + list(reduce(lambda x, y: x + [pad_char_feature] + y, forw_features)) + [pad_char_feature]

to

forw_corpus = [pad_char_feature]
for forw_feature in forw_features:
   forw_corpus.extend(forw_feature + [pad_char_feature])

Which works considerably faster with no freezes.

train_w.py Error

hi Liyuan :)
I have a error when I run train-w.py:
On Pytorch 0.3.0
Traceback (most recent call last):
File "train_w.py", line 193, in
loss.backward()
File "/home/dungdx4/anaconda2/envs/python_anaconda_3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/dungdx4/anaconda2/envs/python_anaconda_3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: dimension specified as 0 but tensor has no dimensions

On Pytorch 0.4:
Traceback (most recent call last):
File "train_w.py", line 189, in
fea_v, tg_v, mask_v = packer.repack_vb(feature, tg, mask)
File "/home/dungdx4/LM-LSTM-CRF-master-0.4/model/crf.py", line 107, in repack_vb
fea_v = torch.Tensor(feature.transpose(0, 1)).cuda()
TypeError: expected torch.FloatTensor (got torch.LongTensor)

RuntimeError (matrix and matrix expected) while training

Hello, thanks for making available this tool.

I am using IBM power pc machine with Ubuntu 16.03 and I am getting en error while I am trying to train a postag model. Here is the command I am running:
python train_wc.py --train_file cmc_test_twitter.txt --dev_file cmc_test_twitter.txt --test_file cmc_test_twitter.txt --eva_matrix a --checkpoint ./checkpoint/pos_ --lr 0.015 --caseless --fine_tune --high_way --co_train

And here is the output:

train
setting:
Namespace(batch_size=10, caseless=True, char_dim=30, char_hidden=300, char_layers=1, checkpoint='./checkpoint/pos_', clip_grad=5.0, co_train=True, dev_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', drop_out=0.5, emb_file='./embedding/glove.6B.100d.txt', epoch=200, eva_matrix='a', fine_tune=False, gpu=0, high_way=True, highway_layers=1, lambda0=1, least_iters=50, load_check_point='', load_opt=False, lr=0.015, lr_decay=0.05, mini_count=5, momentum=0.9, patience=15, rand_embedding=False, shrink_embedding=False, small_crf=True, start_epoch=0, test_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', train_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', unk='unk', update='sgd', word_dim=100, word_hidden=300, word_layers=1)
loading corpus
constructing coding table
feature size: '44'
loading embedding
embedding size: '400005'
constructing dataset
building model
device: 0
Traceback (most recent call last):       
  File "train_wc.py", line 188, in <module>
    scores = ner_model(f_f, f_p, b_f, b_p, w_f)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/projects/LM-LSTM-CRF/model/lm_lstm_crf.py", line 235, in forward
    char_out = self.fb2char(fb_lstm_out)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/projects/LM-LSTM-CRF/model/highway.py", line 53, in forward
    g = nn.functional.sigmoid(self.gate[0](x))
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 12, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at /home/eray/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:237

Scores, Batching and learning rate

Hey guys,
Thanks for releasing your code I am inspiring myself a lot from it by building a custom NER model.
I have an open question about the scores, batching and the learning rate.

I did some experimentation with different batch sizes/learning rates with my own data (even with other NER model based on a CRF) and I found out that:

Increasing the batch size (and averaging the scores) without increasing the learning makes the model learns really poorly / slowly. On the other end, when I sum the scores, then the model converges as expected. Say going from batch size = 1 to 16, learning rate at 0.005.

Increasing the batch size (and averaging the scores) and increasing the learning rate makes the model converge as expected. Say going from batch size = 1 to 16 and learning rate from 0.005 to 0.01.

It seems to me that averaging the scores of the sentences within a batch makes the model learn a lot more slowly. I want to know if you did the same experimentation with another dataset and you got the same behaviour as the one that I have here with the model. I'm also curious to know what is the intuition behind this behaviour.

Many thanks,

Nicolas

Fix pre-trained embedding fails by requires_grad=False.

Hi,
I am testing if I could fix the embedding weights by setting requires_grad=False
Here is my code:
self.word_embeds.weight = nn.Parameter(pre_word_embeddings, requires_grad=False)

But I got:
Traceback (most recent call last): File "train_wc.py", line 202, in <module> optimizer = optim.SGD(ner_model.parameters(), lr=args.lr, momentum=args.momentum) File "/anaconda3/lib/python3.6/site-packages/torch/optim/sgd.py", line 56, in __init__ super(SGD, self).__init__(params, defaults) File "/anaconda3/lib/python3.6/site-packages/torch/optim/optimizer.py", line 61, in __init__ raise ValueError("optimizing a parameter that doesn't " ValueError: optimizing a parameter that doesn't require gradients
I think because the embedding was initialized as parameters, and all the parameters were optimized as a whole. How could I fix the embedding while not conflict with the optimizer? Thanks!

Question about POS performance

Hi,

I am trying to do some seqs tagging tasks.

For POS, did you ever run your model on Universal dependency POS-en dataset?
I want to get it as my baseline.

I used your code to test on that dataset, yet the results is 95.60%, which is lower than reuslt (95.80) of vanilla LSTM-CNN-CRF model.

I think contextual embs are supposed to improve the accuracy.

RuntimeError:

RuntimeError: The expanded size of the tensor (300) must match the existing size (100) at non-singleton dimension 0. Target sizes: [300]. Tensor sizes: [100]

rand_init_hidden() for lstm

Hi Liyuan,

I am writing my own lstm-crf model for ner. My model works well except a slight memory leak (on CPU memory).
After debugging, I found the memory leak is caused by the lstm layer. I doubt it's caused by the hidden state initialization (although I followed the tutorial).

I checked your code train_w.py and lstm_crf.py, and found that you defined the rand_init_hidden() function to initialize the hidden state of lstm, but I can't find the place you use this function. Maybe I missed some important part.
Could you please tell me how do you initialize your hidden state of lstm ?

Thank you very much!

RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

I was trying to run the program by
python3 train_wc.py --gpu -1 --train_file ./data/ner/train.txt --dev_file ./data/ner/testa.txt --test_file ./data/ner/testb.txt --checkpoint ./checkpoint/ner_ --caseless --fine_tune --high_way --co_train --least_iters 100
I got the following error:

embedding size: '400060'
constructing dataset
building model
/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38

: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.55 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:805

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_linear.weight, -bias, bias)
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:816

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:819

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)

  • Tot it 1406 (epoch 0): 0it [00:00, ?it/s]train_wc.py:201

: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
nn.utils.clip_grad_norm(ner_model.parameters(), args.clip_grad)
Traceback (most recent call last):
File "train_wc.py

", line 212, in
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(ner_model, dev_dataset_loader)
File "/home/yankai/weixiao/LM-LSTM-CRF/model/evaluator.py

", line 209, in calc_score
decoded = self.decoder.decode(scores.data, mask_v.data)
File "/home/yankai/weixiao/LM-LSTM-CRF/model/crf.py

", line 379, in decode
decode_idx[idx] = pointer
RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

WSJ corpus preprocessing

Hi, I have got the treebank_3\tagged\pos\wsj corpus. But after I process this corpus to conll format, I get sentence numbers of train, dev and test 37544, 5642 and 6540, which is not consistent to your paper. I wonder what you have done to preprocess the wsj porpus.
Thank you!

OMG! RuntimeError: $ Torch: not enough memory: you tried to allocate 3421GB. Buy new RAM!?

I was trying to run the program by
python train_wc.py --train_file ./data/eng.train --dev_file ./data/eng.testa --test_file ./data/eng.testb --checkpoint ./checkpoint/ner_ --caseless --fine_tune --high_way --co_train --least_iters 100 > record.txt

I got the following error:

Traceback (most recent call last):
File "train_wc.py", line 136, in
ner_model = LM_LSTM_CRF(len(l_map), len(c_map), args.char_dim, args.char_hidden, args.char_layers, args.word_dim, args.word_hidden, args.word_layers, len(f_map), args.drop_out, large_CRF=args.small_crf, if_highway=args.high_way, in_doc_words=in_doc_words, highway_layers = args.highway_layers)
File "/nas/home/xhuang/project_ner/LM-LSTM-CRF/model/lm_lstm_crf.py", line 63, in init
self.crf = crf.CRF_L(word_hidden_dim, tagset_size)
File "/nas/home/xhuang/project_ner/LM-LSTM-CRF/model/crf.py", line 29, in init
self.hidden2tag = nn.Linear(hidden_dim, self.tagset_size * self.tagset_size, bias=if_bias)
File "/nas/home/xhuang/anaconda3/envs/ner/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 41, in init
self.weight = Parameter(torch.Tensor(out_features, in_features))
RuntimeError: $ Torch: not enough memory: you tried to allocate 3421GB. Buy new RAM! at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/THGeneral.c:218

RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

I was trying to run the program by
python3 train_wc.py --gpu -1 --train_file ./data/ner/train.txt --dev_file ./data/ner/testa.txt --test_file ./data/ner/testb.txt --checkpoint ./checkpoint/ner_ --caseless --fine_tune --high_way --co_train --least_iters 100
I got the following error:

embedding size: '400060'
constructing dataset
building model
/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38

: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.55 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:805

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_linear.weight, -bias, bias)
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:816

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)
/home/yankai/weixiao/LM-LSTM-CRF/model/utils.py:819

: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)

Tot it 1406 (epoch 0): 0it [00:00, ?it/s]train_wc.py:201
: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
nn.utils.clip_grad_norm(ner_model.parameters(), args.clip_grad)
Traceback (most recent call last):
File "train_wc.py

", line 212, in
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(ner_model, dev_dataset_loader)
File "/home/yankai/weixiao/LM-LSTM-CRF/model/evaluator.py

", line 209, in calc_score
decoded = self.decoder.decode(scores.data, mask_v.data)
File "/home/yankai/weixiao/LM-LSTM-CRF/model/crf.py

", line 379, in decode
decode_idx[idx] = pointer
RuntimeError: expand(torch.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

evaluator eval_batch.f1_score has inconsistent return types

When I try to run the NER script with eva_matrix as fa, then I get the following error on Line 207:
TypeError: FloatType cannot be iterated

This happens because the f1_score function returns a single value in some cases when it should be returning 4 values. This should need a minor fix.

KeyError in predictor.py class predict

line: 59

for f, y in zip(feature, label):
label = self.r_l_map[y]

label parameter passed from seq_wc is a tensor type, therefore self.r_l_map[y] raise a KeyError

I guess it should be label = self.r_l_map[y.item()] instead?

dropout

python train_wc.py --train_file ./data/train_data.txt --dev_file ./data/dev_data.txt --test_file ./data/test_data.txt --caseless --fine_tune --high_way --co_train --least_iters 100 --dropout 0.5
setting:
Namespace(batch_size=10, caseless=True, char_dim=30, char_hidden=100, char_layers=1, checkpoint='./checkpoint/', clip_grad=5.0, co_train=True, dev_file='./data/dev_data.txt', dropout=0.5, emb_file='./embedding/glove.6B.100d.txt', epoch=200, eva_matrix='fa', fine_tune=False, gpu=0, high_way=True, highway_layers=2, lambda0=1, least_iters=100, load_check_point='', load_opt=False, lr=0.01, lr_decay=0.05, mini_count=5, momentum=0.9, patience=15, rand_embedding=False, small_crf=True, start_epoch=0, test_file='./data/test_data.txt', train_file='./data/train_data.txt', unk='unk', update='sgd', word_dim=100, word_hidden=100, word_layers=1)
loading corpus
constructing coding table
feature size: '3240'
loading embedding
embedding size: '3712'
constructing dataset
building model
Traceback (most recent call last):
File "train_wc.py", line 138, in
if_highway=args.high_way, in_doc_words=in_doc_words, highway_layers=args.highway_layers)
File "/home/usr/Downloads/arabic-ner-master/lmbilstmcrf/lm_lstm_crf.py", line 27, in init
dropout=dropout_ratio)
File "/home/usr/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 425, in init
super(LSTM, self).init('LSTM', *args, **kwargs)
File "/home/usr/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 31, in init
raise ValueError("dropout should be a number in range [0, 1] "
ValueError: dropout should be a number in range [0, 1] representing the probability of an element being zeroed

中文分词处理问题

你好,请问这个模型适合跑中文分词吗?对于数据处理、预训练的字向量有什么需要注意的吗?v

Could you specify the version of the Torch and Python

Currently I successfully finished running the experiments, but I found there are several key requirements of running platform:

  1. The current code works well with Torch upper to 0.3;
  2. The binary Torch installation file should be specified to the cuda version & python version as well.

About the score given a sequence and a target

Dear Author,

Thank you for sharing the code. I have a question about the forward() in CRFLoss_vb() in crf.py, it calculates the score of the golden state by:

tg_energy = torch.gather(scores.view(seq_len, bat_size, -1), 2, target).view(seq_len, bat_size)
tg_energy = tg_energy.masked_select(mask).sum()

However, it seems that each tag of the target sequence is handled separately and I don't really see the transitions like tag1->tag2->tag3->... Can you explain a little bit?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.