GithubHelp home page GithubHelp logo

ner_incomplete_annotation's Introduction

Better Modeling of Incomplete Annotation for Named Entity Recognition

This repository implements an LSTM-CRF model for named entity recognition. The model is same as the one by Lample et al., (2016) except we do not have the last tanh layer after the BiLSTM. The code provided is used for the paper "Better Modeling of Incomplete Annotation for Named Entity Recognition" published in 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

NOTE: To extend a more general use case, a PyTorch version is implemented in this repo. The previous implementation using DyNet can be found in first release here. Right now, I have implemented a the "hard" approach as in the paper. "Soft" approach would be coming soon.

Our codebase is built based on the pytorch LSTM-CRF repo.

Requirements

  • PyTorch >= 1.1
  • Python 3

Put your dataset under the data folder. You can obtain the conll2003 and conll2002 datasets from other sources. We have put our collected industry datasets ecommerce and youku under the data directory.

Also, put your embedding file under the data directory to run. You need to specify the path for the embedding file.

Running our approaches

python3 main.py --embedding_file ${PATH_TO_EMBEDDING} --dataset conll2003 --variant hard

Change hard to soft for our soft variant. (This version actually also supports using contextual representation. But I'm still testing during this weekend.)

Future Work

  • add soft approach
  • add other baselines.

Citation

If you use this software for research, please cite our paper as follows:

The implementation in our paper is implemented with DyNet. Check out our previous release.

@inproceedings{jie2019better,
  title={Better Modeling of Incomplete Annotations for Named Entity Recognition},
  author={Jie, Zhanming and Xie, Pengjun and Lu, Wei and Ding, Ruixue and Li, Linlin},
  booktitle={Proceedings of NAACL},
  year={2019}
}

ner_incomplete_annotation's People

Contributors

allanj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ner_incomplete_annotation's Issues

Cannot train on GPU

Hi,

I tried to run main.py with the cuda:0 device, but I always get this error:

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Do you have an idea what could cause that? I couldn't figure it out. CPU training works fine.

Thanks!

batch_size

Hi, Does the dynet vesion support batch size > 1?

Using the same dataset to train and evaluate the model can't reach to 100% F1 score

Hi, I want to make sure that the model architecture works well, so I use the same dataset(golden true dataset without removing entities) to train and evaluate the model. The ideal result is that the model should be overfitted and the F1 score should be 100%. When I use conll dataset, it work well. When I change to my own dataset, the F1 score can only get around 80% not 100%. And I'm kind of confused about the result. The experiment log is in this link: wandb link And I also use train set(18000 samples) to train the model and evaluate on dev set(3000 samples) and the F1 score can reach to 60%. I really hope you can help me out. Thanks!

I am interested in implementing the PyTorch version of this project.

I implemented your "Hard method" using PyTorch.
But, I couldn't get the performance written in your paper.

Therefore, I would like to ask you about the project of the PyTorch implementation.
And, if you have time, I'd really appreciate it if you could give me your opinion on my implementation.

[Question] NNCRF not using margions in forward

Hi, I'm reading your code and paper. There is a problem I can't understand in following code:

unlabed_score, labeled_score = self.inferencer(lstm_scores, word_seq_lens, tags, mask)

As I understand, the soft probabilities are store in the marginals, and update it after several loop, but can't find where to use it in training process, and the crf model try to use it , but in NNCRF forward function it not be put in to crf.

Is there something I missed?

glad to get a reply, thank you .

Results reproduction

I want to use your model as a baseline in my future paper, but unfortunately I cannot uitlize the results reported in your paper as I explore the smaller range of labelled entities fraction: 5-15%.

So I will need to run experiments with your code on this range of labelled entities. Thanks a lot for an implementation in PyTorch by the way!

You have mentioned in the other issue that fine hyperparameter tuning is not required to achieve comparable results, yet I would still appreciate you sharing the optimal hyperparameters (at least somewhat close to the ones you used in your paper).

What if the dev data is not completely labeled?

Hi Allanj! Nice to see an article of incomplete annotations for NER. But I still have some questions.
In your code, dev data is used to evaluate each fold model, without entities removal. Is it means that we need a high-quality dev data? But what if the dev data is also not completely labeled? Can your code work well in this case? Hopes to see your ideas. Thanks!

About the training of contextual embedding

Hi, thanks for sharing the code! I saw that it's possible to use contextual embedding such as elmo. And in your code of main.py, do you train elmo by yourself? I aslo saw that there are three elmo vector files, do you train elmo for each train/dev/test set? If you can provide some reference code for training elmo models, that would give us more details!

question on the dynet soft implementation

Hi, I have two questions on the soft implementation:

  1. for_expr = dy.concatenate(alphas_t) + marginal[pos]

    next_tag_expr = next_tag_expr + mask

    the code add marginal after the log_sum_exp, while add mask before log_sum_exp, as mask is just an special case of marginal, why do we deal them differently?

  2. previous_trans = dy.transpose(dy.transpose(self.transition))
    does this double time transpose means we still use the original transpose?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.