GithubHelp home page GithubHelp logo

Comments (18)

jayavardhanr avatar jayavardhanr commented on June 28, 2024 4
  1. You need to download the CONLL data and place it at the appropriate location. You can find the data here - https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003

  2. Make this change in model/config.py:

'''
Initial code(line number:73 to 78):

# filename_dev = "data/coNLL/eng/eng.testa.iob"
# filename_test = "data/coNLL/eng/eng.testb.iob"
# filename_train = "data/coNLL/eng/eng.train.iob"

filename_dev = filename_test = filename_train = "data/test.txt" # test

Changed Code:

filename_dev = "data/coNLL/eng/eng.testa.iob"
filename_test = "data/coNLL/eng/eng.testb.iob"
filename_train = "data/coNLL/eng/eng.train.iob"

#filename_dev = filename_test = filename_train = "data/test.txt" # test

'''

The author provides test.txt, which will be used if you don't change this part of the code.

from sequence_tagging.

cooelf avatar cooelf commented on June 28, 2024 3

I tried the following setting and the Test F-1 Score is 90.02

# embeddings
dim_word = 300
dim_char = 100    
# training
train_embeddings = False
nepochs          = 50
dropout          = 0.3
batch_size       = 50
lr_method        = "adam"
lr               = 0.005
lr_decay         = 0.9
clip             = 5 # if negative, no clipping
nepoch_no_imprv  = 7

# model hyperparameters
hidden_size_char = 100 # lstm on chars
hidden_size_lstm = 300 # lstm on word embeddings

My Environment Details:
Python 3.6
Tensorflow-gpu 1.3.0
CUDA 8.0.61 with cudnn 5.1

from sequence_tagging.

sbmaruf avatar sbmaruf commented on June 28, 2024 3

@ShengleiH sorry for being late

I don't see any problem with doing this at the time of evaluation. Since at the training state you are only training the model based on the token from the train set. If you are using pretrained embedding, this is also done by the original author (@glample) of the paper.

No need to consider < UNK >, while doing evaluation you only lookup on the embedding of dev and test and pass them to your model. Remember you haven't trained the model based on them(dev or test). That's why there is no problem. On the contrary, using their pretrained embedding doesn't contradict that you are using them to train your model.

Remember at the train time you model never sees < UNK > tagged data. If you can differentiate two data that is considered as < UNK > in dev or test time with different embeddings, there is no problem. Apart from that if you are not using pre-trained embedding (initializing the embedding as random distribution), there should not be any problem though the original author (@glample) of the paper use < UNK > tag at that time.

I would also like to have some input from @guillaumegenthial in this regard.

from sequence_tagging.

ShengleiH avatar ShengleiH commented on June 28, 2024 1

@sbmaruf Hi, thank you~ Can I use the embedding of 'UNK' always for the unknown words in dev/test set when evaluation? I mean I don't want to assign the corresponding embeddings in glove to these unknown words.

from sequence_tagging.

emrekgn avatar emrekgn commented on June 28, 2024

I am wondering this too. What are your hyperparameters?

Trying to get the same results (F1:90.94%) as reported in Lample et al.'s LSTM-CRF model for some time. This is how my (hyper)params roughly look like:

dim_word = 100
dim_char = 25
nepochs = 100
dropout = 0.5
batch_size = 10
lr_method = "sgd"
lr= 0.01
lr_decay = 1.0 # original work does not use decay either!
clip = 5.0 # gradient clipping
hidden_size_char = 25
hidden_size_lstm = 100
# I also replace numeric with zero as stated in the original implementation of Lample.

I'm getting approx. 88.5% F1 score for this setting.

The only difference I see compared to the original implementation of Lample, is the addition of singletons (w/ 0.5 probability) to train UNK token but IMO this should not make a huge difference, right?

Any help would be appreciated.
Thanks.

from sequence_tagging.

jayavardhanr avatar jayavardhanr commented on June 28, 2024

Firstly, Thanks for sharing the code and detailed instructions.

I have been facing similar issues, I tried the same parameters as mentioned in the paper. It only gives a Test F-1 Score of around 87.

I also tried tuning the hyper-parameters using different learning methods, learning rates, decays, momentum values. The best result achieved with the code is 88.5 F1.

It would be great if you can share the hyperparameters using which you were able to reproduce the results in the paper.

My Environment Details:
Python 2.7
Tensorflow-gpu 1.2.0
CUDA 8.0.44

Thanks

from sequence_tagging.

jayavardhanr avatar jayavardhanr commented on June 28, 2024

@cooelf Thanks for the reply.
Did you use glove.840B.300d or word2vec 300d for word embeddings?

from sequence_tagging.

cooelf avatar cooelf commented on June 28, 2024

@jayavardhanr I simply used glove.6B.300d word embeddings. It's quite small actually. My partner tried using the codes with glove.840B.300d in a similar task, which showed a big improvment (+3.8%) than glove.6B.300d.

From my previous experiments, adam also seems to be better than SGD. Maybe you can try the embedding with the parameters.

Hoping for your feedback!

from sequence_tagging.

jayavardhanr avatar jayavardhanr commented on June 28, 2024

@cooelf Thanks for the details. I tried your mentioned hyper-parameters. I did achieve an F-1 score of 90.10 on the Test set.

Thanks again.

from sequence_tagging.

Jonida88 avatar Jonida88 commented on June 28, 2024

@cooelf @jayavardhanr . Hey guys... please maybe someone can help me....I try to run the model by myself. I am following the steps: 1.model/data_utils, config.py and than build_data.py but at the referenc he write that first you run bild_data and than config.py...which steps should i use? and when I run data_utils its not iterating over the CoNLL dataset but isnt showing any Error I dont know what i am doing wrong.... i will really appreciate your help..

from sequence_tagging.

luto65 avatar luto65 commented on June 28, 2024

I had to remove the ".iob" from the downloaded files ... did you do it too ?

from sequence_tagging.

jayavardhanr avatar jayavardhanr commented on June 28, 2024

@luto65 Yes. Forgot to mention that

from sequence_tagging.

luto65 avatar luto65 commented on June 28, 2024

Using defaults (without touching the installation) on macOS I got following on the CONLL dataset.
acc 97.91 - f1 89.54

impressive ! Congrats !

from sequence_tagging.

Jonida88 avatar Jonida88 commented on June 28, 2024

Hi @luto65 and @jayavardhanr thank you very much for your help. Have some of you a idee why i am getting this error at the issue 3 (i was opening one issue 3)? I was trying many other ways but i am getting always the same error... Thanks again in advance...

from sequence_tagging.

ShengleiH avatar ShengleiH commented on June 28, 2024

Hi @jayavardhanr, I have a question about the 'build data' part. I found in the 'build_data.py' file, the author build the vocabulary by using all of 'train', 'dev' and 'test' data. But in my view, the vocabulary should be built on the train set. May be I missed something, can you give me some advices? Thanks a lot!

from sequence_tagging.

sbmaruf avatar sbmaruf commented on June 28, 2024

hi!
the vocab is alright with train test dev. here's the reason.

  1. you are actually not using the lable of the dev and test
  2. assume you are not using dev and test. now you got an unknown word from dev. you searched the word's embedding in glove or word2vec or fasttext (or initialize randomly). you found the embedding. you add the embedding to your vocabulary and lookup according to it. it's like while you find an unknown word at runtime and you process the word as your embedding will always be open for you to take. there's no harm in it.

now, if you want to do this procedure at runtime it might be hard to track. instead that you took all the words from train test and dev at the beginning or the training as vocabulary. the procedure is equivalent.

from sequence_tagging.

guillaumegenthial avatar guillaumegenthial commented on June 28, 2024

Because we're using pre-trained embeddings, we can keep the vectors of all the words present in the train test and dev set. (Ideally we would keep all GloVe vectors but that's unnecessary for our experiment). Also, at training time, your model does see the UNK word (not all words in the training set are words in the GloVe vocab!).

from sequence_tagging.

guillaumegenthial avatar guillaumegenthial commented on June 28, 2024

Also, if you use the IOBES and GloVe6B you should get results similar to the paper. I wrote a new version of the code, that achieves higher scores : https://github.com/guillaumegenthial/tf_ner/

from sequence_tagging.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.