GithubHelp home page GithubHelp logo

zhixiuye / hscrf-pytorch Goto Github PK

View Code? Open in Web Editor NEW
304.0 304.0 68.0 1 MB

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Python 100.00%
crf ner nlp pytorch sequence-labeling

hscrf-pytorch's People

Contributors

zhixiuye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hscrf-pytorch's Issues

关于模型9

您好,请问在实现table1中模型9时,参数应该如何设置呀?

Question about scrf_to_crf in utils.py

Hi. I read your code and I have a question about the function scrf_to_crf in utils.py, ie..

for i_l in decoded_scrf:
        sent_labels = [l_map['<start>']]
        for label in i_l:
            if label != l_map['<pad>']:
                sent_labels.append(label)
            else:
                break
        crf_labels.append(sent_labels)
    crfdata = []
    masks = []
    maxl_1 = max([len(i) for i in crf_labels])
    for i_l in crf_labels:
        cur_len_1 = len(i_l)
        cur_len = cur_len_1 - 1
        i_l_pad = [i_l[ind] * label_size + i_l[ind + 1] for ind in range(0, cur_len)] + [i_l[cur_len] * label_size + pad_label] + [
                    pad_label * label_size + pad_label] * (maxl_1 - cur_len_1)

        mask = [1] * cur_len_1 + [0] * (maxl_1 - cur_len_1)
        crfdata.append(i_l_pad)
        masks.append(mask)

Why would it break if lable == l_map['']? The break operation will make the length of sent_label different from that of decoded_scrf[0], resulting in a mismantch between sent_labels and the sentence in terms of the length?

Loss function for NER task

Can you please point to some reference which you used to implement the loss function for NER task?

I think your paper only discusses loss function for the word level segmenting.

Thanks

On Torch 0.4

Traceback (most recent call last):
  File "train.py", line 205, in <module>
    evaluator.calc_score(model, dev_dataset_loader)
  File "/u/suhubdyd/projects/HSCRF-pytorch/model/evaluator.py", line 172, in calc_score
    decoded_crf, crf_result_scored_by_crf = utils.decode_with_crf(ner_model.crf, word_representations, mask_v,self.l_map)
  File "/u/suhubdyd/projects/HSCRF-pytorch/model/utils.py", line 763, in decode_with_crf
    decoded_crf = crf.decode(word_reps, mask_v)
  File "/u/suhubdyd/projects/HSCRF-pytorch/model/crf_layer.py", line 131, in decode
    decode_idx[idx] = pointer
RuntimeError: expand(torch.cuda.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Wrong requirements

Your requirements.txt contains wrong pytorch requirements, either your code base is in python3.x or your pytorch should use a whl corresponding to python2.7, fix it also please update the refereces

logalpha initialize

hi, what really confused me is that why logalpha initialize to -10000 not zero or other random data, is there any special meaning?

out of memory

Have you ever had this problem:
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/condabld/pytorch_1511304568725/work/torch/lib/THC/generic/THCStorage.cu:66

I used the Chinese corpus with the vector dimension of 300. I try to use the little dataset and make the batchsize = 5, but it failed again.

Can you offer me the check point files?

When I try to run 'eval.py', I realize that there is lack of './checkpoint/6365035.json' and './checkpoint/6365035.model' file in this code source. Does this project need these two files? I will be grateful if you can offer me more information.

Not able to achieve accuracy shown in paper/repo

I tried running the program in an environment with pytorch=0.2.0 and python=2.7 multiple times using the command : CUDA_VISIBLE_DEVICES=0 python train.py --char_lstm --high_way

The highest accuracy that I have gotten is as follows :
image

Please tell me how you were able to achieve 91.xx?

asking for the cuda OOM questions

I run the code on a Chinese ner train data(around 70 thousand sentences), and I got the OMM error:

  • Tot it 541 (epoch 0): 0it [00:00, ?it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
    Traceback (most recent call last):
    File "train.py", line 179, in
    loss.backward()
    File "/usr/lib64/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
    File "/usr/lib64/python2.7/site-packages/torch/autograd/init.py", line 98, in backward
    variables, grad_variables, retain_graph)
    File "/usr/lib64/python2.7/site-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
    File "/usr/lib64/python2.7/site-packages/torch/autograd/function.py", line 194, in wrapper
    outputs = fn(ctx, *tensor_args)
    File "/usr/lib64/python2.7/site-packages/torch/nn/_functions/thnn/sparse.py", line 80, in backward
    grad_weight = grad_output.new(ctx.weight_size).zero()
    RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66
    I have set the batch_size to 10 and 128, and both of them result in the error shown above. Could any one give me some advise to solve it?

Goldfactors format

Hi, I have been trying to use your code as a reference to implement a similar SCRF variant, and am a little confused by the get_logloss_numerator function. What exactly is the goldfactors you use for the correct path scores?
I am speaking about this line: https://github.com/ZhixiuYe/HSCRF-pytorch/blob/master/model/hscrf_layer.py#L146

Could you give me an example how I can generate my own factors?

========><==========
EDIT: I think I understand the format, each item is a tensorized list of (from_id, to_id, prev_tag, curr_tag). If that is correct, the comment that the size is (batch_size, tag_len, 4) is confusing, since it means the maximum number of unique tag-sequences within items in the batch. I assumed that tag_len means tagset_size.

tried to construct a tensor

Traceback (most recent call last):
File "/home/04-HSCRF-pytorch-master/train.py", line 210, in
evaluator.calc_score(model, dev_dataset_loader)
File "/home/04-HSCRF-pytorch-master/evaluator.py", line 172, in calc_score
decoded_crf, crf_result_scored_by_crf = utils.decode_with_crf(ner_model.crf, word_representations, mask_v,self.l_map)
File "/home/04-HSCRF-pytorch-master/utils.py", line 777, in decode_with_crf
bi_crf = torch.cuda.LongTensor(bi_crf).transpose(0,1).unsqueeze(2)
RuntimeError: tried to construct a tensor from a nested int sequence, but found an item of type numpy.int64 at index (0, 0)

version of PyTorch

Is it possible to use PyTorch 0.4.0?

It seems I'm running into some issues but not sure if it dues to the version issue.

`allan@statnlp0:~/tmp/HSCRF-pytorch$ CUDA_VISIBLE_DEVICES=0 python train.py --char_lstm --high_way
seed: 5703958
setting:
Namespace(allowspan=6, batch_size=10, char_embedding_dim=30, char_lstm=True, char_lstm_hidden_dim=300, char_lstm_layers=1, checkpoint='./checkpoint/', clip_grad=5.0, cnn_filter_num=30, dev_file='./data/eng.testa', dropout_ratio=0.55, early_stop=10, emb_file='./data/glove.6B.100d.txt', epoch=150, grconv=False, high_way=True, highway_layers=1, index_embeds_dim=10, least_epoch=75, load_check_point='', load_opt=False, lr=0.015, lr_decay=0.05, mini_count=5, model_name='HSCRF', momentum=0.9, scrf_dense_dim=100, shrink_embedding=False, start_epoch=0, test_file='./data/eng.testb', train_file='./data/eng.train', unk='unk', word_embedding_dim=100, word_hidden_dim=300, word_lstm_layers=1)
loading corpus
constructing coding table
/home/allan/tmp/HSCRF-pytorch/model/utils.py:642: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_embedding, -bias, bias)
constructing dataset
building model
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.55 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/home/allan/tmp/HSCRF-pytorch/model/utils.py:649: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_linear.weight, -bias, bias)
/home/allan/tmp/HSCRF-pytorch/model/hscrf_layer.py:60: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_embedding, -bias, bias)
/home/allan/tmp/HSCRF-pytorch/model/hscrf_layer.py:68: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(input_linear.weight, -bias, bias)
/home/allan/tmp/HSCRF-pytorch/model/utils.py:660: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)
/home/allan/tmp/HSCRF-pytorch/model/utils.py:663: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
nn.init.uniform(weight, -bias, bias)

  • Tot it 1404 (epoch 0): 0it [00:00, ?it/s]train.py:180: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
    nn.utils.clip_grad_norm(model.parameters(), args.clip_grad)
  • Tot it 1404 (epoch 0): 0it [00:00, ?it/s]train.py:195: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
    nn.utils.clip_grad_norm(model.parameters(), args.clip_grad)
    epoch_loss: 22.3492750524
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:49: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    f_f = Variable(f_f[:, 0:mlen[0]].transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    f_p = Variable(f_p[:, 0:mlen[1]].transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:51: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    b_f = Variable(b_f[:, -mlen[0]:].transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:52: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    b_p = Variable((b_p[:, 0:mlen[1]] - ocl + mlen[0]).transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:53: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    w_f = Variable(w_f[:, 0:mlen[1]].transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:54: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    tg_v = Variable(target[:, 0:mlen[1]].transpose(0, 1), volatile=True).unsqueeze(2).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:55: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    mask_v = Variable(mask[:, 0:mlen[1]].transpose(0, 1), volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:56: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    SCRF_labels = Variable(SCRF_labels[:, 0:mlen[2]], volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:57: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    mask_SCRF_laebls = Variable(mask_SCRF_laebls[:, 0:mlen[2]], volatile=True).cuda()
    /home/allan/tmp/HSCRF-pytorch/model/data_packer.py:58: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    cnn_features = Variable(cnn_features[:, 0:mlen[1], 0:mlen[3]].transpose(0, 1), volatile=True).cuda().contiguous()
    /home/allan/tmp/HSCRF-pytorch/model/crf_layer.py:113: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    mask = Variable(1 - mask.data, volatile=True)
    /home/allan/tmp/HSCRF-pytorch/model/crf_layer.py:114: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    decode_idx = Variable(torch.cuda.LongTensor(seq_len-1, bat_size), volatile=True)
    Traceback (most recent call last):
    File "train.py", line 205, in
    evaluator.calc_score(model, dev_dataset_loader)
    File "/home/allan/tmp/HSCRF-pytorch/model/evaluator.py", line 172, in calc_score
    decoded_crf, crf_result_scored_by_crf = utils.decode_with_crf(ner_model.crf, word_representations, mask_v,self.l_map)
    File "/home/allan/tmp/HSCRF-pytorch/model/utils.py", line 763, in decode_with_crf
    decoded_crf = crf.decode(word_reps, mask_v)
    File "/home/allan/tmp/HSCRF-pytorch/model/crf_layer.py", line 131, in decode
    decode_idx[idx] = pointer
    RuntimeError: expand(torch.cuda.LongTensor{[50, 1]}, size=[50]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)`

hard-coded dimensions of view cnn_lstm

in the cnn_lstm function in word_rep_layer.py you've hard-coded the dimensions in the view of d_char_out, that is you are using (-1, self.batch_size, 30) , it should be (word_seq.size(0), slef.batch_size, -1) to make it general

I think this won't work if you change the embedding dimension or number of characters in alphabet

What is the format of the dataset?

the dataset given by you in data folder has sentences such as

AUGUST RB I-NP O
1996 CD I-NP O
CDU NNP I-NP I-ORG
/ SYM O I-ORG
CSU NNP I-NP I-ORG
SPD NNP I-NP B-ORG
FDP NNP I-NP B-ORG
Greens NNP I-NP B-ORG
PDS NNP I-NP B-ORG

this is not BIO as B is following I, it should have been B-ORG followed by multiple I-ORG, same happens with B-MISC and B-PER and B-LOC doesn't even exits. please clarify

sequence lenght mismatch

Traceback (most recent call last):
File "F:/Python_Projects/HSCRF_NER_pytorch/train.py", line 203, in
evaluator.calc_score(model, dev_dataset_loader)
File "F:\Python_Projects\HSCRF_NER_pytorch\model\evaluator.py", line 181, in calc_score
scrf_result_scored_by_crf = utils.rescored_with_crf(decoded_scrf, self.l_map, ner_model.crf.crf_scores)
File "F:\Python_Projects\HSCRF_NER_pytorch\model\utils.py", line 871, in rescored_with_crf
tg_energy = torch.gather(scores.view(seq_len, bat_size, -1), 2, scrfdata).view(seq_len, bat_size)
RuntimeError: invalid argument 2: Input tensor must have same size as output tensor apart from the specified dimension at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\aten\src\thc\generic/THCTensorScatterGather.cu:29

Process finished with exit code 1

on using crf model to score the prediction of scrf,
file utils.py, in function scrf_to_crf(): after the prediction of scrf is converted crf data, its sequence lenght is not consistent with the sequence lenght of crf model scores. the former = lenght of sentence + 1,but the latter = thresholds[idx].

How to use loss from HSCRF?

Hey,

When I am running a forward pass with your HSCRF module, I am getting a Loss formatted like below.
In your training code, you use it like this: epoch_loss += utils.to_scalar(loss) (here: https://github.com/ZhixiuYe/HSCRF-pytorch/blob/master/train.py#L177).

What exactly does this do? Why is the loss not directly a scalar?

Loss

tensor([  -40.1046, -1039.7493, -2039.6393, -1039.3293, -1038.7677,
        -2038.6620, -2038.5282, -2038.3925, -2038.2518, -2038.1088,
        -2037.9637, -2037.8209, -2037.6803, -2037.5381, -2037.3993,
        -2037.2581, -2037.1168, -2036.9700, -2036.8280, -2036.6908,
        -2036.5507, -2036.4150, -2036.2721, -2036.1278, -2035.9827,
        -2035.8431, -2035.7041, -2035.5673, -2035.4296, -2035.2881,
        -2035.1510, -2035.0101, -2034.8658, -2034.7272, -2034.5862,
        -2034.4449, -2034.3057, -2034.1682, -2034.0237, -1033.9022,
        -1040.0959, -2040.0959, -2039.9283, -2039.7893, -2039.6503,
        -2039.5123, -2039.3701, -2039.2305, -2039.0912, -2038.9553,
        -2038.8187, -2038.6780, -2038.5389, -2038.3978,   -36.7899,
          -40.0815, -1040.0814, -2039.9296, -2039.7889, -2039.6522,
        -2039.5165, -2039.3783, -2039.2343, -2039.0922, -2038.9550,
        -2038.8119, -1038.5255, -1038.3424,   -37.4315, -1040.0959,
        -2040.0959, -2039.9336, -2039.7943, -1039.5092, -1039.1903,
        -2039.0892, -2038.9465, -2038.8047, -2038.6637, -2038.5248,
        -2038.3876, -2038.2494, -1037.9631, -1037.5199, -2037.4154,
        -2037.2767, -2037.1420, -2037.0059, -2036.8678, -2036.7299,
        -2036.5885, -2036.4471, -2036.3021, -2036.1615, -2036.0237,
        -2035.8873, -2035.7494, -2035.6093, -2035.4706, -1035.3413,
          -40.0959, -1038.7875, -2038.6840, -2038.5472, -2038.4045,
        -2038.2649, -2038.1256, -2037.9882, -2037.8506, -2037.7131,
        -2037.5725, -2037.4362, -2037.2994, -2037.1644, -2037.0319,
        -2036.8983, -2036.7588, -2036.6194, -2036.4772, -2036.3358,
        -1036.2131,   -40.0959, -1039.0562, -2038.9530, -2038.8147,
        -2038.6696, -2038.5270, -2038.3877, -2038.2422, -2038.1002,
        -2037.9564, -2037.8147, -1037.6818,   -40.1046, -1039.7607,
        -2039.6508, -2039.5139, -2039.3763, -2039.2395, -2039.1012,
        -2038.9645, -2038.8303, -2038.6946, -2038.5599, -2038.4235,
        -1038.1422, -1037.9800,   -37.7956,   -40.1046, -1039.7583,
        -1039.5063, -1039.3461, -2039.2405, -2039.1073, -2038.9686,
        -2038.8303, -2038.6901, -2038.5526, -2038.4150, -2038.2737,
        -2038.1295, -2037.9882, -1037.8634, -1040.0959, -2040.0959,
        -2039.9303,   -40.0959, -1038.6449, -2038.5428, -2038.4020,
        -2038.2581, -2038.1158, -2037.9739, -2037.8346, -2037.6959,
        -2037.5570, -2037.4202, -2037.2764, -2037.1334, -2036.9911,
        -2036.8450, -1036.7113,   -40.1044, -1039.9014, -2039.7880,
          -40.0959, -1038.3672, -2038.2637, -2038.1252, -2037.9850,
        -2037.8483, -2037.7079, -2037.5686, -2037.4323, -2037.2991,
        -2037.1674, -2037.0237, -2036.8802, -2036.7416, -2036.6041,
        -2036.4674, -2036.3307, -2036.1898, -2036.0443, -2035.9014,
        -2035.7561, -2035.6185, -2035.4778, -1035.3492], device='cuda:0')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.