GithubHelp home page GithubHelp logo

ljynlp / w2ner Goto Github PK

View Code? Open in Web Editor NEW
500.0 4.0 82.0 284 KB

Source code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

License: MIT License

Python 100.00%
named-entity-recognition ner

w2ner's Introduction

Unified Named Entity Recognition as Word-Word Relation Classification

Source code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W2NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER.

Label Scheme

Architecture

1. Environments

- python (3.8.12)
- cuda (11.4)

2. Dependencies

- numpy (1.21.4)
- torch (1.10.0)
- gensim (4.1.2)
- transformers (4.13.0)
- pandas (1.3.4)
- scikit-learn (1.0.1)
- prettytable (2.4.0)

3. Dataset

We provide some datasets processed in this link.

4. Preparation

  • Download dataset
  • Process them to fit the same format as the example in data/
  • Put the processed data into the directory data/

5. Training

>> python main.py --config ./config/example.json

6. License

This project is licensed under the MIT License - see the LICENSE file for details.

7. Citation

If you use this work or code, please kindly cite this paper:

@inproceedings{li2022unified,
  title={Unified named entity recognition as word-word relation classification},
  author={Li, Jingye and Fei, Hao and Liu, Jiang and Wu, Shengqiong and Zhang, Meishan and Teng, Chong and Ji, Donghong and Li, Fei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={10},
  pages={10965--10973},
  year={2022}
}

w2ner's People

Contributors

kev123456 avatar ljynlp avatar scofield7419 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

w2ner's Issues

关于配置文件

你好,能否提供一下Share_2013和Share_2014对应的配置文件呢?

数据集报错

我自己用同样的方法标注了两个数据集,第一可以正常运行,但是第二个运行一个epoch后就报错。。。。不知道为啥

Train 0 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.1585 | 0.1423 | 0.1426 | 0.1420 |
+---------+--------+--------+-----------+--------+
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
2022-05-24 16:50:44 - INFO: EVAL Label F1 [0.99931388 0. 0. 0. 0. 0.
0. ]
2022-05-24 16:50:44 - INFO:
+--------+--------+-----------+--------+
| EVAL 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.1428 | 0.1427 | 0.1429 |
| Entity | 0.0000 | 0.0000 | 0.0000 |
+--------+--------+-----------+--------+
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Traceback (most recent call last):
File "./drive/MyDrive/W2NER/main.py", line 253, in
test_f1 = trainer.eval(i, test_loader, is_test=False) #is_test=True
File "./drive/MyDrive/W2NER/main.py", line 107, in eval
outputs = model(bert_inputs, grid_mask2d, dist_inputs, pieces2word, sent_length)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/MyDrive/W2NER/model.py", line 222, in forward
bert_embs = self.bert(input_ids=bert_inputs, attention_mask=bert_inputs.ne(0).float())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 957, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (602) must match the existing size (512) at non-singleton dimension 1. Target sizes: [2, 602]. Tensor sizes: [1, 512]

文章中Genia的结果有误

hi,您的文章中Table3中Genia的最后一行的F1有误,为83.10,和Precision和Recall对不上,我用这个仓库的脚本复现的结果差不多应该是81.30。

模型训练中内存问题

你好,我训练模型时用768维的BERT,只用最后一层权重,主GPU的内存会打满,然后程序被强制killed,换成另一个维度较小的模型则可以正常训练,这个模型很吃内存吗?你训练的时候预训练模型是哪个?硬件什么配置?

训练中经常出现loss为Nan

模型在训练中经常出现loss为Nan,使模型训练不起来,调试中发现res_embs出现Nan,调整batch_size 以及 lr也是经常Nan,不知能否提供点建议?

关于显存占用逐渐增大的问题

你好,在实际使用的过程中发现显存占用逐渐增大。对于小数据集还好,但是对于很大的训练集,比如几十万个语句,即使batch_size=1也会出现CUDA_OUT_OF_MEMORY。定位发现应该是预测结果的累积导致的显存爆炸。

原代码:
label_result.append(grid_labels)
pred_result.append(outputs)

修改如下即可:
label_result.append(grid_labels.cpu())
pred_result.append(outputs.cpu())

torch.cat报错

``Traceback (most recent call last):
File "main.py", line 301, in
trainer.train(i, train_loader)
File "main.py", line 77, in train
label_result = torch.cat(label_result)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]

跑example demo的时候出了这个问题,代码没有改过,不知道大家有没有遇到过相同的问题,或者怎么去解决的呢?

NER performance on unstructured data

Hi Team,

I need your suggestion to process NER on unstructured data using W2NER architecture.

The document may not have key-value pair and it won't follow any pattern in the spatial domain.

Will your proposed model work on this type of data?

Pls, shoot your suggestions or any other implementations too.

Thanks,
AB

请教我的测试结果 label和entity结果相差较大的问题

作者您好,首先感谢您的分享。有一个问题想要请教,在我自己构建的数据集(中文,flat)上进行实验时,test的label和entity的准召相差还比较大,val时相差不是很大,请问这是decode时出现了什么问题呢
2022-04-01 17:47:44 - INFO: Epoch: 9
2022-04-01 17:48:01 - INFO:
+---------+--------+--------+-----------+--------+
| Train 9 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0061 | 0.9698 | 0.9694 | 0.9703 |
+---------+--------+--------+-----------+--------+
2022-04-01 17:48:02 - INFO: EVAL Label F1 [0.99797655 0.98461538 0.94179894 0.90293454 0.85561497 0.99166667
0.8 0.98181818]
2022-04-01 17:48:02 - INFO:
+--------+--------+-----------+--------+
| EVAL 9 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9321 | 0.9258 | 0.9389 |
| Entity | 0.9207 | 0.9187 | 0.9226 |
+--------+--------+-----------+--------+
2022-04-01 17:48:03 - INFO: TEST Label F1 [0.99777767 0.985705 0.91578947 0.90581162 0.81385281 0.98876404
0.85964912 1. ]
2022-04-01 17:48:03 - INFO:
+--------+--------+-----------+--------+
| TEST 9 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9334 | 0.9176 | 0.9513 |
| Entity | 0.8928 | 0.8799 | 0.9061 |
+--------+--------+-----------+--------+
2022-04-01 17:48:03 - INFO: Best DEV F1: 0.9230
2022-04-01 17:48:03 - INFO: Best TEST F1: 0.8848
2022-04-01 17:48:08 - INFO: TEST Label F1 [0.99751797 0.98505523 0.9197861 0.904 0.79831933 0.98876404
0.84581498 1. ]
2022-04-01 17:48:08 - INFO:
+------------+--------+-----------+--------+
| TEST Final | F1 | Precision | Recall |
+------------+--------+-----------+--------+
| Label | 0.9299 | 0.9131 | 0.9486 |
| Entity | 0.8848 | 0.8688 | 0.9014 |
+------------+--------+-----------+--------+

关于数据集ace2004/2005的超参数设置

请问能给出ace2004/2005的超参数设置嘛,我们的按照conll03的参数设置复现ace数据集时,发现最终的结果差距较大,不知道是不是超参数的选择上有什么变动

关于随机种子不起作用的原因

作者你好,你们提供的源码,我上次跑的时候发现不能复现你论文的结果,同时和你沟通后被告知是CNN网络的问题导致随机种子不起作用。
后面我仔细调试了代码,发现原因不是这样的,导致随机种子失效的作用是模型中的CLN层,就是那个条件LayerNorm导致的;
原代码
# # # #条件层归一化——输入两个向量[B, L, 1, 512] 和 [B, L, 512]得到[B, L, L, 512] # cln = self.cln(word_reps.unsqueeze(2), word_reps)
替换为
cln = word_reps.unsqueeze(2).repeat(1,1,word_reps.shape[1],1)
模型的结果可复现。这样只是为了验证CLN层确实有问题。
关于怎么修改,我尝试的修改了一下,CLN中的分类器开起偏置后,发现还是不行。
作者看看怎么修改,也分析一下是什么问题?
谢谢!

How to solve multiple overlapped?

Hi, if I have a sequence (A, B, C, D, E, F). The first entity is A, C, D, F and the second is B, C, E, F. With your strategy, there are two reasonable results for both entities. The first could be coded to ACDF or ACEF. While the second could be coded to BCDF or BCEF.

My question is: how could your algorithm attack this situation?

关于中文数据集标注

请问您标注时是使用label-studio进行标注,还是所有文本信息人工一一进行标注?

关于论文中dis_embedding的实现dist_inputs的疑问

代码中的dist_inputs有点不太理解作者为什么要这样子设计?
` for k in range(length):
_dist_inputs[k, :] += k
_dist_inputs[:, k] -= k

    for i in range(length):
        for j in range(length):
            if _dist_inputs[i, j] < 0:
                _dist_inputs[i, j] = dis2idx[-_dist_inputs[i, j]] + 9
            else:
                _dist_inputs[i, j] = dis2idx[_dist_inputs[i, j]]
    _dist_inputs[_dist_inputs == 0] = 19`

_dist_inputs 为何还要减去dis2idx?以及dis2idx的值为何是这样子的:
dis2idx = np.zeros((1000), dtype='int64') dis2idx[1] = 1 dis2idx[2:] = 2 dis2idx[4:] = 3 dis2idx[8:] = 4 dis2idx[16:] = 5 dis2idx[32:] = 6 dis2idx[64:] = 7 dis2idx[128:] = 8 dis2idx[256:] = 9

这个代码的作用有点不理解,请教一下作者?这样设计有什么深意吗?

输出的TEST F1分数疑惑

logger.info('{} Label F1 {}'.format(title, f1_score(label_result.cpu().numpy(), 标签训练结果,标签预测结果
pred_result.cpu().numpy(),
average=None)))
##INFO: TEST Label F1 [0.99965146 0.6340702 0.70332481 0.25824176],我想问一下这句话是什么意思,我的数据标签为两个,然后这个F1分数应该是预测两个标签的训练和测试的F1分数吗?
然后我将数据集的标签换成了四个,但是INFO: TEST Label F1 只有6个数,正常情况应该是8个吧?
不理解这个地方的输出Label F1具体是什么?

NotImplementedError

Hi Team,

I followed your instructions as it is and I got an error listed below.
`

+---------+-----------+----------+
| example | sentences | entities |
+---------+-----------+----------+
| train | 1 | 6 |
| dev | 1 | 1 |
| test | 1 | 2 |
+---------+-----------+----------+
2022-05-27 11:20:50 - INFO: Building Model
2022-05-27 11:20:56 - INFO: Epoch: 0
Traceback (most recent call last):
File "main.py", line 299, in
trainer.train(i, train_loader)
File "main.py", line 75, in train
label_result = torch.cat(label_result)
NotImplementedError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors, or that you (the operator writer) forgot to register a fallback function. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

CPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterCPU.cpp:18433 [kernel]
CUDA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterCUDA.cpp:26496 [kernel]
QuantizedCPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterQuantizedCPU.cpp:1068 [kernel]
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradCPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradCUDA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradXLA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradLazy: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradXPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradMLC: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradHPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradNestedTensor: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse1: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse2: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse3: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
Tracer: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/TraceType_3.cpp:11560 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/autocast_mode.cpp:466 [backend fallback]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

`

I am attaching my conda env here
w2ner.txt

模型句子输入长度问题

运行模型后发现,句子长度最长为512,不足会补pad,但是我想将句子最长改为300,不知道在那个文件修改哪个参数,,,一只没找到,希望能给出修改意见

gpu显存占用问题

在训练时只能设置batch_size =4,无法进行大batch训练
在跑trainer.trian()时:
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3C:00.0 Off | 0 |
| N/A 55C P0 77W / 70W | 6330MiB / 15109MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
跑到trainer.eval()时:
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3C:00.0 Off | 0 |
| N/A 71C P0 71W / 70W | 12748MiB / 15109MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
显存占用会翻倍,并且不会在eval和test结束后释放,这导致无法使用大batch_size进行训练,不知道你们在跑的时候是不是也有这种情况,我看你们给出的至少都是batch_size=8以上

在CMeEE数据上报错

5153b086f9b4871374eb7ac90d1e24b
debug后发现是
_dist_inputs[i, j] = dis2idx[-_dist_inputs[i, j]] + 9
这一句这里出错了,不知道应该怎么修改
我的数据处理后如下,中英文混杂的把英文也拆成了单个字符,不知道这样处理对不对
微信图片_20220706102257

预测结果中的实体并不是按顺序出现的

我用W2NER在自己数据集上进行预测后,出来的实体结果顺序并不是按照原始文本中的顺序,比如我的文本是ABC,但是预测的实体顺序是ACB这样,请问这个顺序是需要自己后处理调么,还是在inference的时候可以设置,或者是我 预测有问题

实体label如何学到的呢?

您好,我在阅读论文的时候有一点困惑:2d grid最后预测的关系为三种,通过解码可以得到不同的实体序列,但是如何得到实体类型呢?
我看预测的结果中会包含实体的类型,但在论文中没有看到是如何做到的,期待解答~

中文数据集的格式问题

请问中文数据集的中的sentence这个数据都需要句号“。”作为结尾吗,数据集的type类型多少个比较好,然后就是一个句子中多长比合适,一个句子中标记几个实体比较合适,谢谢(可以以您的数据集resume-zh的分析为例)

关于模型的输出

模型的最后输出是什么?BIO标签吗(相当于序列标注问题)?

有关中文数据集字段的问题

您好,在resume数据集文件里有一个word关键字类似于分词的,但是我好像没有在代码中看到哪里使用了它。请问这个关键字是否是必须的,如果是它是如何获得的呢?

期待您的回复

推断报错

size mismatch for predictor.biaffine.weight: copying a param with shape torch.Size([12, 513, 513]) from checkpoint, the shape in current model is torch.Size([6, 513, 513]).
size mismatch for predictor.linear.weight: copying a param with shape torch.Size([12, 288]) from checkpoint, the shape in current model is torch.Size([6, 288]).
size mismatch for predictor.linear.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([6]).

仅进行推断时,load model会报上述错误,求解~~

您好 我有个现象不是很理解 希望能得到解答

2022-03-07 11:53:57 - INFO: Epoch: 0
2022-03-07 11:58:07 - INFO:
+---------+--------+--------+-----------+--------+
| Train 0 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0994 | 0.2583 | 0.2489 | 0.4252 |
+---------+--------+--------+-----------+--------+
2022-03-07 11:58:12 - INFO: EVAL Label F1 [0.99934271 0.98191852 0.98198198 1. 0.96551724 0.92318841
0.92511013 0.67803547 0.73913043 1. ]
2022-03-07 11:58:12 - INFO:
+--------+--------+-----------+--------+
| EVAL 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9194 | 0.8889 | 0.9705 |
| Entity | 0.9171 | 0.9294 | 0.9051 |
+--------+--------+-----------+--------+
2022-03-07 11:58:18 - INFO: TEST Label F1 [0.99937835 0.97871818 0.96551724 0.98245614 1. 0.9242923
0.92436975 0.68362688 0.78481013 0.90909091]
2022-03-07 11:58:18 - INFO:
+--------+--------+-----------+--------+
| TEST 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9152 | 0.8914 | 0.9575 |
| Entity | 0.9234 | 0.9536 | 0.8951 |
+--------+--------+-----------+--------+
2022-03-07 11:58:19 - INFO: Epoch: 1
2022-03-07 12:02:25 - INFO:
+---------+--------+--------+-----------+--------+
| Train 1 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0044 | 0.8625 | 0.8954 | 0.8368 |
+---------+--------+--------+-----------+--------+
2022-03-07 12:02:30 - INFO: EVAL Label F1 [0.99968644 0.98618307 0.99547511 1. 0.96551724 0.94186047
0.94977169 0.89143866 0.91891892 1. ]
2022-03-07 12:02:30 - INFO:
+--------+--------+-----------+--------+
| EVAL 1 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9649 | 0.9564 | 0.9750 |
| Entity | 0.9459 | 0.9459 | 0.9459 |
+--------+--------+-----------+--------+
2022-03-07 12:02:36 - INFO: TEST Label F1 [0.99974417 0.98823244 0.99555556 1. 1. 0.94695481
0.96069869 0.89397795 0.94117647 0.90909091]
2022-03-07 12:02:36 - INFO:
+--------+--------+-----------+--------+
| TEST 1 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9635 | 0.9638 | 0.9658 |
| Entity | 0.9515 | 0.9589 | 0.9442 |
+--------+--------+-----------+--------+

在第一轮train中 F1和pre都很低,但最终eval和test的两个指标也算达到了比较高的数值,这是否说明label和entity其实和模型表现或者模型训练的关系没那么大?

关于CLN层

大佬,我想问一下CLN层具体是怎么工作的,能否举个例子说明一下

关于loss计算和最后的词对关系张量计算

大佬你好,有几个地方还是没明白,我跨专业有点菜,希望大佬不会嫌我烦。首先是loss,论文里面写的是负对数似然损失,代码里面好像用的是交叉熵,这个loss怎么计算呀?协预测器出来处理后得到词对关系分类矩阵,那具体怎么在这个矩阵上计算损失?另外一个就是论文里面说Co-Predictor Layer 出来后得到的是一个关系分数张量,经过softmax后,得到后面这个Word-Word Relation Classification 张量。我看了代码,协预测器出来的outputs[B,L,L,num_class]经过torch.argmax(outputs ,-1)得到词对关系张量[B,L,L],torch.argmax()返回的是最大值索引,怎么得到这里Word-Word Relation Classification ,不是很理解,希望能够得到大佬的解答。

预训练模型版本问题

您好,在论文中提到d_h的参数为768 和1024,也就是对应bert-base和bert-large,不知道论文实验中的指标所使用的的模型版本是base还是large呢?所有数据集都相同的模型大小吗?

On dataset & processing codes

Hi! Thank you for sharing your works!

Could you share your data processing codes?

Also, this link seems to require access rights. Could you change to sharable to everyone?

Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.