ljynlp / w2ner Goto Github PK

View Code? Open in Web Editor NEW

500.0 4.0 82.0 284 KB

Source code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

License: MIT License

Python 100.00%

named-entity-recognition ner

w2ner's Introduction

Unified Named Entity Recognition as Word-Word Relation Classification

Source code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W2NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER.

Label Scheme

Architecture

1. Environments

- python (3.8.12)
- cuda (11.4)

2. Dependencies

- numpy (1.21.4)
- torch (1.10.0)
- gensim (4.1.2)
- transformers (4.13.0)
- pandas (1.3.4)
- scikit-learn (1.0.1)
- prettytable (2.4.0)

3. Dataset

We provide some datasets processed in this link.

4. Preparation

Download dataset
Process them to fit the same format as the example in data/
Put the processed data into the directory data/

5. Training

>> python main.py --config ./config/example.json

6. License

This project is licensed under the MIT License - see the LICENSE file for details.

7. Citation

If you use this work or code, please kindly cite this paper:

@inproceedings{li2022unified,
  title={Unified named entity recognition as word-word relation classification},
  author={Li, Jingye and Fei, Hao and Liu, Jiang and Wu, Shengqiong and Zhang, Meishan and Teng, Chong and Ji, Donghong and Li, Fei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={10},
  pages={10965--10973},
  year={2022}
}

w2ner's People

Contributors

Stargazers

Watchers

w2ner's Issues

数据集报错

我自己用同样的方法标注了两个数据集，第一可以正常运行，但是第二个运行一个epoch后就报错。。。。不知道为啥

Train 0 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.1585 | 0.1423 | 0.1426 | 0.1420 |
+---------+--------+--------+-----------+--------+
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
2022-05-24 16:50:44 - INFO: EVAL Label F1 [0.99931388 0. 0. 0. 0. 0.
0. ]
2022-05-24 16:50:44 - INFO:
+--------+--------+-----------+--------+
| EVAL 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.1428 | 0.1427 | 0.1429 |
| Entity | 0.0000 | 0.0000 | 0.0000 |
+--------+--------+-----------+--------+
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Traceback (most recent call last):
File "./drive/MyDrive/W2NER/main.py", line 253, in
test_f1 = trainer.eval(i, test_loader, is_test=False) #is_test=True
File "./drive/MyDrive/W2NER/main.py", line 107, in eval
outputs = model(bert_inputs, grid_mask2d, dist_inputs, pieces2word, sent_length)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/MyDrive/W2NER/model.py", line 222, in forward
bert_embs = self.bert(input_ids=bert_inputs, attention_mask=bert_inputs.ne(0).float())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 957, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (602) must match the existing size (512) at non-singleton dimension 1. Target sizes: [2, 602]. Tensor sizes: [1, 512]

自定义数据并训练时出现”RuntimeError: input.size(-1) must be equal to input_size. Expected 768, got 1024.“怎么处理哇？感谢你

关于OntoNotes 4.0和Weibo结果复现问题

你好，我想复现一下OntoNotes 4.0和Weibo 数据集结果，但源码代码中没有找到，麻烦提供一下OntoNotes 4.0和Weibo 数据集上的超参配置文件，多谢. 邮箱：[email protected]

文章中Genia的结果有误

hi，您的文章中Table3中Genia的最后一行的F1有误，为83.10，和Precision和Recall对不上，我用这个仓库的脚本复现的结果差不多应该是81.30。

模型训练中内存问题

你好，我训练模型时用768维的BERT，只用最后一层权重，主GPU的内存会打满，然后程序被强制killed，换成另一个维度较小的模型则可以正常训练，这个模型很吃内存吗？你训练的时候预训练模型是哪个？硬件什么配置？

多卡训练

请问是否支持多卡训练

训练中经常出现loss为Nan

模型在训练中经常出现loss为Nan，使模型训练不起来，调试中发现res_embs出现Nan，调整batch_size 以及 lr也是经常Nan，不知能否提供点建议？

Overlapped NER 和 Discontinuous NER中的F1的计算逻辑

请问您在论文的figure 5中所展示的Overlapped NER 和 Discontinuous NER中的F1计算逻辑是什么，特别是其中的FP是如何计算的？

关于显存占用逐渐增大的问题

你好，在实际使用的过程中发现显存占用逐渐增大。对于小数据集还好，但是对于很大的训练集，比如几十万个语句，即使batch_size=1也会出现CUDA_OUT_OF_MEMORY。定位发现应该是预测结果的累积导致的显存爆炸。

原代码：
label_result.append(grid_labels)
pred_result.append(outputs)

修改如下即可：
label_result.append(grid_labels.cpu())
pred_result.append(outputs.cpu())

中文数据集怎么预处理

分词吗

关于提供处理后的CADEC数据集

请问您处理后的数据集就只有”ADR“这一种实体类型吗

torch.cat报错

``Traceback (most recent call last):
File "main.py", line 301, in
trainer.train(i, train_loader)
File "main.py", line 77, in train
label_result = torch.cat(label_result)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]

跑example demo的时候出了这个问题，代码没有改过，不知道大家有没有遇到过相同的问题，或者怎么去解决的呢？

请问不连续实体标注有什么工具推荐吗

NER performance on unstructured data

Hi Team,

I need your suggestion to process NER on unstructured data using W2NER architecture.

The document may not have key-value pair and it won't follow any pattern in the spatial domain.

Will your proposed model work on this type of data?

Pls, shoot your suggestions or any other implementations too.

Thanks,
AB

请教我的测试结果 label和entity结果相差较大的问题

作者您好，首先感谢您的分享。有一个问题想要请教，在我自己构建的数据集（中文，flat）上进行实验时，test的label和entity的准召相差还比较大，val时相差不是很大，请问这是decode时出现了什么问题呢
2022-04-01 17:47:44 - INFO: Epoch: 9
2022-04-01 17:48:01 - INFO:
+---------+--------+--------+-----------+--------+
| Train 9 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0061 | 0.9698 | 0.9694 | 0.9703 |
+---------+--------+--------+-----------+--------+
2022-04-01 17:48:02 - INFO: EVAL Label F1 [0.99797655 0.98461538 0.94179894 0.90293454 0.85561497 0.99166667
0.8 0.98181818]
2022-04-01 17:48:02 - INFO:
+--------+--------+-----------+--------+
| EVAL 9 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9321 | 0.9258 | 0.9389 |
| Entity | 0.9207 | 0.9187 | 0.9226 |
+--------+--------+-----------+--------+
2022-04-01 17:48:03 - INFO: TEST Label F1 [0.99777767 0.985705 0.91578947 0.90581162 0.81385281 0.98876404
0.85964912 1. ]
2022-04-01 17:48:03 - INFO:
+--------+--------+-----------+--------+
| TEST 9 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9334 | 0.9176 | 0.9513 |
| Entity | 0.8928 | 0.8799 | 0.9061 |
+--------+--------+-----------+--------+
2022-04-01 17:48:03 - INFO: Best DEV F1: 0.9230
2022-04-01 17:48:03 - INFO: Best TEST F1: 0.8848
2022-04-01 17:48:08 - INFO: TEST Label F1 [0.99751797 0.98505523 0.9197861 0.904 0.79831933 0.98876404
0.84581498 1. ]
2022-04-01 17:48:08 - INFO:
+------------+--------+-----------+--------+
| TEST Final | F1 | Precision | Recall |
+------------+--------+-----------+--------+
| Label | 0.9299 | 0.9131 | 0.9486 |
| Entity | 0.8848 | 0.8688 | 0.9014 |
+------------+--------+-----------+--------+

关于数据集ace2004/2005的超参数设置

请问能给出ace2004/2005的超参数设置嘛，我们的按照conll03的参数设置复现ace数据集时，发现最终的结果差距较大，不知道是不是超参数的选择上有什么变动

你好，我想复现一下OntoNotes 4.0和Weibo 数据集结果，但源码代码中没有找到，麻烦提供一下OntoNotes 4.0和MSRA 数据集上的超参配置文件，多谢. 邮箱：[email protected]

关于sota的疑问

作者你好，在论文中提到在多个ner数据集中都达到了sota，但是我在https://paperswithcode.com/area/natural-language-processing/named-entity-recognition-ner 上发现不少数据集的sota都比论文中提到的高，请问这是怎么回事？

关于随机种子不起作用的原因

作者你好，你们提供的源码，我上次跑的时候发现不能复现你论文的结果，同时和你沟通后被告知是CNN网络的问题导致随机种子不起作用。
后面我仔细调试了代码，发现原因不是这样的，导致随机种子失效的作用是模型中的CLN层，就是那个条件LayerNorm导致的；
原代码
# # # #条件层归一化——输入两个向量[B, L, 1, 512] 和 [B, L, 512]得到[B, L, L, 512] # cln = self.cln(word_reps.unsqueeze(2), word_reps)
替换为
cln = word_reps.unsqueeze(2).repeat(1,1,word_reps.shape[1],1)
模型的结果可复现。这样只是为了验证CLN层确实有问题。
关于怎么修改，我尝试的修改了一下，CLN中的分类器开起偏置后，发现还是不行。
作者看看怎么修改，也分析一下是什么问题？
谢谢！

How to solve multiple overlapped?

Hi, if I have a sequence (A, B, C, D, E, F). The first entity is A, C, D, F and the second is B, C, E, F. With your strategy, there are two reasonable results for both entities. The first could be coded to ACDF or ACEF. While the second could be coded to BCDF or BCEF.

My question is: how could your algorithm attack this situation?

关于中文数据集标注

请问您标注时是使用label-studio进行标注，还是所有文本信息人工一一进行标注？

关于论文中dis_embedding的实现dist_inputs的疑问

代码中的dist_inputs有点不太理解作者为什么要这样子设计？
` for k in range(length):
_dist_inputs[k, :] += k
_dist_inputs[:, k] -= k

    for i in range(length):
        for j in range(length):
            if _dist_inputs[i, j] < 0:
                _dist_inputs[i, j] = dis2idx[-_dist_inputs[i, j]] + 9
            else:
                _dist_inputs[i, j] = dis2idx[_dist_inputs[i, j]]
    _dist_inputs[_dist_inputs == 0] = 19`

_dist_inputs 为何还要减去dis2idx？以及dis2idx的值为何是这样子的：
dis2idx = np.zeros((1000), dtype='int64') dis2idx[1] = 1 dis2idx[2:] = 2 dis2idx[4:] = 3 dis2idx[8:] = 4 dis2idx[16:] = 5 dis2idx[32:] = 6 dis2idx[64:] = 7 dis2idx[128:] = 8 dis2idx[256:] = 9

这个代码的作用有点不理解，请教一下作者？这样设计有什么深意吗？

输出的TEST F1分数疑惑

logger.info('{} Label F1 {}'.format(title, f1_score(label_result.cpu().numpy(), 标签训练结果，标签预测结果
pred_result.cpu().numpy(),
average=None)))
##INFO: TEST Label F1 [0.99965146 0.6340702 0.70332481 0.25824176]，我想问一下这句话是什么意思，我的数据标签为两个，然后这个F1分数应该是预测两个标签的训练和测试的F1分数吗？
然后我将数据集的标签换成了四个，但是INFO: TEST Label F1 只有6个数，正常情况应该是8个吧？
不理解这个地方的输出Label F1具体是什么？

NotImplementedError

Hi Team,

I followed your instructions as it is and I got an error listed below.
`

+---------+-----------+----------+
| example | sentences | entities |
+---------+-----------+----------+
| train | 1 | 6 |
| dev | 1 | 1 |
| test | 1 | 2 |
+---------+-----------+----------+
2022-05-27 11:20:50 - INFO: Building Model
2022-05-27 11:20:56 - INFO: Epoch: 0
Traceback (most recent call last):
File "main.py", line 299, in
trainer.train(i, train_loader)
File "main.py", line 75, in train
label_result = torch.cat(label_result)
NotImplementedError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors, or that you (the operator writer) forgot to register a fallback function. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

CPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterCPU.cpp:18433 [kernel]
CUDA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterCUDA.cpp:26496 [kernel]
QuantizedCPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/build/aten/src/ATen/RegisterQuantizedCPU.cpp:1068 [kernel]
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradCPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradCUDA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradXLA: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradLazy: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradXPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradMLC: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradHPU: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradNestedTensor: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse1: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse2: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
AutogradPrivateUse3: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/VariableType_3.cpp:10141 [autograd kernel]
Tracer: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/torch/csrc/autograd/generated/TraceType_3.cpp:11560 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/autocast_mode.cpp:466 [backend fallback]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1634272068694/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

I am attaching my conda env here
w2ner.txt

关于预训练模型下载

报错信息：
INFO: https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-pytorch_model.bin not found in cache or force_download set to True
这个预训练模型我已经下载并保存在了cache文件夹下，但每次运行仍然会自动下载，请问这个force_download在哪里，一下没有找到

中文数据集复现配置问题

您好，我想复现一下中文MSRA、OntoNotes 4.0和Weibo数据集上结果，能否提供一下相关的config文件，谢谢。
这是我的邮箱：[email protected]

模型句子输入长度问题

运行模型后发现，句子长度最长为512，不足会补pad，但是我想将句子最长改为300，不知道在那个文件修改哪个参数，，，一只没找到，希望能给出修改意见

gpu显存占用问题

在训练时只能设置batch_size =4，无法进行大batch训练
在跑trainer.trian()时：
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3C:00.0 Off | 0 |
| N/A 55C P0 77W / 70W | 6330MiB / 15109MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
跑到trainer.eval()时：
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3C:00.0 Off | 0 |
| N/A 71C P0 71W / 70W | 12748MiB / 15109MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
显存占用会翻倍，并且不会在eval和test结束后释放，这导致无法使用大batch_size进行训练，不知道你们在跑的时候是不是也有这种情况，我看你们给出的至少都是batch_size=8以上

请问什么时候会release代码？

同学你好，很关注你在非连续实体上的工作，想请问你这篇论文的代码大约会在什么时间release出来？

在CMeEE数据上报错

debug后发现是
_dist_inputs[i, j] = dis2idx[-_dist_inputs[i, j]] + 9
这一句这里出错了，不知道应该怎么修改
我的数据处理后如下，中英文混杂的把英文也拆成了单个字符，不知道这样处理对不对

预测结果中的实体并不是按顺序出现的

我用W2NER在自己数据集上进行预测后，出来的实体结果顺序并不是按照原始文本中的顺序，比如我的文本是ABC，但是预测的实体顺序是ACB这样，请问这个顺序是需要自己后处理调么，还是在inference的时候可以设置，或者是我预测有问题

是否有针对标记方案本身的设计做过对比实验呢？

不加入模型里包含的DConv层等，是否有针对文章提出的标记方案和BIO、Span等方案做过对比呢？

模型训练过程中遇到的一些问题

在自己定义的数据集上，训练集的LabelF1是增加的，但valid上的LABEL和Entity的得分都不变，可能的原因是什么呢？

实体label如何学到的呢？

您好，我在阅读论文的时候有一点困惑：2d grid最后预测的关系为三种，通过解码可以得到不同的实体序列，但是如何得到实体类型呢？
我看预测的结果中会包含实体的类型，但在论文中没有看到是如何做到的，期待解答～

中文数据集的格式问题

请问中文数据集的中的sentence这个数据都需要句号“。”作为结尾吗，数据集的type类型多少个比较好，然后就是一个句子中多长比合适，一个句子中标记几个实体比较合适，谢谢（可以以您的数据集resume-zh的分析为例）

关于模型的输出

模型的最后输出是什么？BIO标签吗（相当于序列标注问题）？

有关中文数据集字段的问题

您好，在resume数据集文件里有一个word关键字类似于分词的，但是我好像没有在代码中看到哪里使用了它。请问这个关键字是否是必须的，如果是它是如何获得的呢？

期待您的回复

推断报错

size mismatch for predictor.biaffine.weight: copying a param with shape torch.Size([12, 513, 513]) from checkpoint, the shape in current model is torch.Size([6, 513, 513]).
size mismatch for predictor.linear.weight: copying a param with shape torch.Size([12, 288]) from checkpoint, the shape in current model is torch.Size([6, 288]).
size mismatch for predictor.linear.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([6]).

仅进行推断时，load model会报上述错误，求解～～

您好我有个现象不是很理解希望能得到解答

2022-03-07 11:53:57 - INFO: Epoch: 0
2022-03-07 11:58:07 - INFO:
+---------+--------+--------+-----------+--------+
| Train 0 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0994 | 0.2583 | 0.2489 | 0.4252 |
+---------+--------+--------+-----------+--------+
2022-03-07 11:58:12 - INFO: EVAL Label F1 [0.99934271 0.98191852 0.98198198 1. 0.96551724 0.92318841
0.92511013 0.67803547 0.73913043 1. ]
2022-03-07 11:58:12 - INFO:
+--------+--------+-----------+--------+
| EVAL 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9194 | 0.8889 | 0.9705 |
| Entity | 0.9171 | 0.9294 | 0.9051 |
+--------+--------+-----------+--------+
2022-03-07 11:58:18 - INFO: TEST Label F1 [0.99937835 0.97871818 0.96551724 0.98245614 1. 0.9242923
0.92436975 0.68362688 0.78481013 0.90909091]
2022-03-07 11:58:18 - INFO:
+--------+--------+-----------+--------+
| TEST 0 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9152 | 0.8914 | 0.9575 |
| Entity | 0.9234 | 0.9536 | 0.8951 |
+--------+--------+-----------+--------+
2022-03-07 11:58:19 - INFO: Epoch: 1
2022-03-07 12:02:25 - INFO:
+---------+--------+--------+-----------+--------+
| Train 1 | Loss | F1 | Precision | Recall |
+---------+--------+--------+-----------+--------+
| Label | 0.0044 | 0.8625 | 0.8954 | 0.8368 |
+---------+--------+--------+-----------+--------+
2022-03-07 12:02:30 - INFO: EVAL Label F1 [0.99968644 0.98618307 0.99547511 1. 0.96551724 0.94186047
0.94977169 0.89143866 0.91891892 1. ]
2022-03-07 12:02:30 - INFO:
+--------+--------+-----------+--------+
| EVAL 1 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9649 | 0.9564 | 0.9750 |
| Entity | 0.9459 | 0.9459 | 0.9459 |
+--------+--------+-----------+--------+
2022-03-07 12:02:36 - INFO: TEST Label F1 [0.99974417 0.98823244 0.99555556 1. 1. 0.94695481
0.96069869 0.89397795 0.94117647 0.90909091]
2022-03-07 12:02:36 - INFO:
+--------+--------+-----------+--------+
| TEST 1 | F1 | Precision | Recall |
+--------+--------+-----------+--------+
| Label | 0.9635 | 0.9638 | 0.9658 |
| Entity | 0.9515 | 0.9589 | 0.9442 |
+--------+--------+-----------+--------+

在第一轮train中 F1和pre都很低，但最终eval和test的两个指标也算达到了比较高的数值，这是否说明label和entity其实和模型表现或者模型训练的关系没那么大？

关于CLN层

大佬，我想问一下CLN层具体是怎么工作的，能否举个例子说明一下

自定义数据集训练出问题

出现zg这个问题是不是因为我数据集中的文本长度过长呢，该怎么解决呢，望解答！！！

关于loss计算和最后的词对关系张量计算

大佬你好，有几个地方还是没明白，我跨专业有点菜，希望大佬不会嫌我烦。首先是loss，论文里面写的是负对数似然损失，代码里面好像用的是交叉熵，这个loss怎么计算呀？协预测器出来处理后得到词对关系分类矩阵，那具体怎么在这个矩阵上计算损失？另外一个就是论文里面说Co-Predictor Layer 出来后得到的是一个关系分数张量，经过softmax后，得到后面这个Word-Word Relation Classification 张量。我看了代码，协预测器出来的outputs[B,L,L,num_class]经过torch.argmax(outputs ,-1)得到词对关系张量[B,L,L]，torch.argmax（）返回的是最大值索引，怎么得到这里Word-Word Relation Classification ，不是很理解，希望能够得到大佬的解答。

Also, this link seems to require access rights. Could you change to sharable to everyone?

Thank you very much.