gaoq1 / ner-slot_filling Goto Github PK

View Code? Open in Web Editor NEW

177.0 8.0 46.0 1.39 MB

中文自然语言的实体抽取和意图识别（Natural Language Understanding），可选Bi-LSTM + CRF 或者 IDCNN + CRF

Python 100.00%

nlu slot slot-filling ner nlp bi-lstm crf idcnn medicle emr

ner-slot_filling's Introduction

NLU项目

这个项目做得是实体的抽取和意图的分类，slot filling and intent classify

语料的处理

python gen_cooked_corpus_and_w2v.py

以上生成模型需要的语料，按1:2:13分别生成test数据、dev数据、train数据。以及用gensim生成词向量，这个可以在更大的语料中训练

训练

python train_evaluate.py --clean True --train True --model_type bilstm

上面用的是bilstm训练，也可以选择使用idcnn。

测试

python train_evaluate.py --train False

ner-slot_filling's People

Stargazers

Watchers

ner-slot_filling's Issues

ub16c9@ub16c9-gpu:~/ub16_prj/ner-slot_filling$ python3.6 train_evaluate.py --clean True --train True --model_type bilstm
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.669 seconds.
Prefix dict has been built succesfully.
开始训练模型！！！
14253it [00:00, 291884.34it/s]
Python 3.6.8 (default, Dec 24 2018, 19:24:27)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

请问这个如何嵌到Rasa框架里面呢

如题，请教一下。

执行训练语句，python train_evaluate.py --clean True --train True --model_type bilstm 直接跳转进了python

您好：我用python3.6，训练出现下面情况

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.699 seconds.
Prefix dict has been built succesfully.
开始训练模型！！！
13724it [00:00, 133884.01it/s]
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

请问了解为什么吗？

sort

from tensorflow.contrib.framework import sort can not found sort

运行时报错

Traceback (most recent call last):
File "E:/8、Chatbot机器人/nlu-master/sample_code/random_output.py", line 33, in
dev_dct = json.load(open(sys.argv[1]), encoding='utf8')
IndexError: list index out of range

准确率问题

这个模型的意图识别正确率能达到多少，我们直接跑你的很低

which version of tensorflow?

data and performance problem

Thanks for sharing the code.
Questions:

what is the training data?
do you compare the NER of deep learning with NER in jieba or Hannlp ?
what is the key factor in identifying the performance of NER models?
Thanks!

训练时报错

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\dan\AppData\Local\Temp\jieba.cache
Loading model cost 0.647 seconds.
Prefix dict has been built successfully.
0it [00:00, ?it/s]开始训练模型！！！
13724it [00:00, 257175.79it/s]
3494it [00:00, 264304.62it/s]
1216it [00:00, 23299.77it/s]
576it [00:00, 115511.31it/s]
49it [00:00, 49108.94it/s]
155it [00:00, 155456.03it/s]
100%|██████████| 576/576 [00:00<00:00, 8812.50it/s]
100%|██████████| 155/155 [00:00<00:00, 7778.66it/s]
0%| | 0/49 [00:00<?, ?it/s]576 / 155 / 49 sentences in train / dev / test.
100%|██████████| 49/49 [00:00<00:00, 8189.39it/s]
Traceback (most recent call last):
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 248, in
main(args)
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 242, in main
train()
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 143, in train
config = load_config(args.config_file)
File "E:\8、Chatbot机器人\ner-slot_filling-master\utils\utils.py", line 112, in load_config
return json.load(f)
File "K:\Anaconda\lib\json_init_.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "K:\Anaconda\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "K:\Anaconda\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "K:\Anaconda\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 10 column 5 (char 180)

请问如何解决呢？

进入空洞卷积之前为什么会有个二维卷积？必须的嘛？

测试错误

您好：我用python3.6，训练出现下面情况

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.699 seconds.
Prefix dict has been built succesfully.
开始训练模型！！！
13724it [00:00, 133884.01it/s]
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

测试时报错如下

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.692 seconds.
Prefix dict has been built succesfully.
2019-06-10 23:07:51.152866: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-10 23:07:52.790032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2019-06-10 23:07:52.790099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-10 23:07:53.229309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-10 23:07:53.229363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-10 23:07:53.229376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-10 23:07:53.229666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10168 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
WARNING:tensorflow:From /data/proj/Captcha/ner-slot_filling/models/model.py:385: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2.
/home/jiang.li/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2019-06-10 23:07:55,057 - /data/proj/Captcha/ner-slot_filling/log/train.log - INFO - Created model with fresh parameters.
Loading pretrained embeddings from /Users/gaoquan/Documents/ml-learning/ner-learning/NER_medical_records/assets/cooked_corpus/vec.txt...
Traceback (most recent call last):
File "train_evaluate.py", line 251, in
```
main(args)
```
File "train_evaluate.py", line 247, in main
```
evaluate_line()
```
File "train_evaluate.py", line 230, in evaluate_line

load_word2vec, config, id_to_char, logger)

File "/data/proj/Captcha/ner-slot_filling/utils/utils.py", line 158, in create_model

emb_weights = load_vec(config["emb_file"], id_to_char, config["char_dim"], emb_weights)

File "/data/proj/Captcha/ner-slot_filling/utils/data_utils.py", line 172, in load_word2vec

for i, line in enumerate(codecs.open(emb_path, 'r', 'utf-8')):

File "/home/jiang.li/ENTER/envs/pytorch/lib/python3.6/codecs.py", line 897, in open

file = builtins.open(filename, mode, buffering)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/gaoquan/Documents/ml-learning/ner-learning/NER_medical_records/assets/cooked_corpus/vec.txt'

请问该如何修改，谢谢

gaoq1 / ner-slot_filling Goto Github PK

ner-slot_filling's Introduction

NLU项目

语料的处理

训练

测试

ner-slot_filling's People

Stargazers

Watchers

Forkers

ner-slot_filling's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs