GithubHelp home page GithubHelp logo

gaoq1 / ner-slot_filling Goto Github PK

View Code? Open in Web Editor NEW
177.0 8.0 46.0 1.39 MB

中文自然语言的实体抽取和意图识别(Natural Language Understanding),可选Bi-LSTM + CRF 或者 IDCNN + CRF

Python 100.00%
nlu slot slot-filling ner nlp bi-lstm crf idcnn medicle emr

ner-slot_filling's Introduction

NLU项目

这个项目做得是实体的抽取和意图的分类,slot filling and intent classify

语料的处理

python gen_cooked_corpus_and_w2v.py

以上生成模型需要的语料,按1:2:13分别生成test数据、dev数据、train数据。以及用gensim生成词向量,这个可以在更大的语料中训练

训练

python train_evaluate.py --clean True --train True --model_type bilstm

上面用的是bilstm训练,也可以选择使用idcnn。

测试

python train_evaluate.py --train False

ner-slot_filling's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ner-slot_filling's Issues

data

数据格式能不能给下?

what do the slots mean?

slots = ['DIS', 'SYM', 'SGN', 'TES', 'DRU', 'SUR', 'PRE', 'PT', 'Dur', 'TP', 'REG', 'ORG', 'AT', 'PSB', 'DEG', 'FW', 'CL']

初次接触槽填充,这些槽分别表示的是什么意思

what we suppose to do?

ub16c9@ub16c9-gpu:~/ub16_prj/ner-slot_filling$ python3.6 train_evaluate.py --clean True --train True --model_type bilstm
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.669 seconds.
Prefix dict has been built succesfully.
开始训练模型!!!
14253it [00:00, 291884.34it/s]
Python 3.6.8 (default, Dec 24 2018, 19:24:27)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

执行训练语句,python train_evaluate.py --clean True --train True --model_type bilstm 直接跳转进了python

您好:我用python3.6,训练出现下面情况

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.699 seconds.
Prefix dict has been built succesfully.
开始训练模型!!!
13724it [00:00, 133884.01it/s]
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

请问了解为什么吗?

sort

from tensorflow.contrib.framework import sort can not found sort

运行时报错

Traceback (most recent call last):
File "E:/8、Chatbot机器人/nlu-master/sample_code/random_output.py", line 33, in
dev_dct = json.load(open(sys.argv[1]), encoding='utf8')
IndexError: list index out of range

准确率问题

这个模型的意图识别正确率能达到多少,我们直接跑你的很低

data and performance problem

Thanks for sharing the code.
Questions:

  1. what is the training data?
  2. do you compare the NER of deep learning with NER in jieba or Hannlp ?
  3. what is the key factor in identifying the performance of NER models?
    Thanks!

训练时报错

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\dan\AppData\Local\Temp\jieba.cache
Loading model cost 0.647 seconds.
Prefix dict has been built successfully.
0it [00:00, ?it/s]开始训练模型!!!
13724it [00:00, 257175.79it/s]
3494it [00:00, 264304.62it/s]
1216it [00:00, 23299.77it/s]
576it [00:00, 115511.31it/s]
49it [00:00, 49108.94it/s]
155it [00:00, 155456.03it/s]
100%|██████████| 576/576 [00:00<00:00, 8812.50it/s]
100%|██████████| 155/155 [00:00<00:00, 7778.66it/s]
0%| | 0/49 [00:00<?, ?it/s]576 / 155 / 49 sentences in train / dev / test.
100%|██████████| 49/49 [00:00<00:00, 8189.39it/s]
Traceback (most recent call last):
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 248, in
main(args)
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 242, in main
train()
File "E:/8、Chatbot机器人/ner-slot_filling-master/train_evaluate.py", line 143, in train
config = load_config(args.config_file)
File "E:\8、Chatbot机器人\ner-slot_filling-master\utils\utils.py", line 112, in load_config
return json.load(f)
File "K:\Anaconda\lib\json_init_.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "K:\Anaconda\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "K:\Anaconda\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "K:\Anaconda\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 10 column 5 (char 180)

请问如何解决呢?

标签

DIS', 'SYM', 'SGN', 'TES', 'DRU', 'SUR', 'PRE', 'PT', 'Dur', 'TP', 'REG', 'ORG', 'AT', 'PSB', 'DEG', 'FW', 'CL']
你好 请问下这些标签是什么意思

测试错误

您好:我用python3.6,训练出现下面情况

  • Building prefix dict from the default dictionary ...
  • Loading model from cache /tmp/jieba.cache
  • Loading model cost 0.699 seconds.
  • Prefix dict has been built succesfully.
  • 开始训练模型!!!
  • 13724it [00:00, 133884.01it/s]
  • Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
  • [GCC 7.3.0] on linux
  • Type "help", "copyright", "credits" or "license" for more information.
  • (InteractiveConsole)

测试时报错如下

  • Building prefix dict from the default dictionary ...
  • Loading model from cache /tmp/jieba.cache
  • Loading model cost 0.692 seconds.
  • Prefix dict has been built succesfully.
  • 2019-06-10 23:07:51.152866: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
  • 2019-06-10 23:07:52.790032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
  • name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
  • pciBusID: 0000:02:00.0
  • totalMemory: 10.73GiB freeMemory: 10.53GiB
  • 2019-06-10 23:07:52.790099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
  • 2019-06-10 23:07:53.229309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
  • 2019-06-10 23:07:53.229363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
  • 2019-06-10 23:07:53.229376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
  • 2019-06-10 23:07:53.229666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10168 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
  • WARNING:tensorflow:From /data/proj/Captcha/ner-slot_filling/models/model.py:385: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
  • Instructions for updating:
  • Future major versions of TensorFlow will allow gradients to flow
  • into the labels input on backprop by default.
  • See tf.nn.softmax_cross_entropy_with_logits_v2.
  • /home/jiang.li/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  • "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  • 2019-06-10 23:07:55,057 - /data/proj/Captcha/ner-slot_filling/log/train.log - INFO - Created model with fresh parameters.
  • Loading pretrained embeddings from /Users/gaoquan/Documents/ml-learning/ner-learning/NER_medical_records/assets/cooked_corpus/vec.txt...
  • Traceback (most recent call last):
  • File "train_evaluate.py", line 251, in
  • main(args)
    
  • File "train_evaluate.py", line 247, in main
  • evaluate_line()
    
  • File "train_evaluate.py", line 230, in evaluate_line
  • load_word2vec, config, id_to_char, logger)
    
  • File "/data/proj/Captcha/ner-slot_filling/utils/utils.py", line 158, in create_model
  • emb_weights = load_vec(config["emb_file"], id_to_char, config["char_dim"], emb_weights)
    
  • File "/data/proj/Captcha/ner-slot_filling/utils/data_utils.py", line 172, in load_word2vec
  • for i, line in enumerate(codecs.open(emb_path, 'r', 'utf-8')):
    
  • File "/home/jiang.li/ENTER/envs/pytorch/lib/python3.6/codecs.py", line 897, in open
  • file = builtins.open(filename, mode, buffering)
    
  • FileNotFoundError: [Errno 2] No such file or directory: '/Users/gaoquan/Documents/ml-learning/ner-learning/NER_medical_records/assets/cooked_corpus/vec.txt'

请问该如何修改,谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.