GithubHelp home page GithubHelp logo

baiyyang / medical-entity-recognition Goto Github PK

View Code? Open in Web Editor NEW
219.0 9.0 70.0 216.27 MB

包含传统的基于统计模型(CRF)和基于深度学习(Embedding-Bi-LSTM-CRF)下的医疗数据命名实体识别

License: Apache License 2.0

Python 100.00%
tensorflow named-entity-recognition conditional-random-fields bi-lstm-crf

medical-entity-recognition's Introduction

medical-entity-recognition

Describe

本项目是针对医疗数据,进行命名实体识别。主要采用的方法:

  1. 基于条件随机场(Condition Random Fields, CRF)的命名实体识别.

  2. 基于双向长短时记忆神经网络和条件随机场(Bi-LSTM-CRF)的命名实体识别。

Introduce

  1. raw_data是原始数据,来源于CCKS2017任务二中,针对医疗电子病例进行命名实体识别。reader.py文件是对原始数据进行处理,生成标准的NER格式(data, pos, label)的数据。

  2. train_test_data是模型的训练和测试的语料,其中word2id.pkl和char2id.pkl是神经网络中需要读入的字典。

  3. crf文件夹是使用CRF进行命名实体识别的模型,其中medical_entity_recognition_bio_char_ori.crfsuite和medical_entity_recognition_bio_word_ori.crfsuite分别是训练好的,以字为特征单元和词为特征单元的模型。

  4. bilstm_crf文件夹中是基于神经网络的命名实体识别的模型。其中,bio_model下存放的是已经训练好的两个模型。分别是随机初始化embedding的字向量和词向量的模型。其中:

  • 训练新的模型方法:

python main.py --mode train --data_dir *** --train_data *** --test_data *** --dictionary ***

  • 测试已有模型方法:

python main.py --mode test --data_dir ../train_test_data --train_data train_bio_char.txt --test_data test_bio_char.txt --dictionary char2id.pkl --demo_model random_char_300

Requirements

python 3

pycrfsuite:pip install python-crfsuite

zhon:pip install zhon

tensorflow >= 1.4

Result

分别以字和词为单元进行训练,实验结果如下:

model char_unit word_unit
CRF 0.73 0.74
Bi-LSTM_CRF 0.80 0.78

Reference

guillaumegenthial/sequence_tagging

Other

欢迎各位大佬,批评指正

medical-entity-recognition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medical-entity-recognition's Issues

代码阅读起来不流畅

方法定义太多了吧,好好的tensorflow被你拆的这里调一个方法那里调一个方法。两句话写个方法,不方便debug。可读性太差了

> 如题,可以用您提供的命令行对BLISTM-CRF进行测试,但是不知道如何测试CRF

如题,可以用您提供的命令行对BLISTM-CRF进行测试,但是不知道如何测试CRF
非常感谢!

我刚刚修改了一下,可以用了。主要对crf_unit.py做了以下几个修改:

1、将crt.predata批量修改为predata
2、填入作者给的两个训练和测试文件
testpath = '../train_test_data/test_bio_word.txt'
trainpath = '../train_test_data/train_bio_word.txt'

image

image

Originally posted by @Licko0909 in #6 (comment)

关于数据的使用问题

您好,我看了几篇医学NER的论文,他们都使用了CCKS2017作为实验的数据集,请问这个数据集是没有版权的吗,可以直接在我的论文中使用吗?

请问crf_unit.py如何char模型

您好,我想用crf训练基于字的命名实体识别,我在crf_unit.py里能正常运行训练基于词的模型,但是基于字的模型,出来的结果不对,这是我哪个地方改错了吗?
20181121165024

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.