GithubHelp home page GithubHelp logo

baiyyang / medical-entity-recognition Goto Github PK

View Code? Open in Web Editor NEW
219.0 9.0 70.0 216.27 MB

包含传统的基于统计模型(CRF)和基于深度学习(Embedding-Bi-LSTM-CRF)下的医疗数据命名实体识别

License: Apache License 2.0

Python 100.00%
tensorflow named-entity-recognition conditional-random-fields bi-lstm-crf

medical-entity-recognition's Introduction

medical-entity-recognition

Describe

本项目是针对医疗数据,进行命名实体识别。主要采用的方法:

  1. 基于条件随机场(Condition Random Fields, CRF)的命名实体识别.

  2. 基于双向长短时记忆神经网络和条件随机场(Bi-LSTM-CRF)的命名实体识别。

Introduce

  1. raw_data是原始数据,来源于CCKS2017任务二中,针对医疗电子病例进行命名实体识别。reader.py文件是对原始数据进行处理,生成标准的NER格式(data, pos, label)的数据。

  2. train_test_data是模型的训练和测试的语料,其中word2id.pkl和char2id.pkl是神经网络中需要读入的字典。

  3. crf文件夹是使用CRF进行命名实体识别的模型,其中medical_entity_recognition_bio_char_ori.crfsuite和medical_entity_recognition_bio_word_ori.crfsuite分别是训练好的,以字为特征单元和词为特征单元的模型。

  4. bilstm_crf文件夹中是基于神经网络的命名实体识别的模型。其中,bio_model下存放的是已经训练好的两个模型。分别是随机初始化embedding的字向量和词向量的模型。其中:

  • 训练新的模型方法:

python main.py --mode train --data_dir *** --train_data *** --test_data *** --dictionary ***

  • 测试已有模型方法:

python main.py --mode test --data_dir ../train_test_data --train_data train_bio_char.txt --test_data test_bio_char.txt --dictionary char2id.pkl --demo_model random_char_300

Requirements

python 3

pycrfsuite:pip install python-crfsuite

zhon:pip install zhon

tensorflow >= 1.4

Result

分别以字和词为单元进行训练,实验结果如下:

model char_unit word_unit
CRF 0.73 0.74
Bi-LSTM_CRF 0.80 0.78

Reference

guillaumegenthial/sequence_tagging

Other

欢迎各位大佬,批评指正

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.