GithubHelp home page GithubHelp logo

medknowgraphdataprocessing's Introduction

MedKnowGraphDataProcessing

医疗知识图谱数据清洗部分

项目含义

“基于中文电子病历的医疗知识图谱研究”中的数据处理部分 将半结构化的html格式的电子病历文件进行数据清洗转化为结构化的xml文件

具体程序含义如下:

  1. ChangeFileFolders:把html文件按照科室文件夹分类

  2. HtmlToTxt:把html文件转为txt文件,对于病程记录只保留首次病程记录

  3. TxtPreprocess:对txt数据进行清洗,格式处理;按照";"和"。"进行分句。

  • 病程记录:病例特点+初步诊断+诊疗计划(病例特点中的查体去掉)
  • 出院小结:入院情况+入院诊断+诊疗经过+出院诊断+出院情况+出院医嘱
  1. TxtToXml:把清洗后的txt文件转化为xml文件

  2. PickEntRel:在病程记录和出院小结的文件夹中挑选出实体文件.xml.ent和实体关系文件.xml.rel

  3. RelProcess:标记后的实体关系文件.xml.rel中为实体组之间的关系 把实体组之间关系转为实体之间关系

  4. CountEntRel:统计出院小结和病程记录的实体、实体修饰和实体关系数量

已解决问题

如上

未解决问题

  1. 因为去掉了“诊断及依据”这一项,导致某些诊断丢失,“初步诊断”里就为空
  2. 分句时,由于电子病历撰写格式不规范,会把原本一个句子分成好几个句子 比如。在“”里面时会分成两个句子: 患者因“纳差、全身浮肿2天。”入院。 变成: 患者因“纳差、全身浮肿2天。 ”入院。
  • 对于这两个问题暂时是人工

medknowgraphdataprocessing's People

Contributors

zhangbeibei avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.