GithubHelp home page GithubHelp logo

spactes's Introduction

Apache SpaCTeS-cTAKES

Introduction

We have implemented a clinical text-processing tool for Spanish and Catalan EHRs (Electronic Health Records). This tool, the first in Spanish for neuroscientific purposes, was used to process a collection of anonymized discharge reports collected through a network that integrates 46 Catalan hospitals. The interoperability between these different health systems relied on the HC3 (Shared Clinical History of Catalonia) data model. The tool was primarily developed to assist human experts in the process of systematically evaluating hospital care for patients with a diagnosis of stroke. The tool (named SpaCTeS) is a pipeline that integrates software components performing the following operations: (1) basic PDF pre-processing and conversion into plain text, (2) clinical document standardization and section identification, (3) automatic language detection to distinguish between Spanish and Catalan texts, (4) sentence splitting and tokenization (Freeling), (5) PoS tagging and lemmatization (Freeling), (6) temporal tagging (HeidelTime), (7) semantic tagging of clinical entity mentions with focus on SNOMED semantic tags for disorders, procedures, findings, body structures, substances and organisms (Fuzzy Dictionary lookup), (8) clinical entity grounding system for linking clinical mentions to relevant temporal tags.

The first version of our tool is available which is included these three components for the Spanish language:

  1. FREELING (Padro and Stanilovsky, 2012) is a C++ library providing language analysis functionalities (Morphological Analysis, Named Entity Detection, PoS-Tagging, Parsing, Word Sense Disambiguation, Semantic Role Labelling, so forth) for a variety of languages. FREELING can be integrated into UIMA using a wrapper and a dockerized version of Freeling that was developed during the OpenMinTeD project.

  2. HEIDELTIME (Strotgen and Gertz, 2010) is a multilingual, domain-sensitive temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard.

  3. FUZZY DICTIONARY LOOKUP identifies terms in the text and normalizes them to codes in a given ontology. This component is based on the Fast Dictionary lookup of cTAKES. Although the lookup algorithm has been changed completely. The Fast Dictionary Lookup component of cTAKES is strict to finding the matched words in the dictionary/lexicon, and therefore if in the input’s EHR we have typos or missed tokens, the fast dictionary lookup component could not detect these tokens.

Input is a text file and output is XML(Readble by UIMA CVS), BRAT or HTML files.

Analysis Engine (AE) for writing Brat files has been added to cTAKES in cTAKES-core project.

We integrated all of these components into cTAKES as native components

Note: Type System of cTAKES has been updated

Requirements

After clone cTAKES in your local repository Change pom of cTAKES directory with the current one and add all new modules (Freeling, HeidelTime, SpellCheker, Fuzzy Dictionary Lookup, SpaCTeS, SpaCTes-res and type-system) to the cTAKES directory.

Following User Install Guide of:

  1. cTAKES.

  2. Freeling Wrapper. We installed Freeling V 4.0.

Contact

Siamak Barzegar ([email protected])

Licence

Apache License

spactes's People

Contributors

siabar avatar

Stargazers

gaurav patel avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.