GithubHelp home page GithubHelp logo

moeneuron / negspacy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jenojp/negspacy

1.0 0.0 0.0 288 KB

spaCy pipeline object for negating concepts in text

License: MIT License

Python 100.00%

negspacy's Introduction

negspacy: negation for spaCy

Build Status Built with spaCy pypi Version DOI Code style: black

spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.

NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029

What's new

Version 1.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of negspacy.

If your project uses spaCy 2.3.5 or earlier, you will need to use version 0.1.9. See archived readme.

Installation and usage

Install the library.

pip install negspacy

Import library and spaCy.

import spacy
from negspacy.negation import Negex

Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

View negations.

doc = nlp("She does not like Steve Jobs but likes Apple products.")

for e in doc.ents:
	print(e.text, e._.negex)
Steve Jobs True
Apple False

Consider pairing with scispacy to find UMLS concepts in text and process negations.

NegEx Patterns

  • pseudo_negations - phrases that are false triggers, ambiguous negations, or double negatives
  • preceding_negations - negation phrases that precede an entity
  • following_negations - negation phrases that follow an entity
  • termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Termsets

Designate termset to use, en_clinical is used by default.

  • en = phrases for general english language text
  • en_clinical DEFAULT = adds phrases specific to clinical domain to general english
  • en_clinical_sensitive = adds additional phrases to help rule out historical and possibly irrelevant entities

To set:

from negspacy.negation import Negex
from negspacy.termsets import termset

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

Additional Functionality

Change patterns or view patterns in use

Replace all patterns with your own set

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex", 
    config={
        "neg_termset":{
            "pseudo_negations": ["might not"],
            "preceding_negations": ["not"],
            "following_negations":["declined"],
            "termination": ["but","however"]
        }
    }
    )

Add and remove individual patterns on the fly from built-in termsets

from negspacy.termsets import termset
ts = termset("en")
ts.add_patterns({
            "pseudo_negations": ["my favorite pattern"],
            "termination": ["these are", "great patterns", "but"],
            "preceding_negations": ["wow a negation"],
            "following_negations": ["extra negation"],
        })
#OR
ts.remove_patterns(
        {
            "termination": ["these are", "great patterns"],
            "pseudo_negations": ["my favorite pattern"],
            "preceding_negations": ["denied", "wow a negation"],
            "following_negations": ["unlikely", "extra negation"],
        }
    )

View patterns in use

from negspacy.termsets import termset
ts = termset("en_clinical")
print(ts.get_patterns())

Negations in noun chunks

Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example:

nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text)

# no headache

This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a chunk_prefix:

nlp = spacy.load("en_core_sci_sm")
ts = termset("en_clinical")
nlp.add_pipe(
    "negex",
    config={
        "chunk_prefix": ["no"],
    },
    last=True,
)
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text, e._.negex)

# no headache True

Contributing

contributing

Authors

  • Jeno Pizarro

License

license

Other libraries

This library is featured in the spaCy Universe. Check it out for other useful libraries and inspiration.

If you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at extractacy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.