lybarger / brat_scoring Goto Github PK

License: MIT License

Python 100.00%

brat_scoring's Introduction

BRAT Scoring

Introduction

This repository is a Python package for comparing and scoring two sets of BRAT annotations. The current version focuses on event-based scoring.

Evaluation criteria

The evaluation criteria are defined in sdoh_scoring.pdf.

The primary criteria for the SDOH challenge will be:

trigger: "overlap"
span-only arguments: "exact"
labeld arguments: "label"

The other criteria are included in the scoring routine to assist with troubleshooting.

Requirements

The scoring routine is implemented in Python 3. NO testing was performed using Python 2.

The following packages are needed:

wheel
pandas
tqdm
spacy>=3.0.0 with languange model "en_core_web_sm"

Evaluation script

The scoring routine can be pip installed or called from command line. The scoring routine implements the aforementioned evaluation by comparing two directories with BRAT-style annotations (*.txt and *.ann files). The scoring routine identifies all the *.ann files in both directories, finds matching filenames in the directories, and then compares the annotations defined in the *.ann files.

Python package installation

The brat_scoring package and its dependencies can be installed using the following steps:

Make sure the pip package manager is up-to-date:

pip install pip --upgrade

Install the brat_scoring package:

pip3 install git+https://github.com/Lybarger/brat_scoring.git --upgrade

Download the spacy model en_core_web_sm using:

python -m spacy download en_core_web_sm

Python scoring function usage

Scoring is performed using the score_brat_sdoh function. The required arguments that define the input and output paths include:

gold_dir: str, path to the input directory with gold annotations in BRAT format, e.g. "/home/gold/"
predict_dir: str, path to the input directory with predicted annotations in BRAT format, e.g. "/home/predict/"
output: str, path for the output CSV file that will contain the evaluation results, e.g. "/home/scoring.csv"

The optional arguments define the evaluation criteria:

labeled_args: list, list of labeled argument names as str, default is ['StatusTime', 'StatusEmploy', 'TypeLiving']
score_trig: str, trigger scoring criterion, options include {"exact", "overlap", "min_dist"}, default is "overlap".
score_span: str, span-only argument scoring criterion, options include {"exact", "overlap", "partial"}, default is "exact"
score_labeled: str, labeled argument (span-with-value argument) scoring criterion, options include {"exact", "overlap", "label"}, default is "label"
include_detailed: bool, if True, the scoring routine will generate document-level scores, in addition to the corpus-level scores
loglevel: str, logging level can be set, default value is "info"

Below is an example usage:

from brat_scoring.scoring import score_brat_sdoh
from brat_scoring.constants import EXACT, LABEL, OVERLAP, PARTIAL, MIN_DIST

df = score_brat_sdoh( \
                gold_dir = "/home/gold/",
                predict_dir = "/home/predict/",
                output_path = "/home/scoring.csv",
                score_trig = OVERLAP,
                score_span = EXACT,
                score_labeled = LABEL,
                )

Command line installation

For command-line use, brat_scoring repository can cloned and its dependencies installed using the following steps:

Make sure the pip package manager is up-to-date:

pip install pip --upgrade

Clone brat_scoring repo:

git clone https://github.com/Lybarger/brat_scoring.git

Install dependencies:

pip install -r brat_scoring/requirements.txt

Download the spacy model en_core_web_sm using:

python -m spacy download en_core_web_sm

Command line usage

The command line script, run_sdoh_scoring.py, is a simple wrapper for the function, score_brat_sdoh, and is located at brat_scoring/brat_scoring/run_sdoh_scoring.py.

The arguments for the command line script, run_sdoh_scoring.py, are similar to that of the function score_brat_sdoh above. The arguments can be view using:

python3 run_sdoh_scoring.py -h

Below is an example usage:

python3 run_sdoh_scoring.py /home/gold/ /home/predict/ /home/scoring.csv
--score_trig min_dist --score_span exact --score_labeled label

brat_scoring's People

Contributors

Stargazers

Watchers

Forkers

hannapethani

brat_scoring's Issues

illegal hardware instruction python -m spacy download en_core_web_sm error for M1 Apple Mac users

I just tried to load and run the scoring script on an M1 mac. I ran into some issues with spacy and the new hardware.

Specifically, I get this error when I try to download the spacy model:

illegal hardware instruction python -m spacy download en_core_web_sm

Following the advice here (explosion/spaCy#9397), I installed spacy directly via conda and then created a new conda environment like so:

conda install -c conda-forge spacy

conda create -n n2c2SDoH-py3.8 python=3.8
conda activate n2c2SDoH-py3.8
pip install --upgrade pip
pip install -r ~/git/brat_scoring/requirements.txt
python -m spacy download en_core_web_sm

I'm posting here to provide the solution that worked for me in case other M1 users run into the same issue.

(I don't think there is an obvious software solution to this problem just the need to update the README.)

Order of events changes scoring

Happened to notice the following, that the order of events in the .ann files affect the scoring result. For example:

mkdir foo
cp /path/to/train/mimic/1712.txt foo
sort /path/to/train/mimic/1712.ann > foo/1712.ann

mkdir bar
cp /path/to/train/mimic/1712.txt bar
sort -r /path/to/train/mimic/1712.ann > bar/1712.ann  # same file, different order of annotations

python brat_scoring/brat_scoring/run_sdoh_scoring.py foo bar foobar.csv --score_trig overlap --score_span exact --score_labeled label --include_detailed

Gives, in foobar.csv:

event,argument,subtype,NT,NP,TP,P,R,F1
OVERALL,OVERALL,OVERALL,24,24,19,0.7916666666666666,0.7916666666666666,0.7916666666666666
Alcohol,Amount,N/A,1,1,0,0.0,0.0,0.0
Alcohol,Frequency,N/A,1,1,0,0.0,0.0,0.0
Alcohol,StatusTime,current,1,1,0,0.0,0.0,0.0
Alcohol,StatusTime,past,1,1,0,0.0,0.0,0.0
Alcohol,Trigger,N/A,2,2,2,1.0,1.0,1.0
Alcohol,Type,N/A,1,1,0,0.0,0.0,0.0
Drug,History,N/A,1,1,1,1.0,1.0,1.0
Drug,StatusTime,current,1,1,1,1.0,1.0,1.0
Drug,StatusTime,past,1,1,1,1.0,1.0,1.0
Drug,Trigger,N/A,2,2,2,1.0,1.0,1.0
Drug,Type,N/A,3,3,3,1.0,1.0,1.0
LivingStatus,StatusTime,current,1,1,1,1.0,1.0,1.0
LivingStatus,Trigger,N/A,1,1,1,1.0,1.0,1.0
LivingStatus,TypeLiving,with_others,1,1,1,1.0,1.0,1.0
Tobacco,Amount,N/A,1,1,1,1.0,1.0,1.0
Tobacco,Duration,N/A,1,1,1,1.0,1.0,1.0
Tobacco,Frequency,N/A,1,1,1,1.0,1.0,1.0
Tobacco,StatusTime,current,1,1,1,1.0,1.0,1.0
Tobacco,Trigger,N/A,1,1,1,1.0,1.0,1.0
Tobacco,Type,N/A,1,1,1,1.0,1.0,1.0

Where there are two "Alcohol" events with the same Alcohol trigger and different arguments. It looks like the scoring script must compare one event to the other, effectively, based on the order in which they appear in the .ann files?

Textbound.brat_str() / textbound_str() does not always round-trip when split spans are separated by more than one newline

I noticed that the string produced by textbound.brat_str() doesn't match the original line in the BRAT .ann file for a couple of documents in the n2c2 track 2 dataset: 4590 and 4693. The result is an off-by-one error in the split spans defining the covered tokens.

It looks like brat_str() and related functions depend on some particular conventions when there are newlines between annotated tokens. When there is more than one newline, it looks like brat_str() needs an empty span to work correctly. Here's one case from document 1029:

foo\n\nbarrr bat                                                             # annotated portion of the document
T6      History 228 231;232 232;233 242 foo  barrr bat       # .ann file

Note that there is a zero-length annotation span in there: (232, 232). And there are two spaces between "foo" and "barrr" in the covered text column corresponding to the two skipped newline characters.

However, in the problematic document 4590:

foooooo\n\nbarr batt                                                  # annotated portion of the document
T6      Amount 172 179;181 190     foooooo barr batt    # .ann file
T6      Amount 172 179;180 189     foooooo barr batt    # textbound.brat_str()

Notice the lack of zero-length annotation span in the .ann file and the lack of a double space in the covered text column. I think that, due to this, there is a off-by-one difference in the output of brat_str().

Same thing happens for document 4693.

Easiest fix is probably to modify the documents associated with the failure to round-trip. As far as I can tell, modifying my copy of the documents "fixed" the problem. Ideally, I suppose that it'd be best to store the original annotated spans rather than depending on conventions and attempting to reconstruct them in brat_str(). However, I totally understand not doing this since it'd involve bigger changes, the possibility of breaking something that the scoring code depends upon, and it's not clear whether anyone else would get bit by the buggy behavior.