GithubHelp home page GithubHelp logo

ausgerechnet / bratutils Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jeanphilippegoldman/bratutils

0.0 0.0 0.0 78 KB

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.

License: MIT License

Python 99.31% Makefile 0.69%

bratutils's Introduction

bratutils

CircleCI Maintainability Test Coverage License: MIT

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.

Installation

Install as a normal package from the source directory.

$ pip install bratutils

Agreement Definition

Agreement in multi-token annotations is commonly evaluated using f-score. due to various problems with computing the traditional Krippendorf's alpha and Cohen's kappa. Hripcsak prove the validity of the metric for very large populations, i.e. for unrestricted text annotations.

This library roughly follows the definitions of precision and recall calculation from the MUC-7 test scoring. The basic definitions along with some additional restrictions are laid out below:

  • CORRECT - when annotation tags and indices match completely
  • INCORRECT - when annotation tags do not match, but the indices coincide
  • PARTIAL - when the annotation tags are the same but one of the annotations has the same end index and a different start index
  • MISSING - annotations exising only in the gold standard annotation set
  • SPURIOUS - annotations existing only in the candidate annotation set

Note: the gold standard is considered the collections/document from which the comparison is invoked, while the supplied parallel annotation is considered the candidate set.

Disclaimer: the current definition of the PARTIAL category accomodates working with syntactic chunks. A different arrangement (e.g. pick largest contained tag as partial match instead of rightmost) might be more suitable for other tasks, for example some types of semantic annotation.

Examples

Simple example:

from bratutils import agreement as a

doc = a.Document('res/samples/A/data-sample-1.ann')
doc2 = a.Document('res/samples/B/data-sample-1.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print(statistics)

Output:

-------------------MUC-Table--------------------
------------------------------------------------
pos:135
act:134
cor:115
par:5
inc:4
mis:11
spu:10
------------------------------------------------
pre:0.858208955224
rec:0.851851851852
fsc:0.855018587361
------------------------------------------------
und:0.0814814814815
ovg:0.0746268656716
sub:0.0725806451613
------------------------------------------------
bor:119
ibo:15
------------------------------------------------
------------------------------------------------

bratutils's People

Contributors

hugosousa avatar jeanphilippegoldman avatar savkov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.