nlplab / brat Goto Github PK

View Code? Open in Web Editor NEW

1.8K 79.0 511.0 84.4 MB

brat rapid annotation tool (brat) - for all your textual annotation needs

Home Page: http://brat.nlplab.org

License: Other

Python 62.45% Shell 0.90% JavaScript 27.81% HTML 5.67% CSS 2.70% Perl 0.40% Makefile 0.07%

brat's Introduction

brat rapid annotation tool (brat)

Documentation

In an attempt to keep all user-facing documentation in one place, please visit the brat homepage which contains extensive documentation and examples of how to use and configure brat. We apologise for only providing minimal documentation along with the installation package but the risk of having out-dated documentation delivered to our end-users is unacceptable.

If you find bugs in your brat installation or errors in the documentation, please file an issue at our issue tracker and we will strive to address it promptly.

About brat

brat (brat rapid annotation tool) is based on the stav visualiser which was originally made in order to visualise BioNLP'11 Shared Task data. brat aims to provide an intuitive and fast way to create text-bound and relational annotations. Recently, brat has been widely adopted in the community. It has been used to create well-over 50,000 annotations by the Genia group and several other international research groups for a number of annotation projects.

brat aims to overcome short-comings of previous annotation tools such as:

De-centralisation of configurations and data, causing synchronisation issues
Annotations and related text not being visually adjacent
Complexity of set-up for annotators
Etc.

brat does this by:

Data and configurations on a central web server (as Mark Twain said: "Put all your eggs in one basket, and then guard that basket!")
Present text as it would appear to a reader and maintain annotations close to the text
Zero set-up for annotators, leave configurations and server/data maintenance to other staff

License

brat itself is available under the permissive MIT License but incorporates software using a variety of open-source licenses, for details please see see LICENSE.md.

Citing

If you do make use of brat or components from brat for annotation purposes, please cite the following publication:

@inproceedings{,
    author      = {Stenetorp, Pontus and Pyysalo, Sampo and Topi\'{c}, Goran
            and Ohta, Tomoko and Ananiadou, Sophia and Tsujii, Jun'ichi},
    title       = {{brat}: a Web-based Tool
            for {NLP}-Assisted Text Annotation},
    booktitle   = {Proceedings of the Demonstrations Session
            at {EACL} 2012},
    month       = {April},
    year        = {2012},
    address     = {Avignon, France},
    publisher   = {Association for Computational Linguistics},
}

If you make use of brat or its components solely for visualisation purposes, please cite the following publication:

@InProceedings{stenetorp2011supporting,
  author    = {Stenetorp, Pontus and Topi\'{c}, Goran and Pyysalo, Sampo
      and Ohta, Tomoko and Kim, Jin-Dong and Tsujii, Jun'ichi},
  title     = {BioNLP Shared Task 2011: Supporting Resources},
  booktitle = {Proceedings of BioNLP Shared Task 2011 Workshop},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {112--120},
  url       = {http://www.aclweb.org/anthology/W11-1816}
}

Lastly, if you have enough space we would be very happy if you also link to the brat homepage:

...the brat rapid annotation tool\footnote{
    \url{http://brat.nlplab.org}
}

Contributing

As with any software brat is under continuous development. If you have requests for features please file an issue describing your request. Also, if you want to see work towards a specific feature feel free to contribute by working towards it. The standard procedure is to fork the repository, add a feature, fix a bug, then file a pull request that your changes are to be merged into the main repository and included in the next release. If you seek guidance or pointers please notify the brat developers and we will be more than happy to help.

If you send a pull request you agree that the code will be distributed under the same license as brat (MIT). Additionally, all non-anonymous contributors are recognised in the CONTRIBUTORS.md file.

Contact

For help and feedback please contact the authors below, preferably with all on them on CC since their responsibilities and availability may vary:

Goran Topić <amadanmath gmail com>
Sampo Pyysalo <sampo.pyysalo gmail com>
Pontus Stenetorp <pontus stenetorp se>

brat's People

Contributors

Stargazers

Watchers

Forkers

dmcc tsujiilaboratory edycop pflaquerre jjon msoftware aliabbasjp jogojapan kottmann seyyaw ingridan dimaxweb everythingben arne-cl ecohealthalliance kalimaka pmatsui johncarstens mitchellkoch proycon uday1889 gthandavam imclab xtsimpouris xumiao larsmans azk dongwookshin duncanka sandernaert jsbarry g12mcgov sclement2 cainesap pbabin mattisbusycom maejie arcodergh anujsrc ptwobrussell admackin a-tsioh fnl sulab imani xrc10 jacobsonmt sihuizh 52nlp enoriega pombreda karmats mefarazath teamravana katrinatviglink techn0logic aprgithub habibask fmacias64 bartb yedeheng mo ctjoreilly szyulj dchaplinsky ronaldoviber matthieus sebastianmika crim-ca freygit chagge digideskio helmethair-co josepablog dejori cheggeng tgalery kurtjx shahin anongithum babooppa6 sagacify yofayed wadkar garysmall gazdagandras robbymeals navd hieukieng napsternxg codeaudit aaronali chenmoshushi martianmartian yiqideren architrave-de kkyon thygesen allenai germanferrero

brat's Issues

Arc types should be ordered in menu by their frequency in the annotation

possible_arc_types_from_to() currently ranks the types that are reported as possible alternatives using special-purpose priorities to have common ones show up first in the dialog. This should be generalized by learning role frequencies from the data and using that to rank.

Multiple messages

Unify the messages interface; make messages into an array of pairs of text, class and duration:

[
  [ 'Message 1', 'info', 3 ],
  [ 'Message 2', 'error', -1 ],
  [ 'Message 3', 'debug' ],
]

When a message fades away, the others shift.

If a message is in 'debug' class, only show it if the debug user is logged in.

Display Last-edit For Each Document

We should supply the user with a timestamp of when an edit was made to the document. As it is right now this makes it difficult to know if you have been working on a document.

First choice in arc type dialog should be selected as default

The new server-side arc selection dialog comes up without a default selection, leaving the "enter" shortcut unusable. The first choice in the dialog should be selected by default.

Rename visualizer.xhtml into brat.xhmtl

There is no need to use our old name now that we are moving towards annotation and visualisation.

Theme should be selected as default when an ark is dropped from an event

The most common argument of an event is Theme, not Context-gene.

Individual Annotations Accessible by Using URI;s

It would be terrific if we could access a specific annotation for a given file using some sort of URI. That way it would be easier to communicate which annotation that is being discussed.

This is a related issue to #18 since we would also like the accessed annotation to be highlighted.

"loading" marker when changing documents

When browsing through documents when the server is slow for one reason or another it is at times difficult to know if the current document as finished loading or not. As there is no "loading" marker, it's easy to become confused about how many times you've clicked "right" (or similar) and whether the selected document is about to change (after a bit of lag) or not.

Document changes on loss of focus from file selector

Some strange behavior with the new quick document change with the "left" and "right" keys, sometimes causing a document change on e.g. a mouse click or changing to another window.

To replicate:

open a document with the top selector dropdown, so that focus remains there
press "right" for next document (focus should still be on the dropdown).
(At this point in cases it seems the document name shown doesn't match the document, but while possibly related this is not the main point of this issue)
click anywhere (it seems) on the document, removing focus from the dropdown. The document should now change.

Better sentence splitting

The current sentence splitting heuristic makes a lot of easy-to-correct mistakes such as breaks inside parens. Replace with a better heuristic or external segmentation data from a reasonably good splitter.

When given broken data, client should clear the visualization

When an edit breaks the data (e.g. by currently deleting an event with a Negation), the client leaves the previous view open and goes into an error state. The visualization should instead be cleared, indicating it is no longer valid.

Incorporate Support for Undo

Look for alternate proposal
our own version control?
on save: write down the diff, the parent, the committer
to change place in the tree:
go up the tree:
patch -R
go down the tree:
patch
undo:
just up a level
to display the change tree:
get all edits sorted by date
if parent is root, tree.push(edit)
else treenodes[edit.parent_id].push(edit)
treenodes[edit.id] = edit

Tab title should be filename

current tab title "Test" is uninformative, it should be filename or project name.

Logo is ugly

Span and arc abbrevs should be stored serverside

The ways in which span and arc types can be abbreviated is currently hacked into annotator.js. These should be fetched from some serverside storage.

Document change should not be allowed when dialog is open

Using e.g. the left and right arrows it is currently possible to open a dialog (e.g. delete arc), change document, and then OK the dialog. This could cause any number of weird desync errors.

Intermediate level of project organization (was: document list is too long ...)

If the document list is more than 10 or so, it's difficult to keep managing the annotating file.
Intermediate directory structure will help to remember them.

File managing mechanism like Windows file manager would be great!

Document-level verification error/warning messages

Annotation verification can currently only report issues that relate to visualized aspects of the annotation. There should also be a mechanism to communicate back verification errors/warnings that relate to the document as a whole or to non-visualized markup such as dangling (unreferenced) or duplicate triggers.

High-light Last Edit

The UI should indicate where the last edit was made, preferably using something similar to our warning system.

Selecting Cross-line Spans Produces Unexpected Behaviour

Selecting across a lines sometimes select until the document end and sometimes selects correctly. This could potentially lead the annotator to selecting an unintended span.

"save all" as SVG

Currently when generating SVGs for multiple directories, it is necessary to open each in turn, "save", wait (for 10s of mins for a larger directory) and repeat.

Having a "save all" would help a lot.

Sessions

Implement proper session handling; current login method is insecure.

Modifications should initially be unselected when creating new event

Currently the negation and speculation checkboxes remember state so that they may be selected when creating a new event. In this situation, they should always be deselected.

Confirmation mode should be implemented

In some situation, such as in a demo or for untrained annotator, clicking "OK" button after selecting entity/event type should be needed. It would be great if this mode and normal (quick) mode could be selected by checking on the top of the screen, e.g. "Quick annotation" mode in Sampo's version.

Ctrl-C doesn't work in a form

Ctrl-C gets caught by keyDown handler.

Move the keymap to keyPress instead.

Document without annotation lacks background "stripes" on fresh reload

When a document without any annotation is shown as the first document after a reload, the background "stripes" are not shown. Changing document to one with annotations and back restores.

Server to Client Message Passing Interface

We currently lack support to pass messages from the server to the client for all ajax calls. We are also having issues on the server side since the json entry can be over-written by a later caller. Instead we should have a log-like interface that relieves the caller the burden of knowing of the json message interface.

Allow span modification on existing annotation

Show last modification date

The last time annotation was modified should be shown when a document is open.

Arc labels always abbreviated for ufo catchers

The arc length check fails for ufo catchers, leading the code to think that it must always take the shortest abbreviation.

.htaccess Does Not Point Us To index.xhtml As Index

In order for us to have nicer URL it would be nice to automatically get to index.xhtml by accessing the root. Goran has reported issues with this is relation to .htaccess and we are yet looking for a solution.

Confirmation of automatically created / suggested annotations

In the current workflow in a couple of projects, before manual annotation, candidate entities for some types (e.g. GGP) are tagged automatically. These are then manually revised.

It would be helpful for this setup if there was some way to differentiate between automatic annotations that have been reviewer by a human and ones that are yet to be confirmed. One possible implementation would be to have automatic annotations initially shown with medium opacity, and to style normally on single click. The decoration could be done using the modification or comment mechanism.

Recursive Deletion

Implement recursive deletion.

If a deleted object has dependents, ask for confirmation.

delete
  OK

delete
  recursive!
UI dialog for "sure?"
delete confirmed!
  OK

vars rowSpacing and lineSpacing appear to have no effect in annotator.js

The variables rowSpacing and lineSpacing appear to have no effect when changing their values.

No Favicon for the Project

We lack a Favicon, it would be nice to have one since users can associate the project with it.

Upper Menu For File Selection Should Follow Along When Scrolling

The current way of keeping it at the top makes it more difficult to change the file when you want to.

Crash on no write permission to ".ann"

If ".ann" files exist but have no write permission, annotation.py throws an exception. The program should minimally fail gracefully and ideally fall back to read-only mode.

Server can write out a file it can't read in

There are a number of ways in which the server can be requested to create nonsensical annotations. Until there's exhaustive checking, it would be good to minimally verify that the server will never replace a valid annotation file with one that is so badly broken it cannot read it back in (thus necessitating manual editing, which is not possible without server access).

Negation and Speculation Are Not Shown In the Information Pop-up

Negation and speculation should show up in the information pop-up when describing an annotation.

Configuration per directory / project / user

At the moment the specification of the annotation (e.g. what entity and event types are defined) is global and under version control (mostly annspec.py), making the maintenance of multiple specifications difficult. It should be possible to have the annotation configuration specified on a per-directory or per-project basis, with reasonable defaults.

Ideally, the system would know of a selection of configurations and would allow the user to select the one that should be used for the current directory/project as well as to edit the specifics of the configuration (such as hotkeys), perhaps even on a per-user basis.

Avoid interface hangup on crash

In the current master, if the server dies, the client interface is left in a state where none of the controls work. This should be fixed on both sides:

server: avoid hard crashes, at least when not in debug mode. Even when something really unexpected happens, minimally send back valid JSON with an error message like "This document could not be opened, please contact the devs."
client: on a 500 response or similar, the visualization should be blanked but the document / directory selectors should stay live. Errors seen so far mostly tend to involve individual files (or dirs), so a user should still be able to partially recover and continue working by selecting a different one.

Don't Crash Upon "File-not-Found"

The back-end currently crashes loudly if the requested file is not found, provide the user with a nice warning instead.

Server-side Form Generation

Server should generate the fragment for the span and arc form. Both should return this structure:

{
  html: '
    <frameset>
      <legend>Entities</legend>
      <input id="span_Protein" type="radio" name="span_type" value="Protein"/>
      <label for="span_Protein"><span class="accesskey">P</span>rotein</label>
    ...',
  keymap: {
    P: 'span_Protein',
    E: 'span_Entity',
  }
}

Failure on return when creating event with modification

The current master crashes on failure to JSON serialize an event when creating one with a modification (or adding a modification to an existing event). The event is created and stored cleanly serverside and shows up on reload, but instead of a normal response with annotations, a malformed "python crashed" error JSON is returned.

Pontus, this appears to be the same type of issue that you debugged yesterday, likely tracing to inconsistent use of AnnotationId. Marking critical; we need to squash this before off-site work can start.

Enable Differentiation Between Users for Edits

Currently we are unable to determine which user that gave rise to a given effect to the document. This it is impossible to indicate edits by another user when two users are co-operating.

Goran has suggested the usage of a timestamp for the last edit made by a user. Thus the user could compare the timestamp with a timestamp from the server to see if he/she really was the source of the last change. We could also potentially use hashes.

Intelligent Polling Strategy For Acquiring Data From the Server

Currently we are polling with a pre-set delay. Instead we should poll using a back-off (exponential, maybe) since at most times it is unlikely that several users are editing the same document.

Popup size and location

when I click on the span, the popup will open at the top left of the window, it should locate just next to the cursor.

Modification Deletion

Delete dependent modifications (without a confirmation dialog).

UI Shows Non-existing Directories

Reproduce:

Open any document
Edit the URL to indicate a non-existent directory and/or file
You will receive an error
But for now the directory will show up in the list of directories until you reload the session

After showing an error the directory should not show up in the list.

Suggested type in span dialog

It would be helpful if the span dialog showed one or a few of the most likely type choices for a span being annotated on top of the full ontology views. These should be learned from previously annotated data.