GithubHelp home page GithubHelp logo

cfedermann / appraise Goto Github PK

View Code? Open in Web Editor NEW
73.0 73.0 38.0 198.24 MB

Appraise evaluation system for manual evaluation of machine translation output

Home Page: http://www.appraise.cf/

License: BSD 3-Clause "New" or "Revised" License

Python 61.63% Shell 1.40% HTML 19.21% Perl 2.73% JavaScript 9.61% CSS 5.22% PowerShell 0.19%

appraise's People

Contributors

cfedermann avatar mjpost avatar snukky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

appraise's Issues

Add registration template/view

Activate registration view that allows WMT13 participants to request/create user accounts.

Open issues to decide on:

  • allow full registration?
  • only allow requesting an account?

Check if the django.user has some generic views for that.

Show the group name of the user

I don't see which group I am part of. (Silly workaround: do 1 HIT and check which group's progress has increased.)
The name of my group could be shown:

  • next to the user name in the top menu bar
  • or in the "Update profile" (here it could be changed, if changing a group is allowed)
  • and/or in the "Group status", my group could be highlighted (or marked explicitly as "my group").

Also in the "Language pair status", the language pairs I have selected could be highlighted/marked.

Collectstatic can't find jquery.js

When installing and running the collectstatic command, it couldn't find the jquery.js. Do I have to install other packages other than django==1.3?

~/Appraise-Software/appraise$ python manage.py collectstatic
WARNING:appraise.utils:NLTK is NOT available, using fallback AnnotationTask class instead. This does NOT implement any NLTK features!

You have requested to collect static files at the destination
location as specified in your settings file.

This will overwrite existing files.
Are you sure you want to do this?

Type 'yes' to continue, or 'no' to cancel: yes
Copying '/home/ltan/Appraise-Software/appraise/static/admin/js/jquery.js'
Traceback (most recent call last):
  File "manage.py", line 23, in <module>
    execute_manager(settings)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 191, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 220, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 351, in handle
    return self.handle_noargs(**options)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 89, in handle_noargs
    self.copy_file(path, prefixed_path, storage, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 199, in copy_file
    shutil.copy2(source_path, full_path)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 2] No such file or directory: u'/static-files/admin/js/jquery.js'

Crashes with Django 1.5

The web application crashes with Django 1.5. This is the error shown in the browser:

NoReverseMatch at /appraise/

'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.

Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.5
Exception Type: NoReverseMatch
Exception Value:

'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.

Exception Location: /usr/local/lib/python2.7/dist-packages/django/template/defaulttags.py in render, line 402
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:

['/home/a/software/Appraise-Software/appraise',
'/usr/local/lib/python2.7/dist-packages/langid-1.1.4dev-py2.7.egg',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

Server time: Wed, 27 Aug 2014 16:09:03 +0200

Language status page should show system counts

In ascertaining whether enough HITs have been collected, it would be really helpful if the Language pair status page listed the number of systems, e.g.,

English → Czech (X systems) 519 remaining 1481 completed

Compare unique items not system outputs

Many times, the systems outputs for a sentence are identical. Rather than constructing each task from a random subset of systems, each task should be constructed from the set of distinct outputs for that sentence. The pairwise rankings could then be re-associated with the systems to generate a larger set of pairwise rankings.

This would be a bit more respectful of people's times (it's annoying to see identical outputs), and would also let us potentially gather data more quickly. On the WMT14 data, for example, there are identical system outputs on over half the sentences.

CC: @cfedermann

Add admin action to retire campaign

It should be possible from the Django admin backend to retire a campaign.

This should also retire any associated objects such as tasks, items, and maybe results. The corresponding campaign team should also be retired.

License information

Hello,

Can you please elaborate what license is Appraise under? I didn't see any license information or file in the repo?

Thanks

Import error in web application with Django 1.3.1

Appraise requires Django >=1.3 but the web app doesn't work with 1.3.1.

When pointing the browser to "http://127.0.0.1:8000/appraise/" with the server running, I get the following error:

ImportError at /appraise/

cannot import name patterns

Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.3.1
Exception Type: ImportError
Exception Value:

cannot import name patterns

Exception Location: /home/a/software/Appraise-Software/appraise/../appraise/urls.py in , line 7
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:

['/home/a/software/Appraise-Software/appraise',
'/usr/local/lib/python2.7/dist-packages/langid-1.1.4dev-py2.7.egg',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

Server time: Wed, 27 Aug 2014 16:01:59 +0200

Task 2: "I cannot tick the ‘translate from scratch’ box"

This issue was reported by one of the translator agencies participating in TaraXU's evaluation round 2 and was confirmed by Cindy Tscherwinka.

"I am now working on Task 2 and I cannot tick the ‘translate from scratch’ box if none of the sentences can be post-edited easily. I get a ‘stop’ symbol which appears over the box when I try to tick it."

Cindy posted: "I had the same question as I tested the system. At least I think she is referring to that issue: you still have to select one of the sentences that is not easy to post-edit before you can select “translate from scratch”. "

Incorrect Django requirements on readme

Readme states Appraise should work with Django >1.3, but it does not work with any 1.3 version, because certain symbols imported on urls.py were on django.conf.urls.default on 1.3, but on django.conf.urls on >1.4.

Also, it does not work with >1.5, because the patterns api changed. There is also an open issue with 1.6.

To my best knowledge, it only works with 1.4.20, that will be supported until October 2015.

Can you update the readme to reflect this?

Create demo app for Appraise

Use random sample of 10 HITs per WMT14 language pair and allow infinite collection of annotation results on these...

Create DEMO group for demo users.

Update ranking template

Matt pointed out these points:

  • move the radio buttons to the right of the text;
  • label which is best and worse; and
  • enforce that with a visual cue (gradient from green to red).

(A nice way would be a gradient from green to red aligned with the buttons)

"Reset" doesn't work in ranking

In Firefox button Reset does nothing. Even if a user has already selected some ranks, clicking this button doesn't reset them to NILL.

This is a minor issue and doesn't affect functionality, which means that instead of fixing, the button can be removed as well

Exported CSV has no trailing newline

Exported CSV files (generated from the admin page) do not have a '\n' character on the last line. This creates problems when the file is concatenated with other files. Please add a trailing newline.

Overview info missing when no HITs available

The Overview page should always show the number of completed HITs (by myself), average time per HIT etc.
However, when no HITs are available, I see only this message "At this moment, there are no HITs available to work on. Check back soon...".

Our annotators may get nervous that their work was lost.

Add affiliation to user accounts

Extend user accounts (or associated profiles) with information about a user's affiliation.

OR in case that does not work, create respective groups inside Django's admin backend and thus allow to assign users to "affiliation groups".

The latter option would work out-of-the-box.

Computing clusters with systems with equal output

How to compute the system ranking clusters if systems often produce the same output and are merged in the results CSV file? Is using the scripts/compute_ranking_clusters.perl script the correct way?

This script seems to ignore merged systems in the results CSV file (sysA+sysB will be treated as a separate, new system). I have fixed it in this commit in my fork. Was that the correct thing to do, or is there a better way of getting the ranking clusters?

( Without this fix, the clustering script would get stuck in an infinite loop on my data, i.e., several variants of the same NLG system, often producing identical outputs. )

Improve interface for user assignment

On /appraise/admin/evaluation/evaluationtask/add/ the multiple selection box, which allows for selection of users is too small, resulting in uncertainty on how many and which users have been assigned to the task. We should think of a more user-friendly interface (i.e. another widget) given its critical functionality

Crashes with Django 1.6.1

Appraise crashes with Django 1.6.1 (the version of Django included in Ubuntu 14.04).

When running "python manage.py syncdb", I get the following error:

Traceback (most recent call last):
File "manage.py", line 7, in
from django.core.management import execute_manager
ImportError: cannot import name execute_manager

Fix CSV export for RankingResult instances

Check that we don't generate empty (i.e., sequence of five PLACEHOLDER systems) CSV export lines for RankingResult instances. Check that this is only happening for skipped ranking tasks.

Minor appearance issue with ranking task

These are two minor (rather template) issues observed in Firefox 8. They do not affect functionality
Have a look here: http://www.dfki.de/~elav01/tmp/appraise-screenshot.png

  1. Source and reference are too close to each other. It would be nice to increase the space between the two divs or add a border
  2. Some times, when source is longer than the reference, the rank radio buttons of the first sentence appear right underneath the reference (right column) and not on the left

Easy scriptable export

It would be nice to have a URL I could query to automate a RankingResult CSV export. e.g.,

wget -O backup.csv http://appraise.cf/admin/wmt14/rankingresult/

I think security-through-obscurity would be fine here, but requiring a secret token or something via a GET string would be fine.

Improve usability of inter-annotator agreement computation in status view

Compute inter-annotator agreement (IAA) for the number of annotators or coders (C) which maximises the number of items (I) that have been evaluated by the respective sub set of coders. This avoids showing no IAA scores until all coders have completed the task.

Additionally, there could be checkboxes which can be used to toggle whether a system should be included in IAA computation or not. That way, different scenarios could be tested in a more playful way, from within the status view.

If implemented, the checkbox selection should also be reflected when downloading results data.

Randomized appearance of entries with regard to context

So, we have just come to preparing the set that is to be used for measuring intra-annotator agreement in the ranking task, where we have to present the same 96 evaluation items a second time, but possibly not in the original sequence. Since randomized order is not supported by Appraise, we though of randomizing them in the input files. The original format specified in the Specification Document would allow for the system to derive the context from the original file, and display the items in the order specified in the task file. Unfortunately the simplification of the import data scheme breaks that requirement and randomizing them before import would mean that the context is not consistent. We would need a way to overcome this problem, i.e allowing randomization upon choosing the next-item, for particular sets.

example appraise.conf?

The file appraise/start-server.sh.sample refers to a file named appraise.conf in the lighthttpd invocation. Can you add an example such file to the repository for those of us trying to set it up under that server?

Add group completion status to status view

All users should be able to see:

  • their individual status for all tasks they started working on;
  • their incomplete status for all tasks they retracted from;
  • the aggregated status for all members of the affiliation group (percentage of REQUIRED_HOURS_PER_GROUP).

Use color coding to make people feel better ;)

Error classification template: right column uses too much space

It's just been noticed that when a big word appears on the left column of the error classification pane, some of the radio buttons get "wrapped" to the next line, although there is a lot of space around the "summary box" where the left column could expand. To reproduce it, just narrow your browser window. You will see that the right column reserves empty space around the summary box whereas the left column gets suppressed/wrapped.

General use of the tool

After downloading and running Appraise it seems that the current code is intended to work only for WMT15 evaluation. I would like to create my own evaluation tasks but the corresponding link does not exist. How can I use the current version of Appraise for preparing my own evaluations?

Implement task/batch bidding process

Users should be able to select tasks to work on.

The list of available tasks should be filtered as follows:

  • a maximum of three users is allowed to work on a single task;
  • after selecting a task, the user either has to finish or retract from it;
  • the task list should also display language pairs;
  • only language pairs that are "valid" for the current user should be shown.

Filtering equal hypothesis

Would it be possible to filter out, or to group equal hypothesis. There is no interest in ranking equal outputs, and in fact I loose some time when having 5 hypothesis to find the differences between two of them (many times there are equals, sometimes, they only differ by one character). It would be easier if we knew all outputs to be different.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.