davidjurgens / potato Goto Github PK

potato: portable text annotation tool

License: Other

Python 12.60% HTML 9.94% CSS 0.61% JavaScript 1.02% Jupyter Notebook 75.83%

potato's Introduction

🥔Potato: the POrtable Text Annotation TOol

Potato is an easy-to-use web-based annotation tool accepted by EMNLP 2022 DEMO track. Potato allows you to quickly mock-up and deploy a variety of text annotation tasks. Potato works in the back-end as a web server that you can launch locally and then annotators use the web-based front-end to work through data. Our goal is to allow folks to quickly and easily annotate text data by themselves or in small teams—going from zero to annotating in a matter of a few lines of configuration.

Potato is driven by a single configuration file that specifies the type of task and data you want to use. Potato does not require any coding to get up and running. For most tasks, no additional web design is needed, though Potato is easily customizable so you can tweak the interface and elements your annotators see.

Please check out our official documentation for detailed instructions.

Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Jackson Sargent, Apostolos Dedeloudis and David Jurgens. 🥔Potato: the POrtable Text Annotation TOol. In Proceedings of the 2022 Conference on Empirical Methods on Natural Language Processing (EMNLP'22 demo)

Recent Updates

1.2.2.4 Fixing several bugs for multirate schema ("required" not working and keyboard conflicts).
1.2.2.3 Supports randomizing the order of multirate schema options (Documentation | Example project)
1.2.2.2 small bugs fixed for label suggestions
1.2.2.1 Supports displaying or prefilling label suggestions (Documentation | Example project)
1.2.2.0 Supports automatic task management with prolific apis (Documentation | Example project)
1.2.1.7 Supports randomizing the order of the displayed instances when it is defined with a dictionary (link)
1.2.1.6 Supports different HTML templates for surveyflow and annotation pages (link)
1.2.1.5 Supports disallowing copy-pasting into textboxes (link)

Feature hightlights

Potato supports a wide range of features that can make your data annotation easier:

Easy setup and flexible for diverse needs

Potato can be easily set up with simply editing a configuration file. You don't need to write any codes to set up your annotation webpage. Potato also comes with a series of features for diverse needs.

Built-in schemas and templates: Potato supports a wide range of annotation schemas including radio, likert, checkbox, textbox, span, pairwise comparison, best-worst-scaling, image/video-as-label, etc. All these schemas can be
Flexible data types: Potato supports displaying short documents, long documents, dialogue, comparisons, etc..
Multi-task setup: NLP researchers may need to set up a series of similar but different tasks (e.g. multilingual annotation). Potato allows you to easily generate configuration files for all the tasks with minimum configurations and has supported the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

Improving Annotator Productivity

Potato is carefully desinged with a series of features that can make your annotators experience better and help you get your annotations faster. You can easily set up

Keyboard Shortcuts: Annotators can direcly type in their answers with keyboards
Dynamic Highlighting: For tasks that have a lot of labels or super long documents, you can setup dynamic highlighting which will smartly highlight the potential association between labels and keywords in the document (as defined by you).
Label Tooltips: When you have a lot of labels (e.g. 30 labels in 4 categories), it'd be extremely hard for annotators to remember all the detailed descriptions of each of them. Potato allows you to set up label tooltips and annotators can hover the mouse over labels to view the description.

Knowing better about your annotators

Potato allows a series of features that can help you to better understand the background of annotators and identify potential data biases in your data.

Pre and Post screening questions: Potato allows you to easily set up prescreening and postscreening questions and can help you to better understand the backgrounds of your annotators. Potato comes with a seires of question templates that allows you to easily setup common prescreening questions like demographics.

Better quality control

Potato comes with features that allows you to collect more reliable annotations and identify potential spammers.

Attention Test: Potato allows you to easily set up attention test questions and will randomly insert them into the annotation queue, allowing you to better identify potential spammers.
Qualification Test: Potato allows you to easily set up qualification test before the full data labeling and allows you to easily identify disqualified annotators.
Built-in time check: Potato automatically keeps track of the time annotators spend on each instance and allows you to better analyze annotator behaviors.

Quick start

install potato pypi package

pip install potato-annotation

Check all the available project templates

potato list all

Get one from the project hub

potato get sentiment_analysis

Start the project

potato start sentiment_analysis

Start directly from the github repo

Clone the github repo to your computer

git clone https://github.com/davidjurgens/potato.git

Install all the required dependencies

pip install -r requirements.txt

To run a simple check-box style annotation on text data, run

python potato/flask_server.py start project-hub/simple_examples/configs/simple-check-box.yaml -p 8000

This will launch the webserver on port 8000 which can be accessed at http://localhost:8000.

Clicking "Submit" will autoadvance to the next instance and you can navigate between items using the arrow keys.

The project-hub/simple_examples/configs folder contains example .yaml configuration files that match many common simple use-cases. See the full documentation for all configuration options.

Baked potatoes

Potato aims to improve the replicability of data annotation and reduce the cost for researchers to set up new annotation tasks. Therefore, Potato comes with a list of predefined example projects, and welcome public contribution to the project hub. If you have used potato for your own annotation, you are encouraged to create a pull request and release your annotation setup.

Potato currently include the following example projects:

Please check full list of baked potatoes for more details!

Design Team and Support

Potato is run by a small and engergetic team of academics doing the best they can. For support, please leave a issue on this git repo. Feature requests and issues are both welcomed! If you have any questions or want to collaborate on this project, please email [email protected] or [email protected]

License

Potato is dual-licensed. All use cases are covered by Polyform Shield but a commercial license is available for those use cases not allowed by Polyform Shield. Please contact us for details on commercial licensing.

FAQ:

If I am an open-source developer, can I fork potato and work on it separately?

Yes, this is allowed with the license
If I am an open-source developer, can I fork potato and publicly release a new version with my own features?

No, this is not allowed with the license; such a product would be considered as a “competitor” (see the license for details)
If I am working for a company, can I use potato to annotate my data?

Yes, this is allowed with the license
If I am working for a company, can I use potato within my company’s pipelines for data annotation (e.g., integrate potato within my company’s internal infrastructure)?

Yes, this is allowed with the license—we’d love to hear about these to advertise, so please contact us at [email protected].
Can I integrate potato within a larger annotation pipeline and release that pipeline as an open-source library or service for others to use?

Yes, this is allowed with the license—we’d love to hear about these to advertise, so please contact us
Can I integrate potato within a larger annotation pipeline and release that publicly as commercial software/service/resource for others to use?

No, this is not allowed by Polyform Shield but commercial licensing of potato for this purpose is available. Please reach out to us at [email protected] for details.
I am working for a crowdsourcing platform, can I combine potato in our platform to provide better service for my customers?

No, this is not allowed by Polyform Shield but commercial licensing of potato for this purpose is available. Please reach out to us at [email protected] for details.

Have a question or case not covered by the above? Please reach out to us and we’ll add it to the list!

Cite us

Please use the following bibtex when referencing this work:

@inproceedings{pei2022potato,
  title={POTATO: The Portable Text Annotation Tool},
  author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year={2022}
}

potato's People

Contributors

Stargazers

Watchers

potato's Issues

Custom html layouts

It may be worth documenting the fact that if you want to specify a custom html layout, the path that you provide in the .yml config file has to be a full path. A path relative to the current directory does not work. Might be worth a code change too, to allow relative paths.

activating schema based on answer from another schema

Thanks for implementing this awesome tool!
I would like to annotate a sample by using with multiple schemas as the example in your documentation. In the questions of the task, one question will be depend on the answer of previous question. Would it way to disable answering questions based on conditions as I describe?

Add password recovery feature

There's currently no way for users to recover or reset their password, so some mechanism is needed in case a user gets locked out.

user_config setting did not work

Hello. I've set my configuration to allow only the user with the username "email address", as shown below:

    "user_config": {
      "allow_all_users": False,
      "users": 
        {
          "username": "email address",
          "password": "password"
        }
    },

Yet, I can still access the system with any created account. Can you guide me on how to fix this?

Arranging multiple schemas for a single task

Hi, thanks so much for your work on Potato! I'm wondering if there's an easy way to arrange multiple schemas used for a single task. For example, if I include more than a couple text boxes, they start to spill over the edge of the screen and squeeze other questions. Is there a way to force one or more of the schemas onto a new line?

Fix likert rating outputs to have a single column

The current output is the default multi-choice format with column names like "category:::radio" but since we know it's a scale, we can have a single column for the "category" with the Likert scale point as the value. This output format seems much simpler.

Should there be spaces here?

Hi all,

Thank you for this awesome project. Just curious, are the spaces here between "{{" and "}}" be here? Is this some sort of special notation that you guys are using? https://github.com/davidjurgens/potato/blob/master/potato/flask_server.py#L2586

Best,

automatic assignment failing

I'm trying to setup automatic assignment but keep running into the following error

This is my yaml:

{
    "port" : 8754,  
    
    "annotation_task_name": "record_linking",

    "output_annotation_dir": "annotation_output/record_linking/",

    "output_annotation_format": "csv", 

    "annotation_codebook_url": "",

    "data_files": [
        "data_files/record_pairs.csv"
    ],

    "item_properties": {
        "id_key": "id",
        "text_key": "text",
    },

    "list_as_text": {
      "text_list_prefix_type": 'none',
    },

    "user_config": {
      "allow_all_users": true,     
      "users": [  ],
    },

    "jumping_to_id_disabled": false,

    "hide_navbar": true,

    "alert_time_each_instance": 300,

    #defining the ways annotators entering the annotation system
    "login": {
       "type": 'password',    #can be 'password' or 'url_direct'
       "url_argument": '' # when the login type is set to 'url_direct', 'url_argument' must be setup for a direct url argument login
    },

    "automatic_assignment": {
        "on": true,
        "output_filename": "task_assignment.json",
        "sampling_strategy": "random",
        "labels_per_instance": 1,
        "instance_per_annotator": 1,
        "test_question_per_annotator": 0,
        "users": []
    },
    
    
    "annotation_schemes": [
        {
            "annotation_type": "likert",
            "name": "Decision",
            "description": "Do these two records refer to the same person?",
            "labels": [
                {
                    "name": "Not the same person",
                    "tooltip": "Not the same person",
                    "key_value": "No"
                },
                {
                    "name": "Unsure",
                    "tooltip": "Unsure",
                    "key_value": "Unsure"
                },
                {
                    "name": "Same person",
                    "tooltip": "Same person",
                    "key_value": "Yes"
                },
               ],
               
            "label_requirement": {
                "required": true
            }
        }
    ], 

    "html_layout": "default", 

    "base_html_template": "default",
    "header_file": "default",
    "horizontal_key_bindings": true,
    
    "site_dir": "default",

    "surveyflow": {
        "on": true,
        "order": [
            # "pre_annotation",
            "post_annotation"
        ],
        # "pre_annotation": [
        #     "surveyflow/consent.jsonl",
        # ],
        "post_annotation": [
            "surveyflow/end.jsonl",
        ],
        "testing": [
        ]
    },
}

Thanks so much for any help

Add support for creaeting link between text spans

Add support for links/relationships between spans and the ability to specify different kinds of links.

showing no label in text spans

Hello,

Is it possible to remove labels in the text spans. Because in my case as seen the image, labels are too big, and it could block the small words, so that the annotators can not see.

I couldn't find out how to disable/adjust it. Disclaimer: I have no experience at frontend

Much appreciated for your help. Thanks in advance.

ModuleNotFoundError: No module named `requests`

A default install of potato will result in the above error. This affects both the repository and pypi package.

POTATO License

Hello, thanks for this great tool!. I was wondering if it was possible to include a LICENSE file in the repository in order to know if it is possible to work around it or to integrate it in other projects.

Thanks!

What is the format of the input data to calculate agreement between annotations of different users?

Hi! Thanks for the great tool!
Could you provide a way for obtaining a json for calculating agreement (Krippendorf's alpha) between annotations based on the files annotated_instances.jsonl which are saved automatically for each user?

finished tracker updates even when annotation is incomplete

leaving everything blank, moving forward will mark the previous instance finished

1.2.2.1 broke potato annotation tasks

Halli und Hallo and Greetings,
I am a big fan of Potato, thank you for creating it! :)
Sadly, I just noticed that something inside of potato must have broken recently.

Reproduction
( This error happens either way, whether I install potato-annotation via pip, or clone the repository. )

I run the command to start the annotation task
potato start summarization_evaluation/
The server starts and I can navigate to it via the web browser (Firefox)
I can create a new account as usual
The moment I try to log into the annotation task, the browser displays a message along the lines of: "an internal server error has occurred"

Error Description / Terminal Output

potato start summarization_evaluation/
starting server from summarization_evaluation/configs/summ-eval.yaml
the current working directory is: /home/username/directory_to_project/summarization_evaluation/
WARNING:potato.flask_server:surveyflow_html_layout not configured, use default template at /home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/potato/base_html/examples/plain_layout.html
running at:
localhost:9001
 * Serving Flask app 'potato.flask_server'
 * Debug mode: off
unknown action at home page
test@test Success
user info file saved at: user_config.json
test@test login successful
ERROR:potato.flask_server:Exception on /login [POST]
Traceback (most recent call last):
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/flask/app.py", line 1463, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/flask/app.py", line 872, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/potato/flask_server.py", line 1167, in login
    return annotate_page(username)
  File "/home/username/directory_to_project/venvPotato/lib/python3.10/site-packages/potato/flask_server.py", line 2199, in annotate_page
    suggestions = instance['label_suggestions']
KeyError: 'label_suggestions'

Workaround
For the time being, installing a previous version solves this problem:

pip uninstall potato-annotation

pip install potato-annotation==1.2.2.0

Probable Cause
Something in the newest commit [1.2.2.1 supports displaying label suggestions]
→ 1986f56

Questions on task assignment

I have two questions.

Q1: Do the fields for automatic_assignment config specification (e.g. "labels per instance") only hold when the "on" value is True? If this is not turned on basically it will show all the available tasks in the order they are presented in the data?

Q2: In a testing set up I created a data file for 3 instances. I annotated them successfully, and it brought up the post annotation blurb (basically, "you are done" and redirect). Then I added another data file to the mix. It seems that it shows the 3 instances I annotated, then brings up the post annotation blurb even though there were 12 other datapoints to annotate (in the new data file). Weirdly enough if I press on move forward button, it goes to the next data file.

So to see what would happen, I removed from data_file specifications the first data file with 3 instances. When I log in, it shows me the post annotation blurb first. It is not until I press the "move forward" button that it assigns me the 12 instances from the new data file.

So then, I'm not sure how to go about the annotation. Once an annotator finishes the assigned files, and I want to assign more, is there a way to stop from that post annotation blurb showing up first? I hope my description here is making sense.

Why is user score included in dictionary keys in summarization_evaluation?

Hello, I have noticed that as a result of this line
https://github.com/davidjurgens/potato/blob/master/potato/server_utils/schemas/likert.py#L39
on summarization_evaluation the generated annotation_output looks something like this

{"label_annotations": {"relevance": {"scale_5": "5"}, "fluency": {"scale_2": "2"}, "coherence": {"scale_4": "4"}, "consistency": {"not consistent": "2"}

{"label_annotations": {"relevance": {"scale_4": "4"}, "fluency": {"scale_1": "1"}, "coherence": {"scale_3": "3"}, "consistency": {"consistent": "1"}}

whereas ideally we should have something like

{"label_annotations": {"relevance": {"scale": "5"}, "fluency": {"scale": "2"}, "coherence": {"scale": "4"}, "consistency": "not consistent"}

Not only is there redundancy and duplication of information because the rating is included in both the key and the value, but this also has negative implications for the annotation_output/annotated_instances.tsv file because each rating has its own column meaning that relevance scale 4 would have a different column than relevance scale 3 and the output would look something like this which is really not ideal

Would it not be better to change this line of code from

label = "scale_" + str(i)

label = "scale"

to prevent this issue? I have not explored the project yet in enough depth to be confidently able to say whether this would break other sections of the code, but to me it seems like this should be changed. Let me know what you think

Randomizing the order of responses in a pair that are shown to participants

Hi again!

I'm working on a fork of Potato where I am showing participants a pair of responses on each screen, and they pick one out of the two. Each example will be annotated by several participants. To avoid order effects, I'd like the responses to be randomly shuffled for each participant. Is there an easy way to implement this?

This is an example of what my data looks like, this is my custom annotation template html. This is what the annotation screen looks like:

I'm not sure which part of the code I should edit to make this change so that each user receives their own random order. Do you think it would be possible?

Thanks,
Raj

Dynamic Highlighting

Hi,
I am using Potato for my research, however, I haven't been ale to setup Dynamic Highlighting task.

Reproduction

I tried to include dynamic highlighting for dialogue_analysis task. Added dynamic_key.tsv file path to dialogue-analysis.yaml . Both keyword.tsv and dialogue-analysis.yaml files are present here.

I have attached the snippet of annotation_schemes part of dialogue-analysis.yaml where I have added the path of dynamic_key.tsv file.

"annotation_schemes": [
        {
          "annotation_type": "highlight",
          "name": "certainty",
          "description": "Highlight which phrases make the sentence more or less certain",
          "labels": [
            "certain", "uncertain"
          ],

          # If true, numbers [1-len(labels)] will be bound to each
          # label. Highlight selection annotations with more than 10 are not supported
          # with this simple keybinding and will need to use the full item
          # specification to bind all labels to keys.
          "sequential_key_binding": True,
        },
        {
            "annotation_type": "radio",
            "name": "sentiment",
            "description": "What kind of sentiment does the given text hold?",
            "labels": [
               "positive", "neutral", "negative",
            ],

            # If true, numbers [1-len(labels)] will be bound to each
            # label. Aannotations with more than 10 are not supported with this
            # simple keybinding and will need to use the full item specification
            # to bind all labels to keys.
            "sequential_key_binding": True,     
           "keyword_highlights_file":"dynamic_key.tsv"                      
        },
        
           
    ]

Error Description

From what I understood from documentation, the keywords provided in tsv file will be highlighted automatically. However, even after adding the tsv file, the annotation task doesn't change. Can you tell me how to setup dynamic highlighting.
Thanks in advance!

Allow users to dynamically adjust the assigned number of annotation instances

In many cases, users can't finish all their assigned annotation task (e.g., 100 instances), but their remaining instances are still there and can't be reassigned, which is a waste of data. Meanwhile, some users may want to do more tasks than assigned, but then they have to create new accounts. Therefore, would it be possible to add a adjustable bar to the user interface, so that users could change the number of tasks anytime during their annotation.

Additionally, we could provide those functions to admins so that they could also adjust each user's task size, and therefore easily supervise the progress of annotation. This function is especially useful when we want all data in dataset get annotated (because we can't expect each user to finish all their jobs)

Lechen

Customizing colors

I'm wondering whether it's possible to customize the colors and other aesthetic features of the interface. Poking around in the repo, I wasn't able to find a place where the color palette is defined (the grays and navy blues of the default base_template). Is there an easy way to do this, or is it more complicated than I expected? (This is extremely non-urgent; just curious!)

Survey flow not displayed not correctly when the paths contains "../"

Annotation of attention test did not match

Hi,
Thanks for implementing this handy tool. It saved me so much time to conduct research.

However, I found that the attention test (testing pages) feature did not work properly. I ran the offensiveness sample project then added the "testing": ["surveyflow/testing.jsonl"] and set the "test_question_per_annotator": 1 in the yaml file.

The question of attention test appeared correctly but the annotation was like any other task's annotation. From my understanding, the annotation in this case should be the one that we assigned in the testing.jsonl file.

e.g.,
{"id":"match_finding","text":"This is a test question, please choose: ", "choices": ["1", "2"]}

Please let me know how can I fix it. Again, thanks so much for your hard work.

Disabling paste into text boxes

It would be awesome if there was a built-in way to disable pasting text into specific text boxes, as this gives us one small defense against participants using chatbots to answer study questions.

"Start directly from the github repo" Failed on windows

I work by following steps on Windows:

git clone repo
pip install -r requirements.txt
python potato/flask_server.py start project-hub/simple_examples/configs/simple-check-box.yaml -p 8000

get error:
Traceback
  File potato\flask_server.py, line 2673 in <module>
    main()
  File potato\flask_server.py line 266 in main
     run_server(args)
  File potato\flask_server.py line 2586 in run_server
    init_config(args)
  File potato\server_utils\config_module.py line 22 in init_config
    if split_path[-2] == "configs":
IndexError: list index out of range

I add a print in potato\server_utils\config_module.py between line 21 and 22, get a DOS style path, like "C:\\code\\potato\\...".
But split_path split it default by '/', witch is a Linux/Unix style path.

I simply solve the problem by changing '/' to "\" on line 21.

Maybe the project is just for linux now, would you like to fix that?

Add ability to share login creditials between instances of Potato

What is the correct json format for input data?

Referring to the example given here, I gave the below JSON doc as input data.

{
    "IDENTITY_ATTACK": 0.6113662,
    "INSULT": 0.34384298,
    "PROFANITY": 0.4119253,
    "SEVERE_TOXICITY": 0.32176828,
    "THREAT": 0.11071971,
    "TOXICITY": 0.056931525,
    "category": "1",
    "id": 1288,
    "text": "#proudout wishes all a happy New year! #NYE2016 #LGBTQ #LGBT #lesbian #gay #bisexual #transgender #LoveWins #HappyNewYear #gaynewyear"
}
{
    "IDENTITY_ATTACK": 0.013542902,
    "INSULT": 0.0137602305,
    "PROFANITY": 0.005872097,
    "SEVERE_TOXICITY": 0.006128401,
    "THREAT": 0.0034086322,
    "TOXICITY": 0.00057268236,
    "category": "1",
    "id": 8363,
    "text": "Special skills include: telling if someone is gay solely based on that tiny thumbnail that comes up in Gmail."
}
{
    "IDENTITY_ATTACK": 0.84763545,
    "INSULT": 0.68785954,
    "PROFANITY": 0.6092933,
    "SEVERE_TOXICITY": 0.50534093,
    "THREAT": 0.17621756,
    "TOXICITY": 0.0026031018,
    "category": "1",
    "id": 9026,
    "text": "gay for fifth harmony \u30c4"
}

I encounter the error below when starting the service.

D:\potato>potato start sentiment_analysis
starting server from sentiment_analysis\configs\sentiment-analysis.yaml
the current working directory is: D:\potato\sentiment_analysis
Loading users fromuser_config.json
Traceback (most recent call last):
  File "c:\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python38\Scripts\potato.exe\__main__.py", line 7, in <module>
  File "c:\python38\lib\site-packages\potato\cli.py", line 6, in potato
    main()
  File "c:\python38\lib\site-packages\potato\flask_server.py", line 2587, in main
    run_server(args)
  File "c:\python38\lib\site-packages\potato\flask_server.py", line 2558, in run_server
    load_all_data(config)
  File "c:\python38\lib\site-packages\potato\flask_server.py", line 576, in load_all_data
    item = json.loads(line)
  File "c:\python38\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "c:\python38\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python38\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

Sometimes the random word overwrites a highlighthed keyword

I think trade should have been highlighted there

Missing code block delimiter

The task assignment page is missing the end code block delimiter so the markdown is not rendered correctly.

Fix sequential keybinding conflict for multiple schemas

Annotation counter should start at 1 not 0

Current counter starts at 0, so the last instance is (n-1)/n and you can't advance forward.

Reset instance count after pre-study

Dear potato team,

Do you have any ideas for resetting the instance count after the pre-study? The counter is a bit misleading showing 2/0 instances finished (see screenshot below). Or perhaps, could you point me to some instructions to disable the counter?

Thanks in advance for any help :)

Login stops working if user logs out immediately after the pre-annotation questionnaire

Hi,

I'm using potato for an annotation project which has a pre-study questionnaire. Multiple users have reported that their login stops working if they log out immediately after completing the questionnaire (and before completing the annotation task on the first annotation page). The next time they try to log in they get an Internal Server Error, and in the logs I find this:

INFO:potato.flask_server:Loaded 0 annotations for known user "XYZ"
ERROR:potato.flask_server:Exception on /login [POST]
Traceback (most recent call last):
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/potato/flask_server.py", line 1112, in login
return annotate_page(username)
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/potato/flask_server.py", line 2120, in annotate_page
if (input_field.name != "textarea") and (input_field["value"] != value):
File "/home/ubuntu/flaskenv/lib/python3.10/site-packages/bs4/element.py", line 1573, in getitem
return self.attrs[key]
KeyError: 'value'

Do you have an idea why this is happening and what I can do to solve it?
Any help is appreciated!

Thanks,
Aleksandra

Add support for multiple span-based schemas concurrently

Currently Potato supports only one at a time, which works for most use cases, but we could feasibly support more than one with a bit more internal hacking to figure out which span belongs to which schema when dealing with overlaps and stuff

Is there any way to assign different tasks to authorized annotators?

Hello.
I am trying to run potato on a server.
I have two annotation tasks, but only one open port. I want users A, B, C to assist with task 1, and users D, E, F to assist with task 2 simultaneously.
Is it possible?

Add scale-based and range-based rating

Support separate layout templates for surveyflow and annotation pages

`bws` is not a valid annotation option

Trying to use bws results in the CLI process prompting the user again.

setup multilingual task

Hi creators,

I would like to ask if there is an example for seting up multilingual annotation task where both the language of task definitions and the dataset are different. I found this link, but couldn't find the section:
https://potato-annotation.readthedocs.io/en/latest/schemas_and_templates

Thank you so much!

Flask Server Exception

Hi,

I was using Dialogue Analysis with dynamic highlight. I receive the following flask server error. Please note that this error is only coming after adding the dynamic highlighting tsv file.

Reproduction

This is my modified dialogue_analysis.yaml file. tsv file is added after the user_config. Modified dialogue_analysis task and tsv file is attached here.

{
    "port": 9001,

    "server_name": "potato annotator",

    "annotation_task_name": "Dialogue Analysis",

    # Potato will write the annotation file for all annotations to this
    # directory, as well as per-annotator output files and state information
    # necessary to restart annotation.
    "output_annotation_dir": "annotation_output/",

    # The output format for the all-annotator data. Allowed formats are:
    # * jsonl
    # * json (same output as jsonl)
    # * csv
    # * tsv
    #
    "output_annotation_format": "tsv",

    # If annotators are using a codebook, this will be linked at the top to the
    # instance for easy access
    "annotation_codebook_url": "",

    "data_files": [
       "data_files/dialogue-example.json"
    ],

    "item_properties": {
        "id_key": "id",
        "text_key": "text",
        "context_key": "context"
    },


    "user_config": {

      "allow_all_users": True,
      
      "users": [  ],
    },

    "keyword_highlights_file": "dynamic_key.tsv",  

    #list_as_text is used when the input text is actually a list of texts, usually used for best-worst-scaling or dialogue analysis
    "list_as_text": {
      "text_list_prefix_type": 'none', #whether automatically insert a prefix for each line, currently supporting 'number', 'alphabet', 'number'
      "horizontal": True,
    },

    # How many seconds do you want the annotators spend on each instance, after
    # that, an alert will be sent per alert_time_each_instance seconds.
    "alert_time_each_instance": 10000000,

    "annotation_schemes": [
        {
          "annotation_type": "highlight",
          "name": "certainty",
          "description": "Highlight which phrases make the sentence more or less certain",
          "labels": [
            "certain", "uncertain"
          ],

          # If true, numbers [1-len(labels)] will be bound to each
          # label. Highlight selection annotations with more than 10 are not supported
          # with this simple keybinding and will need to use the full item
          # specification to bind all labels to keys.
          "sequential_key_binding": True,
        },
        {
            "annotation_type": "radio",
            "name": "sentiment",
            "description": "What kind of sentiment does the given text hold?",
            "labels": [
               "positive", "neutral", "negative",
            ],

            # If true, numbers [1-len(labels)] will be bound to each
            # label. Aannotations with more than 10 are not supported with this
            # simple keybinding and will need to use the full item specification
            # to bind all labels to keys.
            "sequential_key_binding": True,                        
        },       
    ],

    # The html that changes the visualiztation for your task. Change this file
    # to influence the layout and description of your task. This is not a full
    # HTML page, just the piece that does lays out your task's pieces
    # you may use templates in our lib, if you want to use your own template,
    # please replace the string as a path to the template
    "html_layout": "default",

    # The core UI files for Potato. You should not need to change these normally.
    #
    # Exceptions to this might include:
    # 1) You want to add custom CSS/fonts to style your task
    # 2) Your layout requires additional JS/assets to render
    # 3) You want to support additional keybinding magic
    #
    # if you want to use your own template,
    # please replace the string as a path to the template
    "base_html_template": "default",
    "header_file": "default",

    # This is where the actual HTML files will be generated
    "site_dir": "default",

}

Error-

ERROR:potato.flask_server:Exception on /annotate [POST]
Traceback (most recent call last):
  File "/home/naman/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/potato/flask_server.py", line 2100, in annotate_page
    did_change = update_annotation_state(username, request.form)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/potato/flask_server.py", line 1000, in update_annotation_state
    span_text, span_annotations = parse_html_span_annotation(span_annotation_html)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/naman/miniconda3/lib/python3.11/site-packages/potato/flask_server.py", line 2460, in parse_html_span_annotation
    middle_text = middle[: m3.start()]
                           ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'start'

Potato list all throws ModuleNotFoundError error

I installed the package using pip

pip install potato-annotation

Then I ran the command below

potato list all

I received the error below:

D:\potato>potato list all
Traceback (most recent call last):
  File "c:\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python38\Scripts\potato.exe\__main__.py", line 4, in <module>
  File "c:\python38\lib\site-packages\potato\cli.py", line 2, in <module>
    from potato.flask_server import main
  File "c:\python38\lib\site-packages\potato\flask_server.py", line 19, in <module>
    import simpledorff
ModuleNotFoundError: No module named 'simpledorff'

OS: Windows
Python version: 3.8.3
Pip version: 21.1.3

Previewing with Prolific before the study is published?

Hi! I am trying to use your interface to set up a study on Prolific. However, I do not want to publish my study on Prolific yet, but I would like to be able to preview the interface locally before I publish it. However, when I try to put a dummy string in place of the Prolific token, and also populate the Prolific config dictionary with dummy data, and then run the app, I still get the error "Please login to annotate or you are using the wrong link". Is there any way to get around this?

Thanks!
Myra

HTML elements in instance_text blocks span annotations

Hi,
First of all thanks for creating this nice tool!

I'm setting up a study and I'd like to use HTML markup (specifically span objects) in the instance texts that we display for annotation. The task is partially a span annotation task, so I'm using the "annotation_type": "highlight".

The highlighting works as expected when the text in "instance_text" is plain text. However, I'm looking to use some HTML elements to display additional text when people hover over the text. In my data I therefore wrap some text in tags. For this, I'm following the setup in the "match-finding" example project. My data and "text" column is analogue to what they include here: https://github.com/davidjurgens/potato/blob/master/example-projects/match_finding/data_files/pilot_data.csv

When I test the setup and try to highlight spans, the highlighting does not work anymore. I checked if it's HTML tags in general and included some bold-faced text , but this doesn't cause any issues.

I think it's specifically span tags which cause the highlighting to break. My best guess is that because I add a new element/node, the name of the parentElement changes and span annotations are not allowed. I don't know enough about java script to figure out how to avoid or fix this.

Any pointers would be super helpful!

Add support feature for shuffling options

I would like to suggest that the front-end support shuffling options to hide patterns, and return the original information to the back-end in the information sent.

It will be very useful for comparing multiple methods.

using arrows to move within textbox moves to different instances (contextual appropriateness)

New data file for existing examples do not work

Hello. Thanks a lot for this nice tool!

I'm having trouble running existing potatoes with my own data. I tried Offensiveness/Politeness which are similar and just changed the yaml file to point to my csv file. Although the log prints my filename, it still shows samples from the previous data file, and since my data file contains fewer samples, I keep getting a KeyError.

My csv header is:
predicted_label,button_color,confidence_score,nda_file_name,text,id

I'm using Python 3.10.12 and I have potato-annotation 1.2.1.5.

Change in the yaml file to use my csv file:

"data_files": [
      # "data_files/politeness_rating_3718.csv"
      "data_files/contdata.csv"
    ],

Terminal output:

potato --verbose --debug start politeness_rating/
multiple config files found, please select the one you want to use (number 0-2)
[0] politeness_2.yaml
[1] politeness.yaml
number: 1
starting server from politeness_rating/configs/politeness.yaml
the current working directory is: politeness_rating/
INFO:potato.flask_server:html_layout will be loaded from user-defined file templates/layout.html
DEBUG:potato.flask_server:Loading data from 1 files
DEBUG:potato.flask_server:Reading data from data_files/contdata.csv
DEBUG:potato.flask_server:Loaded 210 instances from data_files/contdata.csv
DEBUG:potato.flask_server:Found known user "debug_user"; loading annotation state
INFO:potato.flask_server:Loaded 1 annotations for known user "debug_user"
DEBUG:potato.flask_server:Found known user "3"; loading annotation state
INFO:potato.flask_server:Loaded 3 annotations for known user "3"
running at:
localhost:9001
 * Serving Flask app 'potato.flask_server'
 * Debug mode: off
debug user logging in
DEBUG:potato.flask_server:Found known user "debug_user"; loading annotation state
INFO:potato.flask_server:Loaded 1 annotations for known user "debug_user"
ERROR:potato.flask_server:Exception on /annotate [POST]
Traceback (most recent call last):
  File "python3.10/site-packages/flask/app.py", line 1463, in wsgi_app
    response = self.full_dispatch_request()
  File "python3.10/site-packages/flask/app.py", line 872, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "python3.10/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "python3.10/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "python3.10/site-packages/potato/flask_server.py", line 1992, in annotate_page
    save_user_state(username)
  File "python3.10/site-packages/potato/flask_server.py", line 1660, in save_user_state
    "displayed_text": instance_id_to_data[inst_id]["displayed_text"],
KeyError: '1953'```

![Screenshot from 2024-02-02 17-21-16](https://github.com/davidjurgens/potato/assets/12449653/05701c12-d1c7-4d08-a4f5-6dc4fafafbe3)

davidjurgens / potato Goto Github PK

potato's Introduction

🥔Potato: the POrtable Text Annotation TOol

Recent Updates

Feature hightlights

Easy setup and flexible for diverse needs

Improving Annotator Productivity

Knowing better about your annotators

Better quality control

Quick start

Start directly from the github repo

Baked potatoes

Design Team and Support

License

Cite us

potato's People

Contributors

Stargazers

Watchers

Forkers

potato's Issues

Reproduction

Error Description

Reproduction

Recommend Projects

Recommend Topics

Recommend Org

Jobs