GithubHelp home page GithubHelp logo

humansignal / label-studio-converter Goto Github PK

View Code? Open in Web Editor NEW
237.0 12.0 127.0 3.22 MB

Tools for converting Label Studio annotations into common dataset formats

Home Page: https://labelstud.io/

Python 100.00%
conll conll-2003 coco coco-ssd coco-image-dataset pascal-voc pascal-voc2012

label-studio-converter's Introduction

Label Studio Converter

WebsiteDocsTwitterJoin Slack Community

Table of Contents

Introduction

Label Studio Format Converter helps you to encode labels into the format of your favorite machine learning library.

Examples

JSON

Running from the command line:

pip install -U label-studio-converter
python label-studio-converter export -i exported_tasks.json -c examples/sentiment_analysis/config.xml -o output_dir -f CSV

Running from python:

from label_studio_converter import Converter

c = Converter('examples/sentiment_analysis/config.xml')
c.convert_to_json('examples/sentiment_analysis/completions/', 'tmp/output.json')

Getting output file: tmp/output.json

[
  {
    "reviewText": "Good case, Excellent value.",
    "sentiment": "Positive"
  },
  {
    "reviewText": "What a waste of money and time!",
    "sentiment": "Negative"
  },
  {
    "reviewText": "The goose neck needs a little coaxing",
    "sentiment": "Neutral"
  }
]

Use cases: any tasks

CSV

Running from the command line:

python label_studio_converter/cli.py --input examples/sentiment_analysis/completions/ --config examples/sentiment_analysis/config.xml --output output_dir --format CSV --csv-separator $'\t'

Running from python:

from label_studio_converter import Converter

c = Converter('examples/sentiment_analysis/config.xml')
c.convert_to_csv('examples/sentiment_analysis/completions/', 'output_dir', sep='\t', header=True)

Getting output file tmp/output.tsv:

reviewText	sentiment
Good case, Excellent value.	Positive
What a waste of money and time!	Negative
The goose neck needs a little coaxing	Neutral

Use cases: any tasks

CoNLL 2003

Running from the command line:

python label_studio_converter/cli.py --input examples/named_entity/completions/ --config examples/named_entity/config.xml --output tmp/output.conll --format CONLL2003

Running from python:

from label_studio_converter import Converter

c = Converter('examples/named_entity/config.xml')
c.convert_to_conll2003('examples/named_entity/completions/', 'tmp/output.conll')

Getting output file tmp/output.conll

-DOCSTART- -X- O
Showers -X- _ O
continued -X- _ O
throughout -X- _ O
the -X- _ O
week -X- _ O
in -X- _ O
the -X- _ O
Bahia -X- _ B-Location
cocoa -X- _ O
zone, -X- _ O
...

Use cases: text tagging

COCO

Running from the command line:

python label_studio_converter/cli.py --input examples/image_bbox/completions/ --config examples/image_bbox/config.xml --output tmp/output.json --format COCO --image-dir tmp/images

Running from python:

from label_studio_converter import Converter

c = Converter('examples/image_bbox/config.xml')
c.convert_to_coco('examples/image_bbox/completions/', 'tmp/output.conll', output_image_dir='tmp/images')

Output images could be found in tmp/images

Getting output file tmp/output.json

{
  "images": [
    {
      "width": 800,
      "height": 501,
      "id": 0,
      "file_name": "tmp/images/62a623a0d3cef27a51d3689865e7b08a"
    }
  ],
  "categories": [
    {
      "id": 0,
      "name": "Planet"
    },
    {
      "id": 1,
      "name": "Moonwalker"
    }
  ],
  "annotations": [
    {
      "id": 0,
      "image_id": 0,
      "category_id": 0,
      "segmentation": [],
      "bbox": [
        299,
        6,
        377,
        260
      ],
      "ignore": 0,
      "iscrowd": 0,
      "area": 98020
    },
    {
      "id": 1,
      "image_id": 0,
      "category_id": 1,
      "segmentation": [],
      "bbox": [
        288,
        300,
        132,
        90
      ],
      "ignore": 0,
      "iscrowd": 0,
      "area": 11880
    }
  ],
  "info": {
    "year": 2019,
    "version": "1.0",
    "contributor": "Label Studio"
  }
}

Use cases: image object detection

Pascal VOC XML

Running from the command line:

python label_studio_converter/cli.py --input examples/image_bbox/completions/ --config examples/image_bbox/config.xml --output tmp/voc-annotations --format VOC --image-dir tmp/images

Running from python:

from label_studio_converter import Converter

c = Converter('examples/image_bbox/config.xml')
c.convert_to_voc('examples/image_bbox/completions/', 'tmp/output.conll', output_image_dir='tmp/images')

Output images can be found in tmp/images

Corresponding annotations could be found in tmp/voc-annotations/*.xml:

<?xml version="1.0" encoding="utf-8"?>
<annotation>
<folder>tmp/images</folder>
<filename>62a623a0d3cef27a51d3689865e7b08a</filename>
<source>
<database>MyDatabase</database>
<annotation>COCO2017</annotation>
<image>flickr</image>
<flickrid>NULL</flickrid>
</source>
<owner>
<flickrid>NULL</flickrid>
<name>Label Studio</name>
</owner>
<size>
<width>800</width>
<height>501</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>Planet</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>299</xmin>
<ymin>6</ymin>
<xmax>676</xmax>
<ymax>266</ymax>
</bndbox>
</object>
<object>
<name>Moonwalker</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>288</xmin>
<ymin>300</ymin>
<xmax>420</xmax>
<ymax>390</ymax>
</bndbox>
</object>
</annotation>

Use cases: image object detection


YOLO to Label Studio Converter

YOLO directory structure

Check the structure of YOLO folder first, keep in mind that the root is /yolo/datasets/one.

/yolo/datasets/one
  images
   - 1.jpg
   - 2.jpg
   - ...
  labels
   - 1.txt
   - 2.txt

  classes.txt

classes.txt example

Airplane
Car

Usage

label-studio-converter import yolo -i /yolo/datasets/one -o ls-tasks.json --image-root-url "/data/local-files/?d=one/images"

Where the URL path from ?d= is relative to the path you set in LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT.

Note for Local Storages

  • It's very important to set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/yolo/datasets (not to /yolo/datasets/one, but /yolo/datasets) for Label Studio to run.
  • Add a new Local Storage in the project settings and set Absolute local path to /yolo/datasets/one/images (or c:\yolo\datasets\one\images for Windows).

Note for Cloud Storages

  • Use --image-root-url to make correct prefixes for task URLs, e.g. --image-root-url s3://my-bucket/yolo/datasets/one.
  • Add a new Cloud Storage in the project settings with the corresponding bucket and prefix.

Help command

label-studio-converter import yolo -h

usage: label-studio-converter import yolo [-h] -i INPUT [-o OUTPUT]
                                          [--to-name TO_NAME]
                                          [--from-name FROM_NAME]
                                          [--out-type OUT_TYPE]
                                          [--image-root-url IMAGE_ROOT_URL]
                                          [--image-ext IMAGE_EXT]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        directory with YOLO where images, labels, notes.json
                        are located
  -o OUTPUT, --output OUTPUT
                        output file with Label Studio JSON tasks
  --to-name TO_NAME     object name from Label Studio labeling config
  --from-name FROM_NAME
                        control tag name from Label Studio labeling config
  --out-type OUT_TYPE   annotation type - "annotations" or "predictions"
  --image-root-url IMAGE_ROOT_URL
                        root URL path where images will be hosted, e.g.:
                        http://example.com/images or s3://my-bucket
  --image-ext IMAGE_EXT
                        image extension to search: .jpg, .png

Tutorial: Importing YOLO Pre-Annotated Images to Label Studio using Local Storage

This tutorial will guide you through the process of importing a folder with YOLO annotations into Label Studio for further annotation. We'll cover setting up your environment, converting YOLO annotations to Label Studio's format, and importing them into your project.

Prerequisites

  • Label Studio installed locally
  • YOLO annotated images and corresponding .txt label files in the directory /yolo/datasets/one.
  • label-studio-converter installed (available via pip install label-studio-converter)

Step 1: Set Up Your Environment and Run Label Studio

Before starting Label Studio, set the following environment variables to enable Local Storage file serving:

Unix systems:

export LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
export LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/yolo/datasets
label-studio

Windows:

set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\\yolo\\datasets
label-studio

Replace /yolo/datasets with the actual path to your YOLO datasets directory.

Step 2: Setup Local Storage

  1. Create a new project.
  2. Go to the project settings and select Cloud Storage.
  3. Click Add Source Storage and select Local files from the Storage Type options.
  4. Set the Absolute local path to /yolo/datasets/one/images or c:\yolo\datasets\one\images on Windows.
  5. Click Add storage.

Check more details about Local Storages in the documentation.

Step 3: Verify Image Access

Before importing the converted annotations from YOLO, verify that you can access an image from your Local storage via Label Studio. Open a new browser tab and enter the following URL:

http://localhost:8080/data/local-files/?d=one/images/<your_image>.jpg

Replace one/images/<your_image>.jpg with the path to one of your images. The image should display in the new tab of the browser. If you can't open an image, the Local Storage configuration is incorrect. The most likely reason is that you made a mistake when specifying your Path in Local Storage settings or in LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT.

Note: The URL path from ?d= should be relative to LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/yolo/datasets, it means that the real path will be /yolo/datasets/one/images/<your_image>.jpg and this image should exist on your hard drive.

Step 4: Convert YOLO Annotations

Use the label-studio-converter to convert your YOLO annotations to a format that Label Studio can understand:

label-studio-converter import yolo -i /yolo/datasets/one -o output.json --image-root-url "/data/local-files/?d=one/images"

Step 5: Import Converted Annotations

Now import the output.json file into Label Studio:

  1. Go to your Label Studio project.
  2. From the Data Manager, click Import.
  3. Select the output.json file and import it.

Step 6: Verify Annotations

After importing, you should see your images with the pre-annotated bounding boxes in Label Studio. Verify that the annotations are correct and make any necessary adjustments.

Troubleshooting

If you encounter issues with paths or image access, ensure that:

  • The LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT is set correctly.
  • The --image-root-url in the conversion command matches the relative path:
`Absolute local path from Local Storage Settings` - `LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT` = `path for --image_root_url`

e.g.:

/yolo/datasets/one/images - /yolo/datasets/ = one/images

Contributing

We would love to get your help for creating converters to other models. Please feel free to create pull requests.

License

This software is licensed under the Apache 2.0 LICENSE © Heartex. 2020

label-studio-converter's People

Contributors

alebmutt avatar bramschermer avatar cdpath avatar changrq avatar dependabot[bot] avatar farioas avatar fcakyon avatar ferenc-hechler avatar hakan458 avatar hlomzik avatar jangsiye avatar jombooth avatar konstantinkorotaev avatar loveychen avatar makseq avatar maliubiao avatar mekcyed avatar nikitabelonogov avatar niklub avatar p0p4k avatar posionus avatar r-dh avatar rasmusedvardsen avatar rgaiacs avatar robot-ci-heartex avatar smoreface avatar triklozoid avatar twsl avatar vkhizanov avatar xumix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

label-studio-converter's Issues

web/cli interface not found

Hi at all,
I was trying to use label studio to create a COCO-like dataset but from the web interface I can't see COCO option inside the select box inside the export page (only json, json_mini, csv, tsv are available).
Neither I can find the cli.py to run.
It looks like only label_studio_converter files are installed.
Thanks 😄

'Annotation' directory created twice

Thank you for this great project !

I report this small issue I met when extracting labeled files in VOC format.
The folder Annotations is created twice.

When I run the command :

python cli.py --input /path/to/completions/ --config /path/to/config.xml --output /path/to/export/export1/Annotations --format VOC --image-dir /path/to/export/export1/images

This is the structure I get inside export1/:

/Annotations
    /Annotations
       |
        --- file1.xml
       |
        --- file2.xml
/images
       |
        --- file1.png
       |
        --- file2.png

By now I try to flatten the duplicate folder manually

How to convert Relation annotation output to Spacy Binary format for relational model training?

Hi,
Please help to convert the Label studio output after annotation of documents for relation model. Spacy accepts "token_Start" and "token_end" with "Start" and "End" key values. But Label studio only outputs Start and End key and No "token_start" and "Token_end " is keyed. Please help me to get "token_start" and "token_end" from the annotated dataset or help me to convert this label studio output to spacy binary file for training relation model.
Following is the snap of output from Label studio.:

[
  {
    "id": 6,
    "annotations": [
      {
        "id": 3,
        "completed_by": {
          "id": 2,
          "email": "[email protected]",
          "first_name": "",
          "last_name": ""
        },
        "result": [
          {
            "value": {
              "start": 9,
              "end": 63,
              "text": "Synergy One Lending, Inc. dba Mutual of Omaha Mortgage",
              "labels": [
                "PARTY NAME"
              ]
            },
            "id": "fkVqYYR_P7",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "value": {
              "start": 0,
              "end": 8,
              "text": "BORROWER",
              "labels": [
                "PARTY ROLE"
              ]
            },
            "id": "62m6jvJopr",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "value": {
              "start": 64,
              "end": 102,
              "text": "5716 Corsa Avenuew, Sulle 102 Westlake",
              "labels": [
                "PARTY ADDRESS"
              ]
            },
            "id": "ZMBV98QR7N",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "from_id": "62m6jvJopr",
            "to_id": "fkVqYYR_P7",
            "type": "relation",
            "direction": "right",
            "labels": [
              "ROLE"
            ]
          },
          {
            "from_id": "ZMBV98QR7N",
            "to_id": "fkVqYYR_P7",
            "type": "relation",
            "direction": "right",
            "labels": [
              "ADDRESS"
            ]
          }
        ],
        "was_cancelled": false,
        "ground_truth": false,
        "created_at": "2021-09-04T08:14:02.157201Z",
        "updated_at": "2021-09-04T08:14:02.157201Z",
        "lead_time": 80162.164,
        "prediction": {},
        "result_count": 0,
        "task": 6
      }
    ],
    "predictions": [],
    "file_upload": "New_Text_Document_2_E6Don9E.txt",
    "data": {
      "text": "BORROWER Synergy One Lending, Inc. dba Mutual of Omaha Mortgage 5716 Corsa Avenuew, Sulle 102 Westlake village."
    },
    "meta": {},
    "created_at": "2021-09-03T09:54:27.365638Z",
    "updated_at": "2021-09-03T09:54:27.365638Z",
    "project": 6
  }
]

Error converting to COCO using any of the `config.xml` configuration

FYI, I'm using this for a specific project, and I need to convert to COCO JSON format.

I'm using this config.xml configuration obtained from here:
https://github.com/heartexlabs/label-studio/blob/fe31f6d300564db4fe8afb3cfeb01d0f4994b698/label_studio/annotation_templates/computer-vision/object-detection-with-bounding-boxes/config.yml#L8-L14

To reproduce:

json_path = "label_studio_segmentation.json"  # with the JSON format exported from Label Studio
converter = Converter(config="config.xml", project_dir=None)
converter.convert_to_coco(
    json_path,
    output_dir="tmp",
    is_dir=False
)

Error:

Traceback (most recent call last):
  File "c:/Users/user/Desktop/ANSON/Python Scripts/test_labelstudio_converter.py", line 37, in <module>
    converter.convert_to_coco(
  File "c:\Users\user\Desktop\ANSON\Python Scripts\label_studio_converter_ORI\label_studio_converter\converter.py", line 514, in convert_to_coco
    categories, category_name_to_id = self._get_labels()
  File "c:\Users\user\Desktop\ANSON\Python Scripts\label_studio_converter_ORI\label_studio_converter\converter.py", line 917, in _get_labels
    labels |= set(info["labels"])
KeyError: 'labels'

Feedback:
The issues seem to arise from the config passed into the Converter class, and also the parse_config function shown below?
https://github.com/heartexlabs/label-studio-converter/blob/69141b057776546b39625129dda2fa42cd156575/label_studio_converter/converter.py#L136-L144
I'm not sure... I'm wondering if I should just change them entirely to make it work... Please let me know if there is any solution.

CSV output should use simple field values for labels (not convoluted JSON)

Unfortunately, the CSV output exported by Label Studio uses JSON for the label .
See dog-example-project-35-at-2021-10-07-18-48-22cb3c67.csv. This makes it hard to review the data in spreadsheets,.

Instead, the label should be extracted as a simple string value, as with the other converters (e.g., CONLL). In addition, each annotation should be on a separate line. For example, 15 distinct annotations are packed into a single line in the above example!

For the expected output see the attached desired-dog-example-project-35-at-2021-10-07-18-49-22cb3c67.csv.

Note that this is not a feature request: I was baffled when I found out about this behavior. For example, why bother having a CSV format if the important part must be processed with a JSON utility?!

yolo preannotated dataset

Hey, we are moving our operation to label studio from cvat. we have a bunch of pre-annotated images that exist in yolos 1 txt file per image thing. Do you know a way i can batch convert these to label studio's json format?

Cannot export data in YOLO format

Hi,
when I try to export my annotated data in the YOLO format the server crashes after few secounds.

Log:
-24 13:26:36,150] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #500

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #501

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #502

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #503

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #504

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #505

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #506

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #507

-24 13:26:36,152] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #508

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #509

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #510

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #511

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #512

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #513

-24 13:26:36,153] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #514

-24 13:26:41,123] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1309

-24 13:26:41,123] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1309

-24 13:26:42,086] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1836

-24 13:26:42,086] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1837

-24 13:26:42,089] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1841

-24 13:26:42,089] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1842

-24 13:26:42,089] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1843

-24 13:26:42,093] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1850

-24 13:26:42,139] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1888

-24 13:26:42,140] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1889

-24 13:26:42,140] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1890

-24 13:26:42,140] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1891

-24 13:26:42,140] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1892

-24 13:26:42,161] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1915

-24 13:26:42,161] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1916

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1936

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1937

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1938

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1939

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1940

-24 13:26:42,181] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #1941

-24 13:26:42,470] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2155

-24 13:26:42,470] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2156

-24 13:26:43,053] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2509

-24 13:26:43,054] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2510

-24 13:26:43,054] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2511

-24 13:26:43,054] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2512

-24 13:26:43,055] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2513

-24 13:26:43,055] [label_studio_converter.converter::convert_to_yolo::557] [WARNING] No completions found for item #2514

Thank you!

YOLO format import does not import image

Hello,

I recently tried to import YOLO format into label studio

I always get this error and the image does not display in the tasks tab. Just the annotations.

There was an issue loading URL from $image value

Things to look out for:

    URL is valid
    URL scheme matches the service scheme, i.e. https and https
    The static server has wide-open CORS, [more on that here](https://labelstud.io/guide/storage.html#Troubleshoot-CORS-and-access-problems)

Technical description:
URL: [/data/local-files/?d=/0fdb2623-25997_5.jpg](http://localhost:8081/data/local-files/?d=/0fdb2623-25997_5.jpg)

Span shift in the exported data

the downloaded annotated files have a span shift, the start-end of a label doesn't match the correct word in the document

Allow multiple completions export for CONLL2003

Hey, I made a PR for allowing multiple completions in an export when exporting CONLL2003 format.
It takes the latest added unskipped completion, or None if none exist.
Of course I don't know if this is the desired behavior, so let me know what you think.

#7

Export to COCO error

Version 0.0.31rc1 worked well , but the following error comes when I use "convert_to_coco" in the after versions.
<no 'labels'>

In commit 1ea3768 ,
"labels |= set(info['labels'])" was added in "_get_labels( )". I guess that's the point.

Should I use tag in labelstudio config to use convert_to_coco???

Label Studio Polygons Unsupported

I'd like to convert a list of JSONs I generated with the polygon tool in label studio to COCO format but it doesn't look like the code handles this use case.

Export impossible COCO/VOC/YOLO format

I'm unable to export my annotations in COCO/VOC/YOLO format in a newly created project (for object detection, ~300 images, ~300 annotations, with only 2 classes in rectangle labels).

Here is my stacktrace for Yolo export ( IndexError also in VOC/COCO ):

Traceback (most recent call last):
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/django/utils/decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/label_studio/data_export/api.py", line 103, in get
    export_stream, content_type, filename = DataExport.generate_export_file(project, tasks, export_type, request.GET)
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/label_studio/data_export/models.py", line 83, in generate_export_file
    converter.convert(input_json, tmp_dir, output_format, is_dir=False)
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/label_studio_converter/converter.py", line 172, in convert
    self.convert_to_yolo(input_data, output_data, output_image_dir=image_dir,
  File "/home/Theo.Henaff/miniconda3/envs/yolo_v5_env/lib/python3.8/site-packages/label_studio_converter/converter.py", line 565, in convert_to_yolo
    category_name = label['rectanglelabels'][0]
IndexError: list index out of range

My labelling template is based on previous working projects (where exports were fine) :

<View style="display:flex;align-items:start;gap:8px;flex-direction:column-reverse">
  <Image name="image" value="$image" zoom="true" rotateControl="true" zoomControl="true"/>
  <Filter toName="label" minlength="0" name="filter"/>
  <RectangleLabels name="label" toName="image" showInline="true">
    <Label value="Iban" background="#995AE6"/>
    <Label value="TitulaireCompte" background="#43E6CD"/>
  </RectangleLabels>
</View>

Working with label-studio 1.1, python 3.8 in conda env, ubuntu 20.04

I have annotate the same images in another project (with different albels) and it was fine. I have also done much bigger projects and they exported without errors.
The problem seems to come from my side but I have no clue about how to it has come / how to resolve it.

YOLO format disabled

when exporting in labelStudio , the popup format option with YOLO format is disabled.

Screenshot 2022-04-14 at 11 25 16 AM

still WIP?

Interpretation of RLE values

Hello!

May I ask what the RLE values stored for brush segmentations stand for? I am familiar with this encoding method from COCO, but I was used to them being understood as pixel counts. For instance, an RLE vector such as [234, 54, 103, 3,...] would mean that the first 234 pixels (counting from the top of the image, from left to right) are not masked, the following 54 are masked, the next 103 not masked, then 3 masked again and so on. But apparently, this is not the meaning that Label Studio is using, right? Is it something more like [start_pixel_1, num_masked_pixels_1, start_pixel_2, num_masked_pixels_2...]?

Thank you!

JSONL export format

Hey guys,
first of all, thanks a lot for the amazing job. It is simply the best annotation tool off the shelf.

I was wondering whether JSONL is a format that can be taken into consideration at some point?
I find it much handy when working with huge datasets: it allows to lazily load the JSON entries on by one so that storing the entire JSON object in memory is not needed.

show unicode more user-friendly in exported csv files

https://github.com/heartexlabs/label-studio-converter/blob/a4498f8b352196dd91035bdbc75de5848fa1c528/label_studio_converter/converter.py?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L384

When exporting project to a csv file, it shows multiple-choices in unicode format (like \uxxxx), and it is not user friendly.

I am thinking if we can add ensure_ascii=False option to json.dumps(), and the final code comes like

for name, value in item['output'].items():
        pretty_value = self._prettify(value)
        record[name] = pretty_value if isinstance(pretty_value, str) else json.dumps(pretty_value, ensure_ascii=False)

Incorrect tokenization in CoNLL 2003 export

I'm evaluating a number of different annotation interfaces and am currently looking at label-studio.

I noticed when I exported as json and colnll2003 I got quite different results for the same string. The reason seems to be the tokenization. I'm trying to identify named entities from IoT device identification strings - the string usually have delimiters like - or . in place of space (or none at all). i.e.

AHU-G1-V-1-1-Ctrl-Md

I tagged AHU-G1-V-1-1 and Ctrl-Md, when exported as json I get the correct tokens

When I export as colnll2003 it seems to apply some form of word tokenization so it creates a single token "AHU-G1-V-1-1-Ctrl-Md -X- _ B-Equip"

I'm not sure if this is a label-studio issue, but prodigy for example applies tokenization before annotation to ensure that the start / end of the selection is align with a token.

make create_tokens_and_tags robust against missing annotation fields

Hi, I ran into a problem with the CONLL converter failing due to a missing label in the annotations.

Although this is a problem with the label studio proper, it would be good for the converter to be more robust. For example, any access to the hash should use h.get('k', default) rather than h['k']. The default should allow the code to produce a reasonable approximation, as follows:

span['labels'][0]
=>
span.get('labels', ["_missing_"])[0]

Alternatively, having better exception handling would be good, provided it can pinpoint the source of the error.

Here's a simple illustration:

create_tokens_and_tags("My dog has ugly fleas",
                       [{'start': 3,
                         'end': 6,
                         'text': 'dog',
                         'type': 'Labels'}])
=>
KeyError: 'labels'

The attached file _bad-annotation.json.txt contains an actual export with the issue.

See extract_tokens_and_tags in the attached file (misc_converter.py.txt), which is in the context of an XML-extension I am working on.

As mentioned in issue #15, this needs work before it is ready for a pull request.

Can't Export VOC Image Segmentation

Hi,
I'm getting the following error "FileNotFoundError: [Error 2] No such file or directory" when I try to export the labelled images using the VOC format.

function brush.decode_from_annotation returns empty dict

The function

brush.decode_from_annotation

returns empty dict when called using the annotations results.
test_annot_result_json.txt

how to reproduce:

import json
from label_studio_converter import brush
with open("./test_annot_result_json.txt", 'r') as f:
    annot = f.read()
result = eval(annot)
brush.decode_from_annotation('annot', r)

The result is:
{}

But it should be:

RLE params: 605208 values 8 word_size [3, 4, 8, 16] rle_sizes
RLE params: 605208 values 8 word_size [3, 4, 8, 16] rle_sizes
RLE params: 605208 values 8 word_size [3, 4, 8, 16] rle_sizes

{'annot-severity_1-0': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
'annot-severity_0.66-0': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
'annot-severity_0.33-0': array([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
...,
[255, 255, 255, ..., 0, 0, 0],
[255, 255, 255, ..., 0, 0, 0],
[255, 255, 255, ..., 0, 0, 0]], dtype=uint8)}

Documents delimiter for CoNLL 2003 NER

Exported annotated data in CoNLL 2003 NER format cannot be imported in SpaCy.
SpaCy expects documents to be separated using -DOCSTART- -X- O O line and sentences with whitespaces as per its documentation for converting CoNLL-2003 NER format to json.
https://spacy.io/api/cli#convert

Should this be handled in the converter? If yes, I can push a PR to fix it.

add export roLabelImg format

Because labelstudio already support label ROTATED rectangle regions whether it can support the exported roLabelImg format

<annotation verified="yes">
  <folder>hsrc</folder>
  <filename>100000001</filename>
  <path>/Users/haoyou/Library/Mobile Documents/com~apple~CloudDocs/OneDrive/hsrc/100000001.bmp</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>1166</width>
    <height>753</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <type>bndbox</type>
    <name>ship</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>178</xmin>
      <ymin>246</ymin>
      <xmax>974</xmax>
      <ymax>504</ymax>
    </bndbox>
  </object>
  <object>
    <type>robndbox</type>
    <name>ship</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <robndbox>
      <cx>580.7887</cx>
      <cy>343.2913</cy>
      <w>775.0449</w>
      <h>170.2159</h>
      <angle>2.889813</angle>
    </robndbox>
  </object>
</annotation>

CONLL2003 conversion where the span is wider than the entity

Label Studio allows a span to be either within a token, or to include whitespace around a token.

For example if the text is "The film (1988) stars Frankie James as the protagonist" then one span could be just "1988" and another could be " Frankie James".

The current exporting functionality does not support these situations well in my use case.

I'm not clear what the official behaviour should be in these cases, but in order to support my use case I've implemented the code here: alanbuxton@153044b

If you think this could be helpful to the label-studio-converter community then I'm happy to submit a PR.

Pull request: In the readme please fix the convert_to_csv

Hi

The convert_to_csv example in main readme should be
c.convert_to_csv('examples/sentiment_analysis/completions/', 'tmp', sep='\t', header=True)

basically output destination is directory e.g. 'tmp' not a file

instead of current
c.convert_to_csv('examples/sentiment_analysis/completions/', 'tmp/output.tsv', sep='\t', header=True)

How to fix the order of id and name in YOLO format

When export the same tags of object detection in YOLO format, the tag order of two datasets are different:

{
  "categories": [
    {
      "id": 0,
      "name": "list_describe"
    },
    {
      "id": 1,
      "name": "discount"
    },
    {
      "id": 2,
      "name": "member"
    },
    {
      "id": 3,
      "name": "card_price"
    },
    {
      "id": 4,
      "name": "card_name"
    },
    {
      "id": 5,
      "name": "card_img"
    },
    {
      "id": 6,
      "name": "card"
    }
  ],
  "info": {
    "year": 2022,
    "version": "1.0",
    "contributor": "Label Studio"
  }
}
{
  "categories": [
    {
      "id": 0,
      "name": "card_img"
    },
    {
      "id": 1,
      "name": "card_name"
    },
    {
      "id": 2,
      "name": "card_price"
    },
    {
      "id": 3,
      "name": "card"
    },
    {
      "id": 4,
      "name": "member"
    },
    {
      "id": 5,
      "name": "discount"
    },
    {
      "id": 6,
      "name": "list_describe"
    }
  ],
  "info": {
    "year": 2022,
    "version": "1.0",
    "contributor": "Label Studio"
  }
}

What is the logic of the name order and how to fix the order between two Projects?

mask -> rle problem

I wrote these lines for mask -> rle,but the length of rle is very long,so the json file was big. What is the correct way to convert mask to rle?

seg= cv2.imread('1.png')
h, w = seg.shape[0], seg.shape[1]
label_img = cv2.cvtColor(seg, cv2.COLOR_BGR2GRAY)
contours, hierarchy = cv2.findContours(label_img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
mask_im = np.zeros((h, w, 4))
mask_contours = cv2.drawContours(mask_im, contours, -1, color=(0, 255, 0, 100), thickness=-1)
rle = encode_rle(mask_contours.ravel().astype(int))
print(len(rle))

JSON to CSV converter ignores multiple annotations of each task (LS 1.0)

This relates to a point discussed with @niklub in the LS Slack workspace.

If I have a JSON file with multiple annotations of a single task, running the CSV conversion command (e.g. python backend/converter/cli.py --input examples/sentiment_analysis/completions/ --config examples/sentiment_analysis/config.xml --output output_dir --format CSV --csv-separator $'\t') outputs a CSV file with just one line: the most recent annotations. Earlier annotations are not reflected in the CSV.

I think what would make most sense is for each row in the CSV to be its own task-annotation datapoint(s). So if there were 2 tasks with 2 annotations each, the CSV would have the following rows:

  1. header
  2. task 1 user/annotation 1
  3. task 1 user/annotation 2
  4. task 2 user/annotation 1
  5. task 2 user/annotation 2

YOLO Export: download of imported and uploaded images does not work together with LABEL_STUDIO_HOST

We have installed Label-Studio in a Kubernetes cluster and the LABEL_STUDIO_HOST environment varibale is used to configure the path based routing "https://[host]/[path-prefix]", e.g. "https://myhost.com/ls-path".

This means, in the Annotation-JSON-Export the image url is "https://myhost.com/ls-path/data/local-files/?d=testimgs/image1.png" for synchronized images from the local filesystem:

	{
		"id": 1,
		"annotations": [
			...
		],
		...
		"data": {
			"image": "https:\/\/myhost.com\/ls-path\/data\/local-files\/?d=testimgs\/image1.png"
		},
		...
		"project": 1,
		...
	},
	{
		"id": 2,
		...
		"data": {
			"image": "\/ls-path\/\/data\/upload\/1\/6a9a9ea2-image2.png"
		},
	}

For uploaded images the url is set up like this: "/ls-path//data/upload/1/6a9a9ea2-image2.png"
I think the double "//" between "ls-path" and "data" is an error which is located somewhere in the upload.
But even without the double "//" the correct location of the uploaded image can not be detected.

The problem that we have is, that the prefix added by LABEL_STUDIO_HOST is not recognized.
So, instead of copying the files locally the YOLO export downloads the synchronized images from the given URLs.
For the local files this would be fine, but unfortunately the download from the public hostname fails due to authentication issues.
Maybe this is specific to our settings.

But the uploaded images, which should be copied from the project dir, fail completely, becaus the image path is tried to be download as URL.

When exporting in YOLO format the images are "downloaded" with the method "utils.download()":
https://github.com/heartexlabs/label-studio-converter/blob/598420012b5cb6e9cd5283e62887ff33af36d0bb/label_studio_converter/utils.py#L103

The code has a special handling for local and uploaded files:

    is_local_file = url.startswith('/data/') and '?d=' in url
    is_uploaded_file = url.startswith('/data/upload')

So, for our image1.png the variable "is_local_file" should be true and for our image2.png the variable "is_uploaded_file" should be true.
In both cases, the variables are false, because of the wrong assumption, that the url startswith "/data/" does not match.

For the is_local_file check, the prefix should be f"{LABEL_STUDIO_HOST}/data/" and for the is_uploaded_file check it should be f"{PATH_PREFIX}/data/", where PATH_PREFIX is only the path prefix from LABEL_STUDIO_HOST, here "/ls-path",

I have no deeper understanding, how the connection between label-studio configuration and label-studio-converter works, but the following code shows a working example, how this problem could be worked around.

I don´t think, that this is the correct solution, but it should point out, what is missing.

So, as a utility function we copied the sources from the label-studio settings:
https://github.com/heartexlabs/label-studio/blob/e1111d4708e06e0f5885f397d1b904f146c6aa4c/label_studio/core/settings/base.py#L28

import re
def getEnvParams():
    FORCE_SCRIPT_NAME = None
    # Hostname is used for proper path generation to the resources, pages, etc
    # HOSTNAME = get_env('HOST', '')   # get_env() adds the prefix "LABEL_STUDIO_" or "HEARTEX_"
    HOSTNAME = os.environ.get('LABEL_STUDIO_HOST', '')
    
    if HOSTNAME:
        if not HOSTNAME.startswith('http://') and not HOSTNAME.startswith('https://'):
            logger.info(
                "! HOST variable found in environment, but it must start with http:// or https://, ignore it: %s", HOSTNAME
            )
            HOSTNAME = ''
        else:
            logger.info("=> Hostname correctly is set to: %s", HOSTNAME)
            if HOSTNAME.endswith('/'):
                HOSTNAME = HOSTNAME[0:-1]
    
            # for django url resolver
            if HOSTNAME:
                # http[s]://domain.com:8080/script_name => /script_name
                pattern = re.compile(r'^http[s]?:\/\/([^:\/\s]+(:\d*)?)(.*)?')
                match = pattern.match(HOSTNAME)
                FORCE_SCRIPT_NAME = match.group(3)
                if FORCE_SCRIPT_NAME:
                    logger.info("=> Django URL prefix is set to: %s", FORCE_SCRIPT_NAME)
                    
    LOCAL_FILES_DOCUMENT_ROOT = os.environ.get('LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT', '')
                        
    return HOSTNAME, FORCE_SCRIPT_NAME, LOCAL_FILES_DOCUMENT_ROOT 

Now we get the required information for accessing local files:

HOSTNAME="https://myhost.com/ls-path"
FORCE_SCRIPT_NAME="/ls-path"
LOCAL_FILES_DOCUMENT_ROOT ="/path/to/local/files"

The utils.download() method can now be rewritten in the following way:

def download(url, output_dir, filename=None, project_dir=None, return_relative_path=False, upload_dir=None,
             download_resources=True):
    HOSTNAME, FORCE_SCRIPT_NAME, LOCAL_FILES_DOCUMENT_ROOT = getEnvParams()
    print("HOSTNAME=", HOSTNAME, ", FORCE_SCRIPT_NAME=", FORCE_SCRIPT_NAME, ", LOCAL_FILES_DOCUMENT_ROOT=", LOCAL_FILES_DOCUMENT_ROOT)
    
    is_local_file = url.startswith(f'{HOSTNAME}/data/') and '?d=' in url
    # special handling to fix duplicate "//" before "data"
    if url.startswith(f'{FORCE_SCRIPT_NAME}//data/upload'):
        FORCE_SCRIPT_NAME = f'{FORCE_SCRIPT_NAME}/'
    is_uploaded_file = url.startswith(f'{FORCE_SCRIPT_NAME}/data/upload')

    if is_uploaded_file:
        upload_dir = _get_upload_dir(project_dir, upload_dir)
        filename = url.replace(f'{FORCE_SCRIPT_NAME}/data/upload/', '')
        filepath = os.path.join(upload_dir, filename)
        logger.debug(f'Copy {filepath} to {output_dir}'.format(filepath=filepath, output_dir=output_dir))
        if download_resources:
            shutil.copy(filepath, output_dir)
        if return_relative_path:
            return os.path.join(os.path.basename(output_dir), filename)
        return filepath

    if is_local_file:
        filename, dir_path = url.split(f'{HOSTNAME}/data/', 1)[-1].split('?d=')
        dir_path = str(urllib.parse.unquote(dir_path))
        if not os.path.exists(dir_path):
            if filename == 'local-files/':
                print(dir_path)
                filename = os.path.basename(dir_path)
                dir_path = os.path.dirname(dir_path)
                dir_path = os.path.join(LOCAL_FILES_DOCUMENT_ROOT, dir_path)
            else:
                raise FileNotFoundError(dir_path)
        filepath = os.path.join(dir_path, filename)
        if download_resources:
            shutil.copy(filepath, output_dir)
            if return_relative_path:
                return os.path.join(os.path.basename(output_dir), filename)
        else:
            if return_relative_path:
                raise NotImplementedError()    
        return filepath

    if filename is None:
        ...
    return filepath

For is_uploaded_file==true there are no further changes neccessary, besides the changed startswith() comparison.

For is_local_file==true we had to change much more and struggled somewhat with the meaning of filename and dir_path.
I tested the changed code and it works for our examples, but I can not understand how the original code was thought to work.

We tested this with a fresh docker buld from the label-studio develop branch.

YOLO Labeling Format

Hello,

It will be nice to have the option to export the annotations directly to the YOLO txt format when doing Object Detection with Bounding Boxes projects without having to export to another format than using external script to do the conversion.

I would like to hear from you if this feature can/will be implemented ? And thanks

Please update new README file?

Thanks for the conversion tool!
When i use now source to convert my data export from label-studio,next error has occured.

converter/label_studio_converter/sample.py", line 3, in <module>
    c = Converter('/Users/gaohe/Downloads/config.xml')
TypeError: __init__() missing 1 required positional argument: 'project_dir'

Can you update the Readme file on how to use it.
The converter parameters like config. xml are what file?
the project_dir is the path to what.
How do I set these parameters if I am starting from docker?
Thanks

Export taking a lot of time.

I uploaded some 50 Images. After labelling I tried to export in YOLO format. But it takes a lot of time, which is giving very bad User Experience. Is there any way to reduce this export time?

Error occurred when loading data ValueError: Expected object or value

Dear LabelStudio team,
we are making labels on project ACPCFMST (I could write link if needed in private message) and today all completions are lost and there is some errors on project:

  1. on Task page there is error message: "Error occurred when loading data ValueError: Expected object or value"
  2. on Settings page: "label config: expected object or value"
  3. on Export page: "JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
    Is there any chance to get our lables or we need to start new project?

How can I relate the filenames in VOC export to originals

Hi,

I am trying to make an export of my data in VOC format but then the filenames get a sort of hashed?

Original file name: 922d9ea2-a2a7-4a13-8d57-138404fcca2a.png
Filename in VOC folder images: 22754d115fcc574462a81b9c0d253f0e

Where in the xml is also no information about the original task or original file name.

My first question is there a way to do this? hashing key or something like that?
Is this done on purpose? If yes why?

Wouldnt it also be more logical to just use the original file name and extension?

COCO export error

Hi,

I'm getting the following error on exporting COCO format in label studio and I believe that label-studio-converters is the reason for this error.

FileNotFoundError("Can't find upload dir: either LS_UPLOAD_DIR or project should be passed to converter")
FileNotFoundError: Can't find upload dir: either LS_UPLOAD_DIR or project should be passed to converter

Actually, I fixed it by adding

upload_dir=self.upload_dir as a parameter in download function for convert_to_coco.

Please correct me if this issue is already opened because I couldn't find one in Issues.

Thanks!

Information lost at export for "visibleWhen" "toName" field

After I updated to v1.2.0, I noticed that the export format has changed.
This [1] is my labeling interface config.

The entity_id field only appears when a text region is selected. This gives the labeler the opportunity to input an entity id or not. The key is in the fact that this field is optional. If it were required, then this would not be an issue (follow below to see why).

Here is a sample of what the export format looked like previously:

{
    ...
    "ner": "Dallas is 7-1-2 in its past 10, and is just two points out of a playoff spot heading into Tuesday night's clash with Carolina.",
    "label": [
      { "start": 0, "end": 6, "text": "Dallas",  "labels": ["Team"] },
      { "start": 117, "end": 125, "text": "Carolina", "labels": ["Team"] }
    ],
    "entity_id": [
      { "start": 117, "end": 125, "text": ["282"] }
    ]
    ...
}

Here is a sample of what it looks like now:

{
    ...
    "ner": "Dallas is 7-1-2 in its past 10, and is just two points out of a playoff spot heading into Tuesday night's clash with Carolina.",
    "label": [
      { "start": 0, "end": 6, "text": "Dallas",  "labels": ["Team"] },
      { "start": 117, "end": 125, "text": "Carolina", "labels": ["Team"] }
    ],
    "entity_id": [ "282" ]
    ...
}

Problem

As you may notice, the only difference is in the entity_id field. At first glance, it might seem like there's no problem, it's simpler now.
However, when you start thinking about how you can link back the entity_id to the label, there's no way of doing it other than using the previously available start and end fields. Now they are no longer there, there's no way to know whether "282" refers to the first or the second label.
This makes it impossible to make use of the additional entity_id labels.

Potential solutions

  1. Revert to old export format
  2. Generate ids for each label and put the same id on the entity id

Note: The problem persists for all export formats - the data for the entity_id field is insufficient, therefore unusable.

[1]

<View style="display: flex;">
  <View style="width: 240px; padding-left: 2em; margin-right: 2em; background: #f1f1f1; border-radius: 3px">
    <Labels name="label" toName="text" choice="multiple">
      <Label value="Team" background="red"/>
      <Label value="Player" background="darkorange"/>
    </Labels>
  </View>
  <View>
    <View style="overflow-y: auto">
      <Text name="text" value="$ner" saveTextResult="yes" granularity="symbol"/>
    </View>
    <View>
      <View visibleWhen="region-selected">
        <Header value="Entity id"/>
        <TextArea name="entity_id" toName="text" perRegion="true" maxSubmissions="1"/>
      </View>
    </View>
  </View>
</View>

YOLO Export: rotated rectangle annotationes are deformed for non quadratic images

A detailed description with screenshots can be found in the label-studio issue HumanSignal/label-studio#2293

For example the following label-studio annotation

"annotations": [
	{
		...
		"result": [
			{
				"original_width": 1000,
				"original_height": 500,
				...
				"value": {
					"x": 20,
					"y": 20,
					"width": 30,
					"height": 20,
					"rotation": 90,
					"rectanglelabels": [
						"EVS"
					]
				},
				...
			},
		...

Will be exported as
<EVS> 0.1 0.35 0.2 0.3
But correct would be
<EVS> 0.15 0.5 0.1 0.6

The problem is the non-quadratic aspect ratio of the original-image. The rotation is done without first correcting the aspect.

The corresponding code can be found here:
https://github.com/heartexlabs/label-studio-converter/blob/598420012b5cb6e9cd5283e62887ff33af36d0bb/label_studio_converter/converter.py#L649

A possible fix would be to first correct the aspect and then do the rotation and afterwards return to the percentage scale:

    if abs(label_r) > 0:
        r = math.pi * label_r / 180
        label_x = label_x * original_width
        label_y = label_y * original_height
        label_w = label_w * original_width
        label_h = label_h * original_height
        sin_r = math.sin(r)
        cos_r = math.cos(r)
        h_sin_r, h_cos_r = label_h * sin_r, label_h * cos_r
        x_top_right = label_x + label_w * cos_r
        y_top_right = label_y + label_w * sin_r

        x_ls = [
            label_x,
            x_top_right,
            x_top_right - h_sin_r,
            label_x - h_sin_r,
        ]
        y_ls = [
            label_y,
            y_top_right,
            y_top_right + h_cos_r,
            label_y + h_cos_r,
        ]
        label_x = max(0, min(x_ls)) / original_width
        label_y = max(0, min(y_ls)) / original_height
        label_w = min(100*original_width, max(x_ls)) / original_width - label_x
        label_h = min(100*original_height, max(y_ls)) / original_height - label_y

    x = (label_x + label_w / 2) / 100 
    y = (label_y + label_h / 2) / 100
    w = label_w / 100 
    h = label_h / 100

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.