hipe-eval / hipe-2022-data Goto Github PK
View Code? Open in Web Editor NEWData for the HIPE 2022 shared task.
License: Other
Data for the HIPE 2022 shared task.
License: Other
Encountered issue while NEL data processing in file HIPE-2022-v2.1-newseye-dev-de.tsv
:
lines 32928-32931
Haa¬ B-PER _ O _ _ B-ORG Q1405350 _ NoSpaceAfter|EndOfLine
senstein I-PER _ O _ _ I-ORG Q56322697 _ _
& O _ O _ _ I-ORG Q56322697 _ _
Vogler O _ O _ _ I-ORG Q56322697 _ _
The Qid covers the correct entity of type ORG. "Haasenstein & Vogler" is considered to be nested entity, that has a person type included. From the definition of nested entities, the smaller entity should be the one nested, thus:
Haa¬ B-ORG _ O _ _ B-PER Q1405350 _ NoSpaceAfter|EndOfLine
senstein I-ORG _ O _ _ I-PER Q56322697 _ _
& I-ORG _ O _ _ O Q56322697 _ _
Vogler I-ORG _ O _ _ O Q56322697 _ _
Later on, in the same file (lines 33541-33543), the correct annotation is used:
Haasenstein B-ORG _ O _ _ B-PER Q56322697 _ _
& I-ORG _ O _ _ O Q56322697 _ _
Vogler I-ORG _ O _ _ O Q56322697 _ _
Hi,
Many thanks for the repository. I am looking for the wikidata dump file you used for the annotation. Is it the same as in HIPE 2020 (https://files.ifi.uzh.ch/cl/siclemat/impresso/clef-hipe-2020/)?
Wondered if there is a link to wikidata file (or entity catalogue) with version specified?
Is the same wikidata dump version used for HIPE2020
and topRes19th
?
Best regards,
A
Hi,
during review of adding HIPE-2022 dataset into Flair, we just found that some of the listed entites do not exist in the actual dataset.
These entities are: ALIEN
, OTHER
, FICTION
.
Could you please clarify what happened to these entites? Will they be added later (or will they appear in the final test dataset).
Many thanks,
Stefan
Hello!
I've noticed a possible missing entity type in COARSE-METO in HIPE-2022-v2.1-hipe2020-train-fr.tsv
, where M. Théodore Reinach
should (possibly) be a pers.ind (line 2,141-2,150):
M O O O O B-comp.title O _ _ NoSpaceAfter
. O O O O I-comp.title O _ _ _
Théodore O O O O B-comp.name O _ _ _
Reinach O O O O I-comp.name O _ _ NoSpaceAfter
, O O O O O O _ _ _
député O O O O B-comp.function O _ _ _
radical O O O O I-comp.function O _ _ _
de O O O O I-comp.function O _ _ _
la O O O O I-comp.function O _ _ EndOfLine
Savoie B-loc O B-loc.adm.reg O I-comp.function O Q12745 _ NoSpaceAfter
Due to several evaluation processes on my side, I'll be checking more in depth other annotated files also, and open an issue for each (if any).
Masked test files are sometimes called ...-test-allmasked-...
(e.g. in ajmc
) and sometimes ...-test_allmasked-...
(notice the underscore). This should be harmonized.
Hi,
I've just written some testcases for reading the v2.0 version of the corpus in Flair, and it seems that there are some issues for AJMC:
ἄνδοα
starts with a leading whitespace (very minor issue). Leading spaces also appear in other AJMC splits.περάνας sa
Would be awesome if this could be fixed in the next release(s), I'm going to catch these issues in Flair for now :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.