getalp / ufsac Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 4.0 253.04 MB

UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them

License: MIT License

Java 86.71% Makefile 8.51% Roff 1.62% Shell 0.34% Prolog 2.81%

ufsac's People

Contributors

Stargazers

Watchers

Forkers

yhgon bfsujason glicerico loic-vial

ufsac's Issues

Compilation is broken

Maven recently updated their communication security protocols (see https://stackoverflow.com/questions/59763531/maven-dependencies-are-failing-with-501-error/59763928#59763928).
Compilation of the java API is broken as a result

sense key in WNGT is not correct for adj

For example, the correct sense key for 'emergent.a.02' is 'emergent%5:00:00:nascent:00', but it is labeled as 'emergent%3:00:00:nascent:00'. There are hundreds of them which are mislabeled in terms of '3' or '5' for adj.

wn30_key="emergent%3:00:00:nascent:00;emerging%3:00:00:nascent:00"

The provided google drive links dont seem to be working...

Agreement with NLTK WordNet

Hi,

Thank you very much for your work!

I have a question with regard to the WNGT dataset keys and how they correspond to the keys from the NLTK WordNet corpus. The WN version from NLTK is 3.0:

import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
print(wn.get_version())
3.0

According to the ufsac-public-2.1/wngt.xml file, the entry
<sentence wn30_key="drive%1:04:03::;driving%1:04:03::" >
for example, has associated with it the definition "hitting a golf_ball off of a tee with a driver".
However, using one of these keys to obtain the synset and associated definition from the NLTK WordNet corpus gives:

print(wn.synset_from_sense_key("drive%1:04:03::").definition())
a series of actions advancing a principle or tending toward a particular end

It seems that some of the keys from NLTK's WordNet and the wngt.xml don't point to the same gloss definition even though both WordNet versions are 3.0. Any thoughts?

Convert to Raganato not generating keys

Hi,

I'm interested in using your scripts to convert MASC to the format used in Raganato's framework, but it seems there some issue to be resolved.

I'm running the command:
sh UFSAC/scripts/convert_to_raganato.sh --input masc.xml --output masc_converted.xml

This generates two files, as expected:

masc_converted.xml.data.xml
masc_converted.xml.gold.key.txt

But the key file is empty, and it doesn't look like the data file contains any key references.

Do you think this can be solved?

Your work in converting all this corpora into the same format, and all mapped to WN3.0, is a much appreciated effort btw!

Thanks,
Daniel

No license or copyright statement

The Java code and scripts in this repository have no license or even a copyright statement that I could find. Can you please clarify the terms of reuse with a license?

Issues with OMSTI sensekeys and POS tags

First of all, thank you for your work!
I noticed a few issues at occuring systematically at least within the OMSTI corpus. All the WN sense keys with POS 'adjective satellite' (hence of the form word%5...) always appear as word%3, as if they were simple adjectives. You always find typical%3:00:00:normal:01 and never the correct typical%5:00:00:normal:01 (I double checked the dict/index.sense file in WN 3.0).
It confuses me a little that WN 3.1 online (see the third sense of typical http://wordnetweb.princeton.edu/perl/webwn?c=6&sub=Change&o2=&o0=1&o8=1&o1=1&o7=1&o5=&o9=&o6=&o3=&o4=&i=-1&h=000&s=typical) also uses %3 in place of %5. Also some of the POS Tags, such as 'n', 'v', 'a', ''' and '''' are not valid PTB.

Adverbs with sensekeys in MASC

Hi,

I've noticed that the masc.xml file in UFSAC 2.1 contains adverbs annotated with WN sense keys. I wasn't expecting this, as the UFSAC paper (Tab. 1), and MASC's documentation, report that no adverbs are annotated in that corpus.

For example, line 316 of masc.xml:
<word surface_form="here" lemma="here" pos="RB" wn30_key="here%4:02:00::;here%4:02:01::;here%4:02:02::" />

I find a total of 11,675 RBs with sense annotations. Can you tell us where they come from? Are these automatically assigned?

Thanks,
Daniel

getalp / ufsac Goto Github PK

ufsac's People

Contributors

Stargazers

Watchers

Forkers

ufsac's Issues

Compilation is broken

sense key in WNGT is not correct for adj

The provided google drive links dont seem to be working...

Agreement with NLTK WordNet

Convert to Raganato not generating keys

No license or copyright statement

Issues with OMSTI sensekeys and POS tags

Adverbs with sensekeys in MASC

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs