getalp / ufsac Goto Github PK
View Code? Open in Web Editor NEWUFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them
License: MIT License
UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them
License: MIT License
Maven recently updated their communication security protocols (see https://stackoverflow.com/questions/59763531/maven-dependencies-are-failing-with-501-error/59763928#59763928).
Compilation of the java API is broken as a result
For example, the correct sense key for 'emergent.a.02' is 'emergent%5:00:00:nascent:00', but it is labeled as 'emergent%3:00:00:nascent:00'. There are hundreds of them which are mislabeled in terms of '3' or '5' for adj.
wn30_key="emergent%3:00:00:nascent:00;emerging%3:00:00:nascent:00"
Hi,
Thank you very much for your work!
I have a question with regard to the WNGT dataset keys and how they correspond to the keys from the NLTK WordNet corpus. The WN version from NLTK is 3.0:
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
print(wn.get_version())
3.0
According to the ufsac-public-2.1/wngt.xml
file, the entry
<sentence wn30_key="drive%1:04:03::;driving%1:04:03::" >
for example, has associated with it the definition "hitting a golf_ball off of a tee with a driver".
However, using one of these keys to obtain the synset and associated definition from the NLTK WordNet corpus gives:
print(wn.synset_from_sense_key("drive%1:04:03::").definition())
a series of actions advancing a principle or tending toward a particular end
It seems that some of the keys from NLTK's WordNet and the wngt.xml don't point to the same gloss definition even though both WordNet versions are 3.0. Any thoughts?
Hi,
I'm interested in using your scripts to convert MASC to the format used in Raganato's framework, but it seems there some issue to be resolved.
I'm running the command:
sh UFSAC/scripts/convert_to_raganato.sh --input masc.xml --output masc_converted.xml
This generates two files, as expected:
But the key file is empty, and it doesn't look like the data file contains any key references.
Do you think this can be solved?
Your work in converting all this corpora into the same format, and all mapped to WN3.0, is a much appreciated effort btw!
Thanks,
Daniel
The Java code and scripts in this repository have no license or even a copyright statement that I could find. Can you please clarify the terms of reuse with a license?
First of all, thank you for your work!
I noticed a few issues at occuring systematically at least within the OMSTI corpus. All the WN sense keys with POS 'adjective satellite' (hence of the form word%5...) always appear as word%3, as if they were simple adjectives. You always find typical%3:00:00:normal:01 and never the correct typical%5:00:00:normal:01 (I double checked the dict/index.sense file in WN 3.0).
It confuses me a little that WN 3.1 online (see the third sense of typical http://wordnetweb.princeton.edu/perl/webwn?c=6&sub=Change&o2=&o0=1&o8=1&o1=1&o7=1&o5=&o9=&o6=&o3=&o4=&i=-1&h=000&s=typical) also uses %3 in place of %5. Also some of the POS Tags, such as 'n', 'v', 'a', ''' and '''' are not valid PTB.
Hi,
I've noticed that the masc.xml file in UFSAC 2.1 contains adverbs annotated with WN sense keys. I wasn't expecting this, as the UFSAC paper (Tab. 1), and MASC's documentation, report that no adverbs are annotated in that corpus.
For example, line 316 of masc.xml:
<word surface_form="here" lemma="here" pos="RB" wn30_key="here%4:02:00::;here%4:02:01::;here%4:02:02::" />
I find a total of 11,675 RBs with sense annotations. Can you tell us where they come from? Are these automatically assigned?
Thanks,
Daniel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.