GithubHelp home page GithubHelp logo

getalp / ufsac Goto Github PK

View Code? Open in Web Editor NEW
37.0 37.0 4.0 253.04 MB

UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them

License: MIT License

Java 86.71% Makefile 8.51% Roff 1.62% Shell 0.34% Prolog 2.81%

ufsac's People

Contributors

glicerico avatar loic-vial avatar schwabdidier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ufsac's Issues

sense key in WNGT is not correct for adj

For example, the correct sense key for 'emergent.a.02' is 'emergent%5:00:00:nascent:00', but it is labeled as 'emergent%3:00:00:nascent:00'. There are hundreds of them which are mislabeled in terms of '3' or '5' for adj.

wn30_key="emergent%3:00:00:nascent:00;emerging%3:00:00:nascent:00"

Agreement with NLTK WordNet

Hi,

Thank you very much for your work!

I have a question with regard to the WNGT dataset keys and how they correspond to the keys from the NLTK WordNet corpus. The WN version from NLTK is 3.0:

import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
print(wn.get_version())
3.0

According to the ufsac-public-2.1/wngt.xml file, the entry
<sentence wn30_key="drive%1:04:03::;driving%1:04:03::" >
for example, has associated with it the definition "hitting a golf_ball off of a tee with a driver".
However, using one of these keys to obtain the synset and associated definition from the NLTK WordNet corpus gives:

print(wn.synset_from_sense_key("drive%1:04:03::").definition())
a series of actions advancing a principle or tending toward a particular end

It seems that some of the keys from NLTK's WordNet and the wngt.xml don't point to the same gloss definition even though both WordNet versions are 3.0. Any thoughts?

Convert to Raganato not generating keys

Hi,

I'm interested in using your scripts to convert MASC to the format used in Raganato's framework, but it seems there some issue to be resolved.

I'm running the command:
sh UFSAC/scripts/convert_to_raganato.sh --input masc.xml --output masc_converted.xml

This generates two files, as expected:

  • masc_converted.xml.data.xml
  • masc_converted.xml.gold.key.txt

But the key file is empty, and it doesn't look like the data file contains any key references.

Do you think this can be solved?

Your work in converting all this corpora into the same format, and all mapped to WN3.0, is a much appreciated effort btw!

Thanks,
Daniel

No license or copyright statement

The Java code and scripts in this repository have no license or even a copyright statement that I could find. Can you please clarify the terms of reuse with a license?

Issues with OMSTI sensekeys and POS tags

First of all, thank you for your work!
I noticed a few issues at occuring systematically at least within the OMSTI corpus. All the WN sense keys with POS 'adjective satellite' (hence of the form word%5...) always appear as word%3, as if they were simple adjectives. You always find typical%3:00:00:normal:01 and never the correct typical%5:00:00:normal:01 (I double checked the dict/index.sense file in WN 3.0).
It confuses me a little that WN 3.1 online (see the third sense of typical http://wordnetweb.princeton.edu/perl/webwn?c=6&sub=Change&o2=&o0=1&o8=1&o1=1&o7=1&o5=&o9=&o6=&o3=&o4=&i=-1&h=000&s=typical) also uses %3 in place of %5. Also some of the POS Tags, such as 'n', 'v', 'a', ''' and '''' are not valid PTB.

Adverbs with sensekeys in MASC

Hi,

I've noticed that the masc.xml file in UFSAC 2.1 contains adverbs annotated with WN sense keys. I wasn't expecting this, as the UFSAC paper (Tab. 1), and MASC's documentation, report that no adverbs are annotated in that corpus.

For example, line 316 of masc.xml:
<word surface_form="here" lemma="here" pos="RB" wn30_key="here%4:02:00::;here%4:02:01::;here%4:02:02::" />

I find a total of 11,675 RBs with sense annotations. Can you tell us where they come from? Are these automatically assigned?

Thanks,
Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.