GithubHelp home page GithubHelp logo

gary3321 / coarsewordnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sumitbhagwani/coarsewordnet

0.0 0.0 0.0 136.81 MB

Coarsening WordNet

Python 0.47% MATLAB 0.01% Perl 16.32% PHP 20.79% C 37.12% OCaml 0.10% Shell 1.89% Java 23.30%

coarsewordnet's Introduction

Coarsening WordNet

Abstract:

Currently used general purpose dictionaries are often too fine-grained, with narrow sense divisions that are not relevant for many Natural Language applications. WordNet, which is a widely used sense inventory for Word Sense Disambiguation, has the same problem. With different applications requiring different levels of sense granularities, producing sense clustered inventories of arbitrary sense granularity has evolved as a crucial task. We try to exploit the resources available like human-labelled sense clusterings and semi-automatically generated domain labels of synsets, to estimate the similarity between synsets. Using supervision, we learn a model which predicts the probability of any two senses of a word to be merged. To learn a more generic model, we propose a graph based approach, which allows us to use the information learnt from supervision as well. Using this complete similarity measure, we propose a simple method for clustering synsets.

============= Notes:

  1. A demo of the system is available in Demo.java in applet package.

  2. For more details refer: Merging Word Senses, M.Tech. Thesis - Sumit Bhagwani, Computer Science and Engineering, IIT Kanpur.

=============

Installation Instructions:

  1. The system requires all following additional datasets in /home/USERNAME/Data folder:
  1. The USERNAME needs to be set in the code (StaticValues.java) and in properties files of EXTJWNL and BabelNet.

=============

Code Details:

  1. Data Processing: The data preprocessing codes are available in krsystem.ontology.senseClustering package
  2. Supervised Learning Framework:
    • The SVM framework used for learning similarity metric is available in krsystem.ontology.senseClustering.svm package.
    • The FeatureGenerator.java class collects the features and passes it to learning module.
    • Evaluation.java serves as the main class for the package in which we learn and evaluate our models.
  3. Semi-supervised Learning Framework:
  • The SimRank framework is available in simRank package
  • The Sigmoid Transformation of SVM scores is available in svmPredictionNormalization package
  • connectedComponentAnalysis package uses the similarity metric learnt and performs clustering using Connected Components in the graph.
  • The connectedComponentAnalysis package also includes the code to evaluate the performance of clustering obtained on standard datasets

coarsewordnet's People

Contributors

sumitbhagwani avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.