GithubHelp home page GithubHelp logo

sharispe / slib Goto Github PK

View Code? Open in Web Editor NEW
80.0 13.0 39.0 16.77 MB

`Slib` is a JAVA library dedicated to semantic data mining based on texts and/or ontology processing. The library is composed of various modules dedicated to specific treatments - they can be used in the context of information retrieval, data analysis, recommendation system design... The Semantic Measures Library (SML) is a sub-project of the Slib.

Home Page: http://www.semantic-measures-library.org

Java 98.82% R 0.87% Python 0.31%

slib's Introduction

Slib

Slib is a JAVA library dedicated to semantic data mining based on texts and/or ontology processing. The library is composed of various modules dedicated to specific treatments - they can be used in the context of information retrieval, data analysis, recommendation system design...

  • slib-sml, the module dedicated to The Semantic Measures Library (SML), a library dedicated to semantic measures (similarity/relatedness) computation, evaluation and analysis. See dedicated web site: https://factory.euromov.eu/sml for more information on both semantic measures and this module.

  • slib-graph a simple in-memory graph engine used to manipulate graphs of URIs - based on on the sesame library (RDF/OWL data loading...). This module provides an easy way to process semantic graphs (e.g. RDF graph) as graphs in which traversal can easily be performed. Numerous algorithms commonly used to process semantic graphs are implemented.

  • slib-tools various command-line tools performing processes on semantic graph/data

    • SML-Toolkit, a command-line tool dedicated to semantic similarity/relatedness computation.
    • Ontofocus, a command-line tool which can be used to perform efficient transitive reductions on potentially large taxonomies.
  • slib-utils SLIB Utility classes

  • slib-indexer Module which provides easy-to-use utility classes for indexing specific datasets.

  • slib-example Source code examples

Licence

Cecill license: a free software license adapted to both international and French legal matters, in the spirit of and retaining compatibility with the GNU General Public License (src: Wikipedia).

slib's People

Contributors

alexhenrie avatar matevarga avatar pierrejean avatar sharispe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slib's Issues

Maven

Greetings,
i am trying to download using maven as mentioned in the website:

How to use the library?
The Semantic Measures Library can be used from the provided packaged library or from the sources. In both cases, you need to download/retrieve the library (+sources) into your IDE. To do so we encourage you to use Maven. The dependency to include in your pom.xml is (Replace CURRENT_VERSION by the version number of the latest release):

 <dependency>
 	<groupId>com.github.sharispe</groupId>
 	<artifactId>slib-sml</artifactId>
 	<version>CURRENT_VERSION</version>
 </dependency>

So using Maven the latest version of the library as mentioned in the website is 0.9.4

 <dependency>
	<groupId>com.github.sharispe</groupId>
 	<artifactId>slib-sml</artifactId>
 	<version>0.9.4</version>
 </dependency>

It is not working and Maven is saying: "Missing artifact com.github.sharispe:slib-sml:jar:0.9.4", what am i doing wrong?

  • note: although i prefer to use the Maven (since it automatically download the source code and the java doc), i downloaded the library from:

Semantic Measures Library
Stable version
Current stable release, version 0.9.4 - 31/01/17
You can download the latest version of the Semantic Measures Library in a single jar.
show change log - show release history

But i could not find where to download the Javadoc and the source code
Kindly, would you please write the dependency which i need to add using Maven.

New measure of Pirró & Seco

Hello Sébastien:

I would like to integrate Pirró & Seco semantic similarity described in these two papers:

Pirró, G., & Seco, N. (2008). Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content. In R. Meersman & Z. Tari (Eds.), On the Move to Meaningful Internet Systems: OTM 2008 (Vol. 5332, pp. 1271–1288). Springer Berlin Heidelberg.

Regards,
Juan

Building maven

Hi,
I was testing our framework with the .jar file available on the website, however it seems to be an old version. Therefore, i tried to built a new file using maven, but i got a missing package:

[INFO] ------------------------------------------------------------------------
[INFO] Building slib-sml
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 13 resources
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:

  1. com.github.sharispe:slib-graph:pom:0.10-SNAPSHOT
    Path to dependency:
    1. com.github.sharispe:slib-sml:jar:0.10-SNAPSHOT
    2. com.github.sharispe:slib-graph:pom:0.10-SNAPSHOT

1 required artifact is missing.

for artifact:
com.github.sharispe:slib-sml:jar:0.10-SNAPSHOT

from the specified remote repositories:
central (https://repo1.maven.org/maven2)

[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13 seconds
[INFO] Finished at: Fri Oct 07 11:37:37 CEST 2016

Could you please fix it? Thanks so much!
PS: Congrats for the good work :D

[SML-Toolkit] Error when using virtual root flag __FICTIVE__

[Error] Not a valid (absolute) URI: FICTIVE

java.lang.IllegalArgumentException: Not a valid (absolute) URI: FICTIVE
at org.openrdf.model.impl.URIImpl.setURIString(URIImpl.java:68)
at org.openrdf.model.impl.URIImpl.(URIImpl.java:57)
at org.openrdf.sail.memory.model.MemValueFactory.createURI(MemValueFactory.java:345)
at slib.sglib.model.impl.repo.URIFactoryMemory.createURI(URIFactoryMemory.java:117)
at slib.sglib.algo.graph.utils.GraphActionExecutor.rerooting(GraphActionExecutor.java:322)
at slib.sglib.algo.graph.utils.GraphActionExecutor.applyAction(GraphActionExecutor.java:56)
at slib.sglib.algo.graph.utils.GraphActionExecutor.applyActions(GraphActionExecutor.java:440)
at slib.sglib.io.loader.GraphLoaderGeneric.load(GraphLoaderGeneric.java:159)
at slib.sglib.io.loader.GraphLoaderGeneric.load(GraphLoaderGeneric.java:134)
at slib.sglib.io.loader.GraphLoaderGeneric.load(GraphLoaderGeneric.java:173)
at slib.tools.smltoolkit.sm.cli.SmCli.execute(SmCli.java:138)
at slib.tools.smltoolkit.sm.cli.SmCli.execute(SmCli.java:117)
at slib.tools.smltoolkit.SmlToolKitCli.launch(SmlToolKitCli.java:132)
at slib.tools.smltoolkit.SmlToolKitCli.processArgs(SmlToolKitCli.java:105)
at slib.tools.module.CmdHandler.(CmdHandler.java:105)
at slib.tools.smltoolkit.SmlToolKitCli.(SmlToolKitCli.java:73)
at slib.tools.smltoolkit.SmlToolKitCli.main(SmlToolKitCli.java:205)

[SML] MICA search error

The Most Informative Common Ancestor of two concepts is not the one expected when evaluated concepts are similar and the IC (theta function) doesn't ensure ic(x) > ic(y) with y in A(x), i.e. ancestors of x.
IC must monotonically decrease from the leaves to the root of the ontology.
However, some IC functions, e.g. the one of Resnik which is based on concept frequency, only ensure ic(x) >= ic(y) with y in A(x). In this case the current approach returns the concept maximizing the selected function in the intersection of A(x) inter A(y) as MICA. Therefore, in some specific case the approach doesn't return x as MICA(x,x).
Note that MICA IC calculus is correct.

Add classical set-based measures

Add classical set-based measures to both the library and the toolkit, e.g. Tversky ratio model and some specific instantiations such as Jaccard, Dice. These measures have to be available from the toolkit XML interface

[SML] Vertice reduction action

The reduction of a taxonomic graph is composed by two basic steps:

  • 1 Reduction of the taxonomic graph induced by the given class
  • 2 Addition of the vertices linked to the resulting graph.

However, it could be problematic to add a vertex of type CLASS during step two.
Indeed, the added vertex will not be linked to the taxonomic graph by a taxonomic relationship and will therefore be isolated i.e. the taxonomic graph will not be connexe anymore.

In order to avoid such problems it could be interesting to avoid addition of vertices types as CLASS during extension phase (step 2)

Rerooting Error

The rerooting action doesn't perform the treatment associated to it's documentation.
The vertices who are subsumed by the specified root are not removed anymore...

see enhancement #16

Change date in Knappe measure

Thanks to Alex Henrie.

The following citation is given under both direct groupwise measures and a set-based measures:

Knappe R, Bulskov H, Andreasen T: Perspectives on ontology-based querying. International Journal of Intelligent Systems 2004, 22:739-761.

I searched for this paper, but it looks like it was published in 2007, not 2004: http://onlinelibrary.wiley.com/doi/10.1002/int.20226/abstract

Unless I am mistaken, the citation should be corrected, and the flag should be renamed to SIM_FRAMEWORK_DAG_SET_KNAPPE_2007.

TODO

  • Change associated labels in the library and the toolkit
  • update the documentation

Enable tuning of relationships to consider in SMengine

The aim of this enhancement is to be able to tune the relationship to consider when creating an instance of SM_Engine. Currently SM_Engine only consider rdfs:subClassOf relationships for computing semantic measures. However, it could be nice to load the engine by defining another transitive relationship, e.g part-of relationship and therefore be able to use a measure by considering this specific type of relationship. It could also be interesting to enable the consideration of multiple transitive relationships - but this makes the development and tuning more complex since the expected way to handle paths composed of relationships of different types must also be specified.

Use namespaces when processing input queries

Currently it is not possible to use namespaces in query definition, even if the namespace has been specified into the configuration. It could ne nice to define queries using defined namespaces, e.g.
hpo:XXXX go:YYYYY...

Move SmCLI core functionalities

Move Concept to Concept / Entity to Entity Computation form the CLI, those functionalities must be in sml package not sml-toolkit

Split Rooting Action to better fit is semantic

The documentation currently says:
The aim of the action is to root the semantic graph according to the given root and rdfs:subClassOf relationships. The action can also be used to reduce the graph according to a considered vertex. Indeed, if a root is specified, a traversal is made starting from it only considering the inverse of rdfs:subClassOf edge direction. All vertices reached during the traversal are included in the reduction; others which are typed as CLASS are removed from the graph. Edges associated to vertices which have been removed are also deleted.

TODO - Add information to the documentation, are the instances considered during the reduction process, the answer must be yes.

The rooting process must:
(1) If no root is specify, Root the current roots if multiple root exists, do nothing if a single root already exists.
(2) Reduce the graph according to a specific vertex to consider as root.

I suggest to split the treatment to 2 different Actions:
process (1) is clearly a rerooting treatment
process (2) must be added to the reduction action, i.e. the process performed is a more than rerooting...

SMComputationOWL.java exception

Hi, when running the example for this https://sites.google.com/site/portdial2/downloads-area/Travel-Domain.owl and other owl/rdf ontologies I got the following exception

slib.utils.ex.SLIB_Ex_Critic: Content is not allowed in trailing section. [line 1476, column 1]
at slib.graph.io.loader.rdf.RDFLoader.load(RDFLoader.java:151)
at slib.graph.io.loader.rdf.RDFLoader.populate(RDFLoader.java:98)
at slib.graph.io.loader.GraphLoaderGeneric.populate(GraphLoaderGeneric.java:99)
at slib.graph.io.loader.GraphLoaderGeneric.load(GraphLoaderGeneric.java:154)

I also tried with different ontologies. The result is always the same.

toolkit cannot be packaged on Windows

JARs with filenames that contain timestamps in the current format cannot be used on Windows, therefore the packaging step fails. I'll submit a pull request for this soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.