GithubHelp home page GithubHelp logo

Comments (9)

MSherif avatar MSherif commented on September 27, 2024 1

Min(m1, m2) Computes the intersection of the two mappings m1 and m2. In case an entry (i.e., link) exists in both mappings the minimal similarity is taken.

Max(m1, m2) Computes the union of the two mappings m1 and m2. In case an entry (i.e., link) exists in both mappings the maximal similarity is taken.

MINUS(m1, m2) Computes the difference of two mappings. i.e. the set difference m1 - m2

from limes.

KonradHoeffner avatar KonradHoeffner commented on September 27, 2024

Thanks to @MSherif I had partial success with MINUS(TRIGRAMS(c1.label,c2.label)0.5,EXACTMATCH(c1.x,c2.y)|1) however that still contains duplicates and it seems like those cannot be removed with limes as there is no "less than" operator.

from limes.

MSherif avatar MSherif commented on September 27, 2024

We added the new lessThan String measure. Please test it and close the issue if it is OK.

from limes.

KonradHoeffner avatar KonradHoeffner commented on September 27, 2024

Unfortunately it doesn't seem to work for me. Did I make a mistake with the combined metric? I don't really understand the documentation on what exactly MINUS, MAX and LESS_THAN output.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE LIMES SYSTEM "limes.dtd">
<LIMES>
	<PREFIX>
		<NAMESPACE>http://hitontology.eu/ontology/</NAMESPACE>
		<LABEL>hito</LABEL>
	</PREFIX>
	<PREFIX>
		<NAMESPACE>http://www.w3.org/1999/02/22-rdf-syntax-ns#</NAMESPACE>
		<LABEL>rdf</LABEL>
	</PREFIX>
	<PREFIX>
		<NAMESPACE>http://www.w3.org/2000/01/rdf-schema#</NAMESPACE>
		<LABEL>rdfs</LABEL>
	</PREFIX>
	<PREFIX>
		<NAMESPACE>http://www.w3.org/2002/07/owl#</NAMESPACE>
		<LABEL>owl</LABEL>
	</PREFIX>
	<PREFIX>
		<NAMESPACE>http://www.w3.org/2004/02/skos/core#</NAMESPACE>
		<LABEL>skos</LABEL>
	</PREFIX>
	
	<SOURCE>
		<ID>c1</ID>
		<ENDPOINT>https://hitontology.eu/sparql</ENDPOINT>
		<VAR>?c1</VAR>
		<PAGESIZE>-1</PAGESIZE>
		<RESTRICTION>?c1 a hito:FeatureClassified</RESTRICTION>
		<PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
		<PROPERTY>hito:featureCatalogue RENAME cat</PROPERTY>
		<OPTIONAL_PROPERTY>rdfs:comment AS nolang->lowercase->regularalphabet RENAME comment</OPTIONAL_PROPERTY>
		<TYPE>SPARQL</TYPE>
	</SOURCE>

	<TARGET>
		<ID></ID>
		<ENDPOINT>https://hitontology.eu/sparql</ENDPOINT>
		<VAR>?c2</VAR>
		<PAGESIZE>-1</PAGESIZE>
		<RESTRICTION>?c2 a hito:FeatureClassified</RESTRICTION>
		<PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
		<PROPERTY>hito:featureCatalogue RENAME cat</PROPERTY>
		<OPTIONAL_PROPERTY>rdfs:comment AS nolang->lowercase->regularalphabet RENAME comment</OPTIONAL_PROPERTY>
		<TYPE>SPARQL</TYPE>
	</TARGET>

<METRIC>MINUS(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC>

	<ACCEPTANCE>
		<THRESHOLD>1</THRESHOLD>
		<FILE>catalogue-exact.ttl</FILE>
		<RELATION>skos:closeMatch</RELATION>
	</ACCEPTANCE>
	
	<REVIEW>
		<THRESHOLD>0.5</THRESHOLD>
		<FILE>catalogue-close.ttl</FILE>
		<RELATION>skos:closeMatch</RELATION>
	</REVIEW>

	<EXECUTION>
		<REWRITER>default</REWRITER>
		<PLANNER>default</PLANNER>
		<ENGINE>default</ENGINE>
	</EXECUTION>

	<OUTPUT>CSV</OUTPUT>
</LIMES>

Despite saying that c1.cat should be less than c2.cat, the resulting catalogue-close.ttl still contains symmetric pairs:

<http://hitontology.eu/ontology/WhoDhiSelfMonitoringOfHealthOrDiagnosticDataByClient>   <http://hitontology.eu/ontology/WhoDhiRemoteMonitoringOfClientHealthOrDiagnosticDataByProvider> 0.618421052631579
<http://hitontology.eu/ontology/WhoDhiNonRoutineDataCollectionAndManagement>    <http://hitontology.eu/ontology/WhoDhiRoutineHealthIndicatorDataCollectionAndManagement>    0.6129032258064516
<http://hitontology.eu/ontology/WhoDhiManageCertificationregistrationOfHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiMapLocationOfHealthcareProviders> 0.5245901639344263
<http://hitontology.eu/ontology/WhoDhiMapLocationOfHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiManageCertificationregistrationOfHealthcareProviders> 0.5245901639344263
<http://hitontology.eu/ontology/WhoDhiRemoteMonitoringOfClientHealthOrDiagnosticDataByProvider> <http://hitontology.eu/ontology/WhoDhiSelfMonitoringOfHealthOrDiagnosticDataByClient>   0.618421052631579
<http://hitontology.eu/ontology/WhoDhiTransmitNonroutineHealthEventAlertsToHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiTransmitRoutinePayrollPaymentToHealthcareProviders>   0.5540540540540541
<http://hitontology.eu/ontology/WhoDhiTransmitRoutinePayrollPaymentToHealthcareProviders>   <http://hitontology.eu/ontology/WhoDhiTransmitNonroutineHealthEventAlertsToHealthcareProviders> 0.5540540540540541
<http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToHealthcareProviders>  <http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToClientsForHealthServices> 0.52
<http://hitontology.eu/ontology/WhoDhiRoutineHealthIndicatorDataCollectionAndManagement>    <http://hitontology.eu/ontology/WhoDhiNonRoutineDataCollectionAndManagement>    0.6129032258064516
<http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToClientsForHealthServices> <http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToHealthcareProviders>  0.52

from limes.

MSherif avatar MSherif commented on September 27, 2024

Plz try <METRIC>MIN(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC>

from limes.

KonradHoeffner avatar KonradHoeffner commented on September 27, 2024

Thank you for the detailed explanation, this is extremely helpful! Could you add this to the official documentation at http://dice-group.github.io/LIMES/#/user_manual/configuration_file/defining_link_specifications?id=boolean-operations? I know what minimum, maximum and set difference are but the interaction with the thresholds was not clear to me. However what I still don't know is: What is the similarity score output of the MINUS operator? The ones from the first parameter? And what if something is below the threshold?

from limes.

KonradHoeffner avatar KonradHoeffner commented on September 27, 2024

Unfortunately, <METRIC>MIN(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC> does not do the trick. If I replace this in the full specification given above (you can run it yourself to verify if you want), it gives a bunch of identical results:

<http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness>   <http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness>   1.0
<http://hitontology.eu/ontology/EhrSfmSupportForResearchProtocolsRelativeToIndividualPatientCare>   <http://hitontology.eu/ontology/EhrSfmSupportForResearchProtocolsRelativeToIndividualPatientCare>   1.0
<http://hitontology.eu/ontology/BbDisplayVitalParametersFromMonitoringDevices>  <http://hitontology.eu/ontology/BbDisplayVitalParametersFromMonitoringDevices>  1.0
<http://hitontology.eu/ontology/WhoDhiTargetedClientCommunication>  <http://hitontology.eu/ontology/WhoDhiTargetedClientCommunication>  1.0
[...]

However this should not be possible, because for example http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness only has one catalogue, and this cannot be smaller than itself, as specified in LESS_THAN(c1.cat,c2.cat).

Output of LIMES

$ limes test-sparql.xml                   
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
09:13:15.813 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:125 - Checking for file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.821 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:128 - Found cached data. Loading data from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.859 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:134 - Cached data loaded successfully from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.860 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:135 - Size = 618
09:13:15.860 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:125 - Checking for file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.860 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:128 - Found cached data. Loading data from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.873 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:134 - Cached data loaded successfully from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.874 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:135 - Size = 618
09:13:16.205 [main] [] WARN  org.apache.sis.system:228 - The “SIS_DATA” environment variable is not set.
09:13:17.171 [main] [] INFO  org.aksw.limes.core.controller.Controller:237 - Mapping task finished in 1218 ms
09:13:17.175 [main] [] INFO  org.aksw.limes.core.controller.Controller:241 - Mapping size: 620 (accepted) + 1520 (need verification) = 2140 (total)
09:13:17.176 [main] [] INFO  org.aksw.limes.core.controller.Controller:108 - Writing result files...
09:13:17.176 [main] [] INFO  org.aksw.limes.core.io.serializer.SerializerFactory:32 - Getting serializer with name CSV
09:13:17.199 [main] [] INFO  org.aksw.limes.core.controller.Controller:111 - Writing statistics file...

from limes.

MSherif avatar MSherif commented on September 27, 2024

Thank you for the detailed explanation, this is extremely helpful! Could you add this to the official documentation at http://dice-group.github.io/LIMES/#/user_manual/configuration_file/defining_link_specifications?id=boolean-operations? I know what minimum, maximum, and set differences are but the interaction with the thresholds was not clear to me. However what I still don't know is: What is the similarity score output of the MINUS operator? The ones from the first parameter? And what if something is below the threshold?

Actually, the MIN(m1, m2) is the entries (i.e., links) with minimum similarities in both m1 and m2, where nonexisting entries in both m1 and m2 are assumed to have a similarity of 0. Therefore, if one link l only exists in one m1 for instance, then we conceder that m2 contains the same link l with a similarity of 0. Therefore, we do not return l as it would have the minimum similarity of 0. The MAX(m1, m2) has the same semantics.
MINUS(m1,m2) will only return links from m1 with their respective similarities, only in case such links do not exist in m2.

from limes.

MSherif avatar MSherif commented on September 27, 2024

Done updating the LIMES docs

from limes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.