Comments (9)
Min(m1, m2)
Computes the intersection of the two mappings m1
and m2
. In case an entry (i.e., link) exists in both mappings the minimal similarity is taken.
Max(m1, m2)
Computes the union of the two mappings m1
and m2
. In case an entry (i.e., link) exists in both mappings the maximal similarity is taken.
MINUS(m1, m2)
Computes the difference of two mappings. i.e. the set difference m1 - m2
from limes.
Thanks to @MSherif I had partial success with MINUS(TRIGRAMS(c1.label,c2.label)0.5,EXACTMATCH(c1.x,c2.y)|1)
however that still contains duplicates and it seems like those cannot be removed with limes as there is no "less than" operator.
from limes.
We added the new lessThan
String measure. Please test it and close the issue if it is OK.
from limes.
Unfortunately it doesn't seem to work for me. Did I make a mistake with the combined metric? I don't really understand the documentation on what exactly MINUS, MAX and LESS_THAN output.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE LIMES SYSTEM "limes.dtd">
<LIMES>
<PREFIX>
<NAMESPACE>http://hitontology.eu/ontology/</NAMESPACE>
<LABEL>hito</LABEL>
</PREFIX>
<PREFIX>
<NAMESPACE>http://www.w3.org/1999/02/22-rdf-syntax-ns#</NAMESPACE>
<LABEL>rdf</LABEL>
</PREFIX>
<PREFIX>
<NAMESPACE>http://www.w3.org/2000/01/rdf-schema#</NAMESPACE>
<LABEL>rdfs</LABEL>
</PREFIX>
<PREFIX>
<NAMESPACE>http://www.w3.org/2002/07/owl#</NAMESPACE>
<LABEL>owl</LABEL>
</PREFIX>
<PREFIX>
<NAMESPACE>http://www.w3.org/2004/02/skos/core#</NAMESPACE>
<LABEL>skos</LABEL>
</PREFIX>
<SOURCE>
<ID>c1</ID>
<ENDPOINT>https://hitontology.eu/sparql</ENDPOINT>
<VAR>?c1</VAR>
<PAGESIZE>-1</PAGESIZE>
<RESTRICTION>?c1 a hito:FeatureClassified</RESTRICTION>
<PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
<PROPERTY>hito:featureCatalogue RENAME cat</PROPERTY>
<OPTIONAL_PROPERTY>rdfs:comment AS nolang->lowercase->regularalphabet RENAME comment</OPTIONAL_PROPERTY>
<TYPE>SPARQL</TYPE>
</SOURCE>
<TARGET>
<ID></ID>
<ENDPOINT>https://hitontology.eu/sparql</ENDPOINT>
<VAR>?c2</VAR>
<PAGESIZE>-1</PAGESIZE>
<RESTRICTION>?c2 a hito:FeatureClassified</RESTRICTION>
<PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
<PROPERTY>hito:featureCatalogue RENAME cat</PROPERTY>
<OPTIONAL_PROPERTY>rdfs:comment AS nolang->lowercase->regularalphabet RENAME comment</OPTIONAL_PROPERTY>
<TYPE>SPARQL</TYPE>
</TARGET>
<METRIC>MINUS(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC>
<ACCEPTANCE>
<THRESHOLD>1</THRESHOLD>
<FILE>catalogue-exact.ttl</FILE>
<RELATION>skos:closeMatch</RELATION>
</ACCEPTANCE>
<REVIEW>
<THRESHOLD>0.5</THRESHOLD>
<FILE>catalogue-close.ttl</FILE>
<RELATION>skos:closeMatch</RELATION>
</REVIEW>
<EXECUTION>
<REWRITER>default</REWRITER>
<PLANNER>default</PLANNER>
<ENGINE>default</ENGINE>
</EXECUTION>
<OUTPUT>CSV</OUTPUT>
</LIMES>
Despite saying that c1.cat should be less than c2.cat, the resulting catalogue-close.ttl still contains symmetric pairs:
<http://hitontology.eu/ontology/WhoDhiSelfMonitoringOfHealthOrDiagnosticDataByClient> <http://hitontology.eu/ontology/WhoDhiRemoteMonitoringOfClientHealthOrDiagnosticDataByProvider> 0.618421052631579
<http://hitontology.eu/ontology/WhoDhiNonRoutineDataCollectionAndManagement> <http://hitontology.eu/ontology/WhoDhiRoutineHealthIndicatorDataCollectionAndManagement> 0.6129032258064516
<http://hitontology.eu/ontology/WhoDhiManageCertificationregistrationOfHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiMapLocationOfHealthcareProviders> 0.5245901639344263
<http://hitontology.eu/ontology/WhoDhiMapLocationOfHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiManageCertificationregistrationOfHealthcareProviders> 0.5245901639344263
<http://hitontology.eu/ontology/WhoDhiRemoteMonitoringOfClientHealthOrDiagnosticDataByProvider> <http://hitontology.eu/ontology/WhoDhiSelfMonitoringOfHealthOrDiagnosticDataByClient> 0.618421052631579
<http://hitontology.eu/ontology/WhoDhiTransmitNonroutineHealthEventAlertsToHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiTransmitRoutinePayrollPaymentToHealthcareProviders> 0.5540540540540541
<http://hitontology.eu/ontology/WhoDhiTransmitRoutinePayrollPaymentToHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiTransmitNonroutineHealthEventAlertsToHealthcareProviders> 0.5540540540540541
<http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToHealthcareProviders> <http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToClientsForHealthServices> 0.52
<http://hitontology.eu/ontology/WhoDhiRoutineHealthIndicatorDataCollectionAndManagement> <http://hitontology.eu/ontology/WhoDhiNonRoutineDataCollectionAndManagement> 0.6129032258064516
<http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToClientsForHealthServices> <http://hitontology.eu/ontology/WhoDhiTransmitOrManageIncentivesToHealthcareProviders> 0.52
from limes.
Plz try <METRIC>MIN(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC>
from limes.
Thank you for the detailed explanation, this is extremely helpful! Could you add this to the official documentation at http://dice-group.github.io/LIMES/#/user_manual/configuration_file/defining_link_specifications?id=boolean-operations? I know what minimum, maximum and set difference are but the interaction with the thresholds was not clear to me. However what I still don't know is: What is the similarity score output of the MINUS operator? The ones from the first parameter? And what if something is below the threshold?
from limes.
Unfortunately, <METRIC>MIN(MAX(MAX(TRIGRAMS(c1.label,c2.label),TRIGRAMS(c1.label,c2.comment)),TRIGRAMS(c1.comment,c2.comment))|0.5,LESS_THAN(c1.cat,c2.cat)|1)</METRIC>
does not do the trick. If I replace this in the full specification given above (you can run it yourself to verify if you want), it gives a bunch of identical results:
<http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness> <http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness> 1.0
<http://hitontology.eu/ontology/EhrSfmSupportForResearchProtocolsRelativeToIndividualPatientCare> <http://hitontology.eu/ontology/EhrSfmSupportForResearchProtocolsRelativeToIndividualPatientCare> 1.0
<http://hitontology.eu/ontology/BbDisplayVitalParametersFromMonitoringDevices> <http://hitontology.eu/ontology/BbDisplayVitalParametersFromMonitoringDevices> 1.0
<http://hitontology.eu/ontology/WhoDhiTargetedClientCommunication> <http://hitontology.eu/ontology/WhoDhiTargetedClientCommunication> 1.0
[...]
However this should not be possible, because for example http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness only has one catalogue, and this cannot be smaller than itself, as specified in LESS_THAN(c1.cat,c2.cat)
.
Output of LIMES
$ limes test-sparql.xml
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
09:13:15.813 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:125 - Checking for file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.821 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:128 - Found cached data. Loading data from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.859 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:134 - Cached data loaded successfully from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-836321652.ser
09:13:15.860 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:135 - Size = 618
09:13:15.860 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:125 - Checking for file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.860 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:128 - Found cached data. Loading data from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.873 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:134 - Cached data loaded successfully from file /home/konrad/projekte/hito/ontology/scripts/limes/cache/-1092215045.ser
09:13:15.874 [main] [] INFO org.aksw.limes.core.io.cache.HybridCache:135 - Size = 618
09:13:16.205 [main] [] WARN org.apache.sis.system:228 - The “SIS_DATA” environment variable is not set.
09:13:17.171 [main] [] INFO org.aksw.limes.core.controller.Controller:237 - Mapping task finished in 1218 ms
09:13:17.175 [main] [] INFO org.aksw.limes.core.controller.Controller:241 - Mapping size: 620 (accepted) + 1520 (need verification) = 2140 (total)
09:13:17.176 [main] [] INFO org.aksw.limes.core.controller.Controller:108 - Writing result files...
09:13:17.176 [main] [] INFO org.aksw.limes.core.io.serializer.SerializerFactory:32 - Getting serializer with name CSV
09:13:17.199 [main] [] INFO org.aksw.limes.core.controller.Controller:111 - Writing statistics file...
from limes.
Thank you for the detailed explanation, this is extremely helpful! Could you add this to the official documentation at http://dice-group.github.io/LIMES/#/user_manual/configuration_file/defining_link_specifications?id=boolean-operations? I know what minimum, maximum, and set differences are but the interaction with the thresholds was not clear to me. However what I still don't know is: What is the similarity score output of the MINUS operator? The ones from the first parameter? And what if something is below the threshold?
Actually, the MIN(m1, m2)
is the entries (i.e., links) with minimum similarities in both m1
and m2
, where nonexisting entries in both m1
and m2
are assumed to have a similarity of 0
. Therefore, if one link l
only exists in one m1
for instance, then we conceder that m2
contains the same link l
with a similarity of 0
. Therefore, we do not return l
as it would have the minimum similarity of 0
. The MAX(m1, m2)
has the same semantics.
MINUS(m1,m2)
will only return links from m1
with their respective similarities, only in case such links do not exist in m2
.
from limes.
Done updating the LIMES docs
from limes.
Related Issues (20)
- Dragon seems not working HOT 2
- Error in Jaccard measure for strings HOT 2
- Unsupported class file major version 59 HOT 5
- Output tag ignored? HOT 2
- java.lang.StringIndexOutOfBoundsException HOT 2
- Speed up compilation by parallelizing tests
- owl:sameAs is not implemented yet message HOT 1
- The last version of LIMES generates bad sameAs predicates HOT 2
- Please add a "less than" < string operator HOT 1
- Include WordNet in Docker image HOT 2
- Docker: NoClassDefFoundError: org/apache/logging/log4j/util/ReflectionUtil HOT 12
- LIMES does not work anymore: java.lang.NoClassDefFoundError HOT 3
- Complex query configuration file HOT 2
- Inconsistent behaviour of TYPE value HOT 1
- Tests are skipped in CI
- continuous-integration/travis-ci Expected — Waiting for status to be reported HOT 1
- ParserTest fails HOT 1
- ResilentSparqlQueryModuleTest gets stuck HOT 4
- LIMES Log4j errors on Windows but gives the results HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from limes.