semiceu / iso-19139-to-dcat-ap Goto Github PK
View Code? Open in Web Editor NEWReference XSLT-based implementation of GeoDCAT-AP
License: European Union Public License 1.2
Reference XSLT-based implementation of GeoDCAT-AP
License: European Union Public License 1.2
Dear Andrea,
I hope you are all fine. I have some questions concerning the way you create (or possibly you do not create yet) the Catalog Class by using INSPIRE records.
As far as I see the way we will implement the conversion tools from INSPIRE to DCAT, I guess there will have two ways to create some DCAT AP feeds from INSPIRE records:
In the first case, in order to be compliant with the DCAT (AP) specifications, you still need a Catalog class. I discovered by using your API that there is no Catalog in the created feed if you convert one INSPIRE record describing a dataset into a DCAT Feed.
So this is my first question: with wich semantical informations would recommand to feed the needed class? By using the informations providing from one CSW giving access to that record even though the CSW is much broader than the generated Catalog Class? By some hard-coded informations? Or you dont recommand to create a DCAT Feed from one single INSPIRE record?
In the second case, I read in the Mappings.md (https://github.com/SEMICeu/iso-19139-to-dcat-ap/blob/master/documentation/Mappings.md) that you create the Catalog Record from the INSPIRE record describing the CSW (or maybe I didn't understand that file good enough). I would have done on the same way. I still have a second specific question about the creation of the Catalog in that case: how would you feed the optional dct:identifier attribute of that class?
There are no resource identifiers in the INSPIRE service records and even if resource identifiers are possible for services following ISO 19115 I would not use that element to feed the dct:identifier of the Catalog Class. I could use that ISO 19115 element to feed dct:identifier of Data Service Classes created from INSPIRE records describing WMS/WFS/ATOM/.... because the resources are the same: a WMS described in a DCAT Feed or in an ISO record is still the the sameWMS so we can use the ISO identifier to feed the dct:identfier of the Data Service Class. But I wouldn't use that element to feed dct:identifier of a Catalog Class because a CSW is really different from a Catalog Class of a DCAT Feed. So in that case how would you identify a Catalog Class in the DCAT feed? With the access URL of the published DCAT Feed?
I hope my questions are specifi enough. We have been thinking about these points for a while and I wanted to get your opinion about these.
Regards,
Benoît
Hi,
I used the GeoDCAT-AP API to convert a CSW GetRecordById request. I tested the output in the DCAT-AP validator, and most of the errors are explained in #22.
I cannot find an explanation of the following error, however:
Value must be an instance of vcard:Kind | [Result path] - [http://www.w3.org/ns/dcat#contactPoint]
I have a valid representation of contact point in my original metadata. Does anyone know what this error is referring to?
Test file:
metadata_geodcat.zip
Trying to apply the XSLT transformation through oXygen I got a validation error on the xsl file.
It seems there is a "Duplicate global variable declaration", the variable is geojsonMediaTypeUri
at line #348
What’s the right one to keep?
no license is added due to typo in (missing 's')
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 2444 in e7aa2d8
Dear @andrea-perego
I hope you are all right and I wish you a very pleasant year.
I have a question regarding the dcat:themeTaxonomy proprety of dcat:Catalog. I cannot it in a RDF/XML I created with your API and I cannot find any occurence of that proprety in the xsl you document here. Does that mean you haven't (yet) implemented that mapping or is it something else?
Regards,
Benoît
Swedish validator fails at
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 1507 in e7aa2d8
expects hasTelephone as https://www.w3.org/TR/vcard-rdf/#Examples
<vcard:hasTelephone rdf:parseType="Resource">
<vcard:hasValue rdf:resource="tel:+61755555555"/>
<rdf:type rdf:resource="http://www.w3.org/2006/vcard/ns#Home"/>
<rdf:type rdf:resource="http://www.w3.org/2006/vcard/ns#Voice"/>
</vcard:hasTelephone>
Dear Andrea,
I hope you are all fine. I would like to report a question of mine rather than a true issue.
This one concerns the way this API creates dcat:Distribution classes used to describe direct download links of datasets. The API creates these classes by using the resource locator INSPIRE metadata element (https://github.com/SEMICeu/iso-19139-to-dcat-ap/blob/master/documentation/Mappings.md#resource-metadata-specific-to-data-sets-and-data-set-series).
I have two concerns about using that metadata element to crete dcat:Distribution classes:
On the one hand that element is not mandatory concerning the datasets or series. TG Recommendation 1.9 of Technical Guidelines 2.0 recommends to fill such a tag with the direct download link but it is not mandatory and is never really used in the INSPIRE infrastructure. On the other hand it is mandatory to fill these direct download links in INSPIRE ATOM Feeds files which are used by the INSPIRE portal.
Moreover, as you may know it's almost impossible to describe ta download link on a machine readable way by using the CI_OnlineResource ISO 19139 tag (which expresses the resource locator). It is indeed impossible to specify in a machine readable way the projection system of a zip file or his format or version or if the download link gives access to the full dataset or to a smaller subset (if we are speaking about huge datasets like cadastral layers). These elements are specified on a better way inside the mandatory INSPIRE ATOM Feeds files.
So this is my question: did you consider using INSPIRE ATOM Feeds files of one dataset to create the dcat:Distribution classes of the same dataset?
As far as I see it, using ISO 19139 won't help to create well structured and understandable dcat:Distribution classes.
Regards,
Benoît
I performed some tests of the XSLT aligned with the draft of GeoDCAT-AP 2.0.0.
Concerning the conformity, the title of the specification is empty in the RDF file, output of the transformation, when in the INSPIRE XML file that title is encoded using the gmd:title/gmx:Anchor
element, as recommended in the INSPIRE TG (see TG Recs C.11 and 1.10).
I.e. metadata in ISO 19139
<gmd:report>
<gmd:DQ_DomainConsistency>
<gmd:result>
<gmd:DQ_ConformanceResult>
<gmd:specification>
<gmd:CI_Citation>
<gmd:title>
<gmx:Anchor xlink:href="http://data.europa.eu/eli/reg/2010/1089">REGOLAMENTO (UE) N. 1089/2010 DELLA COMMISSIONE del 23 novembre 2010 recante attuazione della direttiva 2007/2/CE del Parlamento europeo e del Consiglio per quanto riguarda l'interoperabilità dei set di dati territoriali e dei servizi di dati territoriali</gmx:Anchor>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2010-12-08</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">publication</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
</gmd:CI_Citation>
</gmd:specification>
<gmd:explanation>
<gco:CharacterString>Fare riferimento alle specifiche indicate</gco:CharacterString>
</gmd:explanation>
<gmd:pass>
<gco:Boolean>false</gco:Boolean>
</gmd:pass>
</gmd:DQ_ConformanceResult>
</gmd:result>
</gmd:DQ_DomainConsistency>
</gmd:report>
is transformed into
<prov:Activity>
<prov:used rdf:resource="https://geodati.gov.it/resource/id/r_basili:52F1BC6F-7597-B988-8702-FAEB036ACBA7"/>
<prov:qualifiedAssociation rdf:parseType="Resource">
<prov:hadPlan rdf:parseType="Resource">
<prov:wasDerivedFrom rdf:parseType="Resource">
<dct:title xml:lang="it"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2010-12-08</dct:issued>
</prov:wasDerivedFrom>
</prov:hadPlan>
</prov:qualifiedAssociation>
<prov:generated rdf:parseType="Resource">
<dct:type rdf:resource="http://inspire.ec.europa.eu/metadata-codelist/DegreeOfConformity/notConformant"/>
<dct:description xml:lang="it">Fare riferimento alle specifiche indicate</dct:description>
</prov:generated>
</prov:Activity>
where the title value is empty, nor the related URI is used.
The same occurs with the title of the thesaurus when that title is encoded using the Anchor element but the keyword values are encoded using gco:CharacterString
element (I know that this sounds bizarre but this also happens), i.e.
this metadata in ISO 19139
<gmd:descriptiveKeywords>
<gmd:MD_Keywords>
<gmd:keyword>
<gco:CharacterString>Gestione dell'acqua</gco:CharacterString>
</gmd:keyword>
<gmd:thesaurusName>
<gmd:CI_Citation>
<gmd:title>
<gmx:Anchor xlink:href="http://www.eionet.europa.eu/gemet/">GEMET - Concepts, version 2.4</gmx:Anchor>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2010-01-13</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">Pubblicazione</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
</gmd:CI_Citation>
</gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>
is transformed into
<dcat:theme rdf:parseType="Resource">
<skos:prefLabel xml:lang="it">Gestione dell'acqua</skos:prefLabel>
<skos:inScheme>
<skos:ConceptScheme>
<dct:title xml:lang="it"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2010-01-13</dct:issued>
</skos:ConceptScheme>
</skos:inScheme>
</dcat:theme>
where again the title value of the thesaurus is empty.
This is meant to align the XSLT with the mappings added to the current GeoDCAT-AP 2 draft.
we received a comment about the prefered use of
<vcard:Organization rdf:about="http://example.org/ga-courts#GA">
<rdfs:label xml:lang="en">Commonwealth of Australia (Geoscience Australia)</rdfs:label>
<vcard:country-name>Australia</vcard:country-name>
<vcard:email rdf:resource="mailto:[email protected]"/>
</vcard:Organization>
over
<rdf:Description rdf:about="https://resources.stockholm.se/code-list/organisation/Jordbruksverket/gis.support">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
<vcard:fn xml:lang="sv">Jordbruksverket</vcard:fn>
<vcard:hasEmail rdf:resource="mailto:[email protected]"/>
</rdf:Description>
the second approach could create blank nodes, can you confirm this?
At
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 1975 in e7aa2d8
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 1992 in e7aa2d8
An extension to DCAT-AP will be produced by the EC for HVD by the end of 2023 DCAT-AP for HVD.
Do you plan to modify the XSLT so that ISO 19139 MD can be transformed to a DCAT-AP MD that conforms to this new extension ?
Currently, the XSLT output includes URIs for formats only if they are present in the source record.
The reason is twofold:
On the other hand, there are also reasons for supporting a text-to-URI mapping:
Looking at the geospatial records available from the European Data Portal, using URIs for file formats is far from being a common practice.
So, the proposal is to revise the XSLT to include a provisional mapping from textual labels to URIs, which can be phased out in the future. For the textual labels to be mapped to URIs, those most frequently used for geospatial metadata in the European Data Portal can be taken into account. The full list can be obtained via the the following SPARQL queries:
Of course, this solution will not ensure that all distributions will have a format specified via a URI. But this is not the purpose of this revision / patch.
this API to provide an OAS 3+ specification
These code lists are now available from the INSPIRE Registry, as reported in SEMICeu/GeoDCAT-AP#55 & SEMICeu/GeoDCAT-AP#56
The base URIs have been changed from
https://inspire.ec.europa.eu/metadata-codelist/SpatialRepresentationTypeCode
https://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequencyCode
to
https://inspire.ec.europa.eu/metadata-codelist/SpatialRepresentationType
https://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency
The XSLT mappings must be updated accordingly.
Hello,
I am trying to convert a ISO-19115 xml file to a geodcat file with the Python code.
The result I get is:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:adms="http://www.w3.org/ns/adms#" xmlns:cnt="http://www.w3.org/2011/content#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dct="http://purl.org/dc/terms/" xmlns:dctype="http://purl.org/dc/dcmitype/" xmlns:dqv="http://www.w3.org/ns/dqv#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:geodcatap="http://data.europa.eu/930/" xmlns:gsp="http://www.opengis.net/ont/geosparql#" xmlns:locn="http://www.w3.org/ns/locn#" xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:org="http://www.w3.org/ns/org#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:schema="http://schema.org/" xmlns:sdmx-attribute="http://purl.org/linked-data/sdmx/2009/attribute#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#"/>
Does that mean that there are problems in my xml? Do you have an xml file that works I could download?
I've also tried to run the xsl with Apache Nifi (https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.TransformXml/) but there was an error:
javax.xml.transform.TransformerConfigurationException: net.sf.saxon.s9api.SaxonApiException: Stylesheet compilation failed: 1 error reported
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:3075)
at org.apache.nifi.processors.standard.TransformXml.onTrigger(TransformXml.java:325)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: java.util.concurrent.ExecutionException: javax.xml.transform.TransformerConfigurationException: net.sf.saxon.s9api.SaxonApiException: Stylesheet compilation failed: 1 error reported
at org.apache.nifi.processors.standard.TransformXml$2.process(TransformXml.java:352)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:3054)
... 12 common frames omitted
Caused by: java.util.concurrent.ExecutionException: javax.xml.transform.TransformerConfigurationException: net.sf.saxon.s9api.SaxonApiException: Stylesheet compilation failed: 1 error reported
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:90)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2279)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3976)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4960)
at org.apache.nifi.processors.standard.TransformXml$2.process(TransformXml.java:331)
... 13 common frames omitted
Caused by: javax.xml.transform.TransformerConfigurationException: net.sf.saxon.s9api.SaxonApiException: Stylesheet compilation failed: 1 error reported
at net.sf.saxon.jaxp.SaxonTransformerFactory.newTemplates(SaxonTransformerFactory.java:155)
at org.apache.nifi.processors.standard.TransformXml.newTemplates(TransformXml.java:272)
at org.apache.nifi.processors.standard.TransformXml.access$000(TransformXml.java:94)
at org.apache.nifi.processors.standard.TransformXml$1.load(TransformXml.java:300)
at org.apache.nifi.processors.standard.TransformXml$1.load(TransformXml.java:297)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
... 19 common frames omitted
Caused by: net.sf.saxon.s9api.SaxonApiException: Stylesheet compilation failed: 1 error reported
at net.sf.saxon.s9api.XsltCompiler.compile(XsltCompiler.java:546)
at net.sf.saxon.jaxp.SaxonTransformerFactory.newTemplates(SaxonTransformerFactory.java:152)
... 25 common frames omitted
Caused by: net.sf.saxon.trans.XPathException: Stylesheet compilation failed: 1 error reported
at net.sf.saxon.style.Compilation.compileSingletonPackage(Compilation.java:97)
at net.sf.saxon.s9api.XsltCompiler.compile(XsltCompiler.java:543)
I had challenges validating dct:spatial in a dcat validator, for me below xml worked:
<dct:spatial>
<dct:Location rdf:about="https://example.com/7273cd98-5af0-45e5-b31a-8b8ac3a5875d#spatial">
<dcat:bbox rdf:datatype="http://www.opengis.net/ont/geosparql#wktLiteral"><![CDATA[POLYGON((14.52 69.06,24.00 69.06,24.00 65.29,14.52 65.29,14.52 69.06))]]></dcat:bbox>
</dct:Location>
</dct:spatial>
GeoDCAT-AP 2.0.0 has deprecated a number of mappings, following the alignment with DCAT/DCAT-AP 2, and the terms defined in the GeoDCAT-AP namespace (the full list is in §A.2 of the GeoDCAT-AP 2.0.0 specification).
To support backward compatibility, the GeoDCAT-AP XSLT still uses these mappings, along with the new ones. This, however, adds noise in the output records, and it may not be fit for all use cases.
An option is to include in the XSLT a global parameter which can used to specify whether deprecated mappings must or must not be included in the output.
Dear colleagues,
I tried to run iso-19139-to-dcat-ap XSLT via Python (as proposed in this GitHub page), but I got an error message when parsing the input .xml metadata file from a GetRecordById request to my catalogue (https://www.ide.cat/servei/catalunya/cataleg-idec/csw?request=GetRecordById&service=CSW&version=2.0.2&outputSchema=http://www.isotc211.org/2005/gmd&ElementSetName=full&ID=inspire-adreces).
After experiencing this issue, I directly ran the XSLT transformation to my ISO .xml metadata files in Notepad++, using the XML Tools plugin (Plugins > XML Tools > XSL Transformation) - However, I am not sure if this is an appropriate way to run it, and if the .rdf DCAT metadata files obtained in this way are correct.
Anyway, I think running the XSLT script in Notepad++ could be a good idea to spread its use across non-developer users.
Happy to get your feedback.
All the best,
Jordi
It took me a while to find out what version of dcat-ap the current version of the XSLT generates. This could be improved by tagging (or creating a release) each new released version in this repository. This makes it easy for users to link a release to a specific commit. Right now this kind of awkard. I suppose right now the edit commit of the CHANGELOG.md represents the version/release, but it is not very explicit. So for instance, right now this commit represents version 2.7.
With the proposed change the version is also reflected in the url (works since there is already one release; v1.13): https://raw.githubusercontent.com/SEMICeu/iso-19139-to-dcat-ap/v1.13/iso-19139-to-dcat-ap.xsl
.
Related issue: #24
The OP File Types NAL has been updated with new file types. To be verified if the GeoDCAT-AP mappings needs to be updated.
This is meant to align the XSLT with the mappings added to the current GeoDCAT-AP 2 draft.
As explained in the section B.6.8.1 in the draft of GeoDCAT-AP v. 2.0.0, for conformance with DCAT-AP, GeoDCAT-AP records MUST also include keywords from the EU Vocabularies Data Theme Named Authority List.
An improvement of the XSLT could be adding the transformation from INSPIRE themes, included in the INSPIRE XML record, to DCAT-AP themes, based on the alignments between the related controlled vocabularies.
This is meant to align the XSLT with the mappings added to the current GeoDCAT-AP 2 draft.
A first test on the XSLT output against the SHACL definitions returns validation errors concerning instances whose class is not explicitly specified (as dct:LinguisticSystem
, dct:Frequency
, and other code list values).
Hi,
Does anyone have a working implementation which demonstrates how to use the iso-19139-to-dcat-ap xslt with pycsw?
pycsw appears to support the configuration of xslt files.
At
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 1662 in e7aa2d8
iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl
Line 1577 in e7aa2d8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.