Comments (3)
Just curious, is that something documented in the spec?
Another question: aren't there XML tools that would allow specific XML tag extraction? That way, we could keep functionality separate.
from metha.
As far as I understand the spec, oai:metadata
is nested in oai:OAI-PMH/oai:ListRecords/oai:record
and it may contain any XML elements.
<complexType name="metadataType">
<annotation>
<documentation>Metadata must be expressed in XML that complies
with another XML Schema (namespace=#other). Metadata must be
explicitly qualified in the response.</documentation>
</annotation>
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>
I'm only interested in this child element(s) of oia:metadata
because this is the actual payload. I'm surprised people keep the OAI envelope.
Suire Workaround is to apply an XSLT to each record (when using find
and xargs
) or to the while set (when using meta-cat
), but this is far from a one-liner:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Extract OAI metadata records from OAI-PMH responses -->
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:oai="http://www.openarchives.org/OAI/2.0/" exclude-result-prefixes="oai">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/Response">
<records>
<xsl:apply-templates select="ListRecords/oai:record"/>
</records>
</xsl:template>
<xsl:template match="/Records">
<records>
<xsl:apply-templates select="oai:record"/>
</records>
</xsl:template>
<xsl:template match="oai:record">
<xsl:copy-of select="oai:metadata/*"/>
</xsl:template>
</xsl:transform>
I suppose the extraction would only be a few lines and an optional command line flag in metha source code.
from metha.
I'm only interested in this child element(s) of oia:metadata because this is the actual payload. I'm surprised people keep the OAI envelope.I'm surprised people keep the OAI envelope.
Yes, metadata is the most interesting. If the envelop - oai:record/oai:metadata/*
- would be just the metadata then one may mis the "set specifier" that's sometimes useful.
<record>
<header status="">
<identifier>oai:ojs.pkp.sfu.ca:article/2287</identifier>
<datestamp>2020-06-22T11:34:53Z</datestamp>
<setSpec>EJOEH:E</setSpec>
<setSpec>driver</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" ....oai_dc.xsd">
<dc:title xml:lang="it-IT">Editoriale / Editorial</dc:title>
<dc:creator>Szmigielski, S.</dc:creator>
...
</oai_dc:dc>
</metadata>
</record>
Thanks for the XSTL.
but this is far from a one-liner:
I appreciate the desire for one-liners.
from metha.
Related Issues (20)
- Why is data only harvested up to the last day? HOT 3
- base-dir argument for metha-cat HOT 1
- Bad page state in metha-sync (arm)
- authorization // character limit HOT 5
- Support for basic auth HOT 2
- two different resumptionTokens? HOT 1
- metha-cat - can not open the "dir" extablished in .cache/metha HOT 3
- Urlencode resuptionToken HOT 2
- Migration from Goodtables to Frictionless Repository
- Question: Can metha auto harvest all formats/metadataPrefixes? HOT 2
- Metha-Cat: Support for Paging? HOT 3
- Client Timeout HOT 4
- Dependency Issue with Version 0.2.37 HOT 2
- metha-sync should catch SIGINT HOT 3
- Selective Harvesting and metha-cat HOT 2
- Implement various harvesting strategies properly. HOT 1
- Request Entity Too Large HOT 5
- parametrize "max empty responses" HOT 4
- "FATA[0000] illegal base64 data at input byte 0" for metha-ls and metha-cat HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metha.