dilcisboard / e-ark-sip Goto Github PK
View Code? Open in Web Editor NEWE-ARK SIP specification
Home Page: https://earksip.dilcis.eu/
License: Creative Commons Attribution 4.0 International
E-ARK SIP specification
Home Page: https://earksip.dilcis.eu/
License: Creative Commons Attribution 4.0 International
SIP12 metsHdr/agent/name
is defined as 0..* MAY, but METS defines it as 1..1 (mets.xsd, lines 247-263):
<xsd:element name="metsHdr" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="agent" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="name" type="xsd:string">
Duplicates #102
Comment left in a mail to [email protected]
SIP32: content repetition “Follows the requirements in the CSIP profile. Follows the requirements in the CSIP profile.”
Several questions have been asked:
For SIP validation we use the commons-ip library available on git. There are principaly two versions of this library 1.0.3 and 2.0.0-alpha1 and we noticed in the code the xsd schema for the structural validation of xml is hard-coded. But we did not understand the elements that make it possible to distinguish which version to use (at the level of the METS.xml file). Likewise, the xsd file used for validation is hard-coded ("/schemas/mets1_11.xsd" or "/schemas2/mets1_12.xsd"). After reading some documentations there are several versions for METS.xml (although some are quite old) is there a way to configure the xsd to use?
Without changing the code and recompiling it is not possible to change the schema. For more information I suggest placing an issue on https://github.com/keeps/commons-ip
We have retrieved the various SIPs stored on different E-ARK projects on git and generally (but not always) we can validate them with version 1.0.3 or 2.0.0-alpha1 (probably because the use the same library to generate them) but when we try to validate them with other tools it does not work. Example: we tried to use the online application "https://eark.openpreservation.org/" but we never got validation that works.
Commons-IP is undergoing a major update and a new validation module is being developed as we speak. The alfa version was still a long way to being finished. Could you please check again using the latest version available at https://github.com/keeps/commons-ip/releases
Please make sure you are experimenting with the correct versions of packages. For example, EARK v1 packages will not be valid according to the EARK v2 specification.
With these first tests we are not very confident about the validation process. Do you have any advice for us regarding the validation process ? Here is an example of a check done in the commons-ip 1.0.3 library
In EARKUtils class there is a check on LABEL attribute of structMap tag (method getEARKStructMap) that has to be either "Common Specification structural map" or "E-ARK structural map". That's why in our case a validation failed (perhaps there are other issues). Just for information I have attached a SIP to this mail (generated with a webapp "https://earkweb.sydarkivera.se/earkweb/submission/overview") and validation fail with commons-ip library.
The structMap/@Label as "Common Specification structural map" was the value used on version 1 of the common specification (see https://dilcis.eu/images/Specifications/CS/Common_Specifications_for_IPs_v10.pdf).
Version 2 uses a different vocabulary. The @Label is expected to be "CSIP" instead of "Common Specification structural map". The SIP specification does not make changes to the inherited vocabulary as described in the specification:
I did not have access to the SIP created with EARK WEB so I can't comment on the reasons why it is not valid.
Are there any plans to verify EIDAS or other digital signature formats ?
Not at the moment, as far as I know.
Are there plans to check the perennial formats (PDF, TIFF etc ...) ?
No. EARK only cares about the packaging, not validating content. For content validation there are other tools that you may use, e.g. the ones developed under the PREFORMA project - http://www.preforma-project.eu/open-source-portal.html
After checking the source code we see that it's not possible to deactivate the verification of fingerprints or generally to deactivate certain stages of the validation ? Are any such code enhancement planned in the roadmap ?
A new validation module is being developed under https://github.com/keeps/commons-ip. The final version should be released towards the end of October 2021.
The current state of the validator, validates all the requirements of the CSIP. There are no plans to disable certain requirements.
In scenarios where SIP updates/replacements are a reality, the order by which the SIPs are applied over the existing AIP is important.
The SIP should be able to identify the AIP and its version/revision so that a verification can be made during ingest.
Ingesting SIPs in the incorrect order will render considerable different results.
The documents for the different IPs should be standardized so that they are structured in a similar way and the references should keep the same ID for the same element (CSIP7 should always be the same element in all documents).
Hi @carlwilson,
I've added a script to the root folder that generates the MD version of the METS profiles using your generator application. see https://github.com/DILCISBoard/E-ARK-SIP/blob/rel/v2.0.0-draft/mets-profile-proc.sh
I would love to have a similar process to generate the PDF version of the specs. Is there a way we can add this to the project?
Issues related to version 2.0.0 of the EARK SIP specification
Good day,
SIP specification is enlarging attribute "NOTETYPE" of value "IDENTIFICATIONCODE". But this value is mentioned only in the External Vocabularies but not in the METS Extensions itself (neither CSIP or SIP.
I rather not specify in which it should be placed since you told me in the past that "notetype" belongs to the csip with a prefix.
Comment left in a mail to [email protected]
SIP10: it is a bit confusing the fact that the role “ARCHIVIST” has a description including the term “creator”. The term creator could be confused with the role “CREATOR” which is reserved for the submitting agent. *
Original post from @koit. Issue #98 was broke apart into 3 individual issues.
There is no explicit category identifier for these four SIP agents and no unique signature can be combined from @ROLE
and @TYPE
values.
Requirement | Cardinality | @ROLE |
@TYPE |
---|---|---|---|
SIP9 Archival creator agent | 0..1 MAY | /full vocabulary allowed/ | ORGANIZATION, INDIVIDUAL |
SIP15 Submitting agent | 1..1 MUST | /full vocabulary allowed/ | ORGANIZATION, INDIVIDUAL |
SIP21 Contact person agent | 0..* MAY | CREATOR | INDIVIDUAL |
SIP26 Preservation agent | 0..1 MAY | PRESERVATION | ORGANIZATION |
CSIP10 Agent (creator software) | 1..n MUST | CREATOR | OTHER |
For instance, an agent with @ROLE = "PRESERVATION"
and @TYPE = "ORGANIZATION"
could be considered SIP26 Preservation agent, but the same combination is also valid for SIP15 Submitting agent. For comparison, CSIP10 agent has a much clearer signature: @ROLE = "CREATOR"
, @TYPE = "OTHER"
, @OTHERTYPE = "SOFTWARE"
and note/@csip:NOTETYPE="SOFTWARE VERSION"
.
A more serious problem is that any of these SIP agent attribute values are also valid for custom agents the user has added. In order to do meaningful compliance tests we need an explicit way to identify the E-ARK SIP agents.
One (not too elegant) way out of it might be to add a custom attribute:
metsHdr/agent/note/@sip:AGENTROLE = CREATOR | SUBMITTER | CONTACT | PRESERVER
.
Note: mets.xsd vocabularies for @ROLE
and @TYPE
are:
mets/metsHdr/agent/@ROLE = CREATOR | EDITOR | ARCHIVIST | PRESERVATION | DISSEMINATOR | CUSTODIAN | IPOWNER | OTHER
mets/metsHdr/agent/@TYPE = INDIVIDUAL | ORGANIZATION | OTHER
ID's needs to be unique.
Having a summary at the end of the document doesn't make a lot a sense.
As promised, you may find the SIP specification ready for review at:
Any typos you find, please commit them directly. Any “hairier” problems, just create issues and assign them to me, please.
I fixed them manually, meanwhile.
It should be:
| ID | Name & Location | Description & usage | Cardinality & Level |
|----|-----------------|---------------------|-------------------- |
Instead of:
| ID | Name & Location | Description & usage | Cardinality & Level |
| -- | --------------- | ------------------- | ------------------- |
In all cases we can see "metsHDR/agent/note@csip:NOTETYPE".
Shouldn't be there "@sip:NOTETYPE" instead?
Comment received in mail to [email protected]
SIP16: The description of the submitting agent role reads “archival creator” instead of “submitting agent”.
SIP16 Submitting agent role
metsHdr/agent/
@ROLE
The role of the archival creator is “CREATOR”.
CITS Geodata is renamed to Geospatial data. Update is needed in the image.
https://github.com/DILCISBoard/E-ARK-SIP/blob/master/specification/images/Fig_1_SIP.svg
Given in profile:
Namespace: xmlns:csip=https://dilcis.eu/XML/METS/CSIPExtensionMETS
xmlns:sip=https://dilcis.eu/XML/METS/SIPExtensionMETS
Path: http://earksip.dilcis.eu/schema/DILCISExtensionSIPMETS.xsd
metsRootElementExample1:
xmlns:csip="https://dilcis.eu/XML/METS/CSIPExtensionMETS"
https://dilcis.eu/XML/METS/CSIPExtensionMETS https://dilcis.eu/XML/METS/CSIPExtensionMETS/DILCISExtensionMETS.xsd">
Appendix1:
xmlns:csip="https://dilcis.eu/XML/METS/CSIPExtensionMETS"
https://dilcis.eu/XML/METS/CSIPExtensionMETS https://dilcis.eu/XML/METS/CSIPExtensionMETS/DILCISExtensionMETS.xsd https://dilcis.eu/XML/METS/SIPExtensionMETS https://dilcis.eu/XML/METS/SIPExtensionMETS/DILCISExtensionSIPMETS.xsd
Result: Path to schema work in examples and appendix.
Needed: In metsRootElementExample1 namespace for SIP is needed as well as the pointer to the schema. In Appendix 1 the namespace needs to be given (xmlns:sip). In element <related_profile) it’s the wrong path to the CSIP profile.
In the profile, content like (I think the problem is the last paragraph, because it has an anchor)
<requirement ID="SIP5" REQLEVEL="MAY" RELATEDMAT="VocabularyaltrecordIDTYPE" EXAMPLES="metsHdrElementExample1">
<description>
<head>Submission agreement</head>
<p xmlns="http://www.w3.org/1999/xhtml">A reference to the Submission Agreement associated with the package.</p>
<p xmlns="http://www.w3.org/1999/xhtml">@TYPE is used with the value "SUBMISSIONAGREEMENT".</p>
<p xmlns="http://www.w3.org/1999/xhtml">Example: RA 13-2011/5329; 2012-04-12</p>
<p xmlns="http://www.w3.org/1999/xhtml">Example: http://submissionagreement.kb.se/dnr331-1144-2011/20120711/</p>
<p xmlns="http://www.w3.org/1999/xhtml">Note: It is recommended to use a machine-readable format for a better description of a submission agreement.</p>
<p xmlns="http://www.w3.org/1999/xhtml">For example, the submission agreement developed by Docuteam GmbH <a href="http://www.loc.gov/standards/mets/profiles/00000041.xml">http://www.loc.gov/standards/mets/profiles/00000041.xml</a>
</p>
<dl xmlns="http://www.w3.org/1999/xhtml">
<dt>METS XPath</dt>
<dd>metsHDR/altrecordID</dd>
<dt>Cardinality</dt>
<dd>0..1</dd>
</dl>
</description>
</requirement>
generates a null.
Also, if the controlled vocabulary is not present, a null is also generated. Example:
<requirement ID="SIP4" REQLEVEL="MUST" RELATEDMAT="VocabularyOAISPackageType" EXAMPLES="metsHdrElementExample1">
<description>
<head>OAIS Package type information</head>
<p xmlns="http://www.w3.org/1999/xhtml">@csip:OAISPACKAGETYPE is used with the value "SIP".</p>
<dl xmlns="http://www.w3.org/1999/xhtml">
<dt>METS XPath</dt>
<dd>metsHdr/@csip:OAISPACKAGETYPE</dd>
<dt>Cardinality</dt>
<dd>1..1</dd>
</dl>
</description>
</requirement>
In [our country] we have not created separate FGSs for SIP, AIP and DIP.
We recommend that all extra elements for SIP, AIP and DIP should be included in CSIP, but as optional information. Then, documents are needed that describe how to create a SIP, AIP, DIP and how to use the elements at EU level and that it is possible to adapt for local use in different countries.
Markdown has a special meaning to < > elements so they don't show up on the rendered document.
I'm not sure if this is a @carlwilson or a @karinbredenberg issue.
This quote:
At the moment, there are 3 such specifications: - SIARD 2.0 for relational databases (The SIARD 2.0 specification for relational databases can be found at http://eark-project.com/resources/specificationdocs/32-specification-for-siard-format-v20)
mentioned here: https://github.com/DILCISBoard/E-ARK-SIP/blob/master/specification/01-introduction/index.md might not be entirely correct.
It is my believe that the SIARD 2.0 specification is not a content information type specification, even though we made amendments to SIARD in order for the SIARD format to be adopted in a content information type specification.
When the ID starts with REF the anchors in the descriptions don't get transformed and added as links in the description.
Firstly in the all table.
Also addition of tables in the textual part so just with REF elements.
Example: https://dilcisboard.github.io/E-ARK-CSIP/specification/implementation/metadata/mets/metshdr/
If the SIP spec makes something mandatory for example, we need to add a METS profile to the spec.
Comment left in a mail to [email protected]
3.4 states that “Although seldom used, preservation metadata can be included in an SIP”. I understand that this captures technical metadata (file format etc.) but I’m wondering if it includes also actions/events on the SIP. Is it suggested to create a PREMIS event to mark the creation of the METS SIP or the metsHdr information is considered sufficient for this purpose? If the latter is true, then I’m wondering to what kind of PREMIS event does the example under 5.2 (Example 4: Example of a whole METS document describing an submission information package with no representations) refers to?
Comment left in a mail to [email protected]
Link https://dilcis.eu/XML/METS/SIPExtensionMETS/SIPExtensionMETS.xsd does not work
Proposed element: Preservation status
Status for retention, disposal or preservation.
Element type: String. Allowed values: S, DI, PA, UN
Proposed element: Preservation date.
Date after which disposal of AIP shall occur if PreservationStatus="DI". That is preservation shall only occur up to and including this date, i.e. package shall be retained to this date. E.g. ”2020-01-01” means that the AIP shall be destroyed (directly) after this date (or retained to this date depending on ones viewpoint.). Element type: YYYY-MM-DD. Shall be given if PreservationStatus="DI". Otherwise it is optional.
Proposed element: Preservation reference.
Law or other regulation that determines preservation, disposal or retention. E.g. "RA-FS 2030:12". Element type: Free text. Min 1 character. Max 255 characters
Proposed element: Classification.
Security classification of information in package.
Elements type: String. Allowed values: P, BS, HS, EJ KLASSAT. "EJ KLASSAT" is default.
Values could be extended to support different organisations requirements.
Proposed element: Keywords.
A list of keywords used for searching. E.g. "TMJ, BB, varumärkesintrång, värde2variabel" or "TMJ BB varumärkesintrång värde2variabel". Element type: Free text. Max 20 words separated by (space) or (comma). Max 2047 characters.
SIP9 - SIP31 specify four types of metsHdr/agent
but their distinction is insufficient to create useful tests.
There is no explicit category identifier for these four SIP agents and no unique signature can be combined from @ROLE
and @TYPE
values.
Requirement | Cardinality | @ROLE |
@TYPE |
---|---|---|---|
SIP9 Archival creator agent | 0..1 MAY | /full vocabulary allowed/ | ORGANIZATION, INDIVIDUAL |
SIP15 Submitting agent | 1..1 MUST | /full vocabulary allowed/ | ORGANIZATION, INDIVIDUAL |
SIP21 Contact person agent | 0..* MAY | CREATOR | INDIVIDUAL |
SIP26 Preservation agent | 0..1 MAY | PRESERVATION | ORGANIZATION |
CSIP10 Agent (creator software) | 1..n MUST | CREATOR | OTHER |
For instance, an agent with @ROLE = "PRESERVATION"
and @TYPE = "ORGANIZATION"
could be considered SIP26 Preservation agent, but the same combination is also valid for SIP15 Submitting agent. For comparison, CSIP10 agent has a much clearer signature: @ROLE = "CREATOR"
, @TYPE = "OTHER"
, @OTHERTYPE = "SOFTWARE"
and note/@csip:NOTETYPE="SOFTWARE VERSION"
.
A more serious problem is that any of these SIP agent attribute values are also valid for custom agents the user has added. In order to do meaningful compliance tests we need an explicit way to identify the E-ARK SIP agents.
One (not too elegant) way out of it might be to add a custom attribute:
metsHdr/agent/note/@sip:AGENTROLE = CREATOR | SUBMITTER | CONTACT | PRESERVER
.
Note: mets.xsd vocabularies for @ROLE
and @TYPE
are:
mets/metsHdr/agent/@ROLE = CREATOR | EDITOR | ARCHIVIST | PRESERVATION | DISSEMINATOR | CUSTODIAN | IPOWNER | OTHER
mets/metsHdr/agent/@TYPE = INDIVIDUAL | ORGANIZATION | OTHER
metsHdr/agent/name
is not 0..*, but 1..1SIP12 metsHdr/agent/name
is defined as 0..* MAY, but METS defines it as 1..1 (mets.xsd, lines 247-263):
<xsd:element name="metsHdr" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="agent" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="name" type="xsd:string">
metsHdr/agent/name
The cardinality requirements for metsHdr/agent/name
do not follow a consistent logic in relation to the cardinality of their metsHdr/agent
. According to mets.xsd, metsHdr/agent/name
cardinality is 1..1, a "conditional MUST," i.e. if a metsHdr/agent
exists, it MUST have exactly one metsHdr/agent/name
.
metsHdr/agent |
Cardinality | metsHdr/agent/name |
Cardinality |
---|---|---|---|
SIP9 Archival creator agent | 0..1 MAY | SIP12 | 0..* MAY |
SIP15 Submitting agent | 1..1 MUST | SIP18 | 1..1 MAY |
SIP21 Contact person agent | 0..* MAY | SIP24 | 1..1 MUST |
SIP26 Preservation agent | 0..1 MAY | SIP29 | 1..1 MAY |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.