GithubHelp home page GithubHelp logo

dilcisboard / e-ark-sip Goto Github PK

View Code? Open in Web Editor NEW
7.0 8.0 6.0 9.38 MB

E-ARK SIP specification

Home Page: https://earksip.dilcis.eu/

License: Creative Commons Attribution 4.0 International

Ruby 2.48% Shell 97.52%
archiving specification standard submission information package sip oais

e-ark-sip's People

Contributors

carlwilson avatar dependabot[bot] avatar drjaime avatar hsilva-keep avatar jmaferreira avatar karinbredenberg avatar kuldaraas avatar peterdalle avatar shsdev avatar tarvo3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e-ark-sip's Issues

`metsHdr/agent/name` is not 0..*, but 1..1

SIP12 metsHdr/agent/name is defined as 0..* MAY, but METS defines it as 1..1 (mets.xsd, lines 247-263):

<xsd:element name="metsHdr" minOccurs="0">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="agent" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="name" type="xsd:string">

Duplicates #102

SIP32

Comment left in a mail to [email protected]

SIP32: content repetition “Follows the requirements in the CSIP profile. Follows the requirements in the CSIP profile.”

Check that the example appendix in the METS profile is correct and follows the latest version.

Several questions have been asked:

For SIP validation we use the commons-ip library available on git. There are principaly two versions of this library 1.0.3 and 2.0.0-alpha1 and we noticed in the code the xsd schema for the structural validation of xml is hard-coded. But we did not understand the elements that make it possible to distinguish which version to use (at the level of the METS.xml file). Likewise, the xsd file used for validation is hard-coded ("/schemas/mets1_11.xsd" or "/schemas2/mets1_12.xsd"). After reading some documentations there are several versions for METS.xml (although some are quite old) is there a way to configure the xsd to use?

Without changing the code and recompiling it is not possible to change the schema. For more information I suggest placing an issue on https://github.com/keeps/commons-ip

We have retrieved the various SIPs stored on different E-ARK projects on git and generally (but not always) we can validate them with version 1.0.3 or 2.0.0-alpha1 (probably because the use the same library to generate them) but when we try to validate them with other tools it does not work. Example: we tried to use the online application "https://eark.openpreservation.org/" but we never got validation that works.

Commons-IP is undergoing a major update and a new validation module is being developed as we speak. The alfa version was still a long way to being finished. Could you please check again using the latest version available at https://github.com/keeps/commons-ip/releases

Please make sure you are experimenting with the correct versions of packages. For example, EARK v1 packages will not be valid according to the EARK v2 specification.

With these first tests we are not very confident about the validation process. Do you have any advice for us regarding the validation process ? Here is an example of a check done in the commons-ip 1.0.3 library
In EARKUtils class there is a check on LABEL attribute of structMap tag (method getEARKStructMap) that has to be either "Common Specification structural map" or "E-ARK structural map". That's why in our case a validation failed (perhaps there are other issues). Just for information I have attached a SIP to this mail (generated with a webapp "https://earkweb.sydarkivera.se/earkweb/submission/overview") and validation fail with commons-ip library.

The structMap/@Label as "Common Specification structural map" was the value used on version 1 of the common specification (see https://dilcis.eu/images/Specifications/CS/Common_Specifications_for_IPs_v10.pdf).

Version 2 uses a different vocabulary. The @Label is expected to be "CSIP" instead of "Common Specification structural map". The SIP specification does not make changes to the inherited vocabulary as described in the specification:

  • The mandatory METS structural map element is intended to provide an overview of the components included in the package. It can also link elements of that structure to associated content files and metadata. In the CSIP the structMap describes the higher-level structure of all the content in the root and may link to existing representations. The SIP specification does not change or extend any of the requirements already defined by the Common Specification for Information Packages (for more information see section 5.3.6 of the CSIP)
  • The SIP specification does not change or extend any of the requirements already defined by the Common Specification for Information Packages (for more information see section 5.3.6 of the CSIP).
  • REF_CSIP_3 - Structural description of the package - The SIP structMap element should comply with structMap requirements in the CSIP profile.

I did not have access to the SIP created with EARK WEB so I can't comment on the reasons why it is not valid.

Are there any plans to verify EIDAS or other digital signature formats ?

Not at the moment, as far as I know.

Are there plans to check the perennial formats (PDF, TIFF etc ...) ?

No. EARK only cares about the packaging, not validating content. For content validation there are other tools that you may use, e.g. the ones developed under the PREFORMA project - http://www.preforma-project.eu/open-source-portal.html

After checking the source code we see that it's not possible to deactivate the verification of fingerprints or generally to deactivate certain stages of the validation ? Are any such code enhancement planned in the roadmap ?

A new validation module is being developed under https://github.com/keeps/commons-ip. The final version should be released towards the end of October 2021.

The current state of the validator, validates all the requirements of the CSIP. There are no plans to disable certain requirements.

E-ARK-SIP 2.0.0

Issues related to version 2.0.0 of the EARK SIP specification

NOTETYPE's value not reflected in METS Extensions

Good day,
SIP specification is enlarging attribute "NOTETYPE" of value "IDENTIFICATIONCODE". But this value is mentioned only in the External Vocabularies but not in the METS Extensions itself (neither CSIP or SIP.
I rather not specify in which it should be placed since you told me in the past that "notetype" belongs to the csip with a prefix.

The cardinality of METS element SIP12 assumes that there can only be one archival creator for a SIP (Cardinality 0..1)

Page 13, METS element SIP12 (Archival creator agent name): The cardinality of METS element SIP12 assumes that there can only be one archival creator for a SIP (Cardinality 0..1). However, in real life this is not always the case. To accommodate those situations the cardinality should be 0..*. (A real-life example that I came across recently is three independent and autonomous public authorities that use the same database installation. In this case I need to be able to have all three authorities as archival creators.)

SIP10

Comment left in a mail to [email protected]

SIP10: it is a bit confusing the fact that the role “ARCHIVIST” has a description including the term “creator”. The term creator could be confused with the role “CREATOR” which is reserved for the submitting agent. *

Agents indistinguishable from each other

Original post from @koit. Issue #98 was broke apart into 3 individual issues.

There is no explicit category identifier for these four SIP agents and no unique signature can be combined from @ROLE and @TYPE values.

Requirement Cardinality @ROLE @TYPE
SIP9 Archival creator agent 0..1 MAY /full vocabulary allowed/ ORGANIZATION, INDIVIDUAL
SIP15 Submitting agent 1..1 MUST /full vocabulary allowed/ ORGANIZATION, INDIVIDUAL
SIP21 Contact person agent 0..* MAY CREATOR INDIVIDUAL
SIP26 Preservation agent 0..1 MAY PRESERVATION ORGANIZATION
CSIP10 Agent (creator software) 1..n MUST CREATOR OTHER

For instance, an agent with @ROLE = "PRESERVATION" and @TYPE = "ORGANIZATION" could be considered SIP26 Preservation agent, but the same combination is also valid for SIP15 Submitting agent. For comparison, CSIP10 agent has a much clearer signature: @ROLE = "CREATOR", @TYPE = "OTHER", @OTHERTYPE = "SOFTWARE" and note/@csip:NOTETYPE="SOFTWARE VERSION".

A more serious problem is that any of these SIP agent attribute values are also valid for custom agents the user has added. In order to do meaningful compliance tests we need an explicit way to identify the E-ARK SIP agents.

One (not too elegant) way out of it might be to add a custom attribute:
metsHdr/agent/note/@sip:AGENTROLE = CREATOR | SUBMITTER | CONTACT | PRESERVER.

Note: mets.xsd vocabularies for @ROLE and @TYPE are:

  • mets/metsHdr/agent/@ROLE = CREATOR | EDITOR | ARCHIVIST | PRESERVATION | DISSEMINATOR | CUSTODIAN | IPOWNER | OTHER
  • mets/metsHdr/agent/@TYPE = INDIVIDUAL | ORGANIZATION | OTHER

SIP16

Comment received in mail to [email protected]

SIP16: The description of the submitting agent role reads “archival creator” instead of “submitting agent”.

SIP16 Submitting agent role
metsHdr/agent/
@ROLE
The role of the archival creator is “CREATOR”.

Path to extension schema

Given in profile:
Namespace: xmlns:csip=https://dilcis.eu/XML/METS/CSIPExtensionMETS
xmlns:sip=https://dilcis.eu/XML/METS/SIPExtensionMETS
Path: http://earksip.dilcis.eu/schema/DILCISExtensionSIPMETS.xsd

metsRootElementExample1:
xmlns:csip="https://dilcis.eu/XML/METS/CSIPExtensionMETS"
https://dilcis.eu/XML/METS/CSIPExtensionMETS https://dilcis.eu/XML/METS/CSIPExtensionMETS/DILCISExtensionMETS.xsd">

Appendix1:
xmlns:csip="https://dilcis.eu/XML/METS/CSIPExtensionMETS"
https://dilcis.eu/XML/METS/CSIPExtensionMETS https://dilcis.eu/XML/METS/CSIPExtensionMETS/DILCISExtensionMETS.xsd https://dilcis.eu/XML/METS/SIPExtensionMETS https://dilcis.eu/XML/METS/SIPExtensionMETS/DILCISExtensionSIPMETS.xsd

Result: Path to schema work in examples and appendix.
Needed: In metsRootElementExample1 namespace for SIP is needed as well as the pointer to the schema. In Appendix 1 the namespace needs to be given (xmlns:sip). In element <related_profile) it’s the wrong path to the CSIP profile.

The are nulls in the auto-generated markdown

In the profile, content like (I think the problem is the last paragraph, because it has an anchor)

<requirement ID="SIP5" REQLEVEL="MAY" RELATEDMAT="VocabularyaltrecordIDTYPE" EXAMPLES="metsHdrElementExample1">
  <description>
    <head>Submission agreement</head>
    <p xmlns="http://www.w3.org/1999/xhtml">A reference to the Submission Agreement associated with the package.</p>
    <p xmlns="http://www.w3.org/1999/xhtml">@TYPE is used with the value "SUBMISSIONAGREEMENT".</p>
    <p xmlns="http://www.w3.org/1999/xhtml">Example: RA 13-2011/5329; 2012-04-12</p>
    <p xmlns="http://www.w3.org/1999/xhtml">Example: http://submissionagreement.kb.se/dnr331-1144-2011/20120711/</p>
    <p xmlns="http://www.w3.org/1999/xhtml">Note: It is recommended to use a machine-readable format for a better description of a submission agreement.</p>
    <p xmlns="http://www.w3.org/1999/xhtml">For example, the submission agreement developed by Docuteam GmbH <a href="http://www.loc.gov/standards/mets/profiles/00000041.xml">http://www.loc.gov/standards/mets/profiles/00000041.xml</a>
  </p>
  <dl xmlns="http://www.w3.org/1999/xhtml">
    <dt>METS XPath</dt>
    <dd>metsHDR/altrecordID</dd>
    <dt>Cardinality</dt>
    <dd>0..1</dd>
  </dl>
</description>
</requirement>

generates a null.

Also, if the controlled vocabulary is not present, a null is also generated. Example:

<requirement ID="SIP4" REQLEVEL="MUST" RELATEDMAT="VocabularyOAISPackageType" EXAMPLES="metsHdrElementExample1">
  <description>
    <head>OAIS Package type information</head>
    <p xmlns="http://www.w3.org/1999/xhtml">@csip:OAISPACKAGETYPE is used with the value "SIP".</p>
    <dl xmlns="http://www.w3.org/1999/xhtml">
      <dt>METS XPath</dt>
      <dd>metsHdr/@csip:OAISPACKAGETYPE</dd>
      <dt>Cardinality</dt>
      <dd>1..1</dd>
    </dl>
  </description>
</requirement>

General: All extra elements for SIP, AIP and DIP should be included in CSIP

In [our country] we have not created separate FGSs for SIP, AIP and DIP.

We recommend that all extra elements for SIP, AIP and DIP should be included in CSIP, but as optional information. Then, documents are needed that describe how to create a SIP, AIP, DIP and how to use the elements at EU level and that it is possible to adapt for local use in different countries.

Page 11 + Page 19: The sentence just before the table says: “The following table describes the main differences in the metsHdr between an EARK SIP and the CSIP.” The use of “main” is very confusing.

Page 11 + Page 19: The sentence just before the table says: “The following table describes the main differences in the metsHdr between an EARK SIP and the CSIP.” The use of “main” is very confusing. Are there more differences than what is showed in the table? It seems illogical to not include ALL differences in the table. If there are more differences than what is shown in the table, it should be very clear where the remaining differences are described.

Mention of content information type specifications

This quote:

At the moment, there are 3 such specifications: - SIARD 2.0 for relational databases (The SIARD 2.0 specification for relational databases can be found at http://eark-project.com/resources/specificationdocs/32-specification-for-siard-format-v20)

mentioned here: https://github.com/DILCISBoard/E-ARK-SIP/blob/master/specification/01-introduction/index.md might not be entirely correct.

It is my believe that the SIARD 2.0 specification is not a content information type specification, even though we made amendments to SIARD in order for the SIARD format to be adopted in a content information type specification.

Chapter 3.4

Comment left in a mail to [email protected]

3.4 states that “Although seldom used, preservation metadata can be included in an SIP”. I understand that this captures technical metadata (file format etc.) but I’m wondering if it includes also actions/events on the SIP. Is it suggested to create a PREMIS event to mark the creation of the METS SIP or the metsHdr information is considered sufficient for this purpose? If the latter is true, then I’m wondering to what kind of PREMIS event does the example under 5.2 (Example 4: Example of a whole METS document describing an submission information package with no representations) refers to?

Proposal of adding Retention/Disposal elements

Proposed element: Preservation status

Status for retention, disposal or preservation.

  • S=Save, preserve (swe. bevara),
  • DI=Disposal, retention (swe. gallra),
  • PA=PAused (swe. parkerad),
  • UN=UNknown (swe. okänt). 

Element type: String. Allowed values: S, DI, PA, UN

Proposed element: Preservation date.

Date after which disposal of AIP shall occur if PreservationStatus="DI". That is preservation shall only occur up to and including this date, i.e. package shall be retained to this date. E.g. ”2020-01-01” means that the AIP shall be destroyed (directly) after this date (or retained to this date depending on ones viewpoint.). Element type: YYYY-MM-DD. Shall be given if PreservationStatus="DI". Otherwise it is optional.

Proposed element: Preservation reference.

Law or other regulation that determines preservation, disposal or retention. E.g. "RA-FS 2030:12". Element type: Free text. Min 1 character. Max 255 characters

Proposed element: Classification.

Security classification of information in package.

  • P=Publik (eng. public or declassified),
  • BS=Begränsat Skyddsvärde (eng. restricted),
  • HS=Högt Skyddsvärde (eng. confidential),
  • EJ KLASSAT=Ej Klassat, okänd klassning (eng. unclassified, unknown). 

Elements type: String. Allowed values: P, BS, HS, EJ KLASSAT. "EJ KLASSAT" is default.
Values could be extended to support different organisations requirements.

Proposed element: Keywords.

A list of keywords used for searching. E.g. "TMJ, BB, varumärkesintrång, värde2variabel" or "TMJ BB varumärkesintrång värde2variabel". Element type: Free text. Max 20 words separated by (space) or (comma). Max 2047 characters.

SIP9-31 agent ambiguities

SIP9 - SIP31 specify four types of metsHdr/agent but their distinction is insufficient to create useful tests.

Issue 1: Agents indistinguishable from each other

There is no explicit category identifier for these four SIP agents and no unique signature can be combined from @ROLE and @TYPE values.

Requirement Cardinality @ROLE @TYPE
SIP9 Archival creator agent 0..1 MAY /full vocabulary allowed/ ORGANIZATION, INDIVIDUAL
SIP15 Submitting agent 1..1 MUST /full vocabulary allowed/ ORGANIZATION, INDIVIDUAL
SIP21 Contact person agent 0..* MAY CREATOR INDIVIDUAL
SIP26 Preservation agent 0..1 MAY PRESERVATION ORGANIZATION
CSIP10 Agent (creator software) 1..n MUST CREATOR OTHER

For instance, an agent with @ROLE = "PRESERVATION" and @TYPE = "ORGANIZATION" could be considered SIP26 Preservation agent, but the same combination is also valid for SIP15 Submitting agent. For comparison, CSIP10 agent has a much clearer signature: @ROLE = "CREATOR", @TYPE = "OTHER", @OTHERTYPE = "SOFTWARE" and note/@csip:NOTETYPE="SOFTWARE VERSION".

A more serious problem is that any of these SIP agent attribute values are also valid for custom agents the user has added. In order to do meaningful compliance tests we need an explicit way to identify the E-ARK SIP agents.

One (not too elegant) way out of it might be to add a custom attribute:
metsHdr/agent/note/@sip:AGENTROLE = CREATOR | SUBMITTER | CONTACT | PRESERVER.

Note: mets.xsd vocabularies for @ROLE and @TYPE are:

  • mets/metsHdr/agent/@ROLE = CREATOR | EDITOR | ARCHIVIST | PRESERVATION | DISSEMINATOR | CUSTODIAN | IPOWNER | OTHER
  • mets/metsHdr/agent/@TYPE = INDIVIDUAL | ORGANIZATION | OTHER

Issue 2: metsHdr/agent/name is not 0..*, but 1..1

SIP12 metsHdr/agent/name is defined as 0..* MAY, but METS defines it as 1..1 (mets.xsd, lines 247-263):

<xsd:element name="metsHdr" minOccurs="0">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="agent" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="name" type="xsd:string">

Issue 3: Inconsistency in cardinality of metsHdr/agent/name

The cardinality requirements for metsHdr/agent/name do not follow a consistent logic in relation to the cardinality of their metsHdr/agent. According to mets.xsd, metsHdr/agent/name cardinality is 1..1, a "conditional MUST," i.e. if a metsHdr/agent exists, it MUST have exactly one metsHdr/agent/name.

metsHdr/agent Cardinality metsHdr/agent/name Cardinality
SIP9 Archival creator agent 0..1 MAY SIP12 0..* MAY
SIP15 Submitting agent 1..1 MUST SIP18 1..1 MAY
SIP21 Contact person agent 0..* MAY SIP24 1..1 MUST
SIP26 Preservation agent 0..1 MAY SIP29 1..1 MAY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.