GithubHelp home page GithubHelp logo

dm2e-mappings's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dm2e-mappings's Issues

subject must be repeatable

Is there any reason, why a dc:subject is not repeatable?

Please correct that in the specs and validation tool. I can see no advantage in this decision.

Are unknown properties errors?

Should the validation rely on a strict closed world assumption and consider properties that are not defined by the DM2E data model as an error or a mere warning?

Right now, I collect all DatatypeProperties and ObjectProperties of the resp. OWL file and iterate through all the properties in the data to check. If a property is not in this whitelist, a WARNING is given.

But in most cases, these are actually errors, e.g.

<http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG:01QU84RT_982>:.
    [WARNING]   UNKNOWN_PROPERTY    http://purl.org/dc/terms/description
    [WARNING]   UNKNOWN_PROPERTY    http://purl.org/ontology/bibo/number/
    [WARNING]   UNKNOWN_PROPERTY    http://purl.org/dc/terms/title

1 and 3 should be dc elements not dcterms, 2 is a typo.

So I suggest making unknown properties errors.

Issues with ABO validation tests for1.1_Rev1.5-DRAFT

Here I collect issues I encounter while working with ABO mappings and the validator 1.1_Rev1.5-DRAFT

  1. foaf:Organization not allowed as dc:subject --> plz fix
    [ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/agent/onb/authority_gnd/4029989-2

  2. foaf:Person not allowed as dc:subject --> plz fix
    [ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/agent/onb/authority_gnd/118838989

  3. bibo:Series not allowed as dc:type --> FATAL, plz fix
    [FATAL] INVALID_DC_TYPE http://purl.org/ontology/bibo/Series

  4. edm:Place as dc:subject throws error, plz fix
    [ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/place/onb/authority_gnd/4074987-3

----> In the meantime I suspect that the ranges for dc:subject are generally flawed -> OWL file?

Proposal: Make dc:format mandatory for WebResources referenced by edm:object

According to the EDM v.5.2.4 (http://pro.europeana.eu/documents/900548/0d0f6ec3-1905-4c4f-96c8-1d817c03123c) and the Europeana Portal Image Policy (http://pro.europeana.eu/documents/900548/960640/Europeana+Portal+Image+Policy), the WebResource referenced by edm:object in an ore:Aggregation should be:

  • the URL to an image representation of the CHO in the highest resolution available on the provider's web site
  • in an image format supported by the image processing library ImageMagick
  • at least 200px wide

Since this is the property that both Europeana Portal and DM2E will use for display in the respective search interfaces, it would simplify the thumbnail generation process if we knew what MIME type such a WebResource has.

Therefore I propose to make dc:format mandatory for WebResources that are the related to an ore:Aggregation using the edm:object property.

The following MIME types should be supported:

  • image/png
  • image/jpeg
  • image/gif
  • image/tiff
  • application/pdf

PDF may only be used if the first page of the PDF is the representation of the CHO.

Difficult to handle content for Annotation

Resource like the following

http://data.dm2e.eu/data/html/resourcemap/uber/dingler/page_pj022_pb150/1393515961557

have an Annotatable Version that is an HTML page.
Is contains both text (trascription) and images (facsimile).
It would be VERY IMPORTANT to SHOW THEM IN THE DEMO.

However:
Images are enclosed in a viewer and text is somewhere in the HTML DOM.
Does this respect the specifications we agreed? I would like to have links to Images and texts, if possible separately. so I can aggregate them in Feed and annotate them.

Question: what is the why of connecting a transcription to its facsimile? Is the model supporting this?

Single pages without book?

CHOs are single pages? No CHO (boo)k exists that contains them.
It is correct?
How should I treat them?

Error when running validator

I am trying to run

java -jar dm2e-validate.jar -version "1.1_Rev1.4" -terse test.rdf

on a file successfully validated in http://www.w3.org/RDF/Validator/rdfval

and I only get the following error message:

! Jena croaked on file /media/sf_devel/test.rdf. Are you sure it is 'RDF/XML'. http://RELATIVE_URL/ Code: 11/LOWERCASE_PREFERRED in HOST: lowercase is preferred in this component

Please find the content of the file here http://wiki.dm2e.eu/wiki/images/5/51/Validate_Test.zip

Untyped literal in dcterms:created

dm2e-validation.jar NOTICEs:

has untyped literal for property .

The values in the file are 2014-01-09T23:16:06Z, which are valid xsd:dateTime.

DM2E-spec lists edm:TimeSpan, xsd:dateTime, rdf:Literal as possible ranges for dcterms.

Why is it NOTICEd? Or what is a better notation?

How to handle bibo:number for CHOs that don't have them?

bibo:number is mandatory and has datatype xsd:int.

When mapping encounters a page without a page number what should data providers set as a value here. They must set some value and it must be a number.

Probably -1 is a good idea though this will break the naive find-the-first-page algorithm.

Otherwise bibo:number should not be mandatory.

How to find the annotatable version and facsimile image?

If I encounter a CHO with dm2e:displayLevel "true"^^xsd:boolean whose corresponding ore:Aggregation has neither edm:isShownBy, edm:object nor dm2e:hasAnnotatableVersion, what should be displayed as

  • the annotatable version
  • the thumbnail?

Should I skip those links altogether?

Should I select the first page for annotation/as thumbnail? How do I find the first page?

Cannot understand the correct version to check!

I cannot get what is the correct version of the dataset, some are empty some have messy data.

Please delete the old datasets if possible! Or clearly give a name to the version last version.

dm2e:incipit & dm2e:explicit should be repeatable

Just came across such a case:

  <dm2e:incipit>Prospero prudente constate felyce ... [Prolog]</dm2e:incipit>
  <dm2e:incipit>Entonçes se apareja la cosa ...</dm2e:incipit>
  <dm2e:explicit>... Fable la obra que el dezir se çierra. [Prolog]</dm2e:explicit>
  <dm2e:explicit>... ny demandarles cosas non acostumbradas.</dm2e:explicit>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.