dm2e-mappings's People
dm2e-mappings's Issues
subject must be repeatable
Is there any reason, why a dc:subject is not repeatable?
Please correct that in the specs and validation tool. I can see no advantage in this decision.
ShownAt links broken?
The shownAt link seems broken as it lands on a [Empty Page], while the AnnotatableVersion does contain text.
See for example this one:
http://data.dm2e.eu/data/html/resourcemap/mpiwg/harriot/MPIWG_01QU84RT_00793/1393267718023
wrong dc:format?
http://data.dm2e.eu/data/rdf/resourcemap/uib/wab/Ms-114/Ms-114%2C145r%5B2%5Det145r%5B3%5D/1391118905391
here the format is a resource:
dc:format http://onto.dm2e.eu/schemas/dm2e/1.1/mime-types/text/html-named-content .
, while in other datasets a label was used (html-pundit-content). What is the correct value?
Validator should detect unknown URIs used in dc:type
Currently the validator allows arbitrary URI resources as object in triples
?cho dc:type ?dc_type
?dc_type must be one of the enumerated URI values listed in 3.7
Unofficial namespace?
The data at:
http://data.dm2e.eu/data/html/dataset/ub-ffm/sammlungen
is using now an "UNOFFICIAL" namespace.
Also datasets are empty.
WAB is Empty!
Are there Images in this datasets?
Are there images somewhere?
I understand CHOs are transcriptions of manuscript pages, did you also provide facsimiles of such pages?
Are unknown properties errors?
Should the validation rely on a strict closed world assumption and consider properties that are not defined by the DM2E data model as an error or a mere warning?
Right now, I collect all DatatypeProperties and ObjectProperties of the resp. OWL file and iterate through all the properties in the data to check. If a property is not in this whitelist, a WARNING is given.
But in most cases, these are actually errors, e.g.
<http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG:01QU84RT_982>:.
[WARNING] UNKNOWN_PROPERTY http://purl.org/dc/terms/description
[WARNING] UNKNOWN_PROPERTY http://purl.org/ontology/bibo/number/
[WARNING] UNKNOWN_PROPERTY http://purl.org/dc/terms/title
1 and 3 should be dc elements not dcterms, 2 is a typo.
So I suggest making unknown properties errors.
Data provider Resource too Big
I cannot consume resources like
http://data.dm2e.eu/data/agent/ub-ffm/sammlungen/Universit%C3%A4tsbibliothek_JCS_Frankfurt_am_Main
They take ages to load as the contains a lot of triples.
However I need a way of getting at least the label of the content provider.
Would it be possible to put the relative triple in the representation of the CHO? so I do not have to follow this link?
What other alternatives?
Different kind of Pages...
http://data.dm2e.eu/data/html/dataset/uib-ffm/sammlungen/1390576653139
here the contained CHOs are Pages, there is no Book level: http://data.dm2e.eu/data/html/resourcemap/uib-ffm/sammlungen/urn:nbn
Note that the rdf:type used is http://purl.org/spar/fabio/#Page and not dm2e:1.0/Page as in other datasets.
Rev1.4: has unknown property owl#sameAs
When I check my data with the latest validation-tool with -version 1.1_Rev1.4, the following error appears:
http://data.dm2e.eu/data/agent/bbaw/authority_gnd/1018099549:
[ERROR] has unknown property http://www.w3.org/2002/07/owl#sameAs
But according the spec, it is optional (p. 53)
Wrong namespace?
Empty datasets
Issues with ABO validation tests for1.1_Rev1.5-DRAFT
Here I collect issues I encounter while working with ABO mappings and the validator 1.1_Rev1.5-DRAFT
-
foaf:Organization not allowed as dc:subject --> plz fix
[ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/agent/onb/authority_gnd/4029989-2 -
foaf:Person not allowed as dc:subject --> plz fix
[ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/agent/onb/authority_gnd/118838989 -
bibo:Series not allowed as dc:type --> FATAL, plz fix
[FATAL] INVALID_DC_TYPE http://purl.org/ontology/bibo/Series -
edm:Place as dc:subject throws error, plz fix
[ERROR] INVALID_OBJECT_PROPERTY_RANGE http://data.dm2e.eu/data/place/onb/authority_gnd/4074987-3
----> In the meantime I suspect that the ranges for dc:subject are generally flawed -> OWL file?
BLOCKING: No dc:format for the AnnotatableVersion
Example:
http://data.dm2e.eu/data/rdf/resourcemap/mpiwg/harriot/MPIWG_01QU84RT_00817/1393267718023
there is not triple specifying the format.
What should the format be? text/html?
Gei Digital is Empty!
http://data.dm2e.eu/data/html/dataset/gei/gei-digital/1391717717253
.. or I'm looking at the wrong version? please delete old versions!
rdaGr2:otherDesignationAssociatedWithTheCorporateBody not in the validator
[ERROR] UNKNOWN_PROPERTY http://rdvocab.info/ElementsGr2/otherDesignationAssociatedWithTheCorporateBody
This property is already part of Rev1.3
Empty dataset
dm2e:displayLevel="true"^^xsd:boolean should imply the existence of dc:title
Everything that should show up in the search results should have a title.
Why is edm:relatedTo a Literal
Proposal: Make dc:format mandatory for WebResources referenced by edm:object
According to the EDM v.5.2.4 (http://pro.europeana.eu/documents/900548/0d0f6ec3-1905-4c4f-96c8-1d817c03123c) and the Europeana Portal Image Policy (http://pro.europeana.eu/documents/900548/960640/Europeana+Portal+Image+Policy), the WebResource referenced by edm:object
in an ore:Aggregation
should be:
- the URL to an image representation of the CHO in the highest resolution available on the provider's web site
- in an image format supported by the image processing library ImageMagick
- at least 200px wide
Since this is the property that both Europeana Portal and DM2E will use for display in the respective search interfaces, it would simplify the thumbnail generation process if we knew what MIME type such a WebResource has.
Therefore I propose to make dc:format
mandatory for WebResources that are the related to an ore:Aggregation
using the edm:object
property.
The following MIME types should be supported:
- image/png
- image/jpeg
- image/gif
- image/tiff
- application/pdf
PDF may only be used if the first page of the PDF is the representation of the CHO.
Difficult to handle content for Annotation
Resource like the following
http://data.dm2e.eu/data/html/resourcemap/uber/dingler/page_pj022_pb150/1393515961557
have an Annotatable Version that is an HTML page.
Is contains both text (trascription) and images (facsimile).
It would be VERY IMPORTANT to SHOW THEM IN THE DEMO.
However:
Images are enclosed in a viewer and text is somewhere in the HTML DOM.
Does this respect the specifications we agreed? I would like to have links to Images and texts, if possible separately. so I can aggregate them in Feed and annotate them.
Question: what is the why of connecting a transcription to its facsimile? Is the model supporting this?
Dates are difficult to use
Representing dates like resources is ok.
Example:
http://data.dm2e.eu/data/timespan/onb/abo/1763-01-01T000000UG_1763-12-31T235959UG
But consuming this is very heavy as I need to get a full RDF resources with a lot of links that I do not need.
Would it be possible to include the date also as Literal in the RDF description of a e.g. persons?
problems with namespaces
here the ? indicates some broken namespace, I guess:
http://data.dm2e.eu/data/html/agent/onb/authority_gnd/100952798
[WARNING] is missing strongly recommended foobar
Please change [WARNING]s, which are based on recommendations, to [INFO].
That's much more friendly.
(Thanks @kba! Finished. And announce updates.)
has specific Problem: Aggregation contains both edm:isShownAt and edm:isShownBy.
First: there are no problems, there are only challenges!
What does it mean?
In the case of dta, edm:isShownAt, edm:isShownBy and edm:object point to different resources, according to the spec.
If there is a problem, the spec is too complex.
[FATAL] is missing required property <http://purl.org/dc/elements/1.1/format
[FATAL] is missing required property (Condition: Annotatable WebResource)
Not every WebResource is annotatable, e.g. the WebResource for a book points to a landing page. I am not sure, this validation is too strict.
I fixed it in my code anyway.
500 server error
no dc:format data ingested (fatal error)
Example:
http://data.dm2e.eu/data/html/resourcemap/uber/dingler/page_pj022_pb150/1393515961557
there is no dc:format for the annotatable resource.
missing format for the annotatable object
http://data.dm2e.eu/data/rdf/resourcemap/uib-ffm/sammlungen/urn:nbn🇩🇪hebis:30:2-11683-phys1971057/1390576653139
there is no dc:format scpecified for the Annotatable object!
Single pages without book?
CHOs are single pages? No CHO (boo)k exists that contains them.
It is correct?
How should I treat them?
Error when running validator
I am trying to run
java -jar dm2e-validate.jar -version "1.1_Rev1.4" -terse test.rdf
on a file successfully validated in http://www.w3.org/RDF/Validator/rdfval
and I only get the following error message:
! Jena croaked on file /media/sf_devel/test.rdf. Are you sure it is 'RDF/XML'. http://RELATIVE_URL/ Code: 11/LOWERCASE_PREFERRED in HOST: lowercase is preferred in this component
Please find the content of the file here http://wiki.dm2e.eu/wiki/images/5/51/Validate_Test.zip
Untyped literal in dcterms:created
dm2e-validation.jar NOTICEs:
has untyped literal for property .
The values in the file are 2014-01-09T23:16:06Z, which are valid xsd:dateTime.
DM2E-spec lists edm:TimeSpan, xsd:dateTime, rdf:Literal as possible ranges for dcterms.
Why is it NOTICEd? Or what is a better notation?
How to handle bibo:number for CHOs that don't have them?
bibo:number
is mandatory and has datatype xsd:int
.
When mapping encounters a page without a page number what should data providers set as a value here. They must set some value and it must be a number.
Probably -1
is a good idea though this will break the naive find-the-first-page algorithm.
Otherwise bibo:number
should not be mandatory.
Error in namespace parsing
Link: http://data.dm2e.eu/data/html/list
Question marks in place of URL prefixes usually denotes some mulfunctioning (e.g. ?:hasCollection)
Link: http://data-worker.dm2e.hu-berlin.de/data/html/resourcemap/bbaw/dta/16157/1391002599515
a namespace is broken : ?:author (http://purl.org/spar/pro/author)
dm2e:genre should not be a subtype of dc:description
BLOCKING: Missing Aggregation-to-Item connection
Example:
http://data.dm2e.eu/data/rdf/resourcemap/mpiwg/harriot/MPIWG_KECMZ5BH_00003/1393267718023
In this RDF I see no aggregatedCHO property connecting the Item (CHO) with the Aggregation. This is mandatory!
How to find the annotatable version and facsimile image?
If I encounter a CHO with dm2e:displayLevel "true"^^xsd:boolean
whose corresponding ore:Aggregation
has neither edm:isShownBy
, edm:object
nor dm2e:hasAnnotatableVersion
, what should be displayed as
- the annotatable version
- the thumbnail?
Should I skip those links altogether?
Should I select the first page for annotation/as thumbnail? How do I find the first page?
Missign edm:object
Example:
http://data.dm2e.eu/data/html/resourcemap/onb/abo/%2BZ103504606/1391106413338
there is no edm:object triple. This is needed to provide a preview of the book.
Isn't it mandatory in the mapping?
Broken resources
Remove fabio:Page
http://purl.org/spar/fabio/#Page is wrong, you should use dm2e:Page!
Use xsd:int for bibo:number, dm2e:levelOfHierarchy and bibo:numPages
- bibo:number
- dm2e:levelOfHierarchy
- bibo:numPages
dm2e:levelOfHierarchy should be in ore:Aggregation
dm2e:displayLevel is in the domain or ore:Aggregation; dm2e:levelOfHierarchy should be there as well
Cannot understand the correct version to check!
I cannot get what is the correct version of the dataset, some are empty some have messy data.
Please delete the old datasets if possible! Or clearly give a name to the version last version.
Why do crm:P79F.beginning_is_qualified_by and crm:P80F.end_is_qualified_by have xsd:dateTime as range?
missing bibo namespace ?
Example:
http://data.dm2e.eu/data/html/resourcemap/bbaw/dta/16168_f0004/1391002599515
in this RDF the URL http://purl.org/ontology/bibonumber seems wrong.
I guess the correct one is http://purl.org/ontology/bibo/number
Is the dingler dataset empty?
bibo:Issue is missing in the DM2E Model Specs as allowed PhysicalThing subtype
In "3.7 edm:PhysicalThing: Subclasses" there is no Class defined for an Issue of a serial work. This should be bibo:issue
I think.
Books without pages and author
http://141.20.126.232/data/html/resourcemap/onb/abo/%2BZ101315405/1391106413338
is this correct? : http://141.20.126.232/data/... shouldn't it be a domain name?
There is not author triple. Is it correct?
The rdf:type is Book but it does not contain pages. Is it correct? What should I annotate then?
Missing Book type...
http://data.dm2e.eu/data/html/resourcemap/uib/wab/Ms-114/1391118905391
there is no rdf:type that allows me to assume this is a Book and contains pages (perhaps Book was used as a type in other datasets).
dm2e:incipit & dm2e:explicit should be repeatable
Just came across such a case:
<dm2e:incipit>Prospero prudente constate felyce ... [Prolog]</dm2e:incipit>
<dm2e:incipit>Entonçes se apareja la cosa ...</dm2e:incipit>
<dm2e:explicit>... Fable la obra que el dezir se çierra. [Prolog]</dm2e:explicit>
<dm2e:explicit>... ny demandarles cosas non acostumbradas.</dm2e:explicit>
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.