The discovery from iiif

Look at first sentence of Intro, users don't use this spec

Link in readme.md to doesn't go anywhere

The link in the readme for 'Working Draft of an Import specification' links to:

https://github.com/IIIF/iiif.io/discovery/blob/master/source/api/image/0.1/index.md

Which doesn't exist. Where should it link to?

Should the Crawling spec mention the Discovery TSG more prominently?

The current draft at http://preview.iiif.io/api/discovery/api/discovery/0.1/ doesn't mention our technical spec group. I'm not sure whether there is an official policy about IIIF Spec (not) refering to their community context, but I think it could be helpful:

for giving context. In fact refering to the charter and the use cases can be a good way to motivate some choices without writing too much new text as motivations
for calling to action: rather than just giving a pointer to the mailing list, commenters could be invited to join our group - hopefully resulting in better community engagement. In a way this would extend what @azaroth42 started to do in the 'status warning' when hinting at current issues.
for attribution: if we want to trigger more groups like ours in the future, it's good that one can see them mentioned in what they helped building.

Basically we could look at how W3C specs do it. See the 'status of this document' at https://www.w3.org/TR/annotation-model/ (selected completely randomly of course ;-) )

Per-Collection discovery?

If a discovery service is to provide collection level access (e.g. search for manifests within a particular collection), then the inclusion of the manifest within the collection needs to be explicit. There are three ways to accomplish this, each of which has some ramifications on specs:

The Manifest is responsible for maintaining all of the Collections it is within. This is impossible, as the system is not closed and the Manifest publisher may not know all of the Collections.
The Collection is responsible for maintaining all of its Manifests ... which it already is, but we have only scoped discovery in terms of Manifests, not Collections. Should we have a separate stream for the changes made to Collections?
The activity of adding and removing Manifests from Collections is exposed directly. Again the question is where these activities are published.

1 seems out of the question. 2 and 3 seem equivalent, though 3 is more fine grained and doesn't require re-harvesting the Collection just to add a new member to it.

Design in-html representation of link to Presentation resource

e.g. RDFa, JSON-LD or microdata syntax within HTML to link from any web page to a Manifest or Collection that is related to the page. Clients can then follow the link and display the resource in a IIIF viewer.

See: https://groups.google.com/d/msg/iiif-discuss/JrmXLcfvWfk/6KfYrDiHJLAJ
And: https://developers.google.com/custom-search/docs/structured_data?hl=en

From IIIF/api#557

Improving introduction for Level 0

This issue addressed the action in call of 2018-02-07: Rob (or Antoine) will improve the introduction to level 0.

I'm hereby suggesting a small re-write of this introduction:

removing a '"just" which I felt minimized a bit much the effort asked for the people only interested in a basic crawling.
re-ordered the last paragraph to mention the other items in the charter, so as to bring more motivation for the extra complexity.

Here it is:

The basic information required, in order to provide a minimally effective set of links to Manifests to harvest is just the URIs of the Manifests. However, with the addition of a little boilerplate in the JSON, we can be on the path towards a robust set of metadata that allows clients to optimize their harvesting.

Starting with the Manifest links, we add an "Update" activity wrapper around the URIs. The order of the Manifests in the pages is unimportant, but each should only appear once.

In terms of optimization, Level 0 provides no additional benefit over any other simpler list format, but is compatible with a system where optimization is possible, such as the following levels, which bring significant improvements in terms of efficiency. This is also the minimum level for interoperability on the way to realizing a complete and homogeneous framework that addresses more items from the IIIF Discovery Technical Specification Group Charter [http://iiif.io/community/groups/discovery/charter/#introduction].

Document the cut/paste JSON structure

Based on #18.

Notifications about negotiable resources

The same resource URI might be an access point to both V2 and V3 resources, but only the V3 representation changes for a particular update. Is it important that this be possible to express?
If so, how would we do it? And how would the crawler know to request V3 via content negotiation at all?

[Per Andy Irving at Edinburgh]

Canonical URI for negotiable resources

Add a property (as per the usage in the W3C Web Annotation model) to expose the canonical uri for an object that might have a different URL (viz the Content Location as per the explicit recommendation in #46).

https://www.w3.org/TR/annotation-model/#other-identities

Recommendation on how to reference/describe an IIIF activity stream from a DCAT Distribution?

In #38 , it is under discussion the possibility to refer to DCAT descriptions of the dataset covered by an Activity Stream (AS).

If a IIIF provider creates a DCAT description of a IIIF dataset/AS, how should the IIIF Change Discovery API endpoint be referenced and described?
My first thought is that an AS is a dcat:Distribution of a dcat:Dataset.
Should the Discovery Group prepare a recommendation?

But, in fact, I'm currently convinced that whether or not DCAT descriptions are referenced from IIIF Activity Streams, it would always be relevant for general discovery to have a IIIF recommendation for DCAT descriptions of IIIF datasets.

Which timestamp to use as default for activities?

There are three timestamps in activitystreams:

published - the date and time the activity was published to the web
startTime - the date and time the activity began
endTime - the date and time the activity ended

The activity itself could occur at a very different time from when the Activity resource is published describing it, depending on the internal latency of the system. The algorithm for when to stop traversing back up the activity collection requires a consistent order to be used by both client and publisher. Published time seems to be the easiest to generate, sub-sorted by the endTime by the publisher, if known.

Activity Stream Update for item that has become restricted

What type of AS Activity type should be created when a IIIF Manifest goes from open access to some sort of restricted access?

use case - CONTENTdm has has a variety of different access restriction mechanisms. Items can be open access, restricted by IP range, restricted by login, or restricted by both IP address and login. Collection managers can modify these setting on a per collection basis through a configuration UI.

So what should the AS update be if an collection manager changes a collection from 'open' to 'IP range restricted' (for campus use only for example)?

Solutions:

Update: but this is a bit misleading because a crawler might see the Update and try to resolve the Manifest URI only to get a 403 response. But if a crawler is within the range (like a researcher or department on campus) it would be fine
Delete: Also misleading because if you still have access to the data (researcher/teacher or department on campus) then you would inadvertently throw out this item.

I think the real solution is not prescribing a pattern for AS but rather setting an expectation for what a harvester should expect.

Toronto Activity stream

Could you add Toronto's activity stream implementation:

https://iiif.library.utoronto.ca/presentation/v2/discovery

Permission from Rachel. Can probably go at the end of:

https://github.com/IIIF/discovery/blob/master/source/api/harvest/0.1/index.md

Import to Viewers

This is a ticket to gather together the discussions in Toronto. Related reading:

Notes from the Discovery session in Toronto: https://docs.google.com/document/d/1PVYzT1jRTE2jtwzOwyhOcg6Rf8CCdrJ-kP6sYOpV_j0/edit - Heading 'IIIF Import to viewers'
Related discussion on software dev channel: https://iiif.slackarchive.io/softwaredevs/page-10/ts-1508144299000347
Related discussion on discovery channel: https://iiif.slackarchive.io/discovery/page-9/ts-1507910636000184
Related UV issue: UniversalViewer/universalviewer#562

It looks like the discussion in Toronto ended with:

"Work out good cut and paste pattern, generalise to drag and drop"

Tom came up with the following annotation that could encode the state of a viewer in an annotation:

{
  "type": "Annotation",
  "motivation": "highlighting",
  "target": {
   {
	  "start": {
	    "type": "SpecificResource",
	    "source": {
	      "id": "https://example.org/iiif/id1/canvas1",
	      "type": "Canvas",
	      "within": {
		"id": "https://example.org/iiif/id1/manifest",
		"type": "Manifest"
	     }
	    },
	    "selector": {
	      "type": "PointSelector",
	      "x": 10,
	      "y": 10,
	      "t": 14.5
	  }
   },
   "generator": { 
        // UV state encoded here? 
    }
}

So questions I have is how do we transmit the above data from a page to a viewer? How do we support:

Copy and paste
Drag and drop
Other interactions?

Conflict between Prezi3 and AS2 `summary` property name

ActivityStreams has a summary field with very similar semantics to the proposed change from description to summary in the Presentation API, but takes a string not a languageMap. AS2 has a separate summaryMap property that is a language map.

According to the current ordering of the contexts, the AS2 summary would take precedence, and there are potentially other conflicts. [Aside -- a script that found such conflicts between context documents would be very valuable]

Options:

We don't use summary in the AS2 profile to allow it to be used in the resources
We don't use summary in the resources to allow it to be used in the AS2 activities
We have a wrapper context for discovery that scopes summary according to the type, and allow string summary on the activity and language map summary on the IIIF resource.
Don't use summary in Prezi3

Trigger full refresh from level 0?

From the discussion at the Washington conference:

Is there a way to tell harvester to trigger a complete refresh?

This would be very useful for "level 0" implementations that don't track anything more than current state.

Variations of usage of ActivityStreams' (ordered)items elements

As discussed on the IIIF TSG call 25 July 2018 (https://docs.google.com/document/d/1e2F3sJPG4rfMsvee-IvuWDIJqgLe0tP6aBattUmwBa4/), the IIIF Discovery spec at http://iiif.io/api/discovery/0.1/ uses the element 'orderedItems' from Activity Streams in line with the AS spec for ordered list of elements. But for what seems to be similar cases of ordered AS items in nearby specifications, the choice was to use the 'items' element. For example http://iiif.io/api/presentation/3.0/#55-annotation-page and https://www.w3.org/TR/annotation-model/#annotation-page .

The issue has been discussed by the Editors in a call mentioned at IIIF/api#1350 but the issue itself does not track the argument that was made then.

Flowchart for algorithm

Before merging

Various editorial issues and suggestions for current Discovery draft

On http://preview.iiif.io/api/discovery/api/discovery/0.1/, April 23 2018

Sec 2.2: the first parts of the list, once finished, can become static resources.
-> I feel this will mostly happen for the the case when there are no deleted resources, of if deleted resources are indicated as per level 2 (i.e. creation of new deletion activity for the resources, which will allow not to update the earlier ‘update’ activity for these). For level 1 and level 0 it is likely that the pages won’t be static if there are deletions. As the sentence is in the section for level 1, maybe this is worth flagging?

Sec 3: “The W3C Activity Streams specification defines a “model for representing potential and completed activities”, and is compatible with the design patterns established for IIIF APIs. It is defined in terms of JSON-LD, and can be seamlessly integrated with the existing IIIF APIs.” I feel this could be put earlier in the spec - AS has been used already quite a lot before. Maybe in the intro for section 2.

Sec 3: “Properties that the consuming application does not understand MUST be ignored.”
I’m really not sure what an implementer could do with this. Does it mean that we say stuff in the spec, but implementers (of publication systems) should adapt their work to what any consuming system may ingest or not?
This seems quite dangerous even, as a lot of the properties in section 3 are necessary for the organization of collections and pages, which seem to be mandatory (see #25)

Section 3.1 (but it also concerns 3.4 and other places about the ordering of activities):
“This might seem odd to implementers, without the context of the processing patterns expected.”
Should we add a note asking specific feedback on this, if it’s possible to be a controversial point in the spec? If we get no objection or if we successfully address the one we get, then we could highlight this in the next version, instead of having this mildly apologetic sentence in.

Section 3.3: typo “should be one of Create, Update, or Delete. z”

AS publishing granularity level

In harvesting AS data it has become apparent the knowing the level of granularity for the AS is very important but also very difficult to determine unless you are familiar with the data. We have seen instances where:

AS are created at the Organization level (with multiple collections encompassed in the stream)

2 AS streams created at the Collection level (where the Organization is implied based on knowing the data well).

This make it very difficult for an harvester/indexer to create a coherent shape for the data. One approach would be to assume Organization level AS and rely on a partOf relationship from the Manifest to the Collection descriptions but this is not specified in any of the IIIF API docs and we have not seen any such clues/references in any of the data we have harvested. Also in addition to a partOf link this approach would also require the creation and maintenance of Collection level descriptions so the harvester can generate a meaningful description (at least a label).

Descriptive properties for ActivityStream data

A list of AS properties that could be used to help aggregators/harvesters know determine if they are interested in crawling the AS.

Please note that all of the stringy properties are meant for harvesters/aggregators to ingest and build indexes around and are NOT intended for the IIIF Registry to index and make searchable.

Important (based on discussed issues)

attributedTo – organization that the AS is associated with
name – human readable label for AS (maybe Collection name or Organization name – not sure or very opinionated). Basically I want a string to index for searching (after an aggregator has parsed the AS).
summary – human readable text description for the AS. Again, just a bag of words to index for potential searching (after an aggregator has parsed the AS).
tag – list of ‘keywords’ or ‘subjects’ for the AS – connected to Controlled Vocabularies ideally since these are Objects that ‘require’ URIs.

Potential (not discussed but might be useful)

startTime – date the AS was first published (maybe use published?)
updated – date when the AS was last updated
generator – Thing that generated the AS – such as CONTENTdm – maybe more of interest if connected to a specific Activity in the AS??
audience - people interested in the AS

what to do when a IIIF Collection is deleted from an Activity Stream perspective?

this is a use-case most likely applicable to only vendors or aggregators.

When a Collection is permanently deleted or removed from a system or a customer leaves a vender that supports IIIF Change Discovery what should happen to the Activity Stream data?

It is counter-intuitive to delete the Activity Stream stream. Logically a bunch of 'Deletes' would be created BUT what should happen to the related 'dataset' descriptions linked to by the 'seeAlso' property? It seems odd that you would keep them since the Dataset technically would not exist any longer. So should the seeAlso property just be deleted from the Stream data?

I could not find a way to indicate that a DCAT or VoID Dataset not longer exists.

Collect "attribution" use cases from various perspectives

In order to better understand which metadata properties should be part of which documents (discovery, presentation, or external metadata), we need to understand the use cases for how those properties will be used, by which actors in the ecosystem.

This issue is to collect use cases around the relationship between organizations (or other actors) and the resources, in particular the AS Collection, AS Activity, IIIF Collection and IIIF Manifest.

Actors:

Aggregator: The organization that collects IIIF resources, and exposes a search engine for them
Content Publisher: The organization that publishes the IIIF resources
End User: The user of the Aggregator, to discovery the Resources, who then interacts with them.

Discuss @context in Change Discovery API docs

There's no mention of @context in the API documentation. This is due, in part, to the realization of the complexity of context management across multiple specifications that are linked, but should not be tied together: IIIF/api#1571

Given the ActivityStreams focus, the only entries a context needs beyond the default AS context, is the IIIF classes for the objects. However, we need to decide (per #12 etc.) what the scope of those classes is.

The documentation is, however, incomplete without some sort of decision on this.

Replace orderedItems by something type-specific?

Warning: this is a comment that's maybe better to address for future versions of the spec, post Washington, not now!

Section 3.3 of http://preview.iiif.io/api/discovery/api/discovery/0.1/ specifies “Activities must have the object property. The value must be a JSON object, with the id and type properties. “

One of the reasons for which the Discovery API pages are currently quite verbose is because we have to specify the type of the IIIF resource being the object of an Activity (e.g. Manifest), even though there might not be any other statement about that object. Considering that most of the time the resources on a page will be of one type, could we materialize the type into the property that contains the list of activities (e.g. to orderedManifestActivities), so that we save some complexity for the simplest cases?
This is especially interesting if as per the issues on additional properties and content indexing (#8, #12, #24) we decide to limit the number of manifest properties referenced in the Discovery API.

Hyperlink the date format, as per Prezi3

CODH Activity stream

In a similar fashion to #22, maybe another candidate for the list of "Ongoing Experimental Implementations".

Endpoint: https://mp.ex.nii.ac.jp/api/face/curation/as/collection.json
Implementation: https://github.com/IllDepence/JSONkeeper#activity-stream

Caveat: all the Create activities in the AS are for cr:Curation documents. Reference activities however have sc:Canvas as their object. Furthermore there are Offer activities that involve sc:Range (object) and sc:Manifest (target).

Named subsets & non-Presentation objects

The most pressing case we have for change discovery is to inform aggregators of changes within a particular subset of our total collection, usually because of some agreement that we share a particular collection (collection in this sense meaning objects of any one type, or objects belonging to a particular collector, or objects on a particular theme, or any other grouping of objects imaginable).

In this case we would want to use IIIF Discovery for notification solely about this subset, either in images or object metadata. However it is likely that not all of these objects would have public IIIF manifests with canvases, it may be only some would be made available in this way (although perhaps I could create stub manifests that have no canvases but contain a seeAlso to the object record?)

Broadly, this API would be extremely useful to keep aggregators in sync (which is a very convincing case to make internally), but I am trying to think how to do this step-by-step, without needing both sides to implement all the APIs.

Would the use of Discovery API for this purpose make sense and be appropriate, or would this weaken the connection to Presentation too much?. There is mention of "Activities describing changes to other resources", would I be better looking at that? Or is this really more something for ResourceSync ?

connecting different activity streams

If you plan to implement the Discovery API at both the organization level (i.e. here is a list of all the IIIF stuff my organization knows about) and at the collection level (i.e. here are the specific IIIF activities related to a specific collection), is there a way to connect the collection AS back to the organization AS?

as:partOf could be used to connect from the collection AS to the organization AS but there is no inverse in the AS ontology.

Discovery of changes for referenced resources

In order to build a robust search engine, it is necessary to know that referenced metadata resources, via seeAlso, have changed even if the Manifest has not changed itself. Equally, if a search engine can process images, it would need to know about changes to those images (or any other content!)

The current scope of the work is for IIIF resources (charter section 1a), rather than all referenced resources. While this is important, it's not within our remit right now.

The agreement on discovery call (2018-02-07) was to clarify the scope of the work to be only Manifests and Collections, and defer referenced until we have things better nailed down.

Wording on collection/pages organization

Warning: this is a comment that's maybe better to address for future versions of the spec, post Washington, not now!

Section 3.1 reads “The top-most resource for managing the lists of Activities is an Ordered Collection, broken up into Ordered Collection Pages. This is the same pattern that the Web Annotation model uses for Annotation Collections and Annotation Pages. The Collection does not directly contain any of the Activities, instead it refers to the first and last pages of the list.”
Sections 2.4 and 2.5 also prescribe the way activities should be gathered in pages and ordered collections.

Should we make stronger the wording of all this? It is rather clear for me, but one could say it remains implicit. It could be questions, whether this organization is mandatory, especially in corner cases where a collection of resources/activities would be small.

What properties from IIIF should be in the activity?

The activity MUST reference the IIIF resource that changed, but other than the URI which properties are useful to machines to process for discovery purposes?

id as it's needed to download the resource!
type as a very easy way to filter different types to different workflows in a merged stream of (eg #5) Collections and Manifests
seeAlso to avoid downloading the resource at all?
label, summary (#6) and metadata as the core content to index??
thumbnail to present to the user?

Are there any others that would be useful?

what to do about mass 'deletes' and 'creates'?

in CONTENTdm it is not unusual to have Collection Managers create an entire collection of items (potentially 10s of thousands of items), review the items, find a universal error, delete the entire collection and then re-create it with the fix applied.

This would result in a massive number of Activity Streams 'Deletes'/'Creates' or 'Updates'. Is there an expectation that the Activity Steam provider would account for these local practices and not publish activities real-time in events like this?

thanks @shuddles for pointing this out!

Relationship between various working drafts

Following #15 I've added a link to https://github.com/IIIF/discovery/blob/master/source/api/harvest/0.1/index.md from our README page. But with the creation of http://preview.iiif.io/api/discovery/api/discovery/0.1/. I'm wondering how we should work now: how are comments on the one supposed to make it through to another? Should we "kill" our current working draft and comment only on the gihub repo that preview.iiif.io is based on?

I'm assigning this to @azaroth42 as he's clearly the editor leading the work on Crawling/Harvesting, but I'm curious to hear opinions from other editors (@jpstroop @zimeon @tomcrane @mikeapp) as well as @mattmcgrattan and @glenrobson

Single create/delete requirement justified?

In the API we say that each resource must have at most one Create or Delete activity:
https://preview.iiif.io/api/discovery-0.3/api/discovery/0.3/#activities

But if you delete something, and then put another resource in its place with the same URI, then what? You would need to just file Updates after the Delete, which makes no sense.

Propose to change this to clarify that you can't Create, then Create, then Create, but you can Create, then Delete, then Create.

(From @zimeon review of 0.3)

Comments on Content State API 0.2 draft

Here are some comments on
https://preview.iiif.io/api/content-state-0.3/api/content-state/0.2/
Some editorial, some not...

In the intro, "open the found resource in a compatible environment" sounds too simple for a first explanation (in the first paragraph). I feel readers would be better introduced to the goal if the bit on "initialize the view of that resource in any client that implements this specification" (in the second paragraph) would be present in the very first paragraph.
In 1.1 "typically Manifests or Collections, or parts thereof." could be confusing in the sense that parts of collections are manifests (aren't they). Maybe "Manifests or parts or Collections thereof" (or whatever would be a grammatically correct expression of this!) would be less confusing for readers who are not sure of their mastery of the IIIF specs.
In 2 the transition between "A content state is a JSON-LD data structure [...]. Viewers built to those specifications will already understand at least some of these models." and "This specification shows how a content state [...]" is not great as the former is quite specific and the latter comes back to the general goals of the spec.
In fact I think that the bit that starts from "This specification shows how a content state [...]" and including the bullet list of use cases could be moved to the introduction. In 1.1 or maybe a new 1.x titled "Use Cases". Having this in the introduction would be the opportunity to add a sentence that reminds reader of the methodology we follow (i.e. based on use cases) and points to the set of IIIF Discovery use cases on github. By the way, "while load Manifest M" sounds a bit rough without any explanation of this case earlier. Maybe the suggestions above would create a context that makes this a bit smoother for the reader.
In 2 "Such a description" would read better if something had been called a description earlier in the section (currently only 'data' is mentioned). Note that as per the re-structuring suggested in point 3, the new section 2 text could be
[
Descriptions of content states to be exchanged according to this specification are expressed as annotations that target the intended part of a IIIF resource.
A content state is a JSON-LD data structure that uses the models described by the IIIF Presentation API and W3C Web Annotation Data Model specifications.
]
Still in 2, "Viewers built to those specifications will already understand at least some of these models" sounds like it's missing something like "according" before "to". And it also seems a bit tautological.
In 2.2 I am a bit puzzled by the first introduction of contentState as a motivation. An explanation of the example below may help readers figure out why this motivation is chosen, without having to go back to earlier parts (and it seems good anyway to have a brief description of such an example).
In 2.2.4 I don't understand "this form is not capable of expressing content states that are part of a IIIF resource, such as a Canvas within a Manifest, or a region of a Canvas."
Canvas can have URIs and it should be possible to use them as target URL, shouldn't it?
Or is the pattern of 2.2.4 expected to be used only with manifest URLs? In this case the constraint should be made explicit in the text.
In 2.3. it would be great to have some examples with the different protocols - at least have a forward reference to what is shown in section 3. And the query string parameter is mentioned in the sentence "It is recommended that when passing the content state as JSON-LD in a query string parameter" without having been introduced properly before (with this term).
In fact maybe my call for examples comes from a misunderstanding of what the section 2 is about based on a too ambitious title for section 2. "Content State" hints that the section is going to include all that is needed to understand what a content state is, while section 2 is rather an overview of what the spec uses, and how it fits together.
The various questions in section 3 (as well as the entire 3.2 which seems about questions) could be flagged more visibly, from an editorial standpoint.

improve level 1 summary description

In: https://iiif.io/api/discovery/0.3/#listing-resources-and-their-changes

Level 1 adds timestamps and ordering from earliest change to most recent, allowing the consumer to stop processing once it encounters a change that it has already processed

is a bit confusing because while the pages go from earliest to most recent, the processing is in reverse. If we're to talk about processing, then we should also mention that order.

/ht @mixterj

Limited item notifications

Use case discussed on the 31st Oct call and is related to #42.

As a researcher I am interested in a couple of items from institution x (e.g. 2 Manuscripts by a certain author or 5 items on a particular subject).

If institution x publishes their activities for their entire collection, would I be forced to go through all items to find out if the 2 items I am interested in have changed? This assumes the researcher can't ask institution x to create a special activity stream which only contains the items they are interested in.

Section 3.1 should be OrderedCollection as title

for consistency and disambiguating with iiif:Collection

Can't express AnnotationPage/Collection in non scoped context

AnnotationCollection and AnnotationPage are just as:OrderedCollections under the JSON-LD hood. If we add these aliases at the global level, then it is random which term will be used for both the top AS Collection and a referenced AnnotationCollection, and equivalent for Page.

We can't include them as scoped contexts without overriding the ActivityStream context. At the moment, AS is the last context in the list and hence cannot be overridden.

Similar to api#1710, we could have a single discovery context and import the other contexts into it rather than including them in the instance data.

Algorithm issues

Number the steps to make it easier to refer to
Add already encountered state check to page algorithm ; which needs to work across pages
Make seeAlso the primary and metadata from the manifest a fallback if there isn't any, or it's not understood

Look at other examples of algorithms in specs for how best to present these

Indexing Manifest seeAlso metadata

Using AS data to harvest and index Manifests works well due to the reliable structure of the Manifest but the seeAlso metadata is very problematic due to the almost unlimited types/profiles/models that the metadata can take. This makes it very difficult to create a meaningful index record for Manifests.

One solution to this would be to just care about the seeAlso metadata formats you care about but this can still be problematic due to the data model used by the seeAlso metadata.

Situation A) Yes! I love JSON-ld - but I have no idea how to parse and index BIBFRAME JSON-ld...

Another option is to build/spec profiles for all potential seeAlso metadata types. This seems like a pipe dream to me

Situation B) Here is a laundry lists of profiles to parse seeAlso metadata - but you don't have the one I need...

A rather draconian option would be for the Discovery group to make a recommendation on what type/format/shape the seeAlso metadata should take.

Situation C) You say I SHOULD use this type of seeAlso metadata - but that does not work for my data...

I know this goes against the Charter but as we continue to pursue the AS harvesting this will continue to be a headache - especially as we start to investigate the more complex harvesting usecases outlined in the June 13 Discovery Working Group notes.

Comments on Discovery 0.3

Here are some comments on https://iiif.io/api/discovery/0.3/ Mostly editorial!

Place of content indexing in the 0.1 Discovery spec

Warning: this is a comment that's maybe better to address for future versions of the spec, post Washington, not now!

In version 0.1 of the Discovery spec
http://preview.iiif.io/api/discovery/api/discovery/0.1/
the indexing aspect is covered through recommendations to exploit the seeAlso of the Presentation manifests, without further recommendation for specific patterns of using this property. This is in the intro and the processing algorithm.

In earlier versions of the spec, it was suggested to add the seeAlso in the Discovery API JSON, as a property for the Manifest. As per #8 we decided not to include seeAlso now, but I don't feel it was a permanent decision. The same way as #12 is marked as defered, not closed.

So I feel that the current draft could still be re-worked for addressing item 2 (content indexing) of our charter http://iiif.io/community/groups/discovery/charter/ in the future.

Should it be illegal to illegal to have a Create -> Delete -> Create a resource?

The current draft (0.3) says that there may be only one Create or Delete for a resource. I wonder whether this is a useful/necessary restriction on AS

Various editorial issues and suggestions for current crawling draft

Scoping remarks already discussed in call of 2018-02-07, but I'm repeating in case reference to specific parts of text would help*

The title is very generic: ("IIIF Resource Discovery")
The intro mentions "indexing" at the same level as "crawling"
The Approach mentions "the IIID Discovery charter and use cases" in their entirety.

new remarks and suggestions

Approach section mentions "notification of changes" in the bullet list about ResourceSync, but there's no real notification in the current spec, so the bullet could go away.
"follow the best practices defined in the W3C" reads quite vague. And while AS do follow many best practices, they don't follow all of them (for example, re-using existing W3C specs like PROV instead of creating an entire vocabulary from scratch)
We could add "This specification covers the listing of core IIIF resources, namely IIIF manifests and IIIF collections" as the first sentence of "List of Resources" section to make it clear which "IIIF resources we're focusing on. By the way I'm adding "IIIF" in front of them so as to distinguish with AS2 Collection which would come just afterwards.
Names of instances (in the RDF sense) sound a bit generic, like https://example.org/iiif/discovery.json (which hints to cover every discovery need - it may eventually, but not right now!) or http://example.org/museum (which sounds like a type/class name). Could they be a bit more specific, e.g. https://example.org/iiif/updateCollection.json or http://example.org/myMuseum
“to see if it has changed” could be removed - or downplayed - in the “Usage” of Level 0: in the most basic scenario, one would probably harvest all the manifests without checking if they’ve changed.
In the intro for Level 1 “the first pages, once finished, become static resources” could be reworded as “the first pages, once finished, can become static resources”. The current wording hints that it would always be the case. It shouldn’t hold in the scenario where a publisher decides to have a manifest appear only once with its most recent update, should it?
Is there a specific reason why usage of Level 1b is worded differently from the usage of level 1? I don’t see much difference in the message. If it’s the same message, it should be the same sentence (though it’s not very exciting...)
Can there be a way to highlight in the examples what’s being added at every level? As they build on top of each other, it would be a powerful aid to understand the whole story. NB: obviously it is a nice-to-have, and may have to be dropped in case it would be considered bad for accessibility.
The beginning of the “Processing Algorithm” section is heavily focused on indexing. But this is only of the scenarios (although a key one for us). There might be harvesters out there that just want to display images, perhaps do the filtering themselves, and not necessarily “build an index of the resources to allow them to be discovered” in the way an aggregator like Europeana would do.
The Target Resource Algorithm is probably tuned too much for indexing (a set of SHOULDs within a SHOULD) if we intend to make the scope of the document less centered on this goal.

add terms/use rights to an Change Discovery feed

Is there a way to add terms, conditions, or rights to the Change Discovery API feed, outside of the rights associated with the items referenced in the feed?

For example adding a 'non-commercial use' license to Change Discovery API feed even if the items in it are CC0 or something similar.

Which Activity Types do we care about?

In the current discussion document for crawling we list the following core types:

Update -- the most important!
Create -- to distinguish the initial update from subsequent
Delete -- to allow removing already crawled resources from a discovery system

And the following additional publisher activities:

Add -- The (e.g.) Manifest was added to a Collection
Remove -- Opposite of Add
Copy -- The Manifest was duplicated from an existing one
Move -- The Manifest was moved/renamed from a previous URI
Merge -- Two or more Manifests that were separate were merged to form a new Manifest
Split -- A Manifest was split into two or more new Manifests

And third party activities:

Reference -- A link was created to the Manifest by the party
Use -- The Manifest was used by the party in some way
Replace -- The party replaced the manifest with another one (and would like the publisher to incorporate those changes)

Which of the non-core activity types do we care about, if any?

Do we need a Dataset resource?

From #36, #37, @aisaac, Stanford discussions, etc ...

Do we need separate resources for the datasets that underlie the IIIF resources, and/or for the IIIF resources themselves?

The entities that we have currently:

The AS Collection is an ordered list of the Activities that have been carried out. This is a dataset, but of activities, not the resources themselves, IIIF or otherwise.
The Activities, as entries in the AS "dataset"
IIIF Manifests, as the objects of the activities (typically), and can be part of Collections.
IIIF Collections, that Manifests can be part of, for presentation/navigation purposes. These could be considered a "dataset" ... but the intentional vagueness and the presentation motivation for their existence makes them a bad candidate to associate metadata with... the same way that the Manifest is a bad candidate for associating item level metadata with.

Similarly to the manifest having a seeAlso to a metadata record, either the AS Collection or the IIIF Collection could have a seeAlso to a dataset description.

Having a definition of what the Dataset would represent seems important. Is it only those resources available by IIIF, or is the scope defined by the institution? For example, if the organization publishes manifests for 30,000 of its 250,000 objects, is the size of the dataset 30k or 250k?

Editorial work for harvesting

This communication takes place via the IIIF Discovery API, which is based on the W3C Activity Streams specification for the JSON-LD details, and the ResourceSync framework for the more abstract communication patterns.

is unclear. Fix

iiif / discovery Goto Github PK

discovery's People

Contributors

Stargazers

Watchers

Forkers

discovery's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs