Linked Media Formats

Intro: What Even IS Linked Open Data?

Getting started with Linked Open Data (LOD) can be overwhelming. At its core, LOD is a way of connecting elements of metadata records (Title, Creator, and so on) to accepted definitions that can be used and reused by anyone. Many of these collected definitions (or taxonomies–think of it like a dictionary) are community-sourced and accept additions and edits from community members worldwide. That's incredible! Instead of everyone making their own dictionaries, we can share one and agree together where the definitions live.

For example, we can add "Alfred Hitchcock" as a creator to our record for our 16mm print of The Lady Vanishes. Our pals in Russia can add "Хичкок, Альфред" to their local record, and since we both point to the same identifier for Hitch (the URI hosted by WikiData) we all know who we are talking about. There is less confusion, less room for individual human error ("Afred Hichcok"?), and a shared responsibility for descriptive cataloging labor. This concept of authority control has been around in libraries for over a century, but the Linked part of Linked Open Data means that the structure of each record is defined in terms of its relationships to other defined terms, using terms that can be read and recognized by computers in addition to humans. This is key and why LOD is such a powerful way of describing things like audiovisual works.

Each element of an LOD record is structured in terms of Subject-Predicate-Object (Thing A - has some relationship to - Thing B). This relationship is usually called a "triple." For example, every piece of the WikiData record that describes Alfred Hitchcock is linked to other definitions within WikiData (and also to external data sources!). In the Subject-Predicte-Object model, you can say Alfred Hitchcock - has the occupation - Film director. This relationship links to the definition of Film director, where you can read about what a Film director is, what they do, and also see the other records for directors that are linked here.

To get more granular, you can think about a 16mm film print (WikiData id Q194383) of The Lady Vanishes that is black and white (WikiData id Q838368), and compare it to a VHS (WikiData id Q183976) copy that is colorized (Library of Congress Subject sh88002478). Both of these will share elements related to the production (Title, Director, Release Year), but will differ in fields related to format, condition, and so on.

This example also hints at one of the missing elements of LOD as it could be used by audiovisual archivists to describe formats. The record for "colorization of motion pictures" above is linked to a Subject taxonomy, so we referenced "colorization as the subject" of a work, but not explicitly a definition of colorization as a physical characteristic of an audiovisual work. This is one large to-do for AV archivists interested in LOD as a tool for describing our collections–we are missing taxonomy entries that are explicitly relevant to AV archives. For example, there is no entry in WikiData for "vinegar syndrome," or for the tape based errors in the AV Artifact Atlas like "video dropout."

Our hope here is to explore the implications for LOD for AV media description, and to compile some resources and examples that can help AV archivists get started. One accessible starting place is to think about linking existing catalog records with external data sources and vocabularies. Connecting existing records to the existing rich ecosystem of linked data is a step towards making our cataloging that much more useful to users. Another step is to update taxonomies like WikiData with records for terms specific to our field (the Open part of LOD)

OK How Does It Work?

What does LOD metadata look like? The most basic answer is that the common way to represent LOD as machine and human readable text is with JSON (JavaScript Object Notation), which is essentially a list of key:value pairs enclosed in curly brackets {}. This kind of intimidating looking format can then be read and interpreted by different display systems (like the WikiData website, or IMDb.com!) to be a little easier on the eyes.

One widely used and well-documented structured data format is schema.org, which includes fields for LOD. Here is their metadata example for a Movie in JSON-LD format ("JSON-Linked Data" 😉), with linked data fields in bold:

{
     "@type": "Movie",
     "@id": "https://www.wikidata.org/wiki/Q836821" 
     "name": "The Hitchhiker's Guide to the Galaxy",
     "titleEIDR": "10.5240/B752-5B47-DBBE-E5D4-5A3F-N",
     "disambiguatingDescription": "VUDU version",
}

Sources of Identifiers

As seen in the schema.org example above, WikiData and the Entertainment Identifier Registry (EIDR) are two stable sources of LOD identifiers. In fact, these are the two recommended sources from the FIAF Linked Open Data Task Force.

You can also search for other LOD vocabulary and authority sources on BARTOC, for example those related to "film".

What is @id Doing?

Linking to external identifiers is at the heart of Linked Data. In JSON-LD, this is often done in the @id field. You can see the distinction below:

A LOD resource as value:

{
  "landingPage": {
    "@id": "http://www.europeana.eu/portal/record/09102/_CM_0839888.html"
  },
  ...
}

A string literal as value:

{
  "creator": "Europeana",
   ...
}

(source: Europeana)

Schemas and Ontologies in JSON-LD

It's possible to combine fields from different schemas and ontologies using the JSON-LD @context part. For example, here we know that fields from Dublin Core and the European Data Model (EDM) will be used to describe our item. Any field prefixed with dc, such as dc:creator, we know refers for the Dublin Core ontology.

{
  "@context": {
    "edm": "http://www.europeana.eu/schemas/edm/",
    "dc": "http://purl.org/dc/elements/1.1/",
  },
 "@graph": [{
    ...
    "dc:creator": "AMIA Open Source",
    }]
 }

LOD for Media Formats

We noticed that most writing on LOD for film and media focus on title and name authorities. We were curious about the implications for LOD for media formats and compiled the beginning of a list for reference.

Film

PBCore provides LOD for gauged film formats in its instantiationPhysical Film Vocabulary.
- [8mm, 9.5mm, Super 8mm, 16mm, Super 16mm, 22mm, 28mm, 35mm, 70mm]
Getty AAT provides LOD for members of its <size for photographic film> term:
- [8mm, 16mm, Super 16mm, 35mm, 65mm, 70mm]
Wikidata has entries for several sizes:
- [8mm, 16mm, Super 16mm]

Video

PBCore provides LOD for physical video formats in its instantiationPhysical Video Vocabulary.
- [Videocassette, Open reel videotape, Optical video disc, 1 inch videotape, 1/2 inch videotape, 1/4 inch videotape, 2 inch videotape, Betacam, Betacam SX, Betamax, Blu-ray disc, Catrivision, D1, D2, D3, D5, D6, D9, DCT, Digital Betacam, Digital8, DV, DVCAM, DVCPRO, DVD, EIAJ, EVD, HDCAM, HDV, Hi8, LaserDisc, MII, MiniDV, Super Video CD, U-matic, Universal Media Disc, V-Cord, VHS, Video8, VX]
Getty AAT provides LOD, but further research is needed to identify the appropriate Guide Term(s) and Object(s) related to physical video.
Wikidata has entries for several physical video formats and carriers, including:
- [Betacam, Digital Betacam, Betacam SP, videotape, VHS, DVD, Laserdisc, open reel videotape, U-matic, Betamax, Blu-ray Disc, Digital Betacam Digital8, DV, DVCAM, HDCAM, 8 mm video format, MiniDV, Hi8]

Example Record

Here's a kitchen sink example of a JSON-LD record, combining what we've outlined above!

{
  "@context": {
    "sch": "https://schema.org/",
    "edm": "http://www.europeana.eu/schemas/edm/",
    "pbc": "http://pbcore.org/pbcore-controlled-vocabularies/",
    "custom": "http://myinstitution.com"
  },
  "@graph": [
    {
      "@id": "https://www.wikidata.org/wiki/Q202548",
      "@type": "sch:Movie",
      "sch:name": "Vertigo",
      "sch:datePublished": "1958",
      "sch:director": {
        "@id": "https://www.wikidata.org/wiki/Q7374"
      },
      "sch:titleEIDR": {
        "@id": "10.5240/39FE-B96B-01BE-453E-64D7-E"
      },
      "sch:sameAs": {
        "@id": "https://www.imdb.com/title/tt0052357"
      },
      "sch:workExample": {
        "@type": "sch:Movie",
        "sch:editEIDR": {
          "@id": "10.5240/BDF9-F812-FC30-3EAF-AEB3-O",
          "sch:disambiguatingDescription": "35mm Theatrical Print"
        },
        "sch:duration": "PT2H8M",
        "pbc:instantiationPhysical": {
          "@id": "http://pbcore.org/pbcore-controlled-vocabularies/instantiationphysical-film-vocabulary/#35mmFilm"
        },
        "custom:gauge": {
          "@id": "https://www.wikidata.org/wiki/Q226528"
        }
      }
    },
    {
      "@id": "http://semium.org/time/1958",
      "@type": "edm:TimeSpan"
    }
  ]
}

Explanation of example record coming soon..

Glossary

Here are some terms that are used with Linked Open Data. More definitions will come soon!

Identifier
- This is the unique identifier used within a system (like WikiData) that points to the thing being described. In the examples above, within the WikiData universe, the identifier "Q7374" means "the Alfred Hitchcock who was a film director and lived from 1899-1980."
  - This concept is related to URI below, as it is usually part of the full link to the item within a given system.
JSON
- "JavaScript Object Notation"
  - A very commonly used way of representing information in Key:value pairs, for example {"creator":"Alfred Hitchcock"}. It is comparable to data structures like XML, in that it is readable both by humans and machines.
  - Wikipedia page
JSON-LD
- A standardized way of using JSON to represent linked data.
  - Wikipedia page
Linked Open Data
- Structured data that allows linkages between described people, things, concepts, and so on. It is "open" as it is generally open to community authorship and editing. Often represented in
  - Wikipedia page
Ontology
- Wikipedia page
RDF
- "Resource Description Framework"
  - This is a data model that defines how to describe the relationships between things or concepts. In the examples above, the Subject-Predicate-Object relationship is defined by RDF. FOAF ("Friend of a Friend") is a common implementation of RDF that can be used to define things like the Predicate "is the creator of" in the statement "Alfred Hitchcock is the creator of Vertigo."
  - Wikipedia page
schema.org
SPARQL
- This is a "query language" similar to SQL that actually does the linking between linked records. It uses the "Subject-Predicate-Object" structure of linked data (specifically in the Resource Description Framework data structure). It is a powerful way of searching to link records using LOD.
  - Wikipedia page
Taxonomy
Triple
The "Subject-Predicate-Object" is the semantic relationship described by LOD, and is referred to as a "triple." In order to make a valid statement in LOD, you need to have each member of the triple. Spelling the triple out in a plain English statement can help conceptualize this: "Alfred Hitchcock (Subject) is the director of (Predicate) Vertigo (Object)"
Wikipedia page
WikiData
URI
- Stands for "Uniform Resource Identifier." Ultimately this can be any string of characters that uniquely identify a given thing or concept. In practice, this is often a URL, "Uniform Resource Locator," so that you can both identify a thing and get to its definition within a given taxonomy.
  - Wikipedia page

More Resources

Here are some resources that give more background information about linked data, and specifically LOD within AV archives. (More to come!)

From FIAF: "Cataloguing Practices in the Age of Linked Open Data: Wikidata and Wikibase for Film Archives"(https://www.fiafnet.org/pages/E-Resources/Cataloguing-Practices-Linked-Open-Data.html#_ftnref13)

Future Work

We aspire to act in alignment with the FIAF LOD Task Force's priorities by lowering the bar to engaging with linked data. With community input, we can put together more resources or tutorials based on need and interest.

A missing resource on the web seems to be example records showing what an "end user" archivist would see in a cataloging system that uses LOD. We hope to add a number of mockups above.

amiaopensource / linked-media-formats Goto Github PK