w3c / webmediaguidelines Goto Github PK

Web Media applications developer guidelines

Home Page: https://w3c.github.io/webmediaguidelines/

HTML 100.00%

webmediaguidelines's Introduction

Web Media Application Developer Guidelines

This repository contains the Web Media Application Developer Guidelines specification, developed by the Web Media API Community Group.

Deprecation Notice

These guidelines were deprecated in 2023. If you are using these guidelines and would like the development of them to be restarted, please open an issue.

webmediaguidelines's People

Contributors

Stargazers

Watchers

Forkers

boyofgreen edwardmhunton castlabs andybeach azeng-jwp tidoust epiclabsdash bhanditz liucy1898 isabella232 solder128

webmediaguidelines's Issues

Section 1 Feedback

I have a few suggestions for section 1 of the document.

In Scope, suggest changing "(applications that has been developed for use on a particular platform or device)" to "(applications that have been developed for use on a particular platform or device)"
In Glossary of Terms, suggest changing "In order to provide a common language to build this document and communicate concepts to the end reader, we are providing a Glossary of Terms to this guideline document." to something like "The following lexicon provides the common language used to build this document and communicate concepts to the end reader."
In Glossary of Terms: Codec, suggest rewording "A codec is the algorithm used to capture analog video or audio in digital form." In our context, a codec is an algorithm defining the compression and decompression of digital video or audio, not a way to capture analog A/V.
In Glossary of Terms: Bit Rate, suggest changing "and can be constant and variable" to "and can be constant or variable."
In Glossary of Terms: EME, because of the recent announcement, please update "Encrypted Media Extensions (EME) is a proposed W3C specification" to "Encrypted Media Extensions (EME) is a recommended W3C specification".
In Glossary of Terms: Format, suggest removing the comma right after "container format".
In Glossary of Terms: manifest, suggest capitalizing manifest to match all the other terms.
In Glossary of Terms: Rendition, please change "A specific video and audio stream for a target quality level or bitrate in a adaptive streaming set or manifest" to "A specific video and audio stream for a target quality level or bitrate in an adaptive streaming set or manifest"
In Glossary of Terms: Self-hosted player, suggest using camel case for the term to match the rest of the glossary, i.e. "Self-Hosted Player." Additionally, the "Note: we do not recommend this as the player updates every 4-6 weeks." sentence should be removed because it reads like marketing material.
In Glossary of Terms: Streaming, suggest change "Streaming of media content is typically the delivery of media assets over Internet protocols such as HTTP or RTMP." to something like "Streaming of media content is typically the delivery of media assets using Internet protocols such as HTTP or RTMP." Removing "typically" might be useful, since TCP/IP streaming is our focus.
In Glossary of Terms: Encoder and Transcoder, I don't consider encoding a synonym for transcoding. I think transcoding is the processes of converting from one encoded format to another. Encoding is closely aligned with the existing description.
In Glossary of Terms: URL Signing, I'd suggest removing the entire last sentence. It is currently awkwardly written. "This ensures that only links generated by the customer can be used to access their content and that if those links are used elsewhere, the use will expire at the expiration time." It could also be reworded.

Section 2.3 - Live Content Playback

This section should follow the basic structure already described in 2.2 and highlight the following:

Content-Generation and the differences compared to VoD content
Content-Delivery for Live content
Content-Playback: How to Manifest updates work and how is/can the currentTime be reported by the player

The following use-cases might be interesting to highlight:

Reduce live edge latency
Server side Ad insertion

New Section: Glossary

I am drafting a new section for a glossary of terms so that we can conform our nomenclature internally between the different sections and provide a quick reference for users.

Remove "2018" from title.

As discussed in the Steering Committee last week, we should remove "2018" from the title of the developer spec, for a few reasons:

This is not a spec that needs to be updated every year, like the API Snapshot spec does.
It's taken a lot more time to complete this spec than originally expected and we are unlikely to publish an updated version in a year.
The prominent "2018" date will make the spec seem out of date very soon after publishing in December, when in less than a month it will be 2019!

This reverses #48.

Describe live streaming resilience issues

A common problem I see with live streaming in players is that missing data tends to completely crash the player and stop playback.

This can happen more easily in low latency scenarios, where a segment may need to be dropped due to temporary buffer fullness or network glitches. In such a scenario, it may be that segment N of a particular quality level is missing, while segment N of other quality levels is present. In some cases, segment N might be missing from all quality levels.

Different players have different approaches here - some just "jump the gap", whereas others might try another quality level. Assuming they can even tolerate such a failure (many treat it as a fatal error and just stop!).

I believe this is a common issue important enough to outline in the 2.11.4 "Potential Issues" chapter.

Respec Issues

Three ReSpec warnings are generated when the document is parsed:

Process 2015 has been superceded by Process 2017.
Insecure URLs are not allowed in respecConfig. Please change the following properties to 'https://': edDraftURI. ("no-http-props" x 1)
All sections must start with a h2-6 element. Add a h2-6 to the offending section or use a
. See developer console. ("no-headingless-sections" x 1)

Couldn't see this reported elsewhere - please close if I missed something.

Create a dedicated Section for Ad-Insertion

As mentioned in #19, we should maybe consolidate all Ad-Insertion related topics in one section?

DRM in "Content Playback" section

The steps assume DRM'd content. I'm not sure how relevant DRM here since the section is explaining general ABR playback principals.

Also, technically it is not the player that performs license key request creation and content decryption, it is the CDM.

Change Jeff Burtoft to Former Editor

I understand from an email thread that Jeff Burtoft is no longer an editor. Please update the Editors section with a note that Jeff is a "Former Editor".

I understand the Respect syntax is shown here

Update ReSpec metadata to match WMAS2018

I updated the ReSpec metadata in PR #103 to fix errors and match WMAS2018 ReSpec metadata:

deleted all author mailto: URLs because they get out of date.
added formerEditors variable for Jeff Burtoft instead of writing "(former editor)".
deleted processVersion variable because it's unnecessary and discouraged.
deleted edDraftURI because it's not needed.
changed shortName from default "dahut" to "WMADG". Authors are welcome to change it to another short name if desired.
added copyrightStart of 2017 to get full range of copyright dates in copyright statement.
added additionalCopyrightHolders to add CTA to copyright statement
added logos to add CTA logo (Also added small style section to workaround ReSpec bug: w3c/tr-design#170 as in WMAS2018 issue w3c/webmediaapi#188).

Section 2.4 Content Playback, item #6

Item 6 says

It then adds it to the player's video buffer.

Since this entire section is about the player behavior, it may be better just to say
It then adds the decrypted segment to the video buffer.

Spec and references need to be marked as non-normative

The Guidelines spec and all references need to be marked as non-normative.

I've sent an email, cc'ing editors, asking @tidoust for the best way to do this with ReSpec.

I also noted one erroneous anchor tag in section 3.3.2, paragraph 3: “[[<a>HLS</a>]]” should be “[[HLS]]”

Formal reference to DASH-IF Guidelines

Section 2.12 identified the thumbnail_tile as being defined in v3,0 of the DASH-IF guidelines.
Annex A refers to version 3.0 of the DASH-IF Guidelines.

This functionality was first published in v4.1 of the DASH-IF Guidelines.

The most recent version of the DASH-IF Guidelines is v4.2 (https://dashif.org/wp-content/uploads/2018/04/DASH-IF-IOP-v4.2-clean.pdf)

secion 5 for Application Structure

I'll be starting a new section that addressing the application structure to include browser apps, PWAs, Hosted Web Apps, Chrome Embedded Framework etc. My target will be to deliver v1 for first week of august.
Replay if you have any ideas for content for this section that I can write up, but basic structure as I see it now:
Web App Structure
-Intro
-Web App Content management approaches

Hosted
Packaged
Web App Runtime Environments
- Browser
- App Store containers
- CEF
  Progressive Web Apps
- Requirements
- Use Cases
- Advantages for Media Applications
- Avaliability

Guidelines for 'emsg' in Type 3 players

In the November 7 WAVE HATF call, it was suggested that this action item from October 2nd should have a Github Issue to track it:

"HATF Developer Guidelines spec to add guidelines for Type 3 players on extracting, processing, expressing and acting on emsg boxes with guidance for implementation."

See this WAVE thread for some background: https://standards.cta.tech/apps/org/workgroup/wtwg/email/archives/201709/msg00027.html

Add "2017" to title

Update title of guidelines to include "2017" to match other HATF spec

Server Side Ad Insertion - Events Triggering

Where the player usually receives dedicated events in CSAI such as 'adStarted' or 'adEnded', Server-side insertions require a different setup to trigger such events. Typically, in-band notification are used to trigger time based events. For example, SCTE-35 markers are a common format to insert time based meta-data into the feed. These markers are read and interpreted by the player and can be used to trigger events or carry additional meta-data such as callback URLs.

Don't understand very well some of the context here. SSAI solutions that I have evaluated use SCTE-35 markers but just in the server side and as a way of triggering the insertion of ads. This ad insertion mechanism should be transparent to the player and those markers don't need to be interpreted by the player.

Given said this, what I have seen is SSAI solutions, based on markers read in the server side and the ads inserted, are adding information to the stream (ex: Google DAI solution signals ads inserted in the HLS m3u8) that I guess are used for tracking/analytics purposes (player reporting when it is playing an ad, ad playback progress, etc).

Client Side Ads Insertion - Live Streaming - In-Stream metadata

One more point that I would like to bring to the table regarding CSAI and live streaming based on my experience.

Regarding live scenario (3.3.1), the most common use case I have seen is not related with using websockets but with using in-stream metadata (ex: using ID3 in HLS). This allows to trigger ads in all users at the same time and synced with the content.

Typically broadcasters use SCTE-104/35 in their contribution workflows (SDI or transport stream based) to signal ads. When preparing the content to be distributed through Internet these SCTE-104/35 signals are usually "converted" to ID3/EMSG messages for HLS/DASH respectively. This keeps the media workflow for live ads insertion easy (SCTE to ID3/EMSG conversion is supported by most of encoders/media servers used by live content generators), is easy to consume from media players (most of players support parsing ID3/EMSG metadata), ensures synchronization with the content and avoids the need of using alternative communication channels like websocket that add complexity to the solution.

3.2 seems oddly Widevine specific

The section mentions Widevine rather a lot and seems to base much of its content on how Widevine works. It should be worded in a more DRM-neutral fashion and be refactored to be more universal.

Charts/graphs/visual aids for guidelines

I think we could benefit from some visual aids to accompany some of the sections. Specifically when talking about content workflows regarding content delivery and adaptive streaming, I find it useful to include a visual aid. Do you or does your company have such visual aids that already exist that we could use?

Specifically, sections 2.4, 2.5, and 2.7 I believe would greatly benefit from this.

Here's an example of the style of chart I'm referring to (ignore the content of it):

Merge sections about Thumbnails

We should merge sections about thumbnails and do not explicitly differentiate between live and VoD use cases.
The only difference from the players perspective is how to get updates in case of live and it is not clear yet how this will be done in HLS or DASH.

Section 2.4 Content Playback, item #8

The last sentence of item 8 says higher bitrate child manifest

This may be the case for some manifest solutions, but for DASH all the different bitrates are represented in the same AdaptationSet, thus there is no requirement to retrieve a different manifest due to rate adaptation.

Internal spec review notes

Content comments:

Seems to be something missing from the sentence "it will not direction to device manufactures or User Agent implementers"
In "Content Generation" section of Streaming Overview, the way its written implies that content will always be DRM'd when in fact DRM is optional.
In the "Content Playback" section of Streaming Overview, same DRM comment, we should make it clear that not all content in these apps will be DRM'd. For example, make step one something like, "If the content is encrypted with DRM, the client uses the DRM license URL in the manifest (or media stream header) to request a secure key to enable decoding of the media."
I found the "Content Generation" section of VOD very hard to parse. What does "prioritize density over quality" mean? Do we really need a discussion of video compression here? Also, is VoD source really still "typically" from tape in 2017? More likely a mezzanine file. Shouldn't the "client side restrictions" mentioned here be in the Playback section?
"Trick Play" should probably be in the glossary.
"src parameter of the of the video element" should clarify this refers to the HTML
The "HDCP security requirement for HDMI" section is overly opinionated about DRM. IMO phrases like "illicit market" and "potential pirate" aren't appropriate for a technical document. I would remove the whole second paragraph, it isn't needed. Also, if this is a best practices/guidance document, the recommended practice is unclear here. It seems to be "always enable HDCP to prevent piracy," which again, is not appropriate in a spec. Shouldn't it be something like, "evaluate whether your apps will need to support HDCP"?
"Watermarking" again, what is the best practice here?
In "Unique device Identifier", the phrase "you need to be able to identify that device uniquely" should be "you MAY need to identify that device uniquely", since in many use cases a unique ID is not required.
"Like most DRM Widevine..." what is the point of this paragraph? IMO it is too low-level and unnecessary.
"Currently Microsoft’s PlayReady dominates the market" is overly opinionated and unnecessary.
"video or audio media element" make clearer that this is the HTML
section 6 has a lot of broken sentence fragments. Maybe a ReSpec markup problem?
"Original Content is normally delivered to the service provider as a file with near lossless compression." Might be worth mentioning that these are the mezzanine files referenced later.

Typos and other editorial

Not all headings are numbered. IMO they should be since many of the sections have the same title (e.g., "Content Generation"), which makes it difficult to reference them.
"The content provider supply's", I think that should be "supplies"?
Slash after the word "origin" seems like a typo
"In HLS if there is a type is VOD"
"10pc" should be 10%
"The content deliver through a CDN" should that be "delivered"?
"buffer windows" should be window
" live window and event beyond it" should that be 'even'?
In 2.4, "Unfortuantely" spelled wrong
"Within an the application layer"
"Device Type" section is missing proper paragraph breaks.
"windows, mac, android" should be capitalized
"classic pc /browser" should be PC browser
"stb" is inconsistently capitalized
"Advertisements" is inconsistently capitalized
"key value pairs" should be key-value or key/value
"manifest" is inconsistently capitalized
"use-case" is inconsistently hyphenated
"Application" is inconsistently capitalized
"Mezzanine" is inconsistently capitalized
"Encoding" is inconsistently capitalized
"mp4" is inconsistently capitalized
"VoD" is inconsistently capitalized
"Progressive Web apps" is inconsistently capitalized
"not exceed bandwidth availability" needs a period
"In web terms. " seems to be an orphaned clause

Manifests and URLs in section 2.3 Content Delivery

The second paragraph says this

The manifest is typically passed to a player. The player makes a HTTP GET request for the manifest from CDN edge, the edge server's location is determined via DNS.

If the manifest is passed to the player, then the player should not have to retrieve the manifest again. In light of this paragraph, it may be corrected to

At this stage, the control in the chain switches to the client's web video application. The content provider supplies the client with the URL of a manifest file located on a CDN rather than the origin. The manifest URL is typically passed to a player. The player makes a HTTP GET request for the manifest from CDN edge, the edge server's location is determined via DNS. If the CDN edge does not currently have the manifest available, the CDN requests a copy of the file from the origin and the file is cached within the server for later use. There are different levels of caching; popular VOD content is kept closer to the edge in the CDN network in this way it can be delivered to customers faster than an edge server that isn't tuned for high volume delivery. The CDN edge then returns the the manifest to the requesting player.

The difference between the "origin" and "CDN" is difficult to determine in this paragraph. It is quite likely that the manifest URL is to the actual origin where the manifest is created. The CDN may use some other techniques to provide that manifest to the client, but often the CDN architecture (node names, IP addresses, request redirections) should be unknown to the client

Rendition configuration examples in 4.2 should be more clearly fictional

I assume the point of the rendition parameters table in 4.2 is to illustrate the choices that must be made.

To avoid readers misinterpreting this as some sort of "best practices" table, it should be more clearly made imaginary. Instead of referencing HEVC and H.264, it should be changed to say "codec A" and "codec B". Perhaps the headers should also say "Example bitrates" and "Example framerates" to make its nature even more obvious. The 480p HEVC comment in the table should also be made more fictional.

This table should not drive (even by accident) decisions in projects related to selecting such parameters.

Section 2.6 Live Linear Streaming

This area is being worked by Andy Beach (ms), and draft will be posted here for content.

Add 2017 to spec title

Add "2017" to spec title as discussed during the TWG call.

Section 2 Feedback Through Section 2.6

Please feel free to shoot holes in any of these suggestions:

In the first paragraph of Content Generation: I suggest adding mezzanine file to the glossary. I'd suggest eliminating the use of "Original Content" and just use mezzanine file since "original content" is also used in section 3.3.1 to indicate the content to be played after leaving an Ad. The first sentence says, "discussed in section below", but that could probably be removed (I'm not sure what it is referring to).
Suggest rewording second paragraph in Content Generation. Maybe something like: "ABR streaming content is generated by encoding the mezzanine file against an encoding profile. Firstly, the mezzanine file is encoded into a set of versions, each with its own average bitrate. Secondly, each version is packaged into segments of a specified duration." Note, I prefer transcoded to encoded in this context, but the meaning is clear either way.
Suggest minor changes to the final paragraph in Content Generation. (didn't find a way to do strike-through, so some text has been removed) "The second process is performed by a packager which segments the different bitrates. These are then packaged into a transport format such as transport streams (.ts) or fragmented MP4s (.m4s). Next, they are optionally encrypted with a DRM that is suitable for the environment where the content is going to be played out. The packager is also responsible for the generation of a manifest file, typically a DASH (.mpd), HLS (.m3u8), Smooth (.ism) or HDS (.f4v), that specifies the location of the media and its format."
In the second paragraph of Content Delivery: There is a little too much use of the word "it". In the last few sentences, "it" changes from referencing the CDN to referencing the manifest. I'd suggest something like this: "The player makes a HTTP GET request for the manifest from CDN edge, the edge server's location is determined via DNS. If the CDN edge does not currently have the manifest available, the CDN requests a copy of the file from the origin and the file is cached within the server for later use. The CDN edge then returns the the manifest to the requesting player."
In the Content Playback section: number 1: Is there a difference between a client and a player? Maybe start the sentence with "The player uses it's DRM license URL..."
In the Content Playback section: number 2: Since it is in the glossary, can you get rid of "(adaptive bitrate)"? Could we clarify this section to describe it as a network throughput calculation? Maybe something like: "The player's ABR algorithm determines the data throughput rate available to the client by calculating the throughput from the first segment of video. This provides enough information to determine the initial playback quality level the player can sustain."
In the Content Playback section: number 3: Maybe clarify what "this information" is? Change "available bitrate" to "calculated throughput"?
In the Content Playback section: number 4: Slight change: "The player then requests a segment from the edge server, which is typically relative to the location of the manifest."
In the Content Playback section: number 5: slight change: "If encrypted, the segment is decrypted in accordance with the specific DRM in use."
In the Content Playback section: number 6: Make it clear what the "it" words reference. Something like "The player then adds the segment to the video buffer."
In the Content Playback section: number 7: Is "media engine" standard terminology? I've used terms like video pipeline or decoding pipeline. Maybe something like: "The media engine decodes the data and passes it to the video surface where it is rendered."
In the Content Playback section: number 8: In previous bullet points, "video quality" was used instead of "bitrate stream". I suggest referring to it in one way. I personally think bitrate is more accurate than quality. Also, there are some missing commas: "If the throughput remains constant**,** the player will continue to request segments of the same bitrate stream as previoulsy defined. In the event of a change in network throughput**,** the player will make a decision about the need to either drop to a lower bitrate stream or a higher bitrate stream."
In the On-Demand Streaming (VOD) section, the sentence is a little awkward. Maybe something like this: " Despite the almost identical mechanics, content generation, delivery and play out, used for VOD and linear, a large organization will typically maintain two distinct workflows as there are subtle but important ways in which they differ. " If feels like the following sections 2.6 - 2.8 and maybe 2.9 and 2.10, should be subsection of 2.5.
In the 2.6 Content Generation section, there are some missing commas: "For VOD,...", "To this end,...", "As we will see shortly,..."

Future updates to Web Media Application Developer Guidelines

It has been a few years since the Web Media Application Developer Guidelines document was published. It is worth considering if some/all of the document content is still useful and is worthy of an update.

If anyone is referencing the document, please comment on the relevant section(s) you find useful to help inform what the future of this effort will be. Thank you.

Missing topic: Accessibility

the initial draft should include guidance on closed captions/subtitling as all customers will have some sort of requirement around it.

Feedback from latest review

In response to call for TWG review, here are my comments:

Across all sections, make use of Web or web capitalization consistent.

In the Abstract, should we remove this language since we don't expect annual updates? "This specification should be updated at least annually to keep pace with the evolving Web platform."

"It is not a W3C Standard nor is it on the W3C Standards Track." This seems to conflict with the Abstract, which says, "The goal of this Web Media API Community Group specification is to transition to the W3C Recommendation Track for standards development."

1.1. "The examples in this document provides" should be "provide" with no 's' ; "to maximize the provide hints " should be "to provide hints"?

Glossary

"MPEG-Dash" should be "MPEG-DASH"
In Outstream Advertisement, "Eamples" should be "Examples"
Seems there should be a definition for "linear," or explain it in the Live Streaming entry.
Should (.m4s) be (.m4v)?

2.1.3 "Media Source Extensions [[media-source] " remove extra "[" ; In #6, I think "bitrate rendition" would be more accurate than bitrate stream

2.2.1 Would be helpful to expand on what "the encoder is able to prioritize density over quality via configuration" means

2.2.3 Please expand on or remove "(more detail)"

2.4.4 "can occur both in Live and VoD playback session" should be "sessions"

2.5 "For more efficient loading, images are often merged into larger grids" would add "(sometimes called sprites)"

3.1.1 make "navigator.userAgent" and "if(tizenGetUser){ then do X ]" monospaced font; make MPEG-Dash MPEG-DASH.

3.2 Insert a space between EMEAPI; In #3, make "encrypted" monospaced; make navigator.requestMediaKeySystemAccess() , MediaKeySystemAccess.createMediaKeys() , HTMLMediaElement.setMediaKeys(mediaKeys) monospaced; add a period after final "here" link

3.3.2 remove extra [ ] characters around links

4.6 make http urls into links

5.2 "web OS" should be "webOS"; "built with the web. And run" should be " built with the web and run"; Eco system should be ecosystem; android should be Android

5.3 capitalize windows

5.3.1 capitalize popular, make "cross walk" Crosswalk Project

Headings in section 2

The current headings are

Media Playback Use Cases
2.1 General Description
2.2 Content Generation
2.3 Content Delivery
2.4 Content Playback
2.5 On-Demand Streaming (VOD)
2.6 Content Generation
2.7 Content Delivery
2.8 Content Playback
2.9 VOD Pre-caching

It seems that

2.2, 2.3 and 2.4 should be sub headings under 2.1 General Description
2.6, 2.7, 2.8 and 2.9 should be sub headings under 2.5 On-Demand Streaming (VOD)

VOD Uses cases 2.2

I will add content for the other vod uses cases as listed at the foot of the 2.2 section. Let me know if you already have material for these.

recomendations for shaping document

On the Editor's call this morning, we talked a bit about the direction of the document, and wanted to get the full consensus of the team. Thasso brought up a good point that most developers will be approaching media content from a player perspective. The content as we have it now basically teaches that how to build a player from scratch, which is probably not reasonable or desirable for most developers. We had thought about gearing the content to better help developers understand and troubleshoot what is going on in the player. The recommendation is:

remove section 5. This example code would most likely be using a player or pulled from a player, and the content it is discussing is all explained in the previous sections.
add code snippets within the explanations to understand what players are doing to achieve each feature. this could would be code you can copy and past into a window and run, but it should help troubleshoot.
Gear content towards high level understanding (like section 2.1 is now) so developers can better interact with the player code and content creators.
Thoughts?

Clean up Section 2

I propose removing the following sections for our draft:
2.5 Live Streaming with Client Side Ad Insertion
2.6 Live Linear Streaming
2.7 Live Linear Streaming with Client Side Ad Insertion
2.8 On-Demand Streaming with Trick Mode
2.9 Live Streaming with Trick Mode

Add references to terms defined in the Glossary

It is good practice to link back to the definition of a term whenever it gets used in the document. ReSpec makes that easy with the use of <dfn> and href-less <a> (see Definitions and Linking in ReSpec's documentation).

Although easy, it does require some initial editorial work and a little bit of discipline afterwards. The initial work is rather tedious, but I'm used to doing it, so I'd be happy to take a first pass at the document and insert the right markup, once the first batch of comments has been addressed (#46, #47, #49, #52).

Note the need for a bit of discipline would remain on the editors' shoulders. It's really not much work in practice, and experience suggests that it greatly helps with editing as it makes it easier to identify terms that are not properly defined and write a more consistent document.

Use of unique device identifiers

Section 3.1.2 gives some information about how unique device identifiers can be created but the specification does not seem to make any use of them.
While this paragraph is interesting, it would be useful to give a few examples of where the unique device identifier could be used and what impacts the application developer needs to consider.

Scope of media streaming/playback use cases

What are your thoughts on the scope of media streaming and playback use cases?

For example, let's take 2.1 On-Demand Streaming:
Does the code sample include a Manifest file (DASH/HLS)? I think it should, otherwise there will be difference to 2.2 Live Streaming, because the same segments can be used for on-demand and live.

Is it an accessible Manifest file (e.g. referencing segments that are hosted somewhere)? This would mean accessing existing or new DASH/HLS test vectors and making sure that they are available, which might be difficult (for live streaming)

What about MSE/EME code samples? Both EME and MSE provide code samples in the specifications. Web Media guidelines could just reference these or go further and provide actual test vectors with the code samples.

Scope clarification for guidelines

After talking with @thasso today at the F2F, we propose that we should clearly define and limit the scope of v1 of the guidelines to not include superfluous content. I propose that, for example, we NOT include full player implementation examples with code samples and configurations, but rather we discuss overall guidelines for implementations and link to specific technology implementation guidelines (such as JW Player, etc.).

Describe DRM handling with different track types with different policies

Premium content has different track types: SD, HD, UHD, audio, 3D etc. Content owners want different content protection policies applied to each.

Accordingly, it follows that a specific device may be able to only play back a subset of the tracks present in a media presentation. It is important that players be able to handle such situations.

The authoritative source for evaluating the policy to be applied is the DRM system on the client. It receives the policy in the license that carries the content key. To apply different policies to different tracks, they are encrypted with different keys, potentially delivered in different licenses that may have different policies embedded.

Therefore it may be that the client makes a request for one or more licenses for one or more keys, receives some set of licenses (perhaps not containing keys for all tracks!), tries to use them but finds that the DRM system rejects playback of the UHD track (due to policy check failure). A player needs to understand that this does not necessarily mean a fatal error but that it should simply exclude the UHD track from playback.

Application developers also need to consider how to communicate such policy-derived actions to users.

Thumbnails support in DASH

Unfortunately, neither DASH or HLS do currently specify a way to reference thumbnail images directly from manifests. This however is certainly possible and there is already a proposal to add support for thumbnails to the DASH-IF guidelines [DASHIFIOP]. This would certainly simplify live playback case where the application has to update the information about thumbnails with each manifest update.

In the latest version of DASH-IF guidelines (DASHIFIOP 4.1), the proposal regarding thumbnails support, valid for both live and vod streams, has been accepted and included (clause 3.2.9 and clause 6.2.6)

Section 2.4 Content Playback item 2

Bullet #2 says

The player's ABR algorithm determines the data throughput rate available to the client by calculating the throughput from the first segment of video. This provides enough information to determine the initial playback quality level the player can sustain.

This does not answer the question as to what playback quality is selected by the client for retrieving the first video segment. This is typically a client based algorithm, we could have a note on this point to ensure completeness

Section 2 Feedback Continued Section 2.10

Section 2.10.1 suggestions:

Remove the "This is helpful in a few scenarios." sentence.
Update the final sentence of the first paragraph to something like: "In the VOD scenario where a user wants to access a specific location in a file, a range request allows the player to access content at a specific location without downloading the entire transport chunk."
Second sentence of the second paragraph: "The player needs to be able to playback a source buffer and the server must be configured to serve ranges."
Changes to the next three sentences: "The exchange begins when a client makes an HTTP HEAD request . If range requests are supported, the server responds with a header that includes Accept-Ranges: bytes and the client can issue subsequent requests for partial content."
Changes to the final sentence of the second paragraph ("of the" was repeated twice): "The returned bytes are added to an ArrayBuffer and then appended to the SourceBuffer which in turn is used as the SRC parameter of the HTML5 video tag/element."

Section 2.10.2 suggestions:

Unsure whether to use "have" or "has" in the first sentence after "Some content [have/has]..."
The first paragraph could be condensed and merged into the second paragraph. An intro sentence indicating that DRM exists to protect content during distribution to the end user while HDCP exists to prevent interception between the device and the display.
In the second paragraph, add a comma after this clause "If implemented by a device manufacturer,". I'd also replace "it" with "HDCP": "If implemented by a device manufacturer, HDCP..."
Be consistent and use client or device.
Last two sentences could be changes to something like: "If the exchange fails , the signal is not transmitted to the display and the device is responsible for notifying the end user of the error."
If the previous suggestion is not used, change "altering" to "alerting".

Section 2.10.3 suggestions:

Change "vendors SDK" to "vendor's SDK".
Update "...provided by the vendor of this technology if to monitor..." to something like "...provided by the watermarking vendor is to monitor..."
Change "...intercepted screened..." to "...intercepted and screened...."
Change "If a suspect stream is found they then direct the service providers to the location where the streams have been misappropriated." to something like "If a suspect stream is found the vendor directs the service provider(s) to the source of the misappropriated stream."
Generally, I would suggesting using while instead of whilst. I'm assuming US english is preferred over UK english for this document.
In the second paragraph, I want to clarify the "issue". Maybe something like "While watermarking is not an issue that developers will often be faced with, the processing requirements can dictate the choice of platform for the content distribution."
The second sentence could be changed slightly: "The watermarking process requires processing power and low-level functionality through ring-fenced resources that platforms such as Android, iOS or Roku provide."

Need to fix 3 ReSpec errors

The current draft shows 3 ReSpec errors that need fixing:

Found linkless <a> element with text "format" but no matching <dfn>.
Found linkless <a> element with text "format" but no matching <dfn>.
Found linkless <a> element with text "trick play" but no matching <dfn>.

Guidelines may want to consider additional sections to cover targeting TV, mobile & PC devices.

Previous W3C specs provided guidance for developers targeting mobile devices. Mobile Web Application Best Practices includes useful guidance for web app developers targeting mobile devices related to data usage, resource usage, security, privacy, device detection, etc.

Web media apps need to run on PC, mobile and TV devices, each of which have differences in screen size, user interface, viewing distance, etc. Perhaps the Web Media API Guidelines could include some similar guidance extended from mobile to PC, mobile and TV devices.

Section 2 Feedback Continued Through 2.9.1

Section 2.7 suggestions: This sentence is unclear, please rephrase: "There are different levels of caching; popular VOD content is kept closer to the edge in the CDN network in this way it can be delivered to customers faster than an edge server that isn't tuned for high volume delivery." Minor change here: "Older and less popular content is retained in a mid-tier cache while the long tail content is relegated to a lower tier. "
Section 2.8 suggestions: First Paragraph: "As mentioned in the Content Generation section**,** the player uses a tag within the manifest to determine the playout type. In HLS, if there is a type with the value of VOD, the player will not reload the manifest. This has important consequences if there are changes in availability after the session as commenced. In DASH, the difference between a live and VOD playlist are more subtle (more detail) " Do you want to explain the "important consequences" for HLS?
Section 2.8 suggestions: Second Paragraph: End the sentence after "predefined duration." Change "current time" to "playback position". Maybe remove "amount and"?
Section 2.8 suggestions: Third Paragraph: "Additionally, the UX requirements are different between VOD and linear. Unlike linear content, which lends itself to being browsed within an EPG (electronic program guide), VOD content is typically navigated using tiles or poster galleries. There is also a trick play bar to view the current playback position and seek to other points within the content."
Section 2.9 has only one subsection. Just rename it to VOD Pre-caching and make it one section.
Section 2.9 first paragraph (if it remains after previous suggestion): Change "In the previous section,..." to "In the previous sections,...". Change second sentence to "In this section, the use-case is defines an on-demand streaming strategy employed on clients to improve performance."
Section 2.9.1 suggestions:

First sentence change "...will be the.." to "..is the...".
When I read the first sentence, the equation is (% of sessions which buffer)/(length of the session). Is that the intended equation?
In the second sentence, add commas around "as a consequence".
Add a citation for the "...10% of users abandon a video stream"?
Remove the word "will" from the "...developers will use points...".
End the sentence at "... UX to pre-cache content." The next sentence starts: "For example, when entering a mezzanine/synopsis page, ..."
Maybe rewrite the for example sentence as: "For example, when entering a mezzanine/synopsis page, the application might preemptively cache the content by filling the player’s video buffer or, alternatively, storing the chunks locally."
Maybe rewrite the "The consequence..." sentence: "Pre-caching allows the video to commence playing without buffering and provides the user with a more responsive initial playback experience."
Minor update to the last sentence: "This technique is not used for cellular sessions where the user’s mobile data would be consumed on content that they do not watch."

section 2.2 - VOD use cases

These are some topics that came out a discussion on VOD specific use cases here at Sky:

Pre-caching vod
Device storage / local storage
Late binding Audio switching tracks (up to 12) live and vod
Bitrate capping
Buffer manipulation (device / storage dependant)
Vod assets remain at the edge longer
President unique ID exposed via js - signed code issue
HTTP byte range requests
HTTPs support
Player event exposure
HDCP security requirement for HDMI
GSMA flag at DRM rights
Water marking
Hardware acceleration support

I can get something down for these but wont be able to provide MSE/JS examples.

Add a section on Media Capability Discovery

The December 18, 2017 TWG call asked that the Developer Guidelines add a section on media capability discovery, e.g.:

HTML spec API HTMLMediaElement.canPlayType()
MSE spec API MediaSource.isTypeSupported()

3.2 is misleading in terms of how DRM activation occurs

EME says that the "encrypted" event is simply a signal that initData was found in the media stream. Encrypted media can be consumed without ever containing initData in the media stream (e.g. when it is supplied externally) the description in 3.2 is misleading as it gives the impression that the "encrypted" event is a necessary part of the flow.

addition of developer "best practices" statement to document

I'd like to propose a new section to the document that clarifies our support for common Web Development best practices. I think this could be it’s own section, or it can be a section of the introduction (I’m leaning towards a section of the introduction). Here are my thoughts:

This document is squarely aimed at support developers who are building Media based web applications. For that reason the focus on the content is around specific media-centric APIs and app scenarios. This however does not preclude our support for web development best practices.
Many of the APIs we highlight in this documentation relay on modern web developer techniques. For example, not all browser support Encrypted Media Extensions, so a feature detection would need to be implemented:

if (!video.mediaKeys) {
      navigator.requestMediaKeySystemAccess('org.w3.clearkey', [
        { initDataTypes: ['webm'],
          videoCapabilities: [{ contentType: 'video/webm; codecs="vp8"' }] }
    …
}

This allows a developer to implement EME when supported, and not execute code that will generate errors in browsers that do not. Another important practice is the use of polyfills which in many cases can provide an alternative functionality when the browser doesn’t natively support a feature.
In general, best practices for modern web development are well documented across the web. A great place to start to familiarize yourself with these is practices is the Mozilla Cross browser testing guide.