mdn / stumptown-content Goto Github PK

License: Other

JavaScript 99.15% HTML 0.85%

stumptown-content's Introduction

MDN Web Docs on Github

👋 Welcome, Bienvenida, 欢迎, Bienvenue, いらっしゃいませ, Receber, Добро пожаловать, 환영합니다, მოგესალმებით

Welcome to the mdn repository which we use to track MDN team work. The MDN teams public projects are here, where you can view current and upcoming tasks.

This repository is also used for requests and contains issue templates for the following processes:

Invited experts

Joshua Chen

GitHub
Invited expert: JavaScript

Hidde de Vries

GitHub
Invited expert: Accessibility

Scott O'Hara

GitHub
Invited expert: Accessibility

André Jaenisch

GitHub
Invited expert: SVG

Mendy Berger

GitHub
Invited expert: WASM

NOTE: If you wish to nominate someone to be considered as an invited expert, start by filing an issue in this repository.

stumptown-content's People

Contributors

Stargazers

Watchers

Forkers

wbamberg mozilla-github-standards ddbeck alexcross11248 joedarc soc10004 jpmedley nilaydatta mulxcode mdlglobal-atlassian-net kartik1397 oflenake fiji-flo iamthatiam777 shangsandalaohu123 terrorizer1980 isabella232

stumptown-content's Issues

Some examples don't have titles

▶ cat packaged/html/elements/address.json | jq .html.elements.address.examples
[
  {
    "description": {
      "width": 672,
      "height": 242,
      "content": "<p>This example demonstrates the use of <code>&lt;address&gt;</code> to demarcate the\ncontact information for an article&#39;s author.</p>\n<p>Although it renders text with the same default styling as the\n<a href=\"/en-US/docs/Web/HTML/Element/i\"><code>&lt;i&gt;</code></a>\nor <a href=\"/en-US/docs/Web/HTML/Element/em\"><code>&lt;em&gt;</code></a>\nelements, it is more appropriate to use <code>&lt;address&gt;</code> when dealing with\ncontact information, as it conveys additional semantic information.</p>\n"
    },
    "sources": {
      "html": "<address>\n  You can contact author at <a href=\"http://www.somedomain.com/contact\">\n  www.somedomain.com</a>.<br>\n  If you see any bugs, please <a href=\"mailto:[email protected]\">\n  contact webmaster</a>.<br>\n  You may also want to visit us:<br>\n  Mozilla Foundation<br>\n  331 E Evelyn Ave<br>\n  Mountain View, CA 94041<br>\n  USA\n</address>\n"
    }
  }
]

This is breaking mdn2 which could easily do something like {example.title || 'no title'} but since the title can be transformed and used for other things it might behoove us to make it required and add it to the list of validation things.

Bug in build-json-js for prose.see_also

prose.see_also is required for html-element recipes but when I generate packaged/html/elements/video.json it's not there in the JSON.

related.buildRelatedContent should be memoized

If you, for example, run npm run build-json html it will call buildRelatedContent with the argument '/content/related_content/html.yaml' for every html ref page.

item.related_content = related.buildRelatedContent(recipe.related_content);

Use headings rather than HTML comments for prose.md sectioning?

See #14 (comment). I can't remember now why we opted to use HTML comments rather than headings in the first place.

Unless @escattone or @Elchi3 can. I suggest we change this.

Consider explicitly representing the liveness of an example

Currently we don't have a way for an author to indicate explicitly that an example is a live sample (that is, that it has an output iframe). Instead they indicate it implicitly by providing a width and height for the output iframe.

It would probably be clearer to make this explicit in the metadata for the example.

Structure "see also" content

(there's also talk of structuring "See also", since it's basically just a list of links - and that would enable us to render it in other contexts etc. I guess we should talk about that in yet another issue...)

Originally posted by @wbamberg in #55 (comment)

We ought to have some structure around "see also" content, such that:

data in stumptown JSON is an array of cross references that clearly identifies other stumptown content (e.g., html.elements.sometag) or a URL (e.g., https://developer.mozilla.org/docs/Web/HTML/Element/sometag) or some combination of those
authoring is reasonable and unsurprising. This might be a YAML array in meta.yml or a prescribed prose Markdown section (e.g., a single-level unordered list) paired suitable linting

I suspect the next step is to survey existing see-alsos to see how much flexibility we really need there.

Figure out what to do about <dl>

The stumptown-status.md document mentions that Markdown doesn't support <dl>, and that we use it a lot, but does not offer any suggestion on what we will do about this. A solution needs to be decided upon.

Switch to yarn

yarn is still much better than npm. It's faster, has support for "resolutions", easier to upgrade packages, ability to check that packages weren't manually added (without using the cli), etc. And from experience feels a lot more predictable and solid than npm.

Just today, I checked out master and ran npm install and it caused this massive change to package-lock.json

Can we flatten additional_prose?

Spawned from this dicussion: https://github.com/mdn/stumptown-renderer/pull/89/files#r318050706

In that PR, the renderer deals very differently with additional_prose than it does with regular prose. In the packaged JSON the data struct is something like this:

{
  "body": [
    {
      "type": "prose",
      "value": {
        "title": "Short description",
        "id": "short_description",
        "content": "<p>The <strong>HTML Video element</strong> ....</p>"
      }
    },
    {
      "type": "additional_prose",
      "value": [
        {
          "type": "prose",
          "value": {
            "title": "Styling with CSS",
            "content": "<p>Use the\n<a href=\"/en-US/docs/Web/CSS/list-style-image\">..."
          }
        }
      ]
    },
  ]
}

Why not just flatten that and manufacture an id so it becomes like:

{
  "body": [
    {
      "type": "prose",
      "value": {
        "title": "Short description",
        "id": "short_description",
        "content": "<p>The <strong>HTML Video element</strong> ....</p>"
      }
    },
    {
      "type": "prose",
      "value": {
            "title": "Styling with CSS",
            "id": "styling_with_css",
            "content": "<p>Use the\n<a href=\"/en-US/docs/Web/CSS/list-style-image\">..."
        }
    },
  ]
}

Support description per-property under `examples`

In a given example, the stumpdown-status.md page says, you can have multiple html-source, js-source, etc items, to allow having multiple sections of code. This is great.

I think each one of these items should be able to have its own prose section associated with it, optionally, to allow including descriptive text associated with each code item.

Avoid reserved future keyword

If you try to import the buildPageJSON function from scripts/build-page-json.js you get:

SyntaxError: /Users/peterbe/dev/MOZILLA/MDN/stumptown-renderer/stumptown/scripts/build-json/slice-prose.js: Unexpected reserved word 'package' (55:15)

See
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#Future_reserved_keywords

Write linter to validate that pages follow their recipes

As a follow up to #55 and #74, we ought to have a script that will error out when when pages have known deprecated sections (like HTML's usage_notes) and when pages are missing required sections (like overview).

Acceptance criteria:

Running npm run lint runs this script
Errors result if any deprecated sections appear in a page
Errors result if any page is missing a required section

Clarify required and optional prose sections for HTML elements

The recipe for HTML elements lists the following as mandatory sections:

Short description
Usage notes
See also

and the following as optional sections:

Overview
Attributes text (this is for elements that want to add some extra prose for the attributes list)
Accessibility concerns

...with of course the option of providing any extra sections that are specific to a particular element. This doesn't quite match up with the existing stumptown content (as scraped from MDN).

It also doesn't quite match up with the specification @chrisdavidmills wrote for HTML docs: https://docs.google.com/document/d/17R-jyS2WVQ9_OIfErRZWRdPSAVYYauF3wfkJUeG07PI/edit#heading=h.nlsh2gn0fxac:

Summary/intro text

OK, call that "Short description".

Additional usage information; a full explanation of how to use the element.

Call that "Overview".

Styling with CSS

We don't have this at all.

Accessibility concerns and best practices

We have this as "Accessibility concerns".

See also

We have this of course.

So: the doc adds "Styling with CSS", which seems like a useful section at least sometimes, and also omits "Usage notes", which seems like a good omission (we would be better to fold that content into "Overview"). So suggest we could change the recipe to:

Mandatory sections:

Short description
See also

Optional sections:

Overview
Styling with CSS
Attributes text
Accessibility concerns

I don't know if we should be more prescriptive than this, and perhaps make "Overview" or even "Styling with CSS" mandatory.

For "Overview": there are existing elements which don't have an overview, so if we make it mandatory we have to update them. For some elements we might want to move the Usage notes to Overview (e.g. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/i#Usage_notes).

For "Styling with CSS" there will of course be some elements where it's not applicable at all, and others where there's nothing very interesting to say. So this probably should be kept optional.

@chrisdavidmills , thoughts?

Acceptance criteria

Recipe for HTML elements is complete
Existing content that would not comply with the recipe has been identified
Issues have been opened for updating content, updating build-json, adding linting, and updating the renderer

Tasks

(see #55 (comment) for details):

update the recipe
identify existing content that would not comply with the recipe
open issue(s) for updating content, updating build-json, adding linting, and updating the renderer

Cheerio instead of JSDOM

About a decade ago it was shown that Cheerio is 8x faster than JSDOM. See https://habr.com/en/post/163979/ from 2012. It's a bit "hard to read" since it's written in Russian but it's easy to find the raw benchmark results (focusing on NodeJS).
Much has been said here cheeriojs/cheerio#700 about that claim and they acknowledge that original (Russian) benchmark is outdated and JSDOM has changed significantly since then. But someone in Feb 2019 backed up the benchmark with a speed difference of 3.8x.

Here's a good article outlining the difference between the two candidates.

Another thing on my mind was that I heard about the Mozilla Activity Stream folks who worked on parsing web pages in Node for the sake of suggestions in about:newtab. They didn't mention speed but severe memory bloat in JSDOM. It might not matter in a CLI because even if a bloats a bit it's not a daemon.

On a personal note, I really like/prefer the API of Cheerio but perhaps that's just years and years of using jQuery in browser JS and PyQuery in Python code.

We should eslint the scripts

eslint would probably have caught the mistake of creating a function called package.
It would have avoided #109

We shouldn't use eslint to worry itself about code formatting. We'll use prettier for that once the ADR settles.

Code keywords and styling

I think we should try to make changes to how we "label" and style references to code terms when we render our output HTML.

For instance, currently, anything that's code gets wrapped in <code>...</code>. However, that also means that from a scripting and content wrangling perspective, there's no information about what kind of thing you're looking at.

I think that instead, we should be flagging items with what kind of thing they are, rather than applying style. This could be done in a number of ways, from custom elements to custom attributes to a wide variety of classes. A few ideas to contemplate...

Variations on `<code>`

<code class="code-method">getElementById()</code>

The class here provides the insight to anyone interested that this is a reference to a JavaScript method.

<code class="code-method">getElementById</code>

Here, we do the same thing, except that the code-method class uses :after to append the parentheses () after the method name automatically.

<code class="code-method" data-interface="Document">getElementById</code>

Now we've added the fact that this is a method on the Document interface, which can be referenced by anyone interested, from browser extensions to site script to anything else.

Custom element magic

Or we could have fun with a custom element or similar construct:

<keyword type="method">Document.getElementById</keyword>

<keyword type="element">iframe</keyword>

<keyword type="element-attr">iframe.allow</keyword>

This could be coded to look at the type to determine the styling to apply, then make intelligent decisions, such as:

If the referenced item is a member of the same interface or HTML element as the current page, display only the method or property name, and not the interface name
If type is method, append () to the end of the displayed text
Given the type, the item text, and other relevant info that might be provided, generate a link to the referenced page. If possible, we could script this to only do it the first time the referenced keyword is used within a section, or something along those lines, unless overridden specifically using an attribute
Perhaps when it's decided that the entire name (both interface and method/property names) must be displayed, display it as "getElementId() in the Document interface" or "the Document interface's getElementById()" rather than just Document.getElementById(); either automatically or using an option specified by an attribute
Automatically create an appropriate tooltip
And whatever else we want

There are a lot of interesting things we could do that could make our content so much sweeter.

Make packaging JSON asynchronous

See #14 (comment): currently file IO is synchronous, but we should make it all asynch.

Syntax highlighting

In https://github.com/mdn/stumptown-experiment/blob/master/content/html/elements/video/prose.md there's a block uses

  ```html
   ...

In GitHub flavored Markdown this is automatically turned into

&lt;<span class="pl-ent">video</span> <span class="pl-e">controls</span>&gt;
  &lt;<span class="pl-ent">source</span> <span class="pl-e">src</span>=<span class="pl-s"><span class="pl-pds">"</span>myVideo.mp4<span class="pl-pds">"</span></span> <span class="pl-e">type</span>=<span class="pl-s"><span class="pl-pds">"</span>video/mp4<span class="pl-pds">"</span></span>&gt;
  &lt;<span class="pl-ent">source</span> <span class="pl-e">src</span>=<span class="pl-s"><span class="pl-pds">"</span>myVideo.webm<span class="pl-pds">"</span></span> <span class="pl-e">type</span>=<span class="pl-s"><span class="pl-pds">"</span>video/webm<span class="pl-pds">"</span></span>&gt;
  &lt;<span class="pl-ent">p</span>&gt;Your browser doesn't support HTML5 video. Here is
     a &lt;<span class="pl-ent">a</span> <span class="pl-e">href</span>=<span class="pl-s"><span class="pl-pds">"</span>myVideo.mp4<span class="pl-pds">"</span></span>&gt;link to the video&lt;/<span class="pl-ent">a</span>&gt; instead.&lt;/<span class="pl-ent">p</span>&gt;
&lt;/<span class="pl-ent">video</span>&gt;

We need a solution to do the same inside build-json.js so that the snippet of HTML has this stuff too.

Support input element type

We've discussed that HTML input elements might want to be a document type of their own, distinct from just HTML elements.

The HTML <input> element is weird. It's defined as a single element type, but depending on the value of the type attribute it can represent any of a large number of very different things: text inputs, buttons, data/time selectors, check boxes etc.

We've tended to treat these all more or less as quite separate entities, and to treat the fact that they're actually all the same element as an implementation detail. This seems like a good approach from the point of view of answering the questions users are likely to have.

But these things also have some special features which normal HTML elements don't have, so we have thought we might want to model them as a distinct page type in stumptown.

I've looked through the input element pages, and tried to see if we will want a document type and if so what it should feature.

The main page:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input

Looking through these I can see three distinct features:

Value section

All these pages have an H2 "Value" which describes the format and use of the value attribute.

Validation section

Most of the pages have an H2 "Validation" describing how the input is validated and how to add any extra validation.

Attribute handling

These pages have a different way to handle attributes, which I've gone into in detail here: https://discourse.mozilla.org/t/attributes-for-input-elements/43899. The way attributes are handled here is a mess at the moment.

I think what we should do instead is give each input element page an "Attributes" section listing the attributes that apply to that input type.

However we still need special handling here. In the normal HTML element pages, each element gets its own "attributes" directory: there's no sharing. So they are specified in the recipe by supplying the directory name, like:

attributes:
    element_specific: ./attributes

But in this case it's more like: there a big list of input element attributes, then each particular type supports some subset of that list. So, for example, the placeholder attribute is used by textbox-style inputs like url, password, search and so on. These really would like to share the documentation for placeholder.

So the obvious suggestion is to keep all the attribute documentation for input under html/reference/elements/input/attributes, and then let the attributes specification in the front matter list individual files, rather than a single directory:

attributes:
    element_specific:
        - ../attributes/maxlength.md
        - ../attributes/minlength.md
        - ../attributes/pattern.md
        - ../attributes/placeholder.md

One question is whether any of these attributes need different documentation for different types. The only place I've found where that seems likely is step, where the step unit is different for different input types. But even there we could document that in the preamble to the list of attributes, or we could have "../attributes/step-time.md".

Conclusions

We could avoid having an extra page type here. We could just support "Value" and "Validation" as "additional sections", and we don't need a new page type in order to support enumerated attributes.

But I think it would be helpful to model "Value" and "Validation" explicitly. So we could have a recipe that's very close to the normal element one:

related_content: /content/related_content/html.yaml
body:
- prose.short_description
- meta.interactive_example?
- prose.overview
- prose.value
- prose.validation
- prose.attributes_text?
- meta.attributes
- prose.styling?
- prose.accessibility_concerns?
- prose.*
- meta.examples
- meta.info_box:
    - meta.api
    - meta.permitted_aria_roles
    - meta.tag_omission
- meta.browser_compatibility
- prose.see_also

@Elchi3 , @ddbeck , comments?

Acceptance criteria:

input element recipe is written and checked into stumptown-content
all input elements are migrated into stumptown

Locales

How are we going to do the equivalent of https://github.com/mdn/stumptown-experiment/tree/master/content/html/elements/video in French?

Also, apart from the title almost all of the stuff in https://github.com/mdn/stumptown-experiment/blob/master/content/html/elements/video/meta.yaml is "locale agnostic".

We could get fancy and add https://github.com/mdn/stumptown-experiment/tree/master/content/html/elements/video/fr/ which would contain attributes/, examples/, contributors.md, prose.md. And lastly, it could contain a meta.yaml file that is everything that ../meta.yaml is but with the French "extras". E.g.

title: '<video>: L'élément vidéo intégré'
mdn-url: https://developer.mozilla.org/fr/docs/Web/HTML/Element/video
tags:
    group: Image et multimédia

Change default merge to "squash and merge"

There was a discussion recently that squash and merge is preferred for MDN projects.

Someone with admin rights on this repo (@wbamberg?) ought to turn off the other merge options in the repository settings. See Configuring commit squashing for pull requests for instructions.

related_content's mdn_url is different from the main doc

Looking at packaged/html/reference/elements/abbr.json for example.

Looks like this:

{
   "related_content": [
    {
      "title": "Learn HTML",
      "content": [
        {
          "title": "Introduction to HTML",
          "content": [
            {
              "title": "Introduction to HTML",
              "mdn_url": "https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML"
            },
           ...

  "title": "<abbr>: The Abbreviation element",
  "mdn_url": "https://developer.mozilla.org/docs/Web/HTML/Element/abbr",
  "interactive_example_url": "https://interactive-examples.mdn.mozilla.net/pages/tabbed/abbr.html",
  "browser_compatibility": {
    ...

Note the difference in mdn_urlvalues. Some are with en-US and some are not.

We've talked before about at least dropping the absolute part with the domain. No doubt. But should we go for the locale prefix? My inclination is to NOT have the locale in there. Some day, the abbr.json file will be called something like packaged/en-US/html/reference/elements/abbr.json (or packaged/sv-SE/html/reference/elements/abbr.json) so that piece of information will be implicit.

However, it could be that the lazy Swedes haven't translated every page yet so they might want to link to the en-US version in their side bars because it's better to point the English content than to not link at all. So for that reason it might be good to have the locale in there.

Remove the Python code?

In #14 we're developing the first cli tool that can be used to turn the .md and .yaml files into .json files.

The next natural thing to do, especially from a kuma perspective, is to now glob for all these .json files and one by one, do a lookup of its recipe and then do something with that. E.g. extract the bits it needs to turn it into kuma content.

So what else do we need Python for. Today, the only entrypoint I see is the validate.py script.

Regarding #14 @wbamberg mentioned that that script assumes that the integrity of all the core content (e.g. .md and .yaml files) is perfect. And I think that's right.

I would be open to rewriting validate.py to validate.js and reduce this whole project to become a 100% Node project.

Rename markdown files?

Right now, most of the markdown files have one name: docs.md. This makes it confusing to have multiple markdown files open in an editor at once. It'd be nice to have more descriptive names (for example, html/reference/elements/abbr/abbr.md instead of html/reference/elements/abbr/docs.md).

To be able to do this, we need to figure out the rules for deciding which files to build from. Previously, meta.yml file was the giveaway that a folder contained the files needed to build JSON. @wbamberg raised some possible solutions (and noted some drawbacks) in #91 (comment); there may be others.

We ought to:

Decide on a naming convention and a method for selecting files to build from
Merge a PR that changes the build and renames the files

Remove markdown-spellcheck and switch to retext

I've noticed that marked is still in our dependency tree (used by markdown-spellcheck) and I've also spend some time to further read about the package universe unified that we've started to rely on. It is actually great and has tons of format and language processing packages! Really like that.

So, I'd like to propose to remove my initial spellcheck implementation (i.e. remove the markdown-spellcheck package) and start using retext and its plugins.

https://github.com/retextjs/retext is the parent package
https://github.com/remarkjs/remark-retext can get you from remark to retext

Just like remark, retext comes with many plugins, but for language processing! So, not only it does spell checking, but also it has things to do further language checking like avoiding "guys" or checking if "a" and "an" are used correctly. Wow!
https://github.com/retextjs/retext-spell
https://github.com/retextjs/retext-equality
https://github.com/retextjs/retext-indefinite-article
List of language processing plugins: https://github.com/retextjs/retext/blob/master/doc/plugins.md

I'm impressed how much validation our writing could get :)

Use Markdown for representing directives

In #100 we use something like KS syntax to represent directives in guide pages, and some fragile regexps to parse these.

@ddbeck suggests we should instead extend our Markdown parser to handle these directives: #100 (comment).

"Content sourcemaps"

The user story isn't crystal clear yet but something I think would be super engaging is to be able to refer to the source when you're reading something.

Suppose you're reading the "Usage notes" on the <video> page and you spot a typo, or you found a bug in a code example, then it would be great to know exactly where the content came from. Some people might want just the file path of the .md file and some might just prefer to go straight to the HTML URL in GitHub.

I'm referring to what Readthedocs does with it's "Edit in GitHub" link for example.

It's an opportunity to engage new contributors but it also needs to be powerful enough for core contributors.

Link to contributors in some other/better way

Please be aware of this discussion: mdn/kuma#5717
I think stumptown-renderer would equally benefit from not having to render that massive list of names and URLs.

This issue belongs both and either here and in stumptown-renderer but they're intertwined.

Content in stumptown will almost certainly "always" (at least of URIs that existed on the Wiki) be a mix of historical contributions from the Wiki as well as git commits within a directory.

We will have to port the contributions from the Wiki into stumptown-content. Like we've already done
We can union this by doing some fancy git log commands [0] as time progresses the list of contributions will be a little bit of Wiki historical and a little bit git log.

One big question; if we take head from mdn/kuma#5717 , especially in point-of-view of stumptown-renderer, if you don't display the full list of names, where else do you display it?

[0] E.g.

git log -M --follow --pretty=tformat:"%aN %aE" -- content/html/reference/elements/video/docs.md | uniq

JSON identifiers should consistently use "_" for spaces

The JSON produced from stumptown content usually uses underscores to indicate spaces: browser_compatibility, mdn_url, and so on.

But for prose sections it uses "-": short-description, see-also and so on.

We should use underscores everywhere.

'usage-notes' is not present in html/elements/abbr

Pretty sure the reason is because of

<!-- <usage-notes> -->

in
https://raw.githubusercontent.com/mdn/stumptown-experiment/master/content/html/elements/abbr/prose.md

I found this out the hard way when trying to render this with mdn2.

I really hope we can soon build a linter here in this project and run it in CI.

Moving images to Stumptown

At the moment, on a page like https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images there are 6 images. These are loaded from mdn.mozillademos.org and in Kuma we store (MySQL) the reference to these by URL.

I think we should ship images (or other static like that) with this repo. I.e. check them into git.
Yes, it'll make the repo fatter but the advantages are many:

It's easy to reason around them and doesn't require a sync with a remote server.
It's easy to see images in the GitHub UI (e.g. in the list of changes for a PR) and the content will work 100% offline.
Instead of uploading a new one you can edit the images with your preferred local tools.
Not having to do the DNS lookup + SSL negotiation for another domain is a performance boost since we can reuse the same HTTP/2 pipeline.
Once loaded in the renderer, we can write code that post-processes images such as writing custom <picture> tags that pre-compress and pre-resize images from 1 source. And we can improve it (the processing) over time.
Less moving parts.

Shorter titles for the related content?

See screenshot in mdn/yari#58

This is a very specific reference to the existing MDN pages which displays a different title for certain elements in the sidebar compared to the actual title when you get to the page.

The stumptown-renderer is dumb and will just display the title key from the packaged JSON so if we want to have different titles in the sidebar compared to the document page, here's the place to resolve that.

Acceptance criteria:

we have a way to represent short titles in stumptown content, and the short title is included in the JSON
the renderer uses short titles when rendering the sidebar

Prose end-to-end testing with recipes

We've had at least two errors now where the prose sections are busted. #32 and #37

There's not really anything obviously wrong with the Markdown in those files. Maybe there might be, but that's the point.

What I would like to have is something that reads the recipes, understands which sections are mandatory and then validates the built JSON files for each of these.

Let's keep it simple and make an end-to-end test exclusively for this. It'd need to first run the npm run build-json ... and then iterate over the produced .json files and do checks including loading in the recipe.

Why YAML instead of JSON?

I'm curious about the plan to use YAML rather than JSON. There are a couple of major reasons why I think we should consider switching this to JSON:

All our other data sources are JSON
A large percentage, if not an outright majority, of our consumers will be JavaScript code
JSON is very familiar to web developers; more so, I suspect, than YAML

I know many people feel that YAML is easier to read; I'm on the fence about that. I see that it's got less "stuff" going on in it, but on the other hand, JSON's groupings hierarchically are clearer, IMO, because of the braces.

We should consider this carefully before we go too far along, since this decision is a pretty critical one to get right at the outset.

Should recipes drive the renderer?

(meta: taken from https://github.com/mdn/stumptown-content/wiki/Notes-from-Whistler-All-Hands,-June-2019#decide-whether-recipes-should-drive-the-renderer)

In stumptown we've expressed a desire to distinguish between (1) "MDN content" which is an abstract collection of web documentation, and (2) "MDN the website" (developer.mozilla.org). The idea is that "MDN the website" is the main consumer of "MDN content", but not necessarily the only one: editors and devtools could also want to integrate the content. This is reflected in the division between "stumptown-content" and "stumptown-renderer".

Maintaining this distinction is tricky, and one of the skirmishes is the meaning of "recipes". These describe what content should be included in a particular type of document (e.g. https://github.com/mdn/stumptown-content/blob/master/recipes/html-element.yaml). These files have - or at least used to have - a sort of double life. On the one hand they describe which bits of content authors should supply when they're documenting a thing. On the other hand they might be a description of how MDN should render pages.

The concrete difference would be seen if, for example, we decided to stop displaying CSS formal syntax in pages, but wanted to keep the content (maybe because other consumers were using it).

We discussed this recently (#48) and the resolution was that recipes should not drive the renderer, but instead we should just hardcode the different content types are React components (mdn/yari#18, which basically implements #48 (comment)). This means that the renderer has nothing to do with recipes.

In Whistler, @peterbe wanted us to reconsider this decision, and make recipes drive the renderer.

Pros: things get simpler. Not just in the case of recipes: there are other places too where this distinction can get messy: an obvious one is sidebars, which seem a very MDN-y thing, but whose content is clearly an authorial choice (so it is very tempting to put them in stumptown-content). It's easier to make design choices to satisfy real consumers than to satisfy theoretical consumers.

Cons: we acknowledge that stumptown-content is being built to serve MDN-the-website. It's still possible for other consumers to use it, and it might still be as easy, but it's certain that we will introduce extra stuff they don't care about (like sidebars) and possible we will make choices about what to expose or how to expose it that are very suboptimal for other consumers.

Either way, we should be very clear about the choice we're making here and its implications.

Fix HTML content to match new recipe

Now that #74 is done, we need to update the content that does not comply with the new HTML recipe. These take a few forms:

Entries that are missing sections (e.g., see_also)
Entries that contain deprecated sections (e.g., usage_notes)

Here's a list from #55:

content/html/reference/elements/abbr
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/address
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/article
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/aside
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/audio
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/b
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/base
  Missing sections: [ 'see_also' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/blockquote
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/body
  Missing sections: [ 'overview' ]
content/html/reference/elements/br
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/button
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/canvas
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/caption
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/cite
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/code
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/data
  Missing sections: [ 'overview' ]
content/html/reference/elements/datalist
  Missing sections: [ 'overview' ]
content/html/reference/elements/dd
  Missing sections: [ 'overview' ]
content/html/reference/elements/del
  Missing sections: [ 'overview' ]
content/html/reference/elements/dfn
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/dialog
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/div
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/dl
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/em
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/embed
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/figcaption
  Missing sections: [ 'overview' ]
content/html/reference/elements/figure
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/footer
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/h1-h6
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/head
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/header
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/html
  Missing sections: [ 'overview' ]
content/html/reference/elements/i
  Missing sections: [ 'overview' ]
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/iframe
  Missing sections: [ 'see_also' ]
content/html/reference/elements/kbd
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/label
  Contains deprecated sections: [ 'usage_notes' ]
content/html/reference/elements/legend
  Missing sections: [ 'overview' ]
content/html/reference/elements/li
  Missing sections: [ 'overview' ]
content/html/reference/elements/table
  Missing sections: [ 'overview' ]
content/html/reference/elements/video
  Contains deprecated sections: [ 'usage_notes' ]

Acceptance criteria:

All HTML pages in stumptown-content have required sections
All HTML pages in stumptown-content do not have deprecated sections
Changes mades to HTML pages in stumptown-content are reflected in their wiki page equivalents

Expose recipe

Perhaps somewhat similar to #26 I would like more in the built .json files. In particular, I want the recipe too.

We could just take ALL the fields from the src/**/video/meta.yaml file and put them into a key like "meta" in the packaged/**/video.json. E.g.

json.load(open('video.json'))['meta']['mdn-url']

Even better would be if that (the "recipe") can be looked up too so I don't need to do:

recipe_name = json.load(open('packaged/blabla/video.json'))['meta']['recipe']
recipe = yaml.load(open(f'node_modules/stumptown/recipes/{recipe_name}.yaml'))
for section in recipe:
    ...

Meaning, I'd rather just do:

recipe = json.load(open('packaged/blabla/video.json'))['meta']['recipe']
for section in recipe:
    ...

Log "additional sections"

In stumptown we have:

named prose sections that must be present (e.g. "Short description")
named prose sections that may be present (e.g. "Accessibility concerns")
additional prose sections that may have any names (e.g. "Supported image formats")

We would like the tools to log the names of additional sections, partly so we can see additional sections that are very often included and should perhaps be promoted to named sections.

Should we use front matter instead of meta.yaml?

(meta: taken from https://github.com/mdn/stumptown-content/wiki/Notes-from-Whistler-All-Hands,-June-2019#front-matter-versus-metayaml-files)

In the HTML element docs we have prose.md for the content and meta.yaml for metadata, which includes title and mdn_url but also things like browser_compatibility.

In the guide pages we also have some metadata: at least title and mdn_url. We seem to lean here towards keeping them in the Markdown file as front matter:

---
title: Introduction to HTML
mdn_url: https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML
---
At its heart, HTML is a fairly simple language...

This seems to be inconsistent. Should we use front matter for title and mdn_url even in reference docs? Should we keep all metadata in front matter, and abolish meta.yaml entirely?

We have had this conversation before, and decided to keep it in a separate file, for now. The argument for that being that lots of metadata at the start of a Markdown file makes it harder to read. But we should rethink this.

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].

(Message COC001)

Represent the height of an interactive example

With interactive examples you can have different heights of example, to support examples that need more code (like <colgroup>) or those that need less (like <i>.

We need a way for the author to represent this.

Acceptance criteria:

stumptown-content machinery can understand interactive example height indicator
content itself includes the appropriate value for height
the renderer interprets the height and creates an iframe of the appropriate size.

Support info boxes in HTML element recipes

The HTML element recipe looks like this:

related_content: /content/related_content/html.yaml
body:
- prose.short_description
- meta.interactive_example?
- prose.overview
- prose.attributes_text?
- meta.attributes
- prose.styling?
- prose.accessibility_concerns?
- prose.*
- meta.examples
- meta.info_box:
    - meta.api
    - meta.permitted_aria_roles
    - meta.tag_omission
- meta.browser_compatibility
- prose.see_also

So body is an array. All except one of its elements are strings, which map onto particular pieces of a page. In we try adopting this proposal, then it will be processed into an array of all the information the renderer needs to render that page piece (e.g. all the BCD, or all the example descriptions and sources). The renderer can then just walk through that array rendering each piece.

For these elements that are strings, there is some mandatory internal structure we expect to see in the content - for example, examples are expected to have a particular structure of sources and descriptions and titles. The recipe doesn't define that structure (but content linting should still enforce it).

But info_box is different. It's not a string, it's an object which consists of an array of strings, which in turn map onto values in the element's front matter. The output in the JSON still ought to be a thing called info_box (because it's rendered as a single unit) but in this case the recipe is defining the internal structure of the info box.

This is inconsistent, and that becomes very obvious when you try to write generic code to process recipes.

I think we should stop doing this, and instead define the recipe like:

related_content: /content/related_content/html.yaml
body:
- prose.short_description
- meta.interactive_example?
- prose.overview
- prose.attributes_text?
- meta.attributes
- prose.styling?
- prose.accessibility_concerns?
- prose.*
- meta.examples
- meta.info_box
- meta.browser_compatibility
- prose.see_also

...and define the info_box internal content separately, as we do for other items:

title: '<article>: The Article Contents element'
mdn_url: https://developer.mozilla.org/docs/Web/HTML/Element/article
tags:
    group: Flow content
info_box:
    - api: HTMLElement
    - permitted_aria_roles:
        - application
        - document
    - tag_omission: none
interactive_example: https://interactive-examples.mdn.mozilla.net/pages/tabbed/article.html
browser_compatibility: html.elements.article
examples:
    - examples/simple-article-example
attributes:
    global: /content/html/global_attributes
recipe: html-element

I would as usual be happy to hear what @Elchi3 or @ddbeck think.

Acceptance criteria:

info box structures are defined and code written to process them into JSON
info box data is updated for all HTML element and input element pages

Make JSON builder able to build more than one item

From #14 (comment)

Ability to use the cli to, by default, render ALL elements.
Ability to render select few. E.g. node scripts/package/package.js video blockquote abbr to only build the HTML for those 3.

This will probably require us to fix a bunch of data that isn't currently formatted properly.

Enforce uniqueness of example titles

See mdn/yari#3 (comment). If we want to set IDs for example iframes, and derive them from the slugified title, then we should make sure example titles are unique.

Support index/landing pages

MDN has "landing pages". From the meta-docs:

A landing page serves as a menu, of sorts, for its subpages, and is therefore primarily a navigation page. A landing page layout is typically used for the root page of a tree of pages about a particular topic. It opens with a brief summary of the topic, then presents a structured list of links to its subpages, and optionally, additional material that be useful to the reader... The list of subpages can be generated automatically...

So typically they are made of some prose content on some high-level topic like HTML, followed by a collection of links to the pages comprising the docs for this topic. Stumptown is going to need to support these things.

One basic decision is whether they are hand-maintained or (partially) generated automatically. Our actual landing pages are sometimes hand-maintained (e.g. Learn, JavaScript). If landing pages are hand-maintained, then they are just guide pages and there is nothing special to do here.

But it seems quite desirable to generate them, and in that case they would need some kind of special treatment in stumptown. It would be nice if we could reuse the related_content thing for this - there's a sense in which landing pages are just like sidebars except they get to occupy the whole page, and it would be good for consistency if the landing page and sidebars contain the same basic content, in the same organization.

Landing pages might have at least two things that sidebars don't:

an overview prose section
a short description of each linked page, so readers get more than just the link text to explain what the linked page is about.

I wonder if both of these things could be just extra properties of the related_content object. Then the renderer could omit them when it's rendering related_content as a sidebar, and include them when it is rendering related_content as a landing page.

One thing here is: if related_content contains short descriptions for pages, then it seems to make sense to use the short_description property for that. But guide pages as currently specified don't have short_description...

Acceptance criteria:

structure for all needed HTML landing pages are defined, written and checked into stumptown-content
code to process them into a form usable by stumptown-renderer is written and checked into stumptown-content

TravisCI

I see a .travis.yml file but nothing running.

Should spelling be breaking CI?

E.g. https://travis-ci.org/mdn/stumptown-content/builds/572764499

(if the logs disappear, here's rougly what it says)

> [email protected] spell-md /home/travis/build/mdn/stumptown-content
> mdspell -a -n -r -x --en-us 'content/**/!(*contributors).md'

    �[1mcontent/html/guides/Applying_color.md�[22m
       27 |  as the addition of under- or �[31moverlines�[39m, strike-through lines, and so 
       39 | orations (such as underlines, �[31mstrikethroughs�[39m, etc) use the `color` propert 
       97 |     Lets you draw 2D �[31mbitmapped�[39m graphics in a [`<canvas>`](/e 
      105 | he Web Graphics Library is an �[31mOpenGL�[39m ES-based API for drawing high 
      125 | s a number between 0 and 255 (�[31m0x00�[39m and 0xFF) or, optionally, as  
      125 | r between 0 and 255 (0x00 and �[31m0xFF�[39m) or, optionally, as a number  
      125 | as a number between 0 and 15 (�[31m0x0�[39m and 0xF). All components _mus 
      125 | ber between 0 and 15 (0x0 and �[31m0xF�[39m). All components _must_ be sp 
      169 | ite). Image courtesy of user [�[31mSharkD�[39m](http://commons.wikimedia.org 
      171 | ees (`deg`), radians (`rad`), �[31mgradians�[39m (`grad`), or turns (`turn`).  
      178 | 2. Then select a �[31mgrayscale�[39m paint that corresponds how br 
      193 | t a color. Perhaps you have a �[31mcustomizable�[39m user interface, or you're imp 
      215 | rk. For example, the website [�[31mColorZilla�[39m](http://www.colorzilla.com/)  
      226 | - [�[31mPaletton�[39m](http://paletton.com) 
      237 | my.org/) in association with [�[31mPixar�[39m](https://www.pixar.com/)) 
      239 | o express ideas. Presented by �[31mPixar�[39m artists and designers. 
      255 | - [�[31mMedline�[39m Plus: Color Blindness](https: 
      256 | - [American Academy of �[31mOphthamology�[39m: What Is Color Blindness?](ht 
      257 | 010/02/color-blindness.html) (�[31mUsability.gov�[39m: United States Department of  
      261 | there. We carefully avoid the �[31mmockups�[39m and the photos from movies. A 
      261 | hoto taken by one of the Mars �[31mlanders�[39m humanity has parked on the su 
      265 | ur palette. We decide to use [�[31mPaletton�[39m](http://www.paletton.com/) to 
      265 |  colors we need. Upon opening �[31mPaletton�[39m, we see: 
      273 |  (currently "Monochromatic"). �[31mPaletton�[39m computes an appropriate accen 
      299 | s a "don't print backgrounds" �[31mcheckbox�[39m in a print dialog box), that  

�[31m>>�[39m 25 spelling errors found in 274 files

I haven't been involved in the spell-md integration work but this seems very draconian. For example, what's wrong with the word "checkbox" given the eco-system we're in?

One can configure branches of a matrix in .travis.yml to allow failures and they show up as warnings. But!! ...who clicks into a TravisCI build (to see the warnings) as soon as you get that sweet green checkbox icon on the pull request?

To me, a much more attractive option would be to only run md-spell if all the other CI stuff passes. And, if there are spell check problems, post that as an automated comment on the pull request. As a matter of fact, since you have to script that anyway, would could do something like this:

function didmarkdownchange() {
    git diff --name-only origin/master | grep "\.md" && return
    false
}

if didmarkdownchange; then
    echo "At least one .md file changed"
    # TODO make it only spell check *changed* files
    md-spell ...
else
    echo "No .md changes, no spellcheck run"
fi

Enable the renderer to omit section headings

(as will become apparent, I'm not sure if this issue should live in stumptown or mdn2)

We now have sections demarcated in the Markdown using H2 headings, and this get reflected in the rendered HTML.

But there's another problem: in the MDN pages we don't always want to show a heading for every section. In particular "Short description" and "Overview" don't get a heading: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video.

In the current system we don't have a way to express this.

I can think of a couple of options:

we keep stumptown the same, but mdn2 (the renderer) gets some config data telling it which headings to include. This is of a piece with this suggestion: #27 (comment) that we might want to separate the "recipe" into stuff that describes what documentation we need to supply for an item (which is stumptown's realm), and stuff that describes how we want to present that documentation in MDN (which is mdn2's realm).
we change (again!) the way we represent sections, to omit titles for sections that should not get headings.

It feels to me like (1) is the right thing to do, because this is a rendering issue really. And I'm concerned we will keep coming up with issues like this, and eventually we will need to address it properly. But I'm concerned about duplicating information in the recipe and then having to worry about them staying in sync.

@ddbeck , @Elchi3 , @peterbe , I'm interested in opinions about this, both about the overall approach and details of how we could do it.

Catch invalid "code syntax highlighting markers"

It's hard to link to a specific line in markdown files in GitHub: https://github.com/mdn/stumptown-content/blob/master/content/html/reference/elements/abbr/docs.md#accessibility-concerns
You have to click to view Raw to see what I'm talking about:

``` {.brush: .html}
<p>JavaScript Object Notation (<abbr>JSON</abbr>) is a lightweight data-interchange format.</p>
```

What it should be is:

```html
<p>JavaScript Object Notation (<abbr>JSON</abbr>) is a lightweight data-interchange format.</p>
```

I think this {.brush: .html} is a relic from the Kuma wiki raw content.

This'll happen from time to time and it barfs up the renderer's ability to syntax highlight. The renderer will need to decide that to do if the string there isn't html or css or wasm or something else it can expect.

I thing the best course of action for the renderer is to swallow all such troublemakers, make a console warning, and leave it be. Ideally, in the linters here on the content side, it should clean throw an error.

Ditch mdn_url for slug + locale?

Here's what the related_content looks like:

            {
              "title": "<dialog>: The Dialog element",
              "mdn_url": "https://developer.mozilla.org/docs/Web/HTML/Element/dialog"
            },
            {
              "title": "<div>: The Content Division element",
              "mdn_url": "https://developer.mozilla.org/docs/Web/HTML/Element/div"
            },
            {
              "title": "<dl>: The Description List element",
              "mdn_url": "https://developer.mozilla.org/docs/Web/HTML/Element/dl"
            },
            {
              "title": "<dt>: The Description Term element",
              "mdn_url": "https://developer.mozilla.org/docs/Web/HTML/Element/dt"
            },
            {

...for example.

The domain name definitely feels redundant so that's probably a slam dunk.
But this /docs/ thing is just a Wiki left-over that I'm not sure anybody knows anymore why we have it (but let's not worry about removing it just yet!).
What really matters is the Web/HTML/Element/dialog string and when we get there it might smart to also keep the locale (e.g. en-US).

On the renderer, we have a hacky function called fixRelatedContentURIs which, currently, just strips the mdn_url down to a key called uri which is just the path part.

mdn / stumptown-content Goto Github PK

stumptown-content's Introduction

MDN Web Docs on Github

Invited experts

Joshua Chen

Hidde de Vries

Scott O'Hara

André Jaenisch

Mendy Berger

stumptown-content's People

Contributors

Stargazers

Watchers

Forkers

stumptown-content's Issues

Acceptance criteria

Tasks

Variations on <code>

Custom element magic

Value section

Validation section

Attribute handling

Conclusions

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Variations on `<code>`