jupyter / jupyter_markdown Goto Github PK

View Code? Open in Web Editor NEW

8.0 10.0 10.0 42 KB

Documentation and tests related to Jupyter's Markdown syntax

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 97.83% Python 2.17%

jupyter_markdown's Introduction

jupyter_markdown

Documentation and tests related to Jupyter's Markdown syntax

jupyter_markdown's People

Contributors

Stargazers

Watchers

Forkers

pxhanus mpacer the-cc-dev isabella232 global-localhost global19 global19-atlassian-net

jupyter_markdown's Issues

Organised created notebooks markdown feature, not absolute number

Right now the test notebooks are created by splitting the json at intervals of 100 tests.

This creates notebooks that are larger than is ideal and also loses the opportunity to group together important information about the types of tests contained in the notebooks.

Inside the json spec, it should be possible to distinguish which features are being tested by which tests. We should create notebooks on the basis of those distinctions.

So, for example, you would want to group all of the tests related to lists in a single notebook.

If one feature ever has more than 100 tests, then it may make sense to split it into multiple notebooks (ideally, on the basis of some systematic feature of the tests). However, I do not think that is the case today and so we shouldn't need to worry about that.

Action Plan

Add test for nbconvert's html rendering with mistune

Currently this writes out the notebook as a notebook and stops there to be explored by hand & eye.

We also have nbconvert which has an html exporter that uses mistune to convert the markdown to html.

Mistune has different behaviour from marked so it should be checked separately.

Use the commonmark spec.json

The current spec.json file is not for commonmark as can be seen in https://github.com/jupyter/jupyter_markdown/blob/master/spec.json#L1694-L1704:

  {
    "markdown": "| foo | bar |\n| --- | --- |\n| baz | bim |\n",
    "html": "<table>\n<thead>\n<tr>\n<th>foo</th>\n<th>bar</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>baz</td>\n<td>bim</td>\n</tr></tbody></table>\n",
    "example": 189,
    "start_line": 3197,
    "end_line": 3214,
    "section": "Tables (extension)",
    "extensions": [
      "table"
    ]
  },

In the commonmark spec.json there are no extensions included which makes me think that the current example is derived from github flavoured markdown, not commonmark.

This should be easily avoided by completed #4, but if someone wanted to pursue this instead hardcoding the commonmark 0.27 spec.json in this repo, this technically is a separate issue.

Systematic in-browser comparison between rendered markdown & expected literal html

Just wrote some JS that can be put into a notebook that can compare the rendered markdown with the target HTML.

%%javascript
var items = $("div.text_cell_render.rendered_html").map(function() {
    return this.innerHTML;
});
var n = items.length;
var i;
for (i=0; i<n; i=i+2) {
    if (!(items[i]===items[i+1])) {
        console.log("Doesn't match: ", i, items[i])
    }
}

This will compare even with odd numbered cells. For the notebooks that have the headers, will need to play with the loop variables.

Add paired tags to the markdown cell and the canonical rendering

Currently the test cells and the code cells with canonical output have no way of being automatically matched.

Adding tags to the metadata of the cell objects would more easily allow for programmatic exploration of the notebook results.

Automatically obtain the notebook html as it would be rendered in browser with marked

This may require the use of a browser emulating library (such as selenium) or actually creating a Chrome instance that will save the resulting file to disk.

This is going to be hard and I'm not sure how to implement it off the top of my head, but it would be great if we can figure it out.

Related to #7

Create all test notebooks and store them in the repo

In line with the issue at the heart of #4, I think we should include the canonical notebooks for the most recent version of the spec inside this repo. That way we can consistently have a collection of source notebooks that act as the canonical examples of these tests.

Secondary motivation:
We will eventually want to supplement these with any officially endorsed modifications/extensions. But that will be a problem for a later date. Having the mechanisms in place now for using the canonical notebook set will make it easier for us to incorporate those extensions eventually.

Test against GitHub Flavored Markdown (GFM) spec

Github also has a spec for their markdown rendering:
https://github.github.com/gfm/

More details: https://githubengineering.com/a-formal-spec-for-github-markdown/

They do not currently support a web endpoint we can use to grab versions of the spec.json.

So we would need to address that directly.

One solution would require creating it locally and storing it with every released version as part of this library (undesirable).

Alternatively, we could do a bit of GitHub site scraping and pull down spec_tests.py, cmark.py, & spec.txt & run it with a subprocess call. Slightly better but then we won't be able to get the latest released version (or know about our behaviour on multiple versions (which is ideal).

Improve README; Depth & Breadth

The readme should cover in greater detail what this repo is for.

The readme should also have a guide for how to run the tests and what to expect as a result of the output.

We should come up with a roadmap for all that we want to do in this repo, possibly include it in the README with GFM checkboxes.

Question about spec adherance & our general policy.

Are we going to follow the spec when it comes to raw block html?

for example suppose that the following are markdown cells:

<div>[a link in a block](https://google.com)</div>

<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>  
<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>

<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>

If you look at the resultant html from marked you get something that looks like

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<span><a href="https://google.com" target="_blank">a link in a span</a></span>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div><br><span><a href="https://google.com" target="_blank">a link in a span</a></span>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

Currently mistune interprets none of these links, it's not too hard to modify the MarkdownWithMath class to include the default options needed in order to enable the first example paragraph to be converted correctly on html export. However, the other cases do not behave correctly. Marked is pretty unique in its tendency to parse all of that with inline html as valid.

This is not the first or the only time that this will come up.

I've figured out at least one way to "fix" this behaviour in mistune (relating to whether block html is presumed to be followed by an empty line), but such a "fix" introduces errors elsewhere.

See babelmark2 on these cases for a comparison of a bunch of examples (though not mistune).

NB: I'm going to crosspost this to nbconvert (but for this paragraph) because I want more people to have a say in this conversation. I wish this repo were not private, since I'd want to start the conversation about our general markdown policy here rather than in either nbconvert or the notebook. But in the meantime, it seems to be the biggest issue for nbconvert.

Test against all available versions of the spec (including newly released versions)

We will want to test against multiple versions of the spec.json.

This will be important if we are to detect how our tools are evolving in comparison with the spec.

Currently the only two on commonmark's site that have spec.json endpoints are 0.27 and 0.28.

This is going to be a little trickier since it requires a few separate steps, so while I'm going to tag this as sprint friendly, I would not recommend that someone tries to do this without direct guidance from a maintainer.

Parse output html from the notebook as an in-memory object

Per #6 it will be possible to get a version of the html that is available "in-memory" from nbconvert's HTML exporter.

One simple way to do this in the method currently used is to write it to the file system and then read it back from the file system.

Then we can parse this object, using tools like BeautifulSoup, html5lib, or the standard library's html.parser module.

This will be harder to achieve with the marked version of the html rendering, I'll open a separate issue for that.

Pull down spec.json from commonmark website

Per commonmark/commonmark-spec#482 you can now get the spec.json from the website rather than needing to treat it as a static file.

The script should be modified to use a web query (e.g., using requests.get()) on the fly to the site http://spec.commonmark.org/0.27/spec.json.

jupyter / jupyter_markdown Goto Github PK

jupyter_markdown's Introduction

jupyter_markdown

jupyter_markdown's People

Contributors

Stargazers

Watchers

Forkers

jupyter_markdown's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs