Documentation and tests related to Jupyter's Markdown syntax
jupyter / jupyter_markdown Goto Github PK
View Code? Open in Web Editor NEWDocumentation and tests related to Jupyter's Markdown syntax
License: BSD 3-Clause "New" or "Revised" License
Documentation and tests related to Jupyter's Markdown syntax
License: BSD 3-Clause "New" or "Revised" License
Right now the test notebooks are created by splitting the json at intervals of 100 tests.
This creates notebooks that are larger than is ideal and also loses the opportunity to group together important information about the types of tests contained in the notebooks.
Inside the json spec, it should be possible to distinguish which features are being tested by which tests. We should create notebooks on the basis of those distinctions.
So, for example, you would want to group all of the tests related to lists in a single notebook.
If one feature ever has more than 100 tests, then it may make sense to split it into multiple notebooks (ideally, on the basis of some systematic feature of the tests). However, I do not think that is the case today and so we shouldn't need to worry about that.
Currently this writes out the notebook as a notebook and stops there to be explored by hand & eye.
We also have nbconvert
which has an html exporter that uses mistune to convert the markdown to html.
Mistune has different behaviour from marked so it should be checked separately.
The current spec.json file is not for commonmark as can be seen in https://github.com/jupyter/jupyter_markdown/blob/master/spec.json#L1694-L1704:
{
"markdown": "| foo | bar |\n| --- | --- |\n| baz | bim |\n",
"html": "<table>\n<thead>\n<tr>\n<th>foo</th>\n<th>bar</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>baz</td>\n<td>bim</td>\n</tr></tbody></table>\n",
"example": 189,
"start_line": 3197,
"end_line": 3214,
"section": "Tables (extension)",
"extensions": [
"table"
]
},
In the commonmark spec.json there are no extensions included which makes me think that the current example is derived from github flavoured markdown, not commonmark.
This should be easily avoided by completed #4, but if someone wanted to pursue this instead hardcoding the commonmark 0.27 spec.json in this repo, this technically is a separate issue.
Just wrote some JS that can be put into a notebook that can compare the rendered markdown with the target HTML.
%%javascript
var items = $("div.text_cell_render.rendered_html").map(function() {
return this.innerHTML;
});
var n = items.length;
var i;
for (i=0; i<n; i=i+2) {
if (!(items[i]===items[i+1])) {
console.log("Doesn't match: ", i, items[i])
}
}
This will compare even with odd numbered cells. For the notebooks that have the headers, will need to play with the loop variables.
Currently the test cells and the code cells with canonical output have no way of being automatically matched.
Adding tags to the metadata of the cell objects would more easily allow for programmatic exploration of the notebook results.
This may require the use of a browser emulating library (such as selenium) or actually creating a Chrome instance that will save the resulting file to disk.
This is going to be hard and I'm not sure how to implement it off the top of my head, but it would be great if we can figure it out.
Related to #7
In line with the issue at the heart of #4, I think we should include the canonical notebooks for the most recent version of the spec inside this repo. That way we can consistently have a collection of source notebooks that act as the canonical examples of these tests.
Secondary motivation:
We will eventually want to supplement these with any officially endorsed modifications/extensions. But that will be a problem for a later date. Having the mechanisms in place now for using the canonical notebook set will make it easier for us to incorporate those extensions eventually.
Github also has a spec for their markdown rendering:
https://github.github.com/gfm/
More details: https://githubengineering.com/a-formal-spec-for-github-markdown/
They do not currently support a web endpoint we can use to grab versions of the spec.json
.
So we would need to address that directly.
One solution would require creating it locally and storing it with every released version as part of this library (undesirable).
Alternatively, we could do a bit of GitHub site scraping and pull down spec_tests.py
, cmark.py
, & spec.txt
& run it with a subprocess call. Slightly better but then we won't be able to get the latest released version (or know about our behaviour on multiple versions (which is ideal).
The readme should cover in greater detail what this repo is for.
The readme should also have a guide for how to run the tests and what to expect as a result of the output.
We should come up with a roadmap for all that we want to do in this repo, possibly include it in the README with GFM checkboxes.
Are we going to follow the spec when it comes to raw block html?
for example suppose that the following are markdown cells:
<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
If you look at the resultant html from marked you get something that looks like
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<span><a href="https://google.com" target="_blank">a link in a span</a></span>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div><br><span><a href="https://google.com" target="_blank">a link in a span</a></span>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
Currently mistune interprets none of these links, it's not too hard to modify the MarkdownWithMath
class to include the default options needed in order to enable the first example paragraph to be converted correctly on html export. However, the other cases do not behave correctly. Marked is pretty unique in its tendency to parse all of that with inline html as valid.
This is not the first or the only time that this will come up.
I've figured out at least one way to "fix" this behaviour in mistune (relating to whether block html is presumed to be followed by an empty line), but such a "fix" introduces errors elsewhere.
See babelmark2 on these cases for a comparison of a bunch of examples (though not mistune).
NB: I'm going to crosspost this to nbconvert (but for this paragraph) because I want more people to have a say in this conversation. I wish this repo were not private, since I'd want to start the conversation about our general markdown policy here rather than in either nbconvert or the notebook. But in the meantime, it seems to be the biggest issue for nbconvert.
We will want to test against multiple versions of the spec.json
.
This will be important if we are to detect how our tools are evolving in comparison with the spec.
Currently the only two on commonmark's site that have spec.json endpoints are 0.27 and 0.28.
This is going to be a little trickier since it requires a few separate steps, so while I'm going to tag this as sprint friendly, I would not recommend that someone tries to do this without direct guidance from a maintainer.
Per #6 it will be possible to get a version of the html that is available "in-memory" from nbconvert's HTML exporter.
One simple way to do this in the method currently used is to write it to the file system and then read it back from the file system.
Then we can parse this object, using tools like BeautifulSoup, html5lib, or the standard library's html.parser
module.
This will be harder to achieve with the marked version of the html rendering, I'll open a separate issue for that.
Per commonmark/commonmark-spec#482 you can now get the spec.json from the website rather than needing to treat it as a static file.
The script should be modified to use a web query (e.g., using requests.get()
) on the fly to the site http://spec.commonmark.org/0.27/spec.json
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.