GithubHelp home page GithubHelp logo

jupyter / jupyter_markdown Goto Github PK

View Code? Open in Web Editor NEW
8.0 10.0 10.0 42 KB

Documentation and tests related to Jupyter's Markdown syntax

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 97.83% Python 2.17%

jupyter_markdown's Introduction

jupyter_markdown

Documentation and tests related to Jupyter's Markdown syntax

jupyter_markdown's People

Contributors

ashutoshbondre avatar ellisonbg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jupyter_markdown's Issues

Organised created notebooks markdown feature, not absolute number

Right now the test notebooks are created by splitting the json at intervals of 100 tests.

This creates notebooks that are larger than is ideal and also loses the opportunity to group together important information about the types of tests contained in the notebooks.

Inside the json spec, it should be possible to distinguish which features are being tested by which tests. We should create notebooks on the basis of those distinctions.

So, for example, you would want to group all of the tests related to lists in a single notebook.

If one feature ever has more than 100 tests, then it may make sense to split it into multiple notebooks (ideally, on the basis of some systematic feature of the tests). However, I do not think that is the case today and so we shouldn't need to worry about that.

Use the commonmark spec.json

The current spec.json file is not for commonmark as can be seen in https://github.com/jupyter/jupyter_markdown/blob/master/spec.json#L1694-L1704:

  {
    "markdown": "| foo | bar |\n| --- | --- |\n| baz | bim |\n",
    "html": "<table>\n<thead>\n<tr>\n<th>foo</th>\n<th>bar</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>baz</td>\n<td>bim</td>\n</tr></tbody></table>\n",
    "example": 189,
    "start_line": 3197,
    "end_line": 3214,
    "section": "Tables (extension)",
    "extensions": [
      "table"
    ]
  },

In the commonmark spec.json there are no extensions included which makes me think that the current example is derived from github flavoured markdown, not commonmark.

This should be easily avoided by completed #4, but if someone wanted to pursue this instead hardcoding the commonmark 0.27 spec.json in this repo, this technically is a separate issue.

Systematic in-browser comparison between rendered markdown & expected literal html

Just wrote some JS that can be put into a notebook that can compare the rendered markdown with the target HTML.

%%javascript
var items = $("div.text_cell_render.rendered_html").map(function() {
    return this.innerHTML;
});
var n = items.length;
var i;
for (i=0; i<n; i=i+2) {
    if (!(items[i]===items[i+1])) {
        console.log("Doesn't match: ", i, items[i])
    }
}

This will compare even with odd numbered cells. For the notebooks that have the headers, will need to play with the loop variables.

Create all test notebooks and store them in the repo

In line with the issue at the heart of #4, I think we should include the canonical notebooks for the most recent version of the spec inside this repo. That way we can consistently have a collection of source notebooks that act as the canonical examples of these tests.

Secondary motivation:
We will eventually want to supplement these with any officially endorsed modifications/extensions. But that will be a problem for a later date. Having the mechanisms in place now for using the canonical notebook set will make it easier for us to incorporate those extensions eventually.

Test against GitHub Flavored Markdown (GFM) spec

Github also has a spec for their markdown rendering:
https://github.github.com/gfm/

More details: https://githubengineering.com/a-formal-spec-for-github-markdown/

They do not currently support a web endpoint we can use to grab versions of the spec.json.

So we would need to address that directly.

One solution would require creating it locally and storing it with every released version as part of this library (undesirable).

Alternatively, we could do a bit of GitHub site scraping and pull down spec_tests.py, cmark.py, & spec.txt & run it with a subprocess call. Slightly better but then we won't be able to get the latest released version (or know about our behaviour on multiple versions (which is ideal).

Improve README; Depth & Breadth

The readme should cover in greater detail what this repo is for.

The readme should also have a guide for how to run the tests and what to expect as a result of the output.

We should come up with a roadmap for all that we want to do in this repo, possibly include it in the README with GFM checkboxes.

Question about spec adherance & our general policy.

Are we going to follow the spec when it comes to raw block html?

for example suppose that the following are markdown cells:

<div>[a link in a block](https://google.com)</div>

<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>  
<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>
<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>
<div>[a link in a block](https://google.com)</div>

<span>[a link in a span](https://google.com)</span>

<div>[a link in a block](https://google.com)</div>

<div>[a link in a block](https://google.com)</div>

If you look at the resultant html from marked you get something that looks like

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<span><a href="https://google.com" target="_blank">a link in a span</a></span>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div><br><span><a href="https://google.com" target="_blank">a link in a span</a></span>

<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

<p><span><a href="https://google.com" target="_blank">a link in a span</a></span></p>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>
<div><a href="https://google.com" target="_blank">a link in a block</a></div>

Currently mistune interprets none of these links, it's not too hard to modify the MarkdownWithMath class to include the default options needed in order to enable the first example paragraph to be converted correctly on html export. However, the other cases do not behave correctly. Marked is pretty unique in its tendency to parse all of that with inline html as valid.

This is not the first or the only time that this will come up.

I've figured out at least one way to "fix" this behaviour in mistune (relating to whether block html is presumed to be followed by an empty line), but such a "fix" introduces errors elsewhere.

See babelmark2 on these cases for a comparison of a bunch of examples (though not mistune).

NB: I'm going to crosspost this to nbconvert (but for this paragraph) because I want more people to have a say in this conversation. I wish this repo were not private, since I'd want to start the conversation about our general markdown policy here rather than in either nbconvert or the notebook. But in the meantime, it seems to be the biggest issue for nbconvert.

Test against all available versions of the spec (including newly released versions)

We will want to test against multiple versions of the spec.json.

This will be important if we are to detect how our tools are evolving in comparison with the spec.

Currently the only two on commonmark's site that have spec.json endpoints are 0.27 and 0.28.

This is going to be a little trickier since it requires a few separate steps, so while I'm going to tag this as sprint friendly, I would not recommend that someone tries to do this without direct guidance from a maintainer.

Parse output html from the notebook as an in-memory object

Per #6 it will be possible to get a version of the html that is available "in-memory" from nbconvert's HTML exporter.

One simple way to do this in the method currently used is to write it to the file system and then read it back from the file system.

Then we can parse this object, using tools like BeautifulSoup, html5lib, or the standard library's html.parser module.

This will be harder to achieve with the marked version of the html rendering, I'll open a separate issue for that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.