GithubHelp home page GithubHelp logo

Comments (6)

Andre601 avatar Andre601 commented on September 25, 2024

Forgot to add another solution/workaround.
Adding an empty line after the header also prevents the code block issue.

I would assume that this is some block-related rendering behaviour?

from markdown.

facelessuser avatar facelessuser commented on September 25, 2024

I do agree it is weird that there are some cases where the paragraph under the header is getting turned into code blocks. I'm not sure if this is a list issue or a header extension issue within lists. I do know that lists especially have a few quirky issues like this. I do think behavior should be more consistent in lists. The fact that headers handle this case outside of lists fine but have issues in lists should probably be looked into.

With that said, for most consistent behavior, It is always best to keep blocks separate. Generally, Python Markdown expects blocks to have new lines between them.

import markdown

MD = """
-   ### List 1

    Entry 1.1

    Entry 1.2

-   ### List 2

    Entry 2.1

    Entry 2.2

-   ### List 3

    Entry 3.1

    Entry 3.2
"""

html = markdown.markdown(
    MD,
    extensions=[],
)

print(html)
<ul>
<li>
<h3>List 1</h3>
<p>Entry 1.1</p>
<p>Entry 1.2</p>
</li>
<li>
<h3>List 2</h3>
<p>Entry 2.1</p>
<p>Entry 2.2</p>
</li>
<li>
<h3>List 3</h3>
<p>Entry 3.1</p>
<p>Entry 3.2</p>
</li>
</ul>

from markdown.

waylan avatar waylan commented on September 25, 2024

I haven't looked closely at each example given yet (I will when I have time), but the first thing I would check is the reference implementation. Is our behavior any different? For any example that our behavior matches the reference implementation, I would expect that to be the correct behavior (unless it is clearly a bug in the reference implementation, which does happen on occasion). If however, the behavior between implementations differs, then we probably have a bug here.

As a general observation, there are a lot of subtleties with list parsing. Especially when you get into differences between tight (blank lines between items) and loose (no blank lines between items) lists. As loose list items always contain block level children, I can see an argument that any list item which contains a heading (which is clearly a block level element) should get loose list behavior even without the blank lines, but that is not how the reference implementation works, so we don't either. I'm assuming that this is what is leading to the unexpected output.

With that said, for most consistent behavior, It is always best to keep blocks separate. Generally, Python Markdown expects blocks to have new lines between them.

This is generally good advice. Yes, it is true that Markdown can work with all sorts of weird edge cases. However, for consistent results across all implementations I always format all of my Markdown according to the strictest linting rules, such as always including a blank line between all block level elements, no matter what. That has become especially important with the popularity of Commonmark, which handles many edge cases differently that old-school Markdown. My Markdown always renders the same with both Commonmark (on GitHub) and Python-Markdown (on my own sites) because I follow those strict linting rules and I avoid the various weird behaviors raised here.

To be clear, I am not suggesting that we shouldn't bother to fix an edge case if the behavior is clearly wrong because it can be avoided by using a stricture set of rules. What I am saying is that because the correct behavior (as defined by Markdown rather vague syntax rules) is not always clear, it is easier to avoid surprises if you stick to those stricter rules. In fact, for the documentation on this project, we run all proposed changes through the linter tool to enforce those stricter rules.

from markdown.

Andre601 avatar Andre601 commented on September 25, 2024

Something I want to point out real quick.

The linting rule you linked show 2 spaces as proper indent, which is also the default, yet your markdown parser is requiring 4 spaces, no matter what, for proper indents.
Why?

from markdown.

waylan avatar waylan commented on September 25, 2024

Had a chance to look at these.

Test 1 and Test 2 both demonstrate the same bug. There should be no code blocks (paragraphs instead). What is really strange is that the first item is correct, but the subsequent items are wrong.

Test 3 looks correct, but when you check against the reference implementation, it is also wrong. I think this one is interesting in that because there are no blank lines, the reference implementation sees it as a tight list. Presumably, the idea is that a tight list item does not contain any block level children. Therefore it is parsed as inline text only. markdown.pl returns the following result:

<ul>
<li>### List 1
Entry 1.1</li>
<li>### List 2
Entry 2.1</li>
<li>### List 3
Entry 3.1</li>
</ul>

According to Babelmark, there is a lot of variability across implementations with this one. Not sure what to think about it. Regardless, I am inclined to not treat this as the same bug. In fact, I may ignore it altogether.

I'm not sure what is going on with Test 4 as a heading should never be more than one line (a heading always ends at the first newline). However, what is even more curious, is that this specific edge case results in the bug in Tests 1 and 2 being avoided. Add the additional indentation, and we get those issues back. It looks like Test 5 is a workaround to avoid the issues. I suspect Tests 4 and 5 will help in working out what is causing the issues in Tests 1 and 2.

Thanks for posting this. This is clearly a bug. A bug I never would have found as I always follow a heading with a blank line in my own documents.

from markdown.

waylan avatar waylan commented on September 25, 2024

The reason these edge cases are not so clear is because lists support hanging indents. For example, these two list items are parsed the same way:

-   line one of one paragraph
    line 2 of the same paragraph

-   line one of one paragraph
line 2 of the same paragraph 

However, because a heading can only ever be one line, then that forces the second line to start a new paragraph, which is unintuitive. For example, the following two list items get parsed very differently:

- # A Heading
A paragraph in the list item.

- # A Heading

A paragraph outside the list.

Yet, when we take those out of a list, then they get parsed the same.

# A Heading
A paragraph

# A Heading

A paragraph

All of these differences make for a challenge when developing a parser that works consistently and unsurprisingly. An additional complication is that the rules are not comprehensive and some edge cases of the reference implementation don't seem to be consistent with what one might expect having read the rules. I suppose that is why Commonmark completely abandoned the original rules and reimplemented a completely different scheme for parsing list items. But we are not a Commonmark parser, so we are stuck with the weirdness that is old-school Markdown.

from markdown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.