GithubHelp home page GithubHelp logo

vsch / flexmark-java Goto Github PK

View Code? Open in Web Editor NEW
2.2K 58.0 261.0 152.52 MB

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.

License: BSD 2-Clause "Simplified" License

Java 97.24% Shell 0.05% JavaScript 0.02% HTML 2.59% CSS 0.10%
commonmark pegdown java markdown-parser markdown-processor markdown-conversion markdown-flavors markdown markdown-to-html html-to-markdown markdown-to-pdf

flexmark-java's Introduction

Flexmark Icon Logo flexmark-java

flexmark-java is a Java implementation of CommonMark (spec 0.28) parser using the blocks first, inlines after Markdown parsing architecture.

Its strengths are speed, flexibility, Markdown source element based AST with details of the source position down to individual characters of lexemes that make up the element and extensibility.

The API allows granular control of the parsing process and is optimized for parsing with a large number of installed extensions. The parser and extensions come with plenty of options for parser behavior and HTML rendering variations. The end goal is to have the parser and renderer be able to mimic other parsers with great degree of accuracy. This is now partially complete with the implementation of Markdown Processor Emulation

Motivation for this project was the need to replace pegdown parser in my Markdown Navigator plugin for JetBrains IDEs. pegdown has a great feature set but its speed in general is less than ideal and for pathological input either hangs or practically hangs during parsing.

⚠️ Version 0.60.0 has breaking changes due to re-organization, renaming, clean up and optimization of implementation classes. Changes are detailed in Version-0.60.0-Changes.

latest Maven Central status Javadocs

Requirements

  • For Versions 0.62.2 or below, Java 8 or above, Java 9+ compatible. For Versions 0.64.0 or above, Java 11 or above.

  • The project is on Maven: com.vladsch.flexmark

  • The core has no dependencies other than org.jetbrains:annotations:24.0.1. For extensions, see extension description below.

    The API is still evolving to accommodate new extensions and functionality.

Quick Start

For Maven, add flexmark-all as a dependency which includes core and all modules to the following sample:

<dependency>
    <groupId>com.vladsch.flexmark</groupId>
    <artifactId>flexmark-all</artifactId>
    <version>0.64.8</version>
</dependency>

Source: BasicSample.java

package com.vladsch.flexmark.samples;

import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.data.MutableDataSet;

public class BasicSample {
    public static void main(String[] args) {
        MutableDataSet options = new MutableDataSet();

        // uncomment to set optional extensions
        //options.set(Parser.EXTENSIONS, Arrays.asList(TablesExtension.create(), StrikethroughExtension.create()));

        // uncomment to convert soft-breaks to hard breaks
        //options.set(HtmlRenderer.SOFT_BREAK, "<br />\n");

        Parser parser = Parser.builder(options).build();
        HtmlRenderer renderer = HtmlRenderer.builder(options).build();

        // You can re-use parser and renderer instances
        Node document = parser.parse("This is *Sparta*");
        String html = renderer.render(document);  // "<p>This is <em>Sparta</em></p>\n"
        System.out.println(html);
    }
}

Building via Gradle

implementation 'com.vladsch.flexmark:flexmark-all:0.64.8'

Building with Android Studio

Additional settings due to duplicate files:

packagingOptions {
    exclude 'META-INF/LICENSE-LGPL-2.1.txt'
    exclude 'META-INF/LICENSE-LGPL-3.txt'
    exclude 'META-INF/LICENSE-W3C-TEST'
    exclude 'META-INF/DEPENDENCIES'
}

More information can be found in the documentation:
Wiki Home     Usage Examples     Extension Details     Writing Extensions

Pegdown Migration Helper

PegdownOptionsAdapter class converts pegdown Extensions.* flags to flexmark options and extensions list. Pegdown Extensions.java is included for convenience and new options not found in pegdown 1.6.0. These are located in flexmark-profile-pegdown module but you can grab the source from this repo: PegdownOptionsAdapter.java, Extensions.java and make your own version, modified to your project's needs.

You can pass your extension flags to static PegdownOptionsAdapter.flexmarkOptions(int) or you can instantiate PegdownOptionsAdapter and use convenience methods to set, add and remove extension flags. PegdownOptionsAdapter.getFlexmarkOptions() will return a fresh copy of DataHolder every time with the options reflecting pegdown extension flags.

import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.profile.pegdown.Extensions;
import com.vladsch.flexmark.profile.pegdown.PegdownOptionsAdapter;
import com.vladsch.flexmark.util.data.DataHolder;

public class PegdownOptions {
     final private static DataHolder OPTIONS = PegdownOptionsAdapter.flexmarkOptions(
            Extensions.ALL
    );

    static final Parser PARSER = Parser.builder(OPTIONS).build();
    static final HtmlRenderer RENDERER = HtmlRenderer.builder(OPTIONS).build();

    // use the PARSER to parse and RENDERER to render with pegdown compatibility
}

Default flexmark-java pegdown emulation uses less strict HTML block parsing which interrupts an HTML block on a blank line. Pegdown only interrupts an HTML block on a blank line if all tags in the HTML block are closed.

To get closer to original pegdown HTML block parsing behavior use the method which takes a boolean strictHtml argument:

import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.profile.pegdown.Extensions;
import com.vladsch.flexmark.profile.pegdown.PegdownOptionsAdapter;
import com.vladsch.flexmark.util.data.DataHolder;

public class PegdownOptions {
     final private static DataHolder OPTIONS = PegdownOptionsAdapter.flexmarkOptions(true,
            Extensions.ALL
    );

    static final Parser PARSER = Parser.builder(OPTIONS).build();
    static final HtmlRenderer RENDERER = HtmlRenderer.builder(OPTIONS).build();

    // use the PARSER to parse and RENDERER to render with pegdown compatibility
}

A sample with a custom link resolver is also available, which includes link resolver for changing URLs or attributes of links and a custom node renderer if you need to override the generated link HTML.

ℹ️ flexmark-java has many more extensions and configuration options than pegdown in addition to extensions available in pegdown 1.6.0. Available Extensions via PegdownOptionsAdapter

Latest Additions and Changes

Releases, Bug Fixes, Enhancements and Support

I use flexmark-java as the parser for Markdown Navigator plugin for JetBrains IDEs. I tend to use the latest, unreleased version to fix bugs or get improvements. So if you find a bug that is a show stopper for your project or see a bug in github issues page marked fixed for next release that is affecting your project then please let me know and I may be able to promptly make a new release to address your issue. Otherwise, I will let bug fixes and enhancements accumulate thinking no one is affected by what is already fixed.

Extension points in the API are many and numerous

There are many extension options in the API with their intended use. A good soft-start is the flexmark-java-samples module which has simple samples for asked for extensions. The next best place is the source of an existing extension that has similar syntax to what you want to add.

If your extension lines up with the right API, the task is usually very short and sweet. If your extension uses the API in an unintended fashion or does not follow expected housekeeping protocols, you may find it an uphill battle with a rat's nest of if/else condition handling and fixing one bug only leading to creating another one.

Generally, if it takes more than a few dozen lines to add a simple extension, then either you are going about it wrong or the API is missing an extension point. If you look at all the implemented extensions you will see that most are a few lines of code other than boiler plate dictated by the API. That is the goal for this library: provide an extensible core that makes writing extensions a breeze.

The larger extensions are flexmark-ext-tables and flexmark-ext-spec-example, the meat of both is around 200 lines of code. You can use them as a guide post for size estimating your extension.

My own experience adding extensions shows that sometimes a new type of extension is best addressed with an API enhancement to make its implementation seamless, or by fixing a bug that was not visible before the extension stressed the API in just the right way. Your intended extension may just be the one requiring such an approach.

Don't hesitate to open an issue if you can't find the answer

The takeaway is: if you want to implement an extension or a feature please don't hesitate to open an issue and I will give you pointers on the best way to go about it. It may save you a lot of time by letting me improve the API to address your extension's needs before you put a lot of fruitless effort into it.

I do ask that you realize that I am chief cook and bottle washer on this project, without an iota of Vulcan Mind Melding skills. I do ask that you describe what you want to implement because I can't read your mind. Please do some reconnaissance background work around the source code and documentation because I cannot transfer what I know to you, without your willing effort.

Consulting is available

If you have a commercial application and don't want to write the extension(s) yourself or want to reduce the time and effort of implementing extensions and integrating flexmark-java, feel free to contact me. I am available on a consulting/contracting basis.

Markdown Processor Emulation

Despite its name, commonmark is neither a superset nor a subset of other markdown flavors. Rather, it proposes a standard, unambiguous syntax specification for the original, "core" Markdown, thus effectively introducing yet another flavor. While flexmark is by default commonmark compliant, its parser can be tweaked in various ways. The sets of tweaks required to emulate the most commonly used markdown parsers around are available in flexmark as ParserEmulationProfiles.

As the name ParserEmulationProfile implies, it's only the parser that is adjusted to the specific markdown flavor. Applying the profile does not add features beyond those available in commonmark. If you want to use flexmark to fully emulate another markdown processor's behavior, you have to adjust the parser and configure the flexmark extensions that provide the additional features available in the parser that you want to emulate.

A rewrite of the list parser to better control emulation of other markdown processors as per Markdown Processors Emulation is complete. Addition of processor presets to emulate specific markdown processing behaviour of these parsers is on a short to do list.

Some emulation families do a better better job of emulating their target than others. Most of the effort was directed at emulating how these processors parse standard Markdown and list related parsing specifically. For processors that extend original Markdown, you will need to add those extensions that are already implemented in flexmark-java to the Parser/Renderer builder options.

Extensions will be modified to include their own presets for specific processor emulation, if that processor has an equivalent extension implemented.

If you find a discrepancy please open an issue so it can be addressed.

Major processor families are implemented and some family members also:

ℹ️ profiles to encapsulate configuration details for variants within the family were added in 0.11.0:

  • CommonMark (default for family): ParserEmulationProfile.COMMONMARK
  • FixedIndent (default for family): ParserEmulationProfile.FIXED_INDENT
  • GitHub Comments (just CommonMark): ParserEmulationProfile.COMMONMARK
  • Old GitHub Docs: ParserEmulationProfile.GITHUB_DOC
  • Kramdown (default for family): ParserEmulationProfile.KRAMDOWN
  • Markdown.pl (default for family): ParserEmulationProfile.MARKDOWN
  • MultiMarkdown: ParserEmulationProfile.MULTI_MARKDOWN
  • Pegdown, with pegdown extensions use PegdownOptionsAdapter in flexmark-profile-pegdown
  • Pegdown, without pegdown extensions ParserEmulationProfile.PEGDOWN
  • Pegdown HTML block parsing rules, without pegdown extensions ParserEmulationProfile.PEGDOWN_STRICT

History and Motivation

flexmark-java is a fork of commonmark-java project, modified to generate an AST which reflects all the elements in the original source, full source position tracking for all elements in the AST and easier JetBrains Open API PsiTree generation.

The API was changed to allow more granular control of the parsing process and optimized for parsing with a large number of installed extensions. The parser and extensions come with many tweaking options for parser behavior and HTML rendering variations. The end goal is to have the parser and renderer be able to mimic other parsers with great degree of accuracy.

Motivation for this was the need to replace pegdown parser in Markdown Navigator plugin. pegdown has a great feature set but its speed in general is less than ideal and for pathological input either hangs or practically hangs during parsing.

commonmark-java has an excellent parsing architecture that is easy to understand and extend. The goal was to ensure that adding source position tracking in the AST would not change the ease of parsing and generating the AST more than absolutely necessary.

Reasons for choosing commonmark-java as the parser are: speed, ease of understanding, ease of extending and speed. Now that I have reworked the core and added a few extensions I am extremely satisfied with my choice.

Another goal was to improve the ability of extensions to modify parser behavior so that any dialect of markdown could be implemented through the extension mechanism. An extensible options API was added to allow setting of all options in one place. Parser, renderer and extensions use these options for configuration, including disabling some core block parsers.

This is a work in progress with many API changes. No attempt is made to keep backward API compatibility to the original project and until the feature set is mostly complete, not even to earlier versions of this project.

Feature Comparison

Feature flexmark-java commonmark-java pegdown
Relative parse time (less is better) ✔️ 1x (1) ✔️ 0.6x to 0.7x (2) ❌ 25x average, 20,000x to ∞ for pathological input (3)
All source elements in the AST ✔️ ✔️
AST elements with source position ✔️ ✔️ ✔️ with some errors and idiosyncrasies
AST can be easily manipulated ✔️ AST post processing is an extension mechanism ✔️ AST post processing is an extension mechanism ❌ not an option. No node's parent information, children as List<>.
AST elements have detailed source position for all parts ✔️ ❌ only node start/end
Can disable core parsing features ✔️
Core parser implemented via the extension API ✔️ instanceOf tests for specific block parser and node classes ❌ core exposes few extension points
Easy to understand and modify parser implementation ✔️ ✔️ ❌ one PEG parser with complex interactions (3)
Parsing of block elements is independent from each other ✔️ ✔️ ❌ everything in one PEG grammar
Uniform configuration across: parser, renderer and all extensions ✔️ ❌ none beyond extension list int bit flags for core, none for extensions
Parsing performance optimized for use with extensions ✔️ ❌ parsing performance for core, extensions do what they can ❌ performance is not a feature
Feature rich with many configuration options and extensions out of the box ✔️ ❌ limited extensions, no options ✔️
Dependency definitions for processors to guarantee the right order of processing ✔️ ❌ order specified by extension list ordering, error prone ❌ not applicable, core defines where extension processing is added
(1)

flexmark-java pathological input of 100,000 [ parses in 68ms, 100,000 ] in 57ms, 100,000 nested [ ] parse in 55ms

(2)

commonmark-java pathological input of 100,000 [ parses in 30ms, 100,000 ] in 30ms, 100,000 nested [ ] parse in 43ms

(3)

pegdown pathological input of 17 [ parses in 650ms, 18 [ in 1300ms

Progress

  • Parser options, items marked as a task item are to be implemented the rest are complete:
    • Typographic
      • Quotes
      • Smarts
    • GitHub Extensions
      • Fenced code blocks
      • Anchor links for headers with auto id generation
      • Table Spans option to be implemented for tables extension
      • Wiki Links with GitHub and Creole syntax
      • Emoji Shortcuts with use GitHub emoji URL option
    • GitHub Syntax
      • Strikethrough
      • Task Lists
      • No Atx Header Space
      • No Header indents
      • Hard Wraps (achieved with SOFT_BREAK option changed to "<br />")
      • Relaxed HR Rules Option
      • Wiki links
    • Publishing
      • Abbreviations
      • Footnotes
      • Definitions
      • Table of Contents
    • Suppress
      • inline HTML: all, non-comments, comments
      • HTML blocks: all, non-comments, comments
    • Processor Extensions
      • Jekyll front matter
      • Jekyll tag elements, with support for {% include file %}, Include Markdown and HTML File Content
      • GitBook link URL encoding. Not applicable
      • HTML comment nodes: Block and Inline
      • Multi-line Image URLs
      • Spec Example Element
    • Commonmark Syntax suppression
      • Manual loose lists
      • Numbered lists always start with 1.
      • Fixed list item indent, items must be indented by at least 4 spaces
      • Relaxed list start option, allow lists to start when not preceded by a blank line.

I am very pleased with the decision to switch to commonmark-java based parser for my own projects. Even though I had to do major surgery on its innards to get full source position tracking and AST that matches source elements, it is a pleasure to work with and is now a pleasure to extend. If you don't need source level element AST or the rest of what flexmark-java added and CommonMark is your target markdown parser then I encourage you to use commonmark-java as it is an excellent choice for your needs and its performance does not suffer for the overhead of features that you will not use.

Benchmarks

Latest, Jan 28, 2017 flexmark-java 0.13.1, intellij-markdown from CE EAP 2017, commonmark-java 0.8.0:

File commonmark-java flexmark-java intellij-markdown pegdown
README-SLOW 0.420ms 0.812ms 2.027ms 15.483ms
VERSION 0.743ms 1.425ms 4.057ms 42.936ms
commonMarkSpec 31.025ms 44.465ms 600.654ms 575.131ms
markdown_example 8.490ms 10.502ms 223.593ms 983.640ms
spec 4.719ms 6.249ms 35.883ms 307.176ms
table 0.229ms 0.623ms 0.800ms 3.642ms
table-format 1.385ms 2.881ms 4.150ms 23.592ms
wrap 3.804ms 4.589ms 16.609ms 86.383ms

Ratios of above:

File commonmark-java flexmark-java intellij-markdown pegdown
README-SLOW 1.00 1.93 4.83 36.88
VERSION 1.00 1.92 5.46 57.78
commonMarkSpec 1.00 1.43 19.36 18.54
markdown_example 1.00 1.24 26.34 115.86
spec 1.00 1.32 7.60 65.09
table 1.00 2.72 3.49 15.90
table-format 1.00 2.08 3.00 17.03
wrap 1.00 1.21 4.37 22.71
overall 1.00 1.41 17.47 40.11
File commonmark-java flexmark-java intellij-markdown pegdown
README-SLOW 0.52 1.00 2.50 19.07
VERSION 0.52 1.00 2.85 30.12
commonMarkSpec 0.70 1.00 13.51 12.93
markdown_example 0.81 1.00 21.29 93.66
spec 0.76 1.00 5.74 49.15
table 0.37 1.00 1.28 5.85
table-format 0.48 1.00 1.44 8.19
wrap 0.83 1.00 3.62 18.83
overall 0.71 1.00 12.41 28.48

Because these two files represent the pathological input for pegdown, I no longer run them as part of the benchmark to prevent skewing of the results. The results are here for posterity.

File commonmark-java flexmark-java intellij-markdown pegdown
hang-pegdown 0.082ms 0.326ms 0.342ms 659.138ms
hang-pegdown2 0.048ms 0.235ms 0.198ms 1312.944ms

Ratios of above:

File commonmark-java flexmark-java intellij-markdown pegdown
hang-pegdown 1.00 3.98 4.17 8048.38
hang-pegdown2 1.00 4.86 4.10 27207.32
overall 1.00 4.30 4.15 15151.91
File commonmark-java flexmark-java intellij-markdown pegdown
hang-pegdown 0.25 1.00 1.05 2024.27
hang-pegdown2 0.21 1.00 0.84 5594.73
overall 0.23 1.00 0.96 3519.73
  • VERSION.md is the version log file I use for Markdown Navigator
  • commonMarkSpec.md is a 33k line file used in intellij-markdown test suite for performance evaluation.
  • spec.txt commonmark spec markdown file in the commonmark-java project
  • hang-pegdown.md is a file containing a single line of 17 characters [[[[[[[[[[[[[[[[[ which causes pegdown to go into a hyper-exponential parse time.
  • hang-pegdown2.md a file containing a single line of 18 characters [[[[[[[[[[[[[[[[[[ which causes pegdown to go into a hyper-exponential parse time.
  • wrap.md is a file I was using to test wrap on typing performance only to discover that it has nothing to do with the wrap on typing code when 0.1 seconds is taken by pegdown to parse the file. In the plugin the parsing may happen more than once: syntax highlighter pass, psi tree building pass, external annotator.
  • markdown_example.md a file with 10,000+ lines containing 500kB+ of text.

Contributing

Pull requests, issues and comments welcome 😄. For pull requests:

  • Add tests for new features and bug fixes, preferably in the ast_spec.md format
  • Follow the existing style to make merging easier, as much as possible: 4 space indent, trailing spaces trimmed.

License

Copyright (c) 2015-2016 Atlassian and others.

Copyright (c) 2016-2023, Vladimir Schneider,

BSD (2-clause) licensed, see LICENSE.txt file.

flexmark-java's People

Contributors

1024c avatar bashtian avatar benelog avatar bvn13 avatar chiwanpark avatar dependabot[bot] avatar derari avatar gomiguchi avatar groxx avatar haumacher avatar jinneej avatar jjybdx4il avatar jochenberger avatar markkolich avatar minidigger avatar niklasf avatar niksw7 avatar parth avatar pcj avatar prayagverma avatar qwazer avatar rems avatar robinst avatar roxspring avatar sentyaev avatar spand avatar sparksparrow avatar tobiasstadler avatar vnaso avatar vsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flexmark-java's Issues

NodeVisitor does not visit all children

If I have a VisitHandler on a (block) node, then NodeVisitor does not visit the children of that (block) node.

E.g if having VisitHandler for BulletList and Emphasis then the Emphasis handler is not invoked for lists.

The only way I've found to do this is to override NodeVisitor.visit(Node):

NodeVisitor visitor = new NodeVisitor(
	new VisitHandler<>(BulletList.class, this::visit),
	new VisitHandler<>(Emphasis.class, this::visit))
{
	@Override
	public void visit(Node node) {
		VisitHandler<?> handler = myCustomHandlersMap.get(node.getClass());
		if (handler != null)
			handler.visit(node);
		visitChildren(node);
	}
};
visitor.visit(astRoot);

Is this intended behavior or a bug?

Unique id attribute to correspond with AST object id

Consider the following markdown:

# Header 1
Paragraph 1

Paragraph 2, Line 1
Paragraph 2, Line 2
Paragraph 2, Line 3

Paragraph 3, Line 1
#Header 2

What effort would be involved in making this generate unique, sequential, IDs that correspond to values in the AST? For example:

<h1 id="1">Header 1</h1>
<p id="2">Paragraph 1</p>

<p id="3">
<span id="4">Paragraph 2, Line 1</span>
<span id="5">Paragraph 2, Line 2</span>
<span id="6">Paragraph 2, Line 3</span>
</p>

<p id="7">Paragraph 3, Line 1</p>
<h1 id="8">Header 2</h1>

When the user is editing the markdown text:

Paragraph 2, Line 2|

The AST value for the caret's position would be 5, which corresponds to the <span id="5"> value in the HTML document.

Table caption support?

Hi,

I'm workjng on migrating table support in XWiki, moving from pegdown to flexmark-java. I have basic support working but advanced usages still need some tuning.

For example the following input was working before in pegdown:

col1   |col2    |
-------|--------|
cell11 | cell12 |
cell21 | cell22 |
[caption]

But with flexmark-java and the tables extension, the caption is considered as text.

I see that table caption is mentioned at

but I couldn't find any test for it in the md file.

Could you let me know if this is implemented and if not, if it's planned?

Note that I do have a handler registered for TableCaption but my visit(TableCaption node)is not called with the input given above.

Thanks

Subscript, superscript, and struck text

Some markdown flavours support subscripts, superscripts, and struck text:

October 31^st^
H~2~O
~~Commonmark~~ Flexmark is amazing.

Flexmark extensions for these would be useful.

How to add attribute 'class' to AutoLink node

Hi,
I'm new to flexmark. And this is not a bug report. I just want to get some help here:

  1. How to add 'class' attribute to AutoLink node?
  2. Is it possible to configure FlexMark to render \n as <br/>, for example, FlexMark now renders below markdown:
hello
world

to:

<p>hello\nworld</p>

I want to get below html instead:

<p>hello<br/>world</p>

Thanks in advance.

Hard line breaks do not work if markdown text/files uses CR LF as line separator

flexmark seems not handle CR characters in markdown text/files, which breaks hard-line-breaks (and maybe other things?). If a file uses CR LR as line separator, hard line breaks do not work.

With CR:

Parser parser = Parser.builder().build();

Node document = parser.parse("aaa  \r\nbbb\\\r\nccc");
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs (not hard line breaks; CR in text):

Document[0, 16]
  Paragraph[0, 16]
    Text[0, 6] chars:[0, 6, "aaa  \r"]
    SoftLineBreak[6, 7]
    Text[7, 12] chars:[7, 12, "bbb\\r"]
    SoftLineBreak[12, 13]
    Text[13, 16] chars:[13, 16, "ccc"]

Without CR:

Node document2 = parser.parse("aaa  \nbbb\\\nccc");
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document2));

Outputs (correct):

Document[0, 14]
  Paragraph[0, 14]
    Text[0, 3] chars:[0, 3, "aaa"]
    HardLineBreak[3, 6]
    Text[6, 9] chars:[6, 9, "bbb"]
    HardLineBreak[9, 11]
    Text[11, 14] chars:[11, 14, "ccc"]

Link reference definitions indented by spaces not recognized

If link reference definitions are indented by spaces, then only the first link reference definition is recognized (without leading spaces, it works).

String markdown = "aaa [link1] bbb [link2] ccc [link3]\n"
		+ "\n"
		+ "   [link1]: http://link1\n"
		+ "   [link2]: http://link2\n"
		+ "   [link3]: http://link3";
Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 111]
  Paragraph[0, 36]
    Text[0, 4] chars:[0, 4, "aaa "]
    LinkRef[4, 11] referenceOpen:[4, 5, "["] reference:[5, 10, "link1"] referenceClose:[10, 11, "]"]
      Text[5, 10] chars:[5, 10, "link1"]
    Text[11, 16] chars:[11, 16, " bbb "]
    LinkRef[16, 23] referenceOpen:[16, 17, "["] reference:[17, 22, "link2"] referenceClose:[22, 23, "]"]
      Text[17, 22] chars:[17, 22, "link2"]
    Text[23, 28] chars:[23, 28, " ccc "]
    LinkRef[28, 35] referenceOpen:[28, 29, "["] reference:[29, 34, "link3"] referenceClose:[34, 35, "]"]
      Text[29, 34] chars:[29, 34, "link3"]
  Reference[40, 61] refOpen:[40, 41, "["] ref:[41, 46, "link1"] refClose:[46, 48, "]:"] url:[49, 61, "http://link1"]
  Paragraph[65, 111]
    LinkRef[65, 72] referenceOpen:[65, 66, "["] reference:[66, 71, "link2"] referenceClose:[71, 72, "]"]
      Text[66, 71] chars:[66, 71, "link2"]
    Text[72, 86] chars:[72, 86, ": htt … link2"]
    SoftLineBreak[86, 87]
    LinkRef[90, 97] referenceOpen:[90, 91, "["] reference:[91, 96, "link3"] referenceClose:[96, 97, "]"]
      Text[91, 96] chars:[91, 96, "link3"]
    Text[97, 111] chars:[97, 111, ": htt … link3"]

Surprisingly, if I add the TaskListExtension, then it works as expected:

Parser parser2 = Parser.builder()
		.extensions(Arrays.asList(TaskListExtension.create()))
		.build();
Node document2 = parser2.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document2));

Outputs:

Document[0, 111]
  Paragraph[0, 36]
    Text[0, 4] chars:[0, 4, "aaa "]
    LinkRef[4, 11] referenceOpen:[4, 5, "["] reference:[5, 10, "link1"] referenceClose:[10, 11, "]"]
      Text[5, 10] chars:[5, 10, "link1"]
    Text[11, 16] chars:[11, 16, " bbb "]
    LinkRef[16, 23] referenceOpen:[16, 17, "["] reference:[17, 22, "link2"] referenceClose:[22, 23, "]"]
      Text[17, 22] chars:[17, 22, "link2"]
    Text[23, 28] chars:[23, 28, " ccc "]
    LinkRef[28, 35] referenceOpen:[28, 29, "["] reference:[29, 34, "link3"] referenceClose:[34, 35, "]"]
      Text[29, 34] chars:[29, 34, "link3"]
  Reference[40, 61] refOpen:[40, 41, "["] ref:[41, 46, "link1"] refClose:[46, 48, "]:"] url:[49, 61, "http://link1"]
  Reference[65, 86] refOpen:[65, 66, "["] ref:[66, 71, "link2"] refClose:[71, 73, "]:"] url:[74, 86, "http://link2"]
  Reference[90, 111] refOpen:[90, 91, "["] ref:[91, 96, "link3"] refClose:[96, 98, "]:"] url:[99, 111, "http://link3"]

Wrong startOffset in HardLineBreak

The startOffset in HardLineBreak is wrong in two cases:

a) if using backslash at the end of line, the startOffset does not include the backslash

example from ast_spec.md (line 7332):

foo\
bar
.
<p>foo<br />
bar</p>
.
Document[0, 9]
  Paragraph[0, 9]
    Text[0, 3] chars:[0, 3, "foo"]
    HardLineBreak[4, 5]
    Text[5, 8] chars:[5, 8, "bar"]

HardLineBreak starts at 4, but it should start at 3 where the backslash is.

b) if using more than two space characters at the end of a line, only the last two space characters are included in HardLineBreak, but it would IMO make more sense to include all trailing space characters in HardLineBreak

example from ast_spec.md (line 12564):

foo       
baz
.
<p>foo<br />
baz</p>
.
Document[0, 15]
  Paragraph[0, 15]
    Text[0, 3] chars:[0, 3, "foo"]
    HardLineBreak[8, 11]
    Text[11, 14] chars:[11, 14, "baz"]

HardLineBreak starts at 8, but it should start at 3 where the first trailing space character is

IndentedCodeBlock endOffset too large?

I think the endOffset of IndentedCodeBlock is too large. It is at the start of the next paragraph, so the IndentedCodeBlock includes trailing line separators and empty lines.

On the other hand, the end offset in FencedCodeBlock does not include trailing line separators and empty lines.

String markdown = "\tcode\n\nsome text";

Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 16]
  IndentedCodeBlock[1, 7]
  Paragraph[7, 16]
    Text[7, 16] chars:[7, 16, "some text"]

Shouldn't the IndentedCodeBlock end at 5 ?

Maven Repository

I'd like to use this in my project to have the same result as the Markdown Navigator plugin.

I don't see this plugin on the maven repository? Do you plan to publish it there?

Is there any simple example code to generate a html output for given markdown file?

Issue with Image reference AST event order

Hi,

I'm trying to migrate from pegdown to flexmark-java for XWiki (http://xwiki.org) and I'm hitting an issue with the order of the AST events when using the following input:

![image.png][1]

[1]: image.png

In pegdown the following methods were called in that order:

  • visit(ReferenceNode referenceNode)
  • visit(RefImageNode refImageNode)

However in flexmark-java it's the opposite:

  • visit(ImageRef node)
  • visit(Reference node)

The issue is that XWiki's own AST model doesn't support image reference so I was generating an image node, resolving the reference. Now it seems it's no longer possible since I only get a Reference node after method passing an ImageRef has been called.

Is there a way for me to resolve the reference inside my visit(ImageRef node)?

Thanks

DefinitionList extension doesn't seem to work

Hi, I've just tried using the DefinitionExtension extension but my visit(Definition*) methods are not called. Looking at the sources of the extensions, it seems the code is commented out at

public class DefinitionExtension implements Parser.ParserExtension, HtmlRenderer.HtmlRendererExtension {

I need to add support for definition lists which I was handling before with pegdown (for XWiki). Any idea?

Thanks!

Unclosed FencedCodeBlock endOffset too small

If a fenced code block is missing the closing marker, then the whole text until the end-of-file is code, but the endOffset in FencedCodeBlock is not at the end-of-file. It is at the end of the opening marker.

You can notice this in you fantastic Markdown Navigator plugin when you enter ~~~ somewhere, then the text below does not get gray background (but is rendered as code in preview).

String markdown = "~~~\ncode\ncode2\ncode3";

Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 20]
  FencedCodeBlock[0, 4] open:[0, 3, "~~~"] content:[4, 20] lines[3]

BTW the AST output of FencedCodeBlock always outputs lines[3]. The number is always 3. Should this output the number of lines in the block?

Documentation about maven dependencies

I tried to use flexmark-java but simply adding flexmark-java from maven central did not work because it does not have the MutableDataSet class (used in the README example). Adding flexmark-util solved this problem but I am still missing ParserEmulationProfile, Parser and HtmlRenderer. How can I use flexmark-java in my project using Maven dependencies?

Incorrect emphasis close marker source offset

When closing emphasis delimiter is partially used by an inner delimiter run the index needs to be adjusted by number of delimiters used so that when it is finally processed the closing sequence will reflect the correct position.

Add Support for Wiki Images

New Extension overview for flexmark-ext-wikilink

flexmark-java extension for wiki links

Converts references that are wrapped in [[]] into wiki links with optional text separated by
|.

Will also convert ![[]] to image links if IMAGE_LINKS extension option is enabled.

Options:

  • DISABLE_RENDERING default false, if true then rendering of wiki links is disabled and they
    will render as plain text of the element node

  • IMAGE_PREFIX default "", prefix to add to wiki link page reference

  • IMAGE_LINKS default false, true will enable ![[]] image link syntax

  • IMAGE_FILE_EXTENSION default "", extension to be added to wiki image file refs

  • LINK_FIRST_SYNTAX default false, if true then [[page ref|link text]] syntax is used,
    otherwise [[link text|page ref]] syntax. Affects both link and image wiki references.

  • LINK_PREFIX default "", prefix to add to wiki link page reference

  • LINK_FILE_EXTENSION default "", extension to be added to wiki link page refs

[Question] @ is missed

Hi,
Not sure whether this is a bug or a feature, text:

@someone

is rendered as

someone

Problem with escapes in links

I have as input: [link](\(foo\)) (see http://spec.commonmark.org/0.27/#example-464).

In my visit(Link node) the reference text I get is \(foo\)which means that the escapes are not processed. Thus when I render this in HTML I get: <a href="\(foo\)">... instead of <a href="(foo)">....

Would be nice if flexmark-java could process the escapes and have a way in a Link object to get the processed reference.

WDYT?

HtmlBlock and newline

I have the following input:

hello

<table>
  <tr>
    <td>Foo</td>
  </tr>
</table>

world

And in the following method:

    public void visit(HtmlBlock node)
    {
        getListener().onRawText(String.valueOf(node.getChars()), Syntax.HTML_4_01);
    }

I get the following for String.valueOf(node.getChars()):

<table>\n  <tr>\n    <td>Foo</td>\n  </tr>\n</table>\n

I'm concerned about the last \n. In pegdown I wasn't getting any trailing newline, which seems better IMO since this means the newline char would be issued in another Node.

What's the rationale for including the trailing newline in the HTML node?

Thanks

Add FormatterExtension to Table Extension

When used with Formatter renderer with default options will convert:

day|time|spent
:---|:---:|--:
nov. 2. tue|10:00|4h 40m 
nov. 3. thu|11:00|4h
nov. 7. mon|10:20|4h 20m 
total:|| **13h**

to

| day         | time  |   spent |
|:------------|:-----:|--------:|
| nov. 2. tue | 10:00 |  4h 40m |
| nov. 3. thu | 11:00 |      4h |
| nov. 7. mon | 10:20 |  4h 20m |
| total:             || **13h** |

Out of date documentation for Pegdown Migration Helper

I'm trying to follow this: https://github.com/vsch/flexmark-java#pegdown-migration-helper. However, I don't see the flexmark-profile-pegdown module in maven here: https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.vladsch.flexmark%22, and the pom for that module seems like it's kind of old (https://github.com/vsch/flexmark-java/blob/master/flexmark-profile-pegdown/pom.xml) because it references flexmark-java version 0.8.0. Is there a new way to accomplish Pegdown emulation that's not in the README?

Abbreviation node not called when 2 abbreviations

Hi, I'm now adding support for MD abbreviations.

I have added the AbbreviationExtension and registered a handler:

        AbbreviationNodeVisitor abbreviationNodeVisitor = new AbbreviationNodeVisitor(this.visitor, this.listeners);
        this.visitor.addHandlers(
            new VisitHandler<>(Abbreviation.class, abbreviationNodeVisitor::visit)
        );

And in my visit method:

    public void visit(Abbreviation node)
    {
        // Since XWiki doesn't support abbreviations, we generate an HTML <abbr> element.
        String html;
        if (StringUtils.isNotEmpty(node.getAbbreviation())) {
            html = String.format("<abbr title=\"%s\">%s</abbr>", node.getAbbreviation(),
                String.valueOf(node.getChars()));
        } else {
            html = String.format("<abbr>%s</abbr>", String.valueOf(node.getChars()));
        }
        getListener().onRawText(html, Syntax.HTML_4_01);
    }

The problem comes from the following input:

The HTML specification is maintained by the W3C.

*[HTML]: Hyper Text Markup Language
*[W3C]:  World Wide Web Consortium

In my case I get only 1 call for my visit(Abbreviation node)method instead of the 2 I was expecting.

In other words, I don't get a call for the W3Cabbreviation.

Any idea?

Thanks again!

Add ability to pass parameters to wiki links and wiki images

Use case 1: be able to support query string and anchors. Note that the reason for not use reference?a=b&c=d is to be able to easily support having ?in wiki page names.

Use case 2: be able to resize the images.

Syntax proposal: `[[label|reference||parameters]]``

Examples:

  • [[label|reference||queryString="a='b'&c=d" anchor="anchor"]]
  • [[image||width="300px"]]

Alternative: something like: [[label|reference]](parameters).

Yet another alternative is to make it even more generic and be able to pass parameters to any inline or block syntax element (as it's possible in the xwiki syntax for example). In the xwiki syntax you write the following:

(% a="b" c=d %)<element here>

Example1: (% a="b" c=d %)[[reference]]
Example2: Parameter for a list (e.g if you wish to pass the type of list):

(% style="list-style-type:disc" %)
* item 1
* item 2

Then it's up to the renderer to decide what it'll do with the parameters and what it'll honor.

The last alternative is to do nothing and force the user to use HTML but it's not very nice and I've already had users asking for support for wiki link anchors.

Let me know what you think. I could simply start by implementing support for reference?a=b&c=d for now and provide an escape character when users want to use ?in the wiki page name (or have them URL-encode it).

Thanks

no anchor links in HTML

The AnchorLink extension does not add anchor links to rendered HTML, but they are in the AST.

String markdown = "# head1\n\n## head2\n\nsome text";

Parser parser = Parser.builder()
		.extensions(Arrays.asList(AnchorLinkExtension.create()))
		.build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

HtmlRenderer renderer = HtmlRenderer.builder()
		.extensions(Arrays.asList(AnchorLinkExtension.create()))
		.build();
System.out.println(renderer.render(document));

The AST (with anchor links):

Document[0, 28]
  Heading[0, 7] textOpen:[0, 1, "#"] text:[2, 7, "head1"]
    AnchorLink[2, 7]
      Text[2, 7] chars:[2, 7, "head1"]
  Heading[9, 17] textOpen:[9, 11, "##"] text:[12, 17, "head2"]
    AnchorLink[12, 17]
      Text[12, 17] chars:[12, 17, "head2"]
  Paragraph[19, 28]
    Text[19, 28] chars:[19, 28, "some text"]

The HTML output (not anchor links):

<h1>head1</h1>
<h2>head2</h2>
<p>some text</p>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.