GithubHelp home page GithubHelp logo

r-lib / commonmark Goto Github PK

View Code? Open in Web Editor NEW
83.0 8.0 10.0 883 KB

High Performance CommonMark and Github Markdown Rendering in R

Home Page: https://docs.ropensci.org/commonmark/

License: Other

R 0.86% C 78.43% C++ 20.71%
markdown cmark gfm cmark-gfm

commonmark's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonmark's Issues

Escape characters in input

A \033 character is passed down verbatim, which breaks the xml parser:

xml2::read_xml(commonmark::markdown_xml("\033"))
## Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
##   PCDATA invalid Char value 27 [9]

Due to ๏ฟฝ

footnote XML output contains `<<unknown>>` tags

(edit: not sure how the reprex lost its formatting, but I fixed it)

The new footnotes feature might be useful for us at {tinkr}, but I'm not sure how to parse them as each footnote contains identical tags.

from: ropensci/tinkr#92 (comment)

txt <- c("a statement[^1][^2]\n", "[^1]: this is true", "[^2]: this is false")
commonmark::markdown_xml(txt, footnotes = TRUE) |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#>   <paragraph>
#>     <text xml:space="preserve">a statement</text>
#>     <<unknown> />
#>     <<unknown> />
#>   </paragraph>
#>   <<unknown>>
#>     <paragraph>
#>       <text xml:space="preserve">this is true</text>
#>     </paragraph>
#>   </<unknown>>
#>   <<unknown>>
#>     <paragraph>
#>       <text xml:space="preserve">this is false</text>
#>     </paragraph>
#>   </<unknown>>
#> </document>

Created on 2023-03-22 with reprex v2.0.2

target="_blank" not being parsed correctly

๐Ÿ‘‹ thanks for this amazing package. It enables so much. This might be an upstream issue but I thought to bring it here first.

commonmark::markdown_html doesn't seem to parse markdown correctly for links that open in a new tab.

For example commonmark::markdown_html generates this:

commonmark::markdown_html("[RStudio](https://www.rstudio.com/){target='_blank'}")
<p><a href=\"https://www.rstudio.com/\">RStudio</a>{target='_blank'}</p>

When I think this is what it should generate:

<p><a href="https://www.rstudio.com" target="_blank">RStudio</a></p>

Text after a bullet list

I'm having trouble with text after a bullet list in Markdown-ified roxygen and I think it's possible the problem is here. The problem is definitely in the XML produced from the markdown. But there are many things that happen along the way.

I want a bullet list in the description. If I have text right after the list, it gets catenated with the last bullet point. If I add an extra blank line, the text drops out of description and into details.

Here's a pure commonmark example that might explain it. Notice how the text after the bullet list gets absorbed into the bullet item. I tried with both values of hardbreaks. Is this correct behaviour?

library(commonmark)
txt <- "
first line
  * bullet 1
  * bullet 2
second line  
"
txt <- paste0(txt, collapse = "\n")

writeLines(markdown_xml(txt, hardbreaks = FALSE), "hardbreaks-FALSE.xml")
writeLines(markdown_xml(txt, hardbreaks = TRUE), "hardbreaks-TRUE.xml")

hardbreaks-FALSE.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>first line</text>
  </paragraph>
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>bullet 1</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>bullet 2</text>
        <softbreak />
        <text>second line</text>
      </paragraph>
    </item>
  </list>
</document>

hardbreaks-TRUE.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>first line</text>
  </paragraph>
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>bullet 1</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>bullet 2</text>
        <softbreak />
        <text>second line</text>
      </paragraph>
    </item>
  </list>
</document>

CVE-2023-26485

Hello, the cmark version in this R package is affected by CVE-2023-26485. I am not sure about the practical impact on the package, but to clear the issue out of the way, would it be possible to upgrade? Thanks!

Footnotes support?

It seems that footnotes are supported in GFM:

but somehow this feature is not enabled in the R package?

commonmark::markdown_html('a[^1] \n\n[^1]: test footnote')
#> [1] "<p>a[^1]</p>\n<p>[^1]: test footnote</p>\n"

And worse still, a footnote can be treated as a link definition:

commonmark::markdown_html('a[^1] \n\n[^1]: https://example.com')
#> [1] "<p>a<a href=\"https://example.com\">^1</a></p>\n"

@jeroen Do you know how to enable footnotes support? Thanks!

`commonmark::markdown_text(footnotes = TRUE)` does not strip footnotes

The recently introduced footnote support via param footnotes does not work as one would expect for commonmark::markdown_text(). For all other Markdown features (like emphasizing, links etc.), they are stripped by commonmark::markdown_text() since regular text has no notion of markup.

But the footnotes remain:

md <- "Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].\n\n[ref]: https://fsf.org\n\n[^fn]: A note.\n"
cat(md)
#> Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].
#> 
#> [ref]: https://fsf.org
#> 
#> [^fn]: A note.

# without footnote parsing
md |> commonmark::markdown_text(footnotes = F) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^fn].
#> 
#> [^fn]: A note.

# with footnote parsing
md |> commonmark::markdown_text(footnotes = T) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^1].
#> 
#> [^1]: A note.

Created on 2023-03-29 with reprex v2.0.2

Is this really the intended behaviour?

CVE-2022-24724 - integer overflow prior to 0.29.0.gfm.3 and 0.28.3.gfm.21 (cmark extension)

The following vulnerability was published for commonmark

CVE-2022-24724
cmark-gfm is GitHub's extended version of the C reference implementation of CommonMark. Prior to versions 0.29.0.gfm.3 and
0.28.3.gfm.21, an integer overflow in cmark-gfm's table row parsing table.c:row_from_string may lead to heap memory corruption when parsing tables who's marker rows contain more than UINT16_MAX columns. The impact of this heap corruption ranges from Information Leak to Arbitrary Code Execution depending on how and where cmark-gfm is used. If cmark-gfm is used for rendering remote user controlled markdown, this vulnerability may lead to Remote Code Execution (RCE) in applications employing affected versions of the cmark-gfm library. This vulnerability has been patched in the following cmark-gfm versions 0.29.0.gfm.3 and 0.28.3.gfm.21. A workaround is available. The vulnerability exists in the table markdown extensions of cmark-gfm. Disabling the table extension will prevent this vulnerability from being triggered.

Further information

Kind regards, Andreas.

Some sort of abstract representation?

Motivated by a particular use I have, but presumably there's some sort of intermediate representation available in commonmark? Any chance of exposing that through R so that custom renderers can be written?

Thinking of something that could render markdown to marked up text using crayon in the terminal...

But perhaps I'm way off how this actually works under the hood.

Tagfilter extension is not applied

I may be misunderstanding the prescribed usage of the tagfilter extension, but it doesn't seem to be working.

library(commonmark)

markdown_commonmark("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"
markdown_html("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"

In both cases, the spec indicates that we should expect

"&lt;title>&lt;style></style></title>\n"
Session info
devtools::session_info()
#> โ”€ Session info โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  setting  value                                      
#>  version  R version 3.6.3 Patched (2020-04-28 r79534)
#>  os       macOS  10.16                               
#>  system   x86_64, darwin15.6.0                       
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       America/New_York                           
#>  date     2020-12-28                                 
#> 
#> โ”€ Packages โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  package     * version    date       lib source                            
#>  assertthat    0.2.1      2019-03-21 [1] standard (@0.2.1)                 
#>  callr         3.5.1      2020-10-13 [1] standard (@3.5.1)                 
#>  cli           2.2.0      2020-11-20 [1] standard (@2.2.0)                 
#>  commonmark  * 1.7        2018-12-01 [1] standard (@1.7)                   
#>  crayon        1.3.4      2017-09-16 [1] standard (@1.3.4)                 
#>  desc          1.2.0      2018-05-01 [1] standard (@1.2.0)                 
#>  devtools      2.3.2      2020-09-18 [1] standard (@2.3.2)                 
#>  digest        0.6.27     2020-10-24 [1] standard (@0.6.27)                
#>  ellipsis      0.3.1      2020-05-15 [1] standard (@0.3.1)                 
#>  evaluate      0.14       2019-05-28 [1] standard (@0.14)                  
#>  fansi         0.4.1      2020-01-08 [1] standard (@0.4.1)                 
#>  fs            1.5.0      2020-07-31 [1] standard (@1.5.0)                 
#>  glue          1.4.2      2020-08-27 [1] standard (@1.4.2)                 
#>  highr         0.8        2019-03-20 [1] standard (@0.8)                   
#>  htmltools     0.5.0.9003 2020-12-04 [1] Github (rstudio/htmltools@d18bd8e)
#>  knitr         1.30       2020-09-22 [1] standard (@1.30)                  
#>  lifecycle     0.2.0      2020-03-06 [1] standard (@0.2.0)                 
#>  magrittr      2.0.1      2020-11-17 [1] standard (@2.0.1)                 
#>  memoise       1.1.0      2017-04-21 [1] standard (@1.1.0)                 
#>  pkgbuild      1.1.0      2020-07-13 [1] standard (@1.1.0)                 
#>  pkgload       1.1.0      2020-05-29 [1] standard (@1.1.0)                 
#>  prettyunits   1.1.1      2020-01-24 [1] standard (@1.1.1)                 
#>  processx      3.4.4      2020-09-03 [1] standard (@3.4.4)                 
#>  ps            1.4.0      2020-10-07 [1] standard (@1.4.0)                 
#>  purrr         0.3.4      2020-04-17 [1] standard (@0.3.4)                 
#>  R6            2.5.0      2020-10-28 [1] standard (@2.5.0)                 
#>  remotes       2.2.0      2020-07-21 [1] standard (@2.2.0)                 
#>  rlang         0.4.9      2020-11-26 [1] standard (@0.4.9)                 
#>  rmarkdown     2.5        2020-10-21 [1] standard (@2.5)                   
#>  rprojroot     2.0.2      2020-11-15 [1] standard (@2.0.2)                 
#>  sessioninfo   1.1.1      2018-11-05 [1] standard (@1.1.1)                 
#>  stringi       1.5.3      2020-09-09 [1] standard (@1.5.3)                 
#>  stringr       1.4.0      2019-02-10 [1] standard (@1.4.0)                 
#>  testthat      3.0.0      2020-10-31 [1] standard (@3.0.0)                 
#>  usethis       2.0.0.9000 2020-12-10 [1] Github (r-lib/usethis@f96bf2e)    
#>  withr         2.3.0      2020-09-22 [1] standard (@2.3.0)                 
#>  xfun          0.19       2020-10-30 [1] standard (@0.19)                  
#>  yaml          2.2.1      2020-02-01 [1] standard (@2.2.1)                 
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

Feature request: make cmark functions callable

I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.

Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.

How to handle empty lines in md

I am trying to parse this Markdown file

It's full of empty lines due to knitr rendering it from Rmd I guess. On GitHub it renders well. But when I try to parse it I cannot get the structure that's in the .Rmd: the table is either separated in different blocks, or if I remove empty lines, it gets glued to the rest of the README.

rmd <- "https://raw.githubusercontent.com/ropensci/drake/master/README.Rmd"

md <- "https://raw.githubusercontent.com/ropensci/drake/master/README.md"


library("magrittr")
rmd %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = TRUE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <thematic_break/>
#>  [2] <heading level="2">\n  <text>output:</text>\n  <softbreak/>\n  <tex ...
#>  [3] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [4] <code_block info="{r knitrsetup, echo = FALSE}">knitr::opts_chunk$s ...
#>  [5] <code_block info="{r mainexample, echo = FALSE}">suppressMessages(s ...
#>  [6] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io ...
#>  [7] <html_block>&lt;table class="table"&gt;&lt;thead&gt;&lt;tr class="h ...
#>  [8] <heading level="1">\n  <text>The drake R package </text>\n  <html_i ...
#>  [9] <paragraph>\n  <code>drake</code>\n  <text> โ€” or, Data Frames in R  ...
#> [10] <heading level="1">\n  <text>What gets done stays done.</text>\n</h ...
#> [11] <paragraph>\n  <text>Too many data science projects follow a </text ...
#> [12] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [13] <paragraph>\n  <text>It is hard to avoid restarting from scratch.</ ...
#> [14] <html_block>&lt;center&gt;\n&lt;a href="https://twitter.com/fossilo ...
#> [15] <paragraph>\n  <text>With </text>\n  <code>drake</code>\n  <text>,  ...
#> [16] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [17] <heading level="1">\n  <text>How it works</text>\n</heading>
#> [18] <paragraph>\n  <text>To set up a project, load your packages,</text ...
#> [19] <code_block info="{r mainpackages}">library(drake)\nlibrary(dplyr)\ ...
#> [20] <paragraph>\n  <text>load your custom functions,</text>\n</paragraph>
#> ...

md %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [2] <html_block>&lt;center&gt;\n</html_block>
#>  [3] <html_block>&lt;img src="https://ropensci.github.io/drake/images/in ...
#>  [4] <html_block>&lt;/center&gt;\n</html_block>
#>  [5] <html_block>&lt;table class="table"&gt;\n</html_block>
#>  [6] <html_block>&lt;thead&gt;\n</html_block>
#>  [7] <html_block>&lt;tr class="header"&gt;\n</html_block>
#>  [8] <html_block>&lt;th align="left"&gt;\n</html_block>
#>  [9] <paragraph>\n  <text>Release</text>\n</paragraph>
#> [10] <html_block>&lt;/th&gt;\n</html_block>
#> [11] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [12] <paragraph>\n  <text>Usage</text>\n</paragraph>
#> [13] <html_block>&lt;/th&gt;\n</html_block>
#> [14] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [15] <paragraph>\n  <text>Development</text>\n</paragraph>
#> [16] <html_block>&lt;/th&gt;\n</html_block>
#> [17] <html_block>&lt;/tr&gt;\n</html_block>
#> [18] <html_block>&lt;/thead&gt;\n</html_block>
#> [19] <html_block>&lt;tbody&gt;\n</html_block>
#> [20] <html_block>&lt;tr class="odd"&gt;\n</html_block>
#> ...

md %>%
  readLines() %>%
  .[. != ""] %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please e ...
#> [2] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io/ ...

Created on 2018-09-04 by the reprex package (v0.2.0).

commonmark_html() crashes or fails on RStudio Server on CentOS

I don't know why, but commonmark_html() and commonmark_xml() doesn't work well on the console of RStudio Server on CentOS 7.

More precisely, if I install commonmark package 1.6 and run commonmark::markdown_html("## foo"),

  • on RStudio Server 1.1.463 (stable)'s console, it crashes with segmentation fault. (Error message in syslog: kernel: rsession[93474]: segfault at 7f1792c96811 ip 00007f17a3331c97 sp 00007fff29b6f998 error 6 in libc-2.17.so[7f17a31df000+1b8000])
  • on RStudio Server 1.2.1139 (preview)'s console, it returns a wrong result ("<h2></h2>\n").

Note that, it returns the correct result ("<h2>foo</h2>\n") if I run the code

  • on R REPL on RStudio's terminal pane
  • on R REPL on an SSH session
  • with commonmark 1.5
  • on RStudio Server's console on Debian

How to reproduce

  1. Run CentOS Docker image
docker run -p 8787:8787 -it --rm centos:7 bash
  1. Install and run RStudio Server
# Download RPM (`1.1.463` is the stable version)
curl -o rstudio.rpm https://download2.rstudio.org/rstudio-server-rhel-1.1.463-x86_64.rpm
yum install rstudio.rpm

# Add rstudio user
useradd rstudio
passwd rstudio

# Launch RStudio Server
/usr/lib/rstudio-server/bin/rserver 
  1. Browse to http://localhost:8787 and login
  2. Install commonmark package and run this code
commonmark::markdown_html("## foo")

Output to Rd

How hard would it be? Then I could use it in roxygen

Any way to flag escaped characters on input?

commonmark supports both bare square brackets and escaped square brackets. When the enter the parser, there's no indication which set of brackets were escaped and which ones were bare:

commonmark::markdown_xml("[bare brackets] \\[escaped brackets\\]") |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#>   <paragraph>
#>     <text xml:space="preserve">[bare brackets] [escaped brackets]</text>
#>   </paragraph>
#> </document>

Created on 2022-09-19 with reprex v2.0.2

Is there a way to have the parser indicate which characters were escaped in the source document?

Parse NEWS file

Link to libxml2 and use xpath to parse a NEWS file based on markdown_xml output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.