r-lib / commonmark Goto Github PK

View Code? Open in Web Editor NEW

83.0 8.0 10.0 883 KB

High Performance CommonMark and Github Markdown Rendering in R

Home Page: https://docs.ropensci.org/commonmark/

License: Other

R 0.86% C 78.43% C++ 20.71%

markdown cmark gfm cmark-gfm

commonmark's People

Stargazers

Watchers

Forkers

jimhester nteetor jonathan-g nikolaytach isabella232 ruralinnovation wellsouz maelle dmurdoch trbailey326

commonmark's Issues

footnote XML output contains `<<unknown>>` tags

(edit: not sure how the reprex lost its formatting, but I fixed it)

The new footnotes feature might be useful for us at {tinkr}, but I'm not sure how to parse them as each footnote contains identical tags.

from: ropensci/tinkr#92 (comment)

txt <- c("a statement[^1][^2]\n", "[^1]: this is true", "[^2]: this is false")
commonmark::markdown_xml(txt, footnotes = TRUE) |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#>   <paragraph>
#>     <text xml:space="preserve">a statement</text>
#>     <<unknown> />
#>     <<unknown> />
#>   </paragraph>
#>   <<unknown>>
#>     <paragraph>
#>       <text xml:space="preserve">this is true</text>
#>     </paragraph>
#>   </<unknown>>
#>   <<unknown>>
#>     <paragraph>
#>       <text xml:space="preserve">this is false</text>
#>     </paragraph>
#>   </<unknown>>
#> </document>

^{Created on 2023-03-22 with reprex v2.0.2}

target="_blank" not being parsed correctly

👋 thanks for this amazing package. It enables so much. This might be an upstream issue but I thought to bring it here first.

commonmark::markdown_html doesn't seem to parse markdown correctly for links that open in a new tab.

For example commonmark::markdown_html generates this:

commonmark::markdown_html("[RStudio](https://www.rstudio.com/){target='_blank'}")

<p><a href=\"https://www.rstudio.com/\">RStudio</a>{target='_blank'}</p>

When I think this is what it should generate:

<p><a href="https://www.rstudio.com" target="_blank">RStudio</a></p>

I'm having trouble with text after a bullet list in Markdown-ified roxygen and I think it's possible the problem is here. The problem is definitely in the XML produced from the markdown. But there are many things that happen along the way.

I want a bullet list in the description. If I have text right after the list, it gets catenated with the last bullet point. If I add an extra blank line, the text drops out of description and into details.

Here's a pure commonmark example that might explain it. Notice how the text after the bullet list gets absorbed into the bullet item. I tried with both values of hardbreaks. Is this correct behaviour?

library(commonmark)
txt <- "
first line
  * bullet 1
  * bullet 2
second line  
"
txt <- paste0(txt, collapse = "\n")

writeLines(markdown_xml(txt, hardbreaks = FALSE), "hardbreaks-FALSE.xml")
writeLines(markdown_xml(txt, hardbreaks = TRUE), "hardbreaks-TRUE.xml")

hardbreaks-FALSE.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>first line</text>
  </paragraph>
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>bullet 1</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>bullet 2</text>
        <softbreak />
        <text>second line</text>
      </paragraph>
    </item>
  </list>
</document>

hardbreaks-TRUE.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>first line</text>
  </paragraph>
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>bullet 1</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>bullet 2</text>
        <softbreak />
        <text>second line</text>
      </paragraph>
    </item>
  </list>
</document>

CVE-2020-5238

Hi,

cmark-gfm is affected by CVE-2020-5238 and consequently the R package is also affected.

Best,
Dylan

CVE-2023-26485

Hello, the cmark version in this R package is affected by CVE-2023-26485. I am not sure about the practical impact on the package, but to clear the issue out of the way, would it be possible to upgrade? Thanks!

Footnotes support?

It seems that footnotes are supported in GFM:

but somehow this feature is not enabled in the R package?

commonmark::markdown_html('a[^1] \n\n[^1]: test footnote')
#> [1] "<p>a[^1]</p>\n<p>[^1]: test footnote</p>\n"

And worse still, a footnote can be treated as a link definition:

commonmark::markdown_html('a[^1] \n\n[^1]: https://example.com')
#> [1] "<p>a<a href=\"https://example.com\">^1</a></p>\n"

@jeroen Do you know how to enable footnotes support? Thanks!

`commonmark::markdown_text(footnotes = TRUE)` does not strip footnotes

The recently introduced footnote support via param footnotes does not work as one would expect for commonmark::markdown_text(). For all other Markdown features (like emphasizing, links etc.), they are stripped by commonmark::markdown_text() since regular text has no notion of markup.

But the footnotes remain:

md <- "Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].\n\n[ref]: https://fsf.org\n\n[^fn]: A note.\n"
cat(md)
#> Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].
#> 
#> [ref]: https://fsf.org
#> 
#> [^fn]: A note.

# without footnote parsing
md |> commonmark::markdown_text(footnotes = F) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^fn].
#> 
#> [^fn]: A note.

# with footnote parsing
md |> commonmark::markdown_text(footnotes = T) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^1].
#> 
#> [^1]: A note.

^{Created on 2023-03-29 with reprex v2.0.2}

Is this really the intended behaviour?

CVE-2022-24724 - integer overflow prior to 0.29.0.gfm.3 and 0.28.3.gfm.21 (cmark extension)

The following vulnerability was published for commonmark

CVE-2022-24724
cmark-gfm is GitHub's extended version of the C reference implementation of CommonMark. Prior to versions 0.29.0.gfm.3 and
0.28.3.gfm.21, an integer overflow in cmark-gfm's table row parsing table.c:row_from_string may lead to heap memory corruption when parsing tables who's marker rows contain more than UINT16_MAX columns. The impact of this heap corruption ranges from Information Leak to Arbitrary Code Execution depending on how and where cmark-gfm is used. If cmark-gfm is used for rendering remote user controlled markdown, this vulnerability may lead to Remote Code Execution (RCE) in applications employing affected versions of the cmark-gfm library. This vulnerability has been patched in the following cmark-gfm versions 0.29.0.gfm.3 and 0.28.3.gfm.21. A workaround is available. The vulnerability exists in the table markdown extensions of cmark-gfm. Disabling the table extension will prevent this vulnerability from being triggered.

Further information

Kind regards, Andreas.

Some sort of abstract representation?

Motivated by a particular use I have, but presumably there's some sort of intermediate representation available in commonmark? Any chance of exposing that through R so that custom renderers can be written?

Thinking of something that could render markdown to marked up text using crayon in the terminal...

But perhaps I'm way off how this actually works under the hood.

Tagfilter extension is not applied

I may be misunderstanding the prescribed usage of the tagfilter extension, but it doesn't seem to be working.

library(commonmark)

markdown_commonmark("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"
markdown_html("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"

In both cases, the spec indicates that we should expect

"&lt;title>&lt;style></style></title>\n"

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                      
#>  version  R version 3.6.3 Patched (2020-04-28 r79534)
#>  os       macOS  10.16                               
#>  system   x86_64, darwin15.6.0                       
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       America/New_York                           
#>  date     2020-12-28                                 
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                            
#>  assertthat    0.2.1      2019-03-21 [1] standard (@0.2.1)                 
#>  callr         3.5.1      2020-10-13 [1] standard (@3.5.1)                 
#>  cli           2.2.0      2020-11-20 [1] standard (@2.2.0)                 
#>  commonmark  * 1.7        2018-12-01 [1] standard (@1.7)                   
#>  crayon        1.3.4      2017-09-16 [1] standard (@1.3.4)                 
#>  desc          1.2.0      2018-05-01 [1] standard (@1.2.0)                 
#>  devtools      2.3.2      2020-09-18 [1] standard (@2.3.2)                 
#>  digest        0.6.27     2020-10-24 [1] standard (@0.6.27)                
#>  ellipsis      0.3.1      2020-05-15 [1] standard (@0.3.1)                 
#>  evaluate      0.14       2019-05-28 [1] standard (@0.14)                  
#>  fansi         0.4.1      2020-01-08 [1] standard (@0.4.1)                 
#>  fs            1.5.0      2020-07-31 [1] standard (@1.5.0)                 
#>  glue          1.4.2      2020-08-27 [1] standard (@1.4.2)                 
#>  highr         0.8        2019-03-20 [1] standard (@0.8)                   
#>  htmltools     0.5.0.9003 2020-12-04 [1] Github (rstudio/htmltools@d18bd8e)
#>  knitr         1.30       2020-09-22 [1] standard (@1.30)                  
#>  lifecycle     0.2.0      2020-03-06 [1] standard (@0.2.0)                 
#>  magrittr      2.0.1      2020-11-17 [1] standard (@2.0.1)                 
#>  memoise       1.1.0      2017-04-21 [1] standard (@1.1.0)                 
#>  pkgbuild      1.1.0      2020-07-13 [1] standard (@1.1.0)                 
#>  pkgload       1.1.0      2020-05-29 [1] standard (@1.1.0)                 
#>  prettyunits   1.1.1      2020-01-24 [1] standard (@1.1.1)                 
#>  processx      3.4.4      2020-09-03 [1] standard (@3.4.4)                 
#>  ps            1.4.0      2020-10-07 [1] standard (@1.4.0)                 
#>  purrr         0.3.4      2020-04-17 [1] standard (@0.3.4)                 
#>  R6            2.5.0      2020-10-28 [1] standard (@2.5.0)                 
#>  remotes       2.2.0      2020-07-21 [1] standard (@2.2.0)                 
#>  rlang         0.4.9      2020-11-26 [1] standard (@0.4.9)                 
#>  rmarkdown     2.5        2020-10-21 [1] standard (@2.5)                   
#>  rprojroot     2.0.2      2020-11-15 [1] standard (@2.0.2)                 
#>  sessioninfo   1.1.1      2018-11-05 [1] standard (@1.1.1)                 
#>  stringi       1.5.3      2020-09-09 [1] standard (@1.5.3)                 
#>  stringr       1.4.0      2019-02-10 [1] standard (@1.4.0)                 
#>  testthat      3.0.0      2020-10-31 [1] standard (@3.0.0)                 
#>  usethis       2.0.0.9000 2020-12-10 [1] Github (r-lib/usethis@f96bf2e)    
#>  withr         2.3.0      2020-09-22 [1] standard (@2.3.0)                 
#>  xfun          0.19       2020-10-30 [1] standard (@0.19)                  
#>  yaml          2.2.1      2020-02-01 [1] standard (@2.2.1)                 
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

Help us firm up the list of targetted repositories
Make sure all maintainers are aware of what's coming
Give us an issue to close when the job is done
Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

Feature request: make cmark functions callable

I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.

Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.

subscript text is converted to strikethrough

I noticed this in ropensci/tinkr#99, but subscript text (at least with pandoc markdown) is converted to strikethrough (with two ~) on roundtrip between markdown -> XML -> markdown

commonmark::markdown_commonmark("H~2~O", extensions = TRUE)
#> [1] "H~~2~~O\n"

^{Created on 2023-08-18 with reprex v2.0.2}

How to handle empty lines in md

I am trying to parse this Markdown file

It's full of empty lines due to knitr rendering it from Rmd I guess. On GitHub it renders well. But when I try to parse it I cannot get the structure that's in the .Rmd: the table is either separated in different blocks, or if I remove empty lines, it gets glued to the rest of the README.

rmd <- "https://raw.githubusercontent.com/ropensci/drake/master/README.Rmd"

md <- "https://raw.githubusercontent.com/ropensci/drake/master/README.md"


library("magrittr")
rmd %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = TRUE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <thematic_break/>
#>  [2] <heading level="2">\n  <text>output:</text>\n  <softbreak/>\n  <tex ...
#>  [3] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [4] <code_block info="{r knitrsetup, echo = FALSE}">knitr::opts_chunk$s ...
#>  [5] <code_block info="{r mainexample, echo = FALSE}">suppressMessages(s ...
#>  [6] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io ...
#>  [7] <html_block>&lt;table class="table"&gt;&lt;thead&gt;&lt;tr class="h ...
#>  [8] <heading level="1">\n  <text>The drake R package </text>\n  <html_i ...
#>  [9] <paragraph>\n  <code>drake</code>\n  <text> — or, Data Frames in R  ...
#> [10] <heading level="1">\n  <text>What gets done stays done.</text>\n</h ...
#> [11] <paragraph>\n  <text>Too many data science projects follow a </text ...
#> [12] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [13] <paragraph>\n  <text>It is hard to avoid restarting from scratch.</ ...
#> [14] <html_block>&lt;center&gt;\n&lt;a href="https://twitter.com/fossilo ...
#> [15] <paragraph>\n  <text>With </text>\n  <code>drake</code>\n  <text>,  ...
#> [16] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [17] <heading level="1">\n  <text>How it works</text>\n</heading>
#> [18] <paragraph>\n  <text>To set up a project, load your packages,</text ...
#> [19] <code_block info="{r mainpackages}">library(drake)\nlibrary(dplyr)\ ...
#> [20] <paragraph>\n  <text>load your custom functions,</text>\n</paragraph>
#> ...

md %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [2] <html_block>&lt;center&gt;\n</html_block>
#>  [3] <html_block>&lt;img src="https://ropensci.github.io/drake/images/in ...
#>  [4] <html_block>&lt;/center&gt;\n</html_block>
#>  [5] <html_block>&lt;table class="table"&gt;\n</html_block>
#>  [6] <html_block>&lt;thead&gt;\n</html_block>
#>  [7] <html_block>&lt;tr class="header"&gt;\n</html_block>
#>  [8] <html_block>&lt;th align="left"&gt;\n</html_block>
#>  [9] <paragraph>\n  <text>Release</text>\n</paragraph>
#> [10] <html_block>&lt;/th&gt;\n</html_block>
#> [11] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [12] <paragraph>\n  <text>Usage</text>\n</paragraph>
#> [13] <html_block>&lt;/th&gt;\n</html_block>
#> [14] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [15] <paragraph>\n  <text>Development</text>\n</paragraph>
#> [16] <html_block>&lt;/th&gt;\n</html_block>
#> [17] <html_block>&lt;/tr&gt;\n</html_block>
#> [18] <html_block>&lt;/thead&gt;\n</html_block>
#> [19] <html_block>&lt;tbody&gt;\n</html_block>
#> [20] <html_block>&lt;tr class="odd"&gt;\n</html_block>
#> ...

md %>%
  readLines() %>%
  .[. != ""] %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please e ...
#> [2] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io/ ...

Created on 2018-09-04 by the reprex package (v0.2.0).

commonmark_html() crashes or fails on RStudio Server on CentOS

I don't know why, but commonmark_html() and commonmark_xml() doesn't work well on the console of RStudio Server on CentOS 7.

More precisely, if I install commonmark package 1.6 and run commonmark::markdown_html("## foo"),

on RStudio Server 1.1.463 (stable)'s console, it crashes with segmentation fault. (Error message in syslog: kernel: rsession[93474]: segfault at 7f1792c96811 ip 00007f17a3331c97 sp 00007fff29b6f998 error 6 in libc-2.17.so[7f17a31df000+1b8000])
on RStudio Server 1.2.1139 (preview)'s console, it returns a wrong result ("<h2></h2>\n").

Note that, it returns the correct result ("<h2>foo</h2>\n") if I run the code

on R REPL on RStudio's terminal pane
on R REPL on an SSH session
with commonmark 1.5
on RStudio Server's console on Debian

How to reproduce

Run CentOS Docker image

docker run -p 8787:8787 -it --rm centos:7 bash

Install and run RStudio Server

# Download RPM (`1.1.463` is the stable version)
curl -o rstudio.rpm https://download2.rstudio.org/rstudio-server-rhel-1.1.463-x86_64.rpm
yum install rstudio.rpm

# Add rstudio user
useradd rstudio
passwd rstudio

# Launch RStudio Server
/usr/lib/rstudio-server/bin/rserver

Browse to http://localhost:8787 and login
Install commonmark package and run this code

commonmark::markdown_html("## foo")

commonmark::markdown_xml("[bare brackets] \\[escaped brackets\\]") |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#>   <paragraph>
#>     <text xml:space="preserve">[bare brackets] [escaped brackets]</text>
#>   </paragraph>
#> </document>

^{Created on 2022-09-19 with reprex v2.0.2}

Is there a way to have the parser indicate which characters were escaped in the source document?

r-lib / commonmark Goto Github PK

commonmark's People

Stargazers

Watchers

Forkers

commonmark's Issues

How to reproduce

Recommend Projects

Recommend Topics

Recommend Org

Jobs