r-lib / commonmark Goto Github PK
View Code? Open in Web Editor NEWHigh Performance CommonMark and Github Markdown Rendering in R
Home Page: https://docs.ropensci.org/commonmark/
License: Other
High Performance CommonMark and Github Markdown Rendering in R
Home Page: https://docs.ropensci.org/commonmark/
License: Other
A \033
character is passed down verbatim, which breaks the xml parser:
xml2::read_xml(commonmark::markdown_xml("\033"))
## Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
## PCDATA invalid Char value 27 [9]
Due to ๏ฟฝ
(edit: not sure how the reprex lost its formatting, but I fixed it)
The new footnotes
feature might be useful for us at {tinkr}, but I'm not sure how to parse them as each footnote contains identical tags.
from: ropensci/tinkr#92 (comment)
txt <- c("a statement[^1][^2]\n", "[^1]: this is true", "[^2]: this is false")
commonmark::markdown_xml(txt, footnotes = TRUE) |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#> <paragraph>
#> <text xml:space="preserve">a statement</text>
#> <<unknown> />
#> <<unknown> />
#> </paragraph>
#> <<unknown>>
#> <paragraph>
#> <text xml:space="preserve">this is true</text>
#> </paragraph>
#> </<unknown>>
#> <<unknown>>
#> <paragraph>
#> <text xml:space="preserve">this is false</text>
#> </paragraph>
#> </<unknown>>
#> </document>
Created on 2023-03-22 with reprex v2.0.2
๐ thanks for this amazing package. It enables so much. This might be an upstream issue but I thought to bring it here first.
commonmark::markdown_html
doesn't seem to parse markdown correctly for links that open in a new tab.
For example commonmark::markdown_html
generates this:
commonmark::markdown_html("[RStudio](https://www.rstudio.com/){target='_blank'}")
<p><a href=\"https://www.rstudio.com/\">RStudio</a>{target='_blank'}</p>
When I think this is what it should generate:
<p><a href="https://www.rstudio.com" target="_blank">RStudio</a></p>
I'm having trouble with text after a bullet list in Markdown-ified roxygen and I think it's possible the problem is here. The problem is definitely in the XML produced from the markdown. But there are many things that happen along the way.
I want a bullet list in the description. If I have text right after the list, it gets catenated with the last bullet point. If I add an extra blank line, the text drops out of description and into details.
Here's a pure commonmark example that might explain it. Notice how the text after the bullet list gets absorbed into the bullet item. I tried with both values of hardbreaks
. Is this correct behaviour?
library(commonmark)
txt <- "
first line
* bullet 1
* bullet 2
second line
"
txt <- paste0(txt, collapse = "\n")
writeLines(markdown_xml(txt, hardbreaks = FALSE), "hardbreaks-FALSE.xml")
writeLines(markdown_xml(txt, hardbreaks = TRUE), "hardbreaks-TRUE.xml")
hardbreaks-FALSE.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text>first line</text>
</paragraph>
<list type="bullet" tight="true">
<item>
<paragraph>
<text>bullet 1</text>
</paragraph>
</item>
<item>
<paragraph>
<text>bullet 2</text>
<softbreak />
<text>second line</text>
</paragraph>
</item>
</list>
</document>
hardbreaks-TRUE.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text>first line</text>
</paragraph>
<list type="bullet" tight="true">
<item>
<paragraph>
<text>bullet 1</text>
</paragraph>
</item>
<item>
<paragraph>
<text>bullet 2</text>
<softbreak />
<text>second line</text>
</paragraph>
</item>
</list>
</document>
Hi,
cmark-gfm is affected by CVE-2020-5238 and consequently the R package is also affected.
Best,
Dylan
Hello, the cmark version in this R package is affected by CVE-2023-26485. I am not sure about the practical impact on the package, but to clear the issue out of the way, would it be possible to upgrade? Thanks!
It seems that footnotes are supported in GFM:
but somehow this feature is not enabled in the R package?
commonmark::markdown_html('a[^1] \n\n[^1]: test footnote')
#> [1] "<p>a[^1]</p>\n<p>[^1]: test footnote</p>\n"
And worse still, a footnote can be treated as a link definition:
commonmark::markdown_html('a[^1] \n\n[^1]: https://example.com')
#> [1] "<p>a<a href=\"https://example.com\">^1</a></p>\n"
@jeroen Do you know how to enable footnotes support? Thanks!
The recently introduced footnote support via param footnotes
does not work as one would expect for commonmark::markdown_text()
. For all other Markdown features (like emphasizing, links etc.), they are stripped by commonmark::markdown_text()
since regular text has no notion of markup.
But the footnotes remain:
md <- "Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].\n\n[ref]: https://fsf.org\n\n[^fn]: A note.\n"
cat(md)
#> Text *emphasized* and **bold**, with [inline link](https://to.some.where/), [reference link][ref] and footnote[^fn].
#>
#> [ref]: https://fsf.org
#>
#> [^fn]: A note.
# without footnote parsing
md |> commonmark::markdown_text(footnotes = F) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^fn].
#>
#> [^fn]: A note.
# with footnote parsing
md |> commonmark::markdown_text(footnotes = T) |> cat()
#> Text emphasized and bold, with inline link, reference link and footnote[^1].
#>
#> [^1]: A note.
Created on 2023-03-29 with reprex v2.0.2
Is this really the intended behaviour?
The following vulnerability was published for commonmark
CVE-2022-24724
cmark-gfm is GitHub's extended version of the C reference implementation of CommonMark. Prior to versions 0.29.0.gfm.3 and
0.28.3.gfm.21, an integer overflow in cmark-gfm's table row parsing table.c:row_from_string
may lead to heap memory corruption when parsing tables who's marker rows contain more than UINT16_MAX columns. The impact of this heap corruption ranges from Information Leak to Arbitrary Code Execution depending on how and where cmark-gfm
is used. If cmark-gfm
is used for rendering remote user controlled markdown, this vulnerability may lead to Remote Code Execution (RCE) in applications employing affected versions of the cmark-gfm
library. This vulnerability has been patched in the following cmark-gfm versions 0.29.0.gfm.3 and 0.28.3.gfm.21. A workaround is available. The vulnerability exists in the table markdown extensions of cmark-gfm. Disabling the table extension will prevent this vulnerability from being triggered.
Kind regards, Andreas.
Motivated by a particular use I have, but presumably there's some sort of intermediate representation available in commonmark? Any chance of exposing that through R so that custom renderers can be written?
Thinking of something that could render markdown to marked up text using crayon in the terminal...
But perhaps I'm way off how this actually works under the hood.
I may be misunderstanding the prescribed usage of the tagfilter
extension, but it doesn't seem to be working.
library(commonmark)
markdown_commonmark("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"
markdown_html("<title><style></style></title>", extensions = "tagfilter")
#> [1] "<title><style></style></title>\n"
In both cases, the spec indicates that we should expect
"<title><style></style></title>\n"
devtools::session_info()
#> โ Session info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> setting value
#> version R version 3.6.3 Patched (2020-04-28 r79534)
#> os macOS 10.16
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2020-12-28
#>
#> โ Packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] standard (@0.2.1)
#> callr 3.5.1 2020-10-13 [1] standard (@3.5.1)
#> cli 2.2.0 2020-11-20 [1] standard (@2.2.0)
#> commonmark * 1.7 2018-12-01 [1] standard (@1.7)
#> crayon 1.3.4 2017-09-16 [1] standard (@1.3.4)
#> desc 1.2.0 2018-05-01 [1] standard (@1.2.0)
#> devtools 2.3.2 2020-09-18 [1] standard (@2.3.2)
#> digest 0.6.27 2020-10-24 [1] standard (@0.6.27)
#> ellipsis 0.3.1 2020-05-15 [1] standard (@0.3.1)
#> evaluate 0.14 2019-05-28 [1] standard (@0.14)
#> fansi 0.4.1 2020-01-08 [1] standard (@0.4.1)
#> fs 1.5.0 2020-07-31 [1] standard (@1.5.0)
#> glue 1.4.2 2020-08-27 [1] standard (@1.4.2)
#> highr 0.8 2019-03-20 [1] standard (@0.8)
#> htmltools 0.5.0.9003 2020-12-04 [1] Github (rstudio/htmltools@d18bd8e)
#> knitr 1.30 2020-09-22 [1] standard (@1.30)
#> lifecycle 0.2.0 2020-03-06 [1] standard (@0.2.0)
#> magrittr 2.0.1 2020-11-17 [1] standard (@2.0.1)
#> memoise 1.1.0 2017-04-21 [1] standard (@1.1.0)
#> pkgbuild 1.1.0 2020-07-13 [1] standard (@1.1.0)
#> pkgload 1.1.0 2020-05-29 [1] standard (@1.1.0)
#> prettyunits 1.1.1 2020-01-24 [1] standard (@1.1.1)
#> processx 3.4.4 2020-09-03 [1] standard (@3.4.4)
#> ps 1.4.0 2020-10-07 [1] standard (@1.4.0)
#> purrr 0.3.4 2020-04-17 [1] standard (@0.3.4)
#> R6 2.5.0 2020-10-28 [1] standard (@2.5.0)
#> remotes 2.2.0 2020-07-21 [1] standard (@2.2.0)
#> rlang 0.4.9 2020-11-26 [1] standard (@0.4.9)
#> rmarkdown 2.5 2020-10-21 [1] standard (@2.5)
#> rprojroot 2.0.2 2020-11-15 [1] standard (@2.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] standard (@1.1.1)
#> stringi 1.5.3 2020-09-09 [1] standard (@1.5.3)
#> stringr 1.4.0 2019-02-10 [1] standard (@1.4.0)
#> testthat 3.0.0 2020-10-31 [1] standard (@3.0.0)
#> usethis 2.0.0.9000 2020-12-10 [1] Github (r-lib/usethis@f96bf2e)
#> withr 2.3.0 2020-09-22 [1] standard (@2.3.0)
#> xfun 0.19 2020-10-30 [1] standard (@0.19)
#> yaml 2.2.1 2020-02-01 [1] standard (@2.2.1)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
The master
branch of this repository will soon be renamed to main
, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.
That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master
--> main
change.
The purpose of this issue is to:
message id: euphoric_snowdog
I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.
Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.
I noticed this in ropensci/tinkr#99, but subscript text (at least with pandoc markdown) is converted to strikethrough (with two ~
) on roundtrip between markdown -> XML -> markdown
commonmark::markdown_commonmark("H~2~O", extensions = TRUE)
#> [1] "H~~2~~O\n"
Created on 2023-08-18 with reprex v2.0.2
I am trying to parse this Markdown file
It's full of empty lines due to knitr rendering it from Rmd I guess. On GitHub it renders well. But when I try to parse it I cannot get the structure that's in the .Rmd: the table is either separated in different blocks, or if I remove empty lines, it gets glued to the rest of the README.
rmd <- "https://raw.githubusercontent.com/ropensci/drake/master/README.Rmd"
md <- "https://raw.githubusercontent.com/ropensci/drake/master/README.md"
library("magrittr")
rmd %>%
readLines() %>%
commonmark::markdown_xml(extensions = TRUE) %>%
xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <thematic_break/>
#> [2] <heading level="2">\n <text>output:</text>\n <softbreak/>\n <tex ...
#> [3] <html_block><!-- README.md is generated from README.Rmd. Please ...
#> [4] <code_block info="{r knitrsetup, echo = FALSE}">knitr::opts_chunk$s ...
#> [5] <code_block info="{r mainexample, echo = FALSE}">suppressMessages(s ...
#> [6] <html_block><center>\n<img src="https://ropensci.github.io ...
#> [7] <html_block><table class="table"><thead><tr class="h ...
#> [8] <heading level="1">\n <text>The drake R package </text>\n <html_i ...
#> [9] <paragraph>\n <code>drake</code>\n <text> โ or, Data Frames in R ...
#> [10] <heading level="1">\n <text>What gets done stays done.</text>\n</h ...
#> [11] <paragraph>\n <text>Too many data science projects follow a </text ...
#> [12] <list type="ordered" start="1" delim="period" tight="true">\n <ite ...
#> [13] <paragraph>\n <text>It is hard to avoid restarting from scratch.</ ...
#> [14] <html_block><center>\n<a href="https://twitter.com/fossilo ...
#> [15] <paragraph>\n <text>With </text>\n <code>drake</code>\n <text>, ...
#> [16] <list type="ordered" start="1" delim="period" tight="true">\n <ite ...
#> [17] <heading level="1">\n <text>How it works</text>\n</heading>
#> [18] <paragraph>\n <text>To set up a project, load your packages,</text ...
#> [19] <code_block info="{r mainpackages}">library(drake)\nlibrary(dplyr)\ ...
#> [20] <paragraph>\n <text>load your custom functions,</text>\n</paragraph>
#> ...
md %>%
readLines() %>%
commonmark::markdown_xml(extensions = FALSE) %>%
xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <html_block><!-- README.md is generated from README.Rmd. Please ...
#> [2] <html_block><center>\n</html_block>
#> [3] <html_block><img src="https://ropensci.github.io/drake/images/in ...
#> [4] <html_block></center>\n</html_block>
#> [5] <html_block><table class="table">\n</html_block>
#> [6] <html_block><thead>\n</html_block>
#> [7] <html_block><tr class="header">\n</html_block>
#> [8] <html_block><th align="left">\n</html_block>
#> [9] <paragraph>\n <text>Release</text>\n</paragraph>
#> [10] <html_block></th>\n</html_block>
#> [11] <html_block><th align="left">\n</html_block>
#> [12] <paragraph>\n <text>Usage</text>\n</paragraph>
#> [13] <html_block></th>\n</html_block>
#> [14] <html_block><th align="left">\n</html_block>
#> [15] <paragraph>\n <text>Development</text>\n</paragraph>
#> [16] <html_block></th>\n</html_block>
#> [17] <html_block></tr>\n</html_block>
#> [18] <html_block></thead>\n</html_block>
#> [19] <html_block><tbody>\n</html_block>
#> [20] <html_block><tr class="odd">\n</html_block>
#> ...
md %>%
readLines() %>%
.[. != ""] %>%
commonmark::markdown_xml(extensions = FALSE) %>%
xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <html_block><!-- README.md is generated from README.Rmd. Please e ...
#> [2] <html_block><center>\n<img src="https://ropensci.github.io/ ...
Created on 2018-09-04 by the reprex package (v0.2.0).
I don't know why, but commonmark_html()
and commonmark_xml()
doesn't work well on the console of RStudio Server on CentOS 7.
More precisely, if I install commonmark package 1.6 and run commonmark::markdown_html("## foo")
,
kernel: rsession[93474]: segfault at 7f1792c96811 ip 00007f17a3331c97 sp 00007fff29b6f998 error 6 in libc-2.17.so[7f17a31df000+1b8000]
)"<h2></h2>\n"
).Note that, it returns the correct result ("<h2>foo</h2>\n"
) if I run the code
R
REPL on RStudio's terminal paneR
REPL on an SSH sessiondocker run -p 8787:8787 -it --rm centos:7 bash
# Download RPM (`1.1.463` is the stable version)
curl -o rstudio.rpm https://download2.rstudio.org/rstudio-server-rhel-1.1.463-x86_64.rpm
yum install rstudio.rpm
# Add rstudio user
useradd rstudio
passwd rstudio
# Launch RStudio Server
/usr/lib/rstudio-server/bin/rserver
http://localhost:8787
and logincommonmark::markdown_html("## foo")
How hard would it be? Then I could use it in roxygen
See commonmark/cmark#43. Wait for fix upstream or patch downstream.
commonmark supports both bare square brackets and escaped square brackets. When the enter the parser, there's no indication which set of brackets were escaped and which ones were bare:
commonmark::markdown_xml("[bare brackets] \\[escaped brackets\\]") |> writeLines()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#> <paragraph>
#> <text xml:space="preserve">[bare brackets] [escaped brackets]</text>
#> </paragraph>
#> </document>
Created on 2022-09-19 with reprex v2.0.2
Is there a way to have the parser indicate which characters were escaped in the source document?
Asking because of my tinkr
package.
I first saw this locally then checked that same is happening on CRAN. Something about the most recent common mark release seems to have changed behaviour that is tested in roxygen2.
https://cran.r-project.org/web/checks/check_results_roxygen2.html
Link to libxml2
and use xpath to parse a NEWS file based on markdown_xml
output.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.