GithubHelp home page GithubHelp logo

Comments (15)

maestrow avatar maestrow commented on May 19, 2024 3

For those, who is looking for a way to use html tags in markdown with remark-parse, I leave this recipe here. Thanks to @ChristianMurphy for his suggestion. I've made just couple improvements:

  1. Module rehype-dom-parse leads to error: document is not defined. So I replace it with rehype-parse.
  2. Extract rehypeParser from handler, so it's created only once.
  3. Also notice about sanitize: false

You can try this snippet in console:

var unified = require('unified')
var remark = require('remark-parse')
var remark2react = require('remark-react');
var ReactDOMServer = require('react-dom/server');
var rehype = require('rehype-parse')

const sample = `
markdown is here
<div style="color:gray;">
text <a href="#">link</a>
</div>
`

const rehypeParser = unified().use(rehype, { fragment: true });

const parser = unified()
  .use(remark)
  .use(remark2react, {
    toHast: {
      handlers: {
        html: (h, node) => 
          // process raw HTML text into HAST so react remark can process it
          rehypeParser.parse(node.value).children
      }
    },
    sanitize: false
  });

const result = parser.processSync(sample)
const html = ReactDOMServer.renderToStaticMarkup(result.contents)
console.log(html)

output:

<p>markdown is here</p>
<div style="color:gray">
text <a href="#">link</a>
</div>

from remark-react.

ChristianMurphy avatar ChristianMurphy commented on May 19, 2024 2

@Hamms what I've been using is the toHast option
https://github.com/remarkjs/remark-react#optionstohast

With:

// allow inline html to be rendered as React VDOM
import rehype from 'rehype-dom-parse';

// ....
     .use(remarkReact, {
          toHast: {
            handlers: {
              html: (h, node) =>
                // process raw HTML text into HAST so react remark can process it
                unified()
                  .use(rehype, { fragment: true })
                  .parse(node.value).children
            }
          }
        }
      )

from remark-react.

Hamms avatar Hamms commented on May 19, 2024 1

I guess I'm still confused about the intent of this library, then.

The readme implies that the purpose is to be able to render markdown safely, with the features of hast-util-sanitize. But as it's built, the only nodes that end up in the tree that gets sent to hast-util-sanitize are either those specifically by remark - which are already safe - or raw nodes which may contain something unsafe but which are entirely removed by hast-util-sanitize (and also by hast-to-hyperscript), making the existence of sanitization in this project both redundant and misleading.

Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?

from remark-react.

wooorm avatar wooorm commented on May 19, 2024

Hmm, I believe you could also do what @ChristianMurphy suggests, but with a dangerouslySetInnerHTML: {__html: '...'} prop?

Maybe we need a top-level allowDangerousHTML option to do this by default?

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

It's not clear to me why the allowDangerousHTML option I'm currently passing via toHast isn't working. Is it supposed to?

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

Looking closer, this doesn't appear to be an issue with toHast at all, but rather with both the sanitization step and the hast-to-hyperscript step, both of which strip out the raw nodes generated by mdast-util-to-hast.

Taking a step back, it's not clear to me what the intent of this library is. For my purposes, I'd like to be able to render markdown with raw html inside, but also to sanitize that content against "dangerous" html like https://github.com/syntax-tree/hast-util-sanitize does. So, given an input of

_some_ <strong>raw</strong> html

I get

<p><em>some</em> <strong>raw</strong> html</p>

But given an input of

_some_ <a onclick="alert('hello')">strong</a> html

I get

<p><em>some</em> <a>raw</a> html</p>

I had assumed that was the point of this library, but most of the functionality of hast-util-sanitize only works on actual hast nodes like script and a; when it encounters the raw nodes generated by mdast-util-to-hast it simply removes them rather than sanitizing them.

Am I misunderstanding something, or is it not possible to achieve what I want with this tool?

from remark-react.

ChristianMurphy avatar ChristianMurphy commented on May 19, 2024

@Hamms the sanitization can be configured with https://github.com/remarkjs/remark-react#optionssanitize

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

Yes, I'm aware of that. But the problem remains that hast-util-sanitize does not appear to be capable of sanitizing raw nodes at all except by eliminating them, meaning that it's not actually capable of sanitizing the input from mdast-util-to-hast.

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

Ahhh, it seems like what I actually want to do is to incorporate https://github.com/syntax-tree/hast-util-raw into my process

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

I've confirmed that adding hast-util-raw to the remark-react process makes this work:

diff --git a/index.js b/index.js
index c85e599..6ee8ff8 100644
--- a/index.js
+++ b/index.js
@@ -7,6 +7,8 @@ var sanitize = require('hast-util-sanitize')
 var toH = require('hast-to-hyperscript')
 var tableCellStyle = require('@mapbox/hast-util-table-cell-style')
 
+var raw = require('hast-util-raw')
+
 var globalReact
 var globalCreateElement
 var globalFragment
@@ -46,6 +48,10 @@ function react(options) {
     var tree = toHAST(node, toHastOptions)
     var root
 
+    if (toHastOptions.allowDangerousHTML) {
+      tree = raw(tree)
+    }
+
     if (clean) {
       tree = sanitize(tree, scheme)
     }
diff --git a/package.json b/package.json
index b6d5f8c..36fbe02 100644
--- a/package.json
+++ b/package.json
@@ -28,6 +28,7 @@
   "dependencies": {
     "@mapbox/hast-util-table-cell-style": "^0.1.3",
     "hast-to-hyperscript": "^6.0.0",
+    "hast-util-raw": "^5.0.0",
     "hast-util-sanitize": "^1.0.0",
     "mdast-util-to-hast": "^4.0.0"
   },
diff --git a/test/index.js b/test/index.js
index d6cb228..0e83a8c 100644
--- a/test/index.js
+++ b/test/index.js
@@ -121,6 +121,33 @@ versions.forEach(function(reactVersion) {
       'passes toHast options to inner toHAST() function'
     )
 
+    t.equal(
+      React.renderToStaticMarkup(
+        remark()
+          .use(reactRenderer, {
+            createElement: React.createElement,
+            toHast: {allowDangerousHTML: true}
+          })
+          .processSync('<strong>raw</strong> html').contents
+      ),
+      '<p><strong>raw</strong> html</p>',
+      'renders raw html when specified'
+    )
+
+    t.equal(
+      React.renderToStaticMarkup(
+        remark()
+          .use(reactRenderer, {
+            createElement: React.createElement,
+            toHast: {allowDangerousHTML: true}
+          })
+          .processSync('<a onclick="alert(&#x22;charlie&#x22;)">delta</a>')
+          .contents
+      ),
+      '<p><a>delta</a></p>',
+      'raw html is sanitized'
+    )
+
     fixtures.forEach(function(name) {
       var base = path.join(root, name)
       var input = fs.readFileSync(path.join(base, 'input.md'))

Any objections to me opening a PR with the above change?

from remark-react.

wooorm avatar wooorm commented on May 19, 2024

Yes, I do object!

Because including hast-util-raw (or rehype-raw) includes a full blown HTML parser. And that’s really heavy on the browser.

I would suggest people that want that to go remark -> remark-rehype -> rehype-raw -> rehype-sanitize -> rehype-react instead.
We could add a note here in the readme, similar to the note in the intro here, though?

from remark-react.

wooorm avatar wooorm commented on May 19, 2024

I guess I'm still confused about the intent of this library, then.

Definitely something we should fix!

The readme implies that the purpose is to be able to render markdown safely

True! But also that it doesn’t use .dangerouslySetInnerHTML, and including raw nodes kinda defeats that purpose.

But as it's built, the only nodes that end up in the tree [...]

And also anything from hName, hProperties, hChildren, which could be anything, so I disagree with “making the existence of sanitization in this project both redundant and misleading”. (Although we should fix the “misleading” part)

Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?

The point here is to allow a simple markdown to react renderer that is safe. That includes not rendering unsafe HTML. Not sanitising HTML at all would be super unsafe. Not being safe by default would be really bad for XSS and the like.
Raw HTML is an escape hatch for markdown that is inherently unsafe. Except if you really know what you’re doing. In which case you need to include an HTML parser. And then the route is remark -> rehype -> rehype-raw -> rehype-react, which we should definitely document better!

from remark-react.

Hamms avatar Hamms commented on May 19, 2024

Definitely something we should fix!

I appreciate it! :)

Can you give me an example of markdown input without raw html that would result in unsafe HTML output? I think that would help me understand the concerns this library is intended to protect against.

from remark-react.

wooorm avatar wooorm commented on May 19, 2024

The XSS problems stem from plugin use, for example, this tree: https://github.com/syntax-tree/hast-util-sanitize#usage

from remark-react.

wooorm avatar wooorm commented on May 19, 2024

(Or, if the user could write that HTML themselves in Markdown, which would be possible with allowDangerousHTML)

from remark-react.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.