GithubHelp home page GithubHelp logo

rehypejs / rehype-sanitize Goto Github PK

View Code? Open in Web Editor NEW
129.0 9.0 2.0 112 KB

plugin to sanitize HTML

Home Page: https://unifiedjs.com

License: MIT License

JavaScript 100.00%
rehype rehype-plugin sanitize clean html

rehype-sanitize's People

Contributors

greenkeeperio-bot avatar pd4d10 avatar trebeljahr avatar wooorm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

wiatt1706 teleki

rehype-sanitize's Issues

Schema ancestors are not respected

Subject of the issue

A schemas ancestors property is not respected.

Your environment

Steps to reproduce

Simple schema:

{
  "ancestors": {
    "li": ["ul"]
  },
  "tagNames": [
    "div",
    "ul",
    "li"
  ]
}

HTML:

<div>
  <li>List Item</li>
</div>

Expected behavior

Expecting the resulting tree to exclude the li tag:

root[1]
│ data: {"quirksMode":true}
└─0 element<div>[3]
    │ properties: {}
    ├─0 text "\n  "
    ├─1 text "List Item"
    └─2 text "\n"

Actual behavior

The li tag is still included:

root[1]
│ data: {"quirksMode":true}
└─0 element<div>[3]
    │ properties: {}
    ├─0 text "\n  "
    ├─1 element<li>[1]
    │   │ properties: {}
    │   └─0 text "List Item"
    └─2 text "\n"

I investigated a bit and it seems like this bug has appeared because of this: syntax-tree/hast-util-sanitize@19631bb#diff-92bbac9a308cd5fcf9db165841f2d90ce981baddcb2b1e26cfff170929af3bd1R252. I can create a PR for it if you want.

Some attributes explicitly allowed are being removed

Subject of the issue

I'm trying to sanitize html and some explicitly allowed tags and attributes are being removed when they are expected to remain.

Your environment

  • OS: Ubuntu 20.04
  • Packages:

The packages were installed with:

$ yarn add rehype-sanitize unified rehype-parse rehype-stringify to-vfile

version list:

$ yarn list --pattern "rehype-sanitize|unified|rehype-parse|rehype-stringify|to-vfile"
yarn list v1.22.5
├─ [email protected]
│  └─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
│  └─ [email protected]
├─ [email protected]
└─ [email protected]
  • Env: node v14.5.0

Steps to reproduce

full minimal reproduction:

original html sample (page.html):


<table class="table">
  <tbody>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="firstName">
          Your first name
        </label>
      </td>
      <td>
        <input
          type="text"
          class="form-control"
          name="firstName"
          placeholder="E.g. John"
          autofocus
          required
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="lastName">
          Your last name
        </label>
      </td>
      <td>
        <input
          type="text"
          class="form-control"
          name="lastName"
          placeholder="E.g. Doe"
          autofocus
          required
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="email">
          Your work email<sup></sup>
        </label>
      </td>
      <td>
        <input
          type="email"
          class="form-control"
          name="email"
          placeholder="E.g. [email protected]"
          required
          inputmode="email"
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="country">
          Country<sup></sup>
        </label>
      </td>
      <td>
        <select class="custom-select" required name="country">
          <option value>Select your country...</option>
          <option value="us">United States</option>
          <option value="fr">France</option>
          <option value="es">Japan</option>
        </select>
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <label
          class="text-capitalize text-center"
          for="text"
          style="margin-top: 10px"
        >
          Your message<sup></sup>
        </label>
        <div class="form-group">
          <textarea
            class="form-control"
            rows="10"
            placeholder="Write your message text here..."
            required
            name="text"
            spellcheck="true"
          ></textarea>
        </div>
      </td>
    </tr>
    <tr>
      <td></td>
      <td class="text-right">
        <button
          class="btn btn-outline-light btn-lg text-capitalize"
          id="submit-contact-message"
          type="submit"
        >
          Send message
        </button>
      </td>
    </tr>
  </tbody>
</table>
<picture aria-label="My label">
  <source srcset="1.webp 128w" type="image/webp" />
  <source srcset="1.png 128w" type="image/png" />
  <img class="someClass" alt="my-image" data-src="img/1.png"
/></picture>

the script that uses rehype-sanitize (index.js):

const unified = require("unified");
const parser = require("rehype-parse");
const stringify = require("rehype-stringify");
const toVfile = require("to-vfile");
const fs = require("fs-extra");
const sanitize = require("rehype-sanitize");

sanitizeHTML();

function sanitizeHTML() {
  var schema = fs.readJSONSync("./sanitize-schema.json");
  unified()
    .use(parser, {
      fragment: true,
    })
    .use(sanitize, schema)
    .use(stringify)
    .process(toVfile.readSync("./page.html"), (err, file) => {
      if (err) {
        throw new Error(err);
      }
      fs.writeFileSync("./sanitized.html", String(file));
    });
}

the JSON schema I'm using (sanitize-schema.json):

{
  "strip": ["script"],
  "clobberPrefix": "user-content-",
  "clobber": [],
  "ancestors": {
    "tbody": ["table"],
    "tfoot": ["table"],
    "thead": ["table"],
    "td": ["table"],
    "th": ["table"],
    "tr": ["table"],
    "li": ["ol", "ul"]
  },
  "protocols": {
    "href": ["http", "https", "mailto", "xmpp", "irc", "ircs"],
    "cite": ["http", "https"],
    "src": ["http", "https"],
    "longDesc": ["http", "https"]
  },
  "tagNames": [
    "h1",
    "h2",
    "h3",
    "h4",
    "h5",
    "h6",
    "br",
    "b",
    "i",
    "strong",
    "em",
    "a",
    "pre",
    "code",
    "img",
    "tt",
    "div",
    "ins",
    "del",
    "sup",
    "sub",
    "p",
    "ol",
    "ul",
    "table",
    "thead",
    "tbody",
    "tfoot",
    "blockquote",
    "dl",
    "dt",
    "dd",
    "kbd",
    "q",
    "samp",
    "var",
    "hr",
    "ruby",
    "rt",
    "rp",
    "li",
    "tr",
    "td",
    "th",
    "s",
    "strike",
    "summary",
    "details",
    "caption",
    "figure",
    "figcaption",
    "abbr",
    "bdo",
    "cite",
    "dfn",
    "mark",
    "small",
    "span",
    "time",
    "wbr",
    "input",
    "aside",
    "body",
    "button",
    "cite",
    "details",
    "footer",
    "head",
    "header",
    "html",
    "label",
    "link",
    "main",
    "meta",
    "nav",
    "picture",
    "section",
    "select",
    "option",
    "source",
    "strike",
    "summary",
    "svg",
    "textarea",
    "title"
  ],
  "attributes": {
    "a": ["href"],
    "img": ["src", "longDesc"],
    "input": ["placeholder", "type", "autofocus", "required", "inputmode"],
    "li": ["className"],
    "div": ["itemScope", "itemType"],
    "blockquote": ["cite"],
    "del": ["cite"],
    "ins": ["cite"],
    "q": ["cite"],
    "*": [
      "abbr",
      "accept",
      "acceptCharset",
      "accessKey",
      "action",
      "align",
      "alt",
      "ariaDescribedBy",
      "ariaHidden",
      "ariaLabel",
      "ariaLabelledBy",
      "axis",
      "border",
      "cellPadding",
      "cellSpacing",
      "char",
      "charOff",
      "charSet",
      "checked",
      "clear",
      "cols",
      "colSpan",
      "color",
      "compact",
      "coords",
      "dateTime",
      "dir",
      "disabled",
      "encType",
      "htmlFor",
      "frame",
      "headers",
      "height",
      "hrefLang",
      "hSpace",
      "isMap",
      "id",
      "label",
      "lang",
      "maxLength",
      "media",
      "method",
      "multiple",
      "name",
      "noHref",
      "noShade",
      "noWrap",
      "open",
      "prompt",
      "readOnly",
      "rel",
      "rev",
      "rows",
      "rowSpan",
      "rules",
      "scope",
      "selected",
      "shape",
      "size",
      "span",
      "start",
      "summary",
      "tabIndex",
      "target",
      "title",
      "type",
      "useMap",
      "vAlign",
      "value",
      "vSpace",
      "width",
      "itemProp",
      "ariaControls",
      "ariaExpanded",
      "className",
      "contenteditable",
      "data*",
      "role",
      "spellcheck",
      "style"
    ],
    "link": ["href"],
    "meta": ["content", "name", "property"],
    "source": ["srcset", "type"],
    "label": ["for"],
    "textarea": ["placeholder", "required", "autofocus", "inputmode"]
  },
  "required": {},
  "allowComments": true,
  "allowDoctypes": true
}

Run script with

$ node index.js

output html (sanitized.html):

<table class="table">
  <tbody>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="firstName">
          Your first name
        </label>
      </td>
      <td>
        <input
          type="text"
          class="form-control"
          name="firstName"
          placeholder="E.g. John"
          required
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="lastName">
          Your last name
        </label>
      </td>
      <td>
        <input
          type="text"
          class="form-control"
          name="lastName"
          placeholder="E.g. Doe"
          required
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="email">
          Your work email<sup></sup>
        </label>
      </td>
      <td>
        <input
          type="email"
          class="form-control"
          name="email"
          placeholder="E.g. [email protected]"
          required
        />
      </td>
    </tr>
    <tr>
      <td>
        <label class="text-capitalize text-center" for="country">
          Country<sup></sup>
        </label>
      </td>
      <td>
        <select class="custom-select" name="country">
          <option value="">Select your country...</option>
          <option value="us">United States</option>
          <option value="fr">France</option>
          <option value="es">Japan</option>
        </select>
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <label
          class="text-capitalize text-center"
          for="text"
          style="margin-top: 10px"
        >
          Your message<sup></sup>
        </label>
        <div class="form-group">
          <textarea
            class="form-control"
            rows="10"
            placeholder="Write your message text here..."
            required
            name="text"
          ></textarea>
        </div>
      </td>
    </tr>
    <tr>
      <td></td>
      <td class="text-right">
        <button
          class="btn btn-outline-light btn-lg text-capitalize"
          id="submit-contact-message"
          type="submit"
        >
          Send message
        </button>
      </td>
    </tr>
  </tbody>
</table>
<picture aria-label="My label">
  <source type="image/webp" />
  <source type="image/png" />
  <img class="someClass" alt="my-image" data-src="img/1.png"
/></picture>

things to note:

  • the autofocus attribute is removed everywhere despite being explicitly allowed with "input": ["placeholder", "type", "autofocus", "required", "inputmode"],
  • the srcset attribute is removed everywhere despite being allowed with "source": ["srcset", "type"],
  • the same happens with inputmode attribute despite being allowed "input": ["placeholder", "type", "autofocus", "required", "inputmode"],
  • same thing happens with spellcheck despite being allowed under "*"

Expected behavior

The explicitly allowed attributes should remain in the sanitized html.

Actual behavior

Some explicitly allowed attributes are removed from the sanitized output as noted above.

h2 id is consistently removed

Initial checklist

Affected packages and versions

6.0.0

Link to runnable example

No response

Steps to reproduce

no build or bundle tools, only the specific npms required to run this bit of code.

import {unified} from 'unified'
import rehypeParse from 'rehype-parse'
import rehypeSlug from 'rehype-slug'
import rehypeSanitize from 'rehype-sanitize'
import rehypeStringify from 'rehype-stringify'

  unified()
    .use(rehypeParse)
    .use(rehypeSlug)
    .use(rehypeSanitize)
    .use(rehypeStringify)
    .processSync('<h1>foo</h1><h2>bar</h2><h3>baz</h3>')

Expected behavior

h2 to have an id:

<h1 id="user-content-foo">foo</h1>
<h2 id="user-content-bar">bar</h2>
<h3 id="user-content-baz">baz</h3>

Actual behavior

h2 does not have an id:

<h1 id="user-content-foo">foo</h1>
<h2>bar</h2>
<h3 id="user-content-baz">baz</h3>

Runtime

Node v16

Package manager

npm 8

OS

Linux

Build and bundle tools

Other (please specify in steps to reproduce)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.