GithubHelp home page GithubHelp logo

bevacqua / insane Goto Github PK

View Code? Open in Web Editor NEW
445.0 13.0 22.0 345 KB

:pouting_cat: Lean and configurable whitelist-oriented HTML sanitizer

Home Page: https://ponyfoo.com

License: MIT License

JavaScript 100.00%
markdown html html-sanitizer

insane's Introduction

insane

Lean and configurable whitelist-oriented HTML sanitizer

Works well in browsers, as its footprint size is very small (around ~2kb gzipped). API inspired by sanitize-html (which is around 100kb gzipped).

You would be insane not to use this!

Install

npm install insane --save

Usage

insane('<div>foo<span>bar</span></div>', { allowedTags: ['div'] })
// <- '<div>foo</div>'

Contrary to similar sanitizers, insane drops the whole tree of descendants for elements that aren't allowed tags.

API

insane(html, options?, strict?)

  • html can be an arbitrary HTML string
  • options are detailed below
  • strict means that options won't be based off of insane.defaults if set to true

The parser takes into account that some elements can be self-closing. For safety reasons the sanitizer will only accept a valid URL for background, base, cite, href, longdesc, src, and usemap elements. "Valid URL" means that it begins with either #, /, or any of options.allowedSchemes (followed by :).

options

Sensible defaults are provided. You can override specific options as needed.

allowedSchemes

Defaults to ['http', 'https', 'mailto'].

allowedTags

An array of tags that you'll allow in the resulting HTML.

Example

Only allow spans, discarding the rest of elements.

insane('<div>foo</div><span>bar</span>', {
  allowedTags: ['span']
});
// <- '<span>bar</span>'

allowedAttributes

An object describing the attributes you'll allow for each individual tag name.

Example

Only allow spans, and only allow those spans to have an id (discarding the rest of their attributes).

insane('<span id="bar" class="super">bar</span>', {
  allowedTags: ['span'],
  allowedAttributes: { span: ['id'] }
});
// <- '<span id="bar">bar</span>'

allowedClasses

If 'class' is listed as an allowed attribute, every single class will be allowed. If you don't list 'class' as an allowed attribute, you can provide a class whitelist per tag name.

Example

Only allow spans to have super or bad class names, discarding the rest of them.

insane('<span class="super mean and bad">bar</span>', {
  allowedTags: ['span'],
  allowedClasses: { span: ['super', 'bad'] }
});
// <- '<span class="super bad">bar</span>'

filter

Takes a function(token) that allows you to do additional validation beyond exact tag name and attribute matching. The token object passed to your filter contains the following properties.

  • tag is the lowercase tag name of the element
  • attrs is an object containing every attribute in the element, including those that may not be in the whitelist

If you return a falsy value the element and all of its descendants will not be included in the output. Note that you are allowed to change the attrs, and even add new ones, transforming the output.

Example

Require that <span> elements have an aria-label value.

function filter (token) {
  return token.tag !== 'span' || token.attrs['aria-label'];
}
insane('<span aria-label="a foo">foo</span><span>bar</span>', {
  allowedTags: ['span'],
  allowedAttributes: { span: ['aria-label'] },
  filter: filter
});
// <- '<span aria-label="a foo">foo</span>'

transformText

Takes a function(text) that allows you to modify text content in HTML elements. Runs for every piece of text content. The returned value is used instead of the original text contents.

Defaults

The default configuration is used if you don't provide any. This object is available at insane.defaults. You are free to manipulate the defaults themselves.

{
  "allowedAttributes": {
    "a": ["href", "name", "target"],
    "iframe": ["allowfullscreen", "frameborder", "src"],
    "img": ["src"]
  },
  "allowedClasses": {},
  "allowedSchemes": ["http", "https", "mailto"],
  "allowedTags": [
    "a", "article", "b", "blockquote", "br", "caption", "code", "del", "details", "div", "em",
    "h1", "h2", "h3", "h4", "h5", "h6", "hr", "i", "img", "ins", "kbd", "li", "main", "ol",
    "p", "pre", "section", "span", "strike", "strong", "sub", "summary", "sup", "table",
    "tbody", "td", "th", "thead", "tr", "u", "ul"
  ],
  "filter": null,
  "transformText": null
}

License

MIT

insane's People

Contributors

artskydj avatar bevacqua avatar cassiozen avatar markstos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

insane's Issues

add typescript

this package is amazing, please consideer adding typescript or accept a pull request that does it

Allow everything except <script>?

Is there any way to say that I accept EVERYTHING except something specific? I am interested in being able to put any kind of class, tag, etc. Except the <script> tag everything else yes.

Thank you very much

Crashes under default config when given '<div class>'

insane('<div class>');

You'll get aTypeError: Cannot read property 'split' of undefined in sanitizer.js from:

if (lkey === 'class' && attrsOk.indexOf(lkey) === -1) {
  value = value.split(' ').filter(isValidClass).join(' ').trim();
  valid = value.length;
} else {

This is because value is undefined. I'd suggest the two lines be changed to:

if (value) {
  value = value.split(' ').filter(isValidClass).join(' ').trim()
}
valid = !value || value.length;

Consider `createHTMLDocument` instead of custom parser

insane has a great API. Though these days (if <=IE8 can be ignored) I would humbly submit there are better possible implementations.

The browser is already good at parsing HTML. Why not scrap the homegrown parser and just let the browser do what it's good at?

var doc = document.implementation.createHTMLDocument();
doc.body.innerHTML = inputHTMLString;

Boom. Now doc.body is your parsed dom.

And sanitizer is really just a mechanism to query the dom. Which browsers can also do pretty well:

var nodes = doc.querySelectorAll(elementsToDrop.join(','));

All in all, insane could quickly be implemented thusly:

function insane (html, options, strict) {
  var doc = document.implementation.createHTMLDocument();
  doc.body.innerHTML = html;

  doc.querySelectorAll(":not(" + options.allowedTags.join(',') + ")").each(function(node){
    node.parentNode.removeChild(node);
  });

  return doc.body.innerHTML;
}

Granted, this implementation doesn't meet the full api, but with some minor changes, it could very well do so. And would come in significantly smaller :)

svg viewBox is blank

insane(dirty, {
"allowedTags": [
"svg", "path"
],
"allowedAttributes": {
"svg": ["width", "height", "viewBox", "preserveAspectRatio"],
"path": ["d"]
},
})

result:

<svg width="400em" height="1.08em"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg>

viewBox and preserveAspectRatio is blank

XSS Attack Vulnerable

Hi,

I noticed that you guys don't have a security policy so I wasn't sure where to put this.

I have a demo of a XSS Attack with this library and wanted to make sure it was addressed since this package is about sanatizing markdown to prevent XSS Attacks.

My email is [email protected].

If I don't get an email in a few days I'll post the demo & code here.

Need to include bundled version of the library

Hi!
It would be great if you'd include bundled version of the library for example into dist directory. It is useful for those who wants to use your library in environments that don't fully support CommonJS format.

Yes I know I can always bundle the code with browserify, but It would be more convenient to have the bundled version "out of the box" after npm i insane.

HTML Parser should handle multiple attributes without spaces

const testHtml = '<a target="_blank" rel="noopener noreferrernofollow"href="https://tusi.cn/u/639278639053211787"><strong>β†’β†’β†’β†’β†’β†’more←←←←←←</strong></a>'

const result  = insane(testHtml)

console.log({result}) // it will be '', not the result i expect '<a target="_blank" href="https://tusi.cn/u/639278639053211787"><strong>β†’β†’β†’β†’β†’β†’more←←←←←←</strong></a>'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.