jitbit / htmlsanitizer Goto Github PK
View Code? Open in Web Editor NEWFast JavaScript HTML Sanitizer, client-side (i.e. needs a browser, won't work in Node and other backend)
Home Page: https://www.jitbit.com
License: MIT License
Fast JavaScript HTML Sanitizer, client-side (i.e. needs a browser, won't work in Node and other backend)
Home Page: https://www.jitbit.com
License: MIT License
i want to have a very restrictive tag list
const _tagWhitelist = { 'A': true, 'B': true, 'BODY': true, 'BR': true, };
but if i remove 'BODY': true,
i get an javascript error in console saying
Uncaught TypeError: resultElement.innerHTML is undefined
I see the license file is for MIT, but the HtmlSanitizer.js file still says:
//License: GNU GPL v3 https://github.com/jitbit/HtmlSanitizer/blob/master/LICENSE
perhaps you want to make them match?
Thanks for posting your work.
Trying to sanitize a small bit of html which includes data attributes that is built based on a JSON response from a 3rd party API.
Is there a way to whitelist any data attributes ex: (data-{attribute-name}), or do I need to add all data attributes I have to the attributes whitelist?
It will be great if this is published as a ES6 module, with proper exports, so I can just do:
import HtmlSanitizer from '@jitbit/htmlsanitizer'
Execute SanitizeHtml with HTML contains DOM clobbering lead to unexpected results:
HtmlSanitizer.SanitizeHtml("<img name=createElement>");
// Result: TypeError: iframedoc.createElement is not a function
// Expected: "<img>"
HtmlSanitizer.SanitizeHtml("<p>Hello world!</p><img name=body>")
// Result: ""
// Expected: "<p>Hello world!</p><img>"
It's not a big deal if either/both versions aren't provided, as we can make them ourselves. However, it would be appreciated if they were included as a simple gesture, and many developers would benefit from it.
I get the following error:
ERROR TypeError: Cannot read property 'replace' of undefined
at Object.HtmlSanitizer.SanitizeHtml (HtmlSanitizer.js:92)
The line number might be incorrect by one as I rermoved the log statement in the top.
var resultElement = makeSanitizedCopy(iframedoc.body);
document.body.removeChild(iframe);
return resultElement.innerHTML
.replace(/<br[^>]*>(\S)/g, "<br>\n$1")
.replace(/div><div/g, "div>\n<div"); //replace is just for cleaner code
resultElement is not null, but the .innerHTML
property doesn't exist here. Debugging in Chrome shows that resultElement is of type #document-fragment
, which doesn't have a innerHTML property.
It happened because the script went into the else statement of makeSanitzedCopy() for the first node:
} else {
newNode = document.createDocumentFragment();
}
I'm guessing that because I empties all the whitelists and only left in my custom tags. The html that goes into it starts with or somthine which is not white listed anymore.
When sanitizing HTML that contains many image tags, this code wants to load all the images (presumably due to loading them into the DOM). This makes it actually a good deal slower than a pure-JS solution like sanitize-html.
I don't know if there's actually anything you can do about this (maybe you can turn off image loading for the iframe in question?)
For people who would like to add svg
support, you need to add it in lower case, as well as path
, to the white list.
A better solution is to change makeSanitizedCopy
to get an upper case version of the tag name, and in that case we can use SVG
and PATH
:
function makeSanitizedCopy(node) {
let newNode, nodeTagName = (node.tagName||"").toUpperCase();
if (node.nodeType == Node.TEXT_NODE) {
newNode = node.cloneNode(true);
} else if (node.nodeType == Node.ELEMENT_NODE && (tagWhitelist_[nodeTagName] || contentTagWhiteList_[nodeTagName])) {
//remove useless empty spans (lots of those when pasting from MS Outlook)
if ((nodeTagName == "SPAN" || nodeTagName == "B" || nodeTagName == "I" || nodeTagName == "U")
&& node.innerHTML.trim() == "") {
return document.createDocumentFragment();
}
if (contentTagWhiteList_[nodeTagName])
newNode = iframedoc.createElement('DIV'); //convert to DIV
else
newNode = iframedoc.createElement(nodeTagName);
for (let i = 0; i < node.attributes.length; i++) {
let attr = node.attributes[i];
if (attributeWhitelist_[attr.name]) {
if (attr.name == "style") {
for (let s = 0; s < node.style.length; s++) {
let styleName = node.style[s];
if (cssWhitelist_[styleName])
newNode.style.setProperty(styleName, node.style.getPropertyValue(styleName));
}
}
else {
if (uriAttributes_[attr.name]) { //if this is a "uri" attribute, that can have "javascript:" or something
if (attr.value.indexOf(":") > -1 && !startsWithAny(attr.value, schemaWhiteList_))
continue;
}
newNode.setAttribute(attr.name, attr.value);
}
}
}
for (let i = 0; i < node.childNodes.length; i++) {
let subCopy = makeSanitizedCopy(node.childNodes[i]);
newNode.appendChild(subCopy, false);
}
} else {
newNode = document.createDocumentFragment();
}
return newNode;
}
Don't forget to also add the attributes (like xmlns
, viewbox
, d
, fill
, โฆ)
We were using HtmlSanitizer.js for XSS protection in the front end. But in a recent scan Veracode is flagging the sanitizer itself as an XSS flaw, specifically L44 where it's sending the input to the sandbox iframe.
HtmlSanitizer/HtmlSanitizer.js
Line 44 in eff4209
So I'm wondering, is setting iframe['sandbox'] = 'allow-same-origin';
sufficient to prevent the iframe itself becoming vulnerable so that I can tell Veracode "I know that, it's safe"?
Given that this code is heavily inspired by this SO answer, and that code from SO is licensed as CC BY-SA 3.0 and requires attribution, should the license for this library change? Probably to CC-BY-SA 3.0 as well?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.