Comments (16)
Currently, the entire element is being removed. Do you have an example at hands where your request would make sense? I'd have a look then.
from dompurify.
@codylindley Do you have a specific example on where and how you would need this feature? If not I would be tempted to close this ticket - cannot see a use-case right now.
from dompurify.
I'm actually not aware of a XXS tool that does not permit the configuration of leaving the text node while removing the element node. For example, I might not want to allow text wrapped with a span. I would think since DOMPurify can remove a span I should have the option to leave the text node inside of it, while removing the span. Make sense?
from dompurify.
@codylindley Kinda does ;) Now I am wondering: how would a config API look like? How would you expect to configure it from a developer's perceptive? if you give me a specific example, i can start thinking about an implementation - so far it's still a bit too vague.
from dompurify.
Have you seen js-xss?
Here is a look at how they filter out element nodes, but leave text nodes. And it's optional.
https://www.dropbox.com/s/haepbg2w8277dij/Screenshot%202014-03-26%2009.22.47.png
I would think this would be as simple as something like...
var clean = DOMPurify.sanitize(dirty, {
ALLOWED_TAGS: ['b', 'q'],
ALLOWED_ATTR: ['style'],
KEEP_TEXT: true,
REMOVE_TEXT: ['script']
});
from dompurify.
Ah, that's a good example, Thx! And I didn't know that library yet. Your API suggestion is neat, sufficiently clean although I am not a big fan of the REMOVE_TEXT
blacklist.
I'll have a look.
from dompurify.
@codylindley I had a look and I am tempted to reject this feature request.
So far I see only one way to do it safely, and that would involve iterating over the nodes twice instead of once. We'd first have to flag elements for deletion and then go over the tree again and remove them while having their harmless children and text nodes survive. And even then I believe we'd have to go from the deepest node upwards to the "thicker branches" which might end up in a performance killer. I am however open for alternative ways to solve it. If you have thoughts on that: appreciated!
However your request helped spotting a bug; DOMPurify was too intolerant about content starting with TextNode
or consisting entirely of TextNode
- that is being fixed, thx!
from dompurify.
@cure53 I've been presented with this problem in a SO question where the user wanted to remove anchor elements, but keeping their text content. You can see my answer there with a couple ways to implement that, though I'm not sure if those are applicable to how DOM Purify works.
About performance: this option could be disabled by default, so there should be nearly no performance penalty, right?
I was actually going to recommend DOM Purify for the task in the question, then I got quite stumped that DOM Purify does not have an option to keep text nodes from removed elements yet.
Also, HTML Purifier keeps text nodes when removing not allowed tag names. Github and Stack Overflow's markdown editors also keep text nodes when removing unsafe tags.
from dompurify.
To get the described effect, the user can allow a
but remove href
. What remains are deactivated links, text preserved. If the user really needs to remove the entire link, they could use a regex to do so.
This feature would be easy to implement for a flat DOM - e.g. a link with no child elements. For a working implementation one would have to iterate recursively over all child elements and check their type. The given example on SO does not do that sufficiently. It also juggles with innerHTML
and thereby opens a can of worms for mXSS attacks. Think for example <a href="">Foo<a><svg><style>*{font-family:'<iframe/>'}</style></svg><a>bar</a>
.
I believe in a scenario where a feature like that would not be under attack and confronted with a simple and flat DOM it's not a problem. I our case however I think it will enable several attacks and is not feasible. Thus I need to reject it.
Just for the record, knowing it's not a security tool; the function you linked is vulnerable against XSS too:
var content = "<a href=\"1\">I</a> was going <img src=x onerror=alert(1)>here and then <a href=\"that\">that</a> happened.";
var container = document.createElement('div');
container.innerHTML = content;
var anchors = container.getElementsByTagName('a'),
anchor;
while (anchor = anchors[0]) {
var anchorParent = anchor.parentNode;
while (anchor.firstChild) {
anchorParent.insertBefore(anchor.firstChild, anchor);
}
anchorParent.removeChild(anchor);
}
var clean = container.innerHTML;
console.log(clean);
from dompurify.
I created #18 for discussion. Maybe an API would help here rather than adding more features to the core library.
from dompurify.
@cure53 I'm aware that throwing arbitrary markup into innerHTML
is vulnerable to XSS. If you check the comments on that answer, I mentioned running a XSS sanitizer in case of arbitrary markup before my sample code example.
As I said, that code is far from optimal for this library, it was just an example use case which would be much better covered by a tested library than some hack'ish algorithm/regex for those that need to do it securely. =]
from dompurify.
Can you clarify? Should the KEEP_CONTENT: true ... keep the text node?
Does not work for me if that is what it's suppose to do: http://jsfiddle.net/codylindley/9SxCb/
from dompurify.
Haha yes, I can ;) Right now we classify text nodes as actual elements. Which might not be the most intuitive. This would work just fine (note the extra #text
):
var foo = DOMPurify.sanitize('123<h2>test</h2><p>test</p>',
{
ALLOWED_TAGS: ['#text','strong','a','table','td','tr','tbody','thead','tfoot','th','li','ul','ol','br','p'],
ALLOWED_ATTR: ['href,title,style,rows,rowspan,width,height,rel,align,cite,cols,colspan,target'],
KEEP_CONTENT: true
});
console.log(foo);
I'll deploy a fix later today, it's fine when people don't pass in their own tags but fairly counter-intuitive in your and probably many other scenarios. Nice spot!
from dompurify.
@codylindley Done!
from dompurify.
Thanks. Yeah. After I sent the fiddle notice that in the source and sure enough #text was the key. Nice work all around here!
from dompurify.
Thx :)
from dompurify.
Related Issues (20)
- DOMPurify and Trusted Types - Clarification to Docs HOT 9
- when using bypasssecurityTrustHtml mthod to render template HOT 3
- Exception when passing 0 or "" or null to Dompurify.Sanitize Method HOT 2
- Use lower case for bower package name HOT 1
- Uncertain how to handle 'non-standard' HTML HOT 3
- Need to block external calls, e.g. all HTTP requests HOT 7
- Why does name="name" on an input field get purified? HOT 1
- Exception when passing 0 or "" or null to Dompurify.Sanitize Method #947 HOT 3
- Latest versions of DOMPurify 2.5.x block custom SVG elements when they are set via ADD_TAGS config. HOT 6
- release 3.1.3 assets are the same as 3.1.2 HOT 1
- Number.isNaN is not supported in MSIE HOT 15
- Bower issues : DOMPurify is not defined HOT 5
- HTML and BODY tags are being regardless of `ALLOWED_TAGS` settings HOT 2
- MAX_NESTING_DEPTH remove contents issue HOT 5
- Escape unsafe characters instead of removing them HOT 3
- The MAX_NESTING_DEPTH remove contents issue has not been resolved. HOT 3
- A code comment containing a tag name structure leads to removal of the entire block HOT 2
- Issue secure [email protected] Apache-2.0 + Fair + MPL-2.0 HOT 1
- KEEP_CONTENT remove contents of all ALLOWED_TAGS HOT 2
- <img> xss vulnerability
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dompurify.