GithubHelp home page GithubHelp logo

Leave text content? about dompurify HOT 16 CLOSED

codylindley avatar codylindley commented on June 25, 2024
Leave text content?

from dompurify.

Comments (16)

cure53 avatar cure53 commented on June 25, 2024

Currently, the entire element is being removed. Do you have an example at hands where your request would make sense? I'd have a look then.

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

@codylindley Do you have a specific example on where and how you would need this feature? If not I would be tempted to close this ticket - cannot see a use-case right now.

from dompurify.

codylindley avatar codylindley commented on June 25, 2024

I'm actually not aware of a XXS tool that does not permit the configuration of leaving the text node while removing the element node. For example, I might not want to allow text wrapped with a span. I would think since DOMPurify can remove a span I should have the option to leave the text node inside of it, while removing the span. Make sense?

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

@codylindley Kinda does ;) Now I am wondering: how would a config API look like? How would you expect to configure it from a developer's perceptive? if you give me a specific example, i can start thinking about an implementation - so far it's still a bit too vague.

from dompurify.

codylindley avatar codylindley commented on June 25, 2024

Have you seen js-xss?

Here is a look at how they filter out element nodes, but leave text nodes. And it's optional.
https://www.dropbox.com/s/haepbg2w8277dij/Screenshot%202014-03-26%2009.22.47.png

I would think this would be as simple as something like...

var clean = DOMPurify.sanitize(dirty, {
ALLOWED_TAGS: ['b', 'q'], 
ALLOWED_ATTR: ['style'],
KEEP_TEXT: true,
REMOVE_TEXT: ['script']
});

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

Ah, that's a good example, Thx! And I didn't know that library yet. Your API suggestion is neat, sufficiently clean although I am not a big fan of the REMOVE_TEXT blacklist.

I'll have a look.

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

@codylindley I had a look and I am tempted to reject this feature request.

So far I see only one way to do it safely, and that would involve iterating over the nodes twice instead of once. We'd first have to flag elements for deletion and then go over the tree again and remove them while having their harmless children and text nodes survive. And even then I believe we'd have to go from the deepest node upwards to the "thicker branches" which might end up in a performance killer. I am however open for alternative ways to solve it. If you have thoughts on that: appreciated!

However your request helped spotting a bug; DOMPurify was too intolerant about content starting with TextNode or consisting entirely of TextNode - that is being fixed, thx!

from dompurify.

UltCombo avatar UltCombo commented on June 25, 2024

@cure53 I've been presented with this problem in a SO question where the user wanted to remove anchor elements, but keeping their text content. You can see my answer there with a couple ways to implement that, though I'm not sure if those are applicable to how DOM Purify works.

About performance: this option could be disabled by default, so there should be nearly no performance penalty, right?

I was actually going to recommend DOM Purify for the task in the question, then I got quite stumped that DOM Purify does not have an option to keep text nodes from removed elements yet.

Also, HTML Purifier keeps text nodes when removing not allowed tag names. Github and Stack Overflow's markdown editors also keep text nodes when removing unsafe tags.

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

To get the described effect, the user can allow a but remove href. What remains are deactivated links, text preserved. If the user really needs to remove the entire link, they could use a regex to do so.

This feature would be easy to implement for a flat DOM - e.g. a link with no child elements. For a working implementation one would have to iterate recursively over all child elements and check their type. The given example on SO does not do that sufficiently. It also juggles with innerHTML and thereby opens a can of worms for mXSS attacks. Think for example <a href="">Foo<a><svg><style>*{font-family:'<iframe/>'}</style></svg><a>bar</a>.

I believe in a scenario where a feature like that would not be under attack and confronted with a simple and flat DOM it's not a problem. I our case however I think it will enable several attacks and is not feasible. Thus I need to reject it.

Just for the record, knowing it's not a security tool; the function you linked is vulnerable against XSS too:

var content = "<a href=\"1\">I</a> was going <img src=x onerror=alert(1)>here and then <a href=\"that\">that</a> happened.";

var container = document.createElement('div');
container.innerHTML = content;

var anchors = container.getElementsByTagName('a'),
    anchor;

while (anchor = anchors[0]) {
    var anchorParent = anchor.parentNode;

    while (anchor.firstChild) {
        anchorParent.insertBefore(anchor.firstChild, anchor);
    }
    anchorParent.removeChild(anchor);
}

var clean = container.innerHTML;
console.log(clean);

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

I created #18 for discussion. Maybe an API would help here rather than adding more features to the core library.

from dompurify.

UltCombo avatar UltCombo commented on June 25, 2024

@cure53 I'm aware that throwing arbitrary markup into innerHTML is vulnerable to XSS. If you check the comments on that answer, I mentioned running a XSS sanitizer in case of arbitrary markup before my sample code example.

As I said, that code is far from optimal for this library, it was just an example use case which would be much better covered by a tested library than some hack'ish algorithm/regex for those that need to do it securely. =]

from dompurify.

codylindley avatar codylindley commented on June 25, 2024

Can you clarify? Should the KEEP_CONTENT: true ... keep the text node?

Does not work for me if that is what it's suppose to do: http://jsfiddle.net/codylindley/9SxCb/

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

Haha yes, I can ;) Right now we classify text nodes as actual elements. Which might not be the most intuitive. This would work just fine (note the extra #text):

var foo = DOMPurify.sanitize('123<h2>test</h2><p>test</p>',
    {
        ALLOWED_TAGS: ['#text','strong','a','table','td','tr','tbody','thead','tfoot','th','li','ul','ol','br','p'], 
        ALLOWED_ATTR: ['href,title,style,rows,rowspan,width,height,rel,align,cite,cols,colspan,target'],
        KEEP_CONTENT: true
});
console.log(foo);

I'll deploy a fix later today, it's fine when people don't pass in their own tags but fairly counter-intuitive in your and probably many other scenarios. Nice spot!

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

@codylindley Done!

from dompurify.

codylindley avatar codylindley commented on June 25, 2024

Thanks. Yeah. After I sent the fiddle notice that in the source and sure enough #text was the key. Nice work all around here!

from dompurify.

cure53 avatar cure53 commented on June 25, 2024

Thx :)

from dompurify.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.