GithubHelp home page GithubHelp logo

xerc / html-sanitizer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from typo3/html-sanitizer

0.0 0.0 0.0 88 KB

HTML sanitizer, written in PHP, aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.

License: MIT License

PHP 100.00%

html-sanitizer's Introduction

TYPO3 HTML Sanitizer

ℹī¸ Common safe HTML tags & attributes as given in \TYPO3\HtmlSanitizer\Builder\CommonBuilder still might be adjusted, extended or rearranged to more specific builders.

In a Nutshell

This typo3/html-sanitizer package aims to be a standalone component that can be used by any PHP-based project or library. Albeit it is released within the TYPO3 namespace, it is agnostic to specifics of TYPO3 CMS.

  • \TYPO3\HtmlSanitizer\Behavior contains declarative settings for a particular process for sanitizing HTML.
  • \TYPO3\HtmlSanitizer\Visitor\VisitorInterface (multiple different visitors can exist at the same time) are actually doing the work based on the declared Behavior. Visitors can modify nodes or mark them for deletion.
  • \TYPO3\HtmlSanitizer\Sanitizer can be considered as the working instance, invoking visitors, parsing and serializing HTML. In general this instance does not contain much logic on how to handle particular nodes, attributes or values
  • \TYPO3\HtmlSanitizer\Builder\BuilderInterface can be used to create multiple different builder instances - in terms of "presets" - which combine declaring a particular Behavior, initialization of VisitorInterface instances, and finally returning a ready-to-use Sanitizer instance

Installation

composer req typo3/html-sanitizer

Example & API

<?php
use TYPO3\HtmlSanitizer\Behavior;
use TYPO3\HtmlSanitizer\Sanitizer;
use TYPO3\HtmlSanitizer\Visitor\CommonVisitor;

require_once 'vendor/autoload.php';

$commonAttrs = [
    new Behavior\Attr('id'),
    new Behavior\Attr('class'),
    new Behavior\Attr('data-', Behavior\Attr::NAME_PREFIX),
];
$hrefAttr = (new Behavior\Attr('href'))
    ->addValues(new Behavior\RegExpAttrValue('#^https?://#'));

// attention: only `Behavior` implementation uses immutability
// (invoking `withFlags()` or `withTags()` returns new instance)
$behavior = (new Behavior())
    ->withFlags(Behavior::ENCODE_INVALID_TAG)
    ->withTags(
        (new Behavior\Tag('div', Behavior\Tag::ALLOW_CHILDREN))
            ->addAttrs(...$commonAttrs),
        (new Behavior\Tag('a', Behavior\Tag::ALLOW_CHILDREN))
            ->addAttrs($hrefAttr, ...$commonAttrs),
        (new Behavior\Tag('br'))
    );

$visitors = [new CommonVisitor($behavior)];
$sanitizer = new Sanitizer(...$visitors);

$html = <<< EOH
<div id="main">
    <a href="https://typo3.org/" data-type="url" wrong-attr="is-removed">TYPO3</a><br>
    (the <span>SPAN, SPAN, SPAN</span> tag shall be encoded to HTML entities)
</div>
EOH;

echo $sanitizer->sanitize($html);

will result in the following sanitized output

<div id="main">
    <a href="https://typo3.org/" data-type="url">TYPO3</a><br>
    (the &lt;span&gt;SPAN, SPAN, SPAN&lt;/span&gt; tag shall be encoded to HTML entities)
</div>

Behavior flags

  • Behavior::ENCODE_INVALID_TAG keeps invalid tags, but "disarms" them (see <span> in example)
  • Behavior::ENCODE_INVALID_ATTR keeps invalid attributes, but "disarms" the whole(!) tag
  • Behavior::REMOVE_UNEXPECTED_CHILDREN removes children for Tag entities that were created without explicitly using Tag::ALLOW_CHILDREN, but actually contained child nodes
  • Behavior::ALLOW_CUSTOM_ELEMENTS allow using custom elements (having a hyphen -) - however, it is suggested to explicitly name all known and allowed tags and avoid using this flag

License

In general the TYPO3 core is released under the GNU General Public License version 2 or any later version (GPL-2.0-or-later). In order to avoid licensing issues and incompatibilities this package is licenced under the MIT License. In case you duplicate or modify source code, credits are not required but really appreciated.

Security Contact

In case of finding additional security issues in the TYPO3 project or in this package in particular, please get in touch with the TYPO3 Security Team.

html-sanitizer's People

Contributors

andreaskienast avatar crell avatar derhansen avatar kevin-appelt avatar lolli42 avatar ohader avatar schlotzz avatar xerc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.