GithubHelp home page GithubHelp logo

xemlock / htmlpurifier-html5 Goto Github PK

View Code? Open in Web Editor NEW
100.0 9.0 12.0 334 KB

HTML5 support for HTMLPurifier

Home Page: https://packagist.org/packages/xemlock/htmlpurifier-html5

License: MIT License

PHP 100.00%
htmlpurifier html5-elements html5-definitions html-sanitizer html-purifier php

htmlpurifier-html5's People

Contributors

bytestream avatar codebymikey avatar fossabot avatar mbrodala avatar sherbrow avatar xemlock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

htmlpurifier-html5's Issues

Question: is it possible to have self-closing tags?

Setting HTML.XHTML to true doesn't affect tags, however, it is written in documentation that

[...] in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.

I tried to pass this option in Custom HTMLPurifier Class:

namespace App\HtmlPurifier;

final class CustomPurifier extends \HTMLPurifier
{
    public function __construct($var)
    {
        $config = \HTMLPurifier_HTML5Config::createDefault();
        $config->set('HTML.XHTML', true);

        parent::__construct($config);
    }
}

Service :

services:
    # HTMLPurifier
    App\HtmlPurifier\CustomPurifier:
        tags:
            - name: exercise.html_purifier
              profile: default
    exercise_html_purifier.default: '@App\HtmlPurifier\CustomPurifier'

In Controller, I use the right HTMLPurifier, it's ok. But purify() method do not convert html tags to self-closing tags.

        $html = "<img src='test.png'/><hr/><br/>";
        $purifier->purify($html); // "<img src="test.png" alt="test.png"><hr><br>"

I was expecting that it will return <img src="test.png" alt="test.png"/><hr/><br/>

Maybe I did not understand what HTML.XHTML is supposed to do

0.1.8 to 0.1.9 regression: closed </p>

the bump closes </p> tags before <a> and that's not valid change

Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<p>I successfully installed the <a href="https://github.com/thephpleague/commonmark-ext-autolink">https://github.com/thephpleague/commonmark-ext-autolink</a> extension!</p>'
+'<p>I successfully installed the </p><a href="https://github.com/thephpleague/commonmark-ext-autolink">https://github.com/thephpleague/commonmark-ext-autolink</a><p> extension!</p>'
/home/travis/build/eventum/eventum/tests/MarkdownTest.php:51

Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<!--https://github.com/cebe/markdown/issues/157#issuecomment-385439965--><p>here is a <a href="http://github.com">linkref</a>.<br>and <a href="http://google.com">inline</a></p>'
+'<!--https://github.com/cebe/markdown/issues/157#issuecomment-385439965--><p>here is a </p><a href="http://github.com">linkref</a><p>.<br>and </p><a href="http://google.com">inline</a>'

Allow link element

Similar to <script> it would be good to allow <link> if the href is whitelisted

I can send a PR if you'll accept it

HTML5 input types

Currently the set of allowed <input> types doesn't include HTML5 values. Also, it would be useful to be able to narrow the set of allowed input types (as requested in ezyang/htmlpurifier#213).

Unable to configure allowed elements with HTML.AllowedElements

Hi. HTMLPurifier can not be configured with list of allowed elements
Next code produce exception
User Warning: Element 'picture' is not supported (for information on implementing this, see the support forums)

$config = \HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.AllowedElements', ['img', 'picture']);
$config->set('HTML.AllowedAttributes', ['img.srcset', 'img.src', 'img.sizes']);

$htmlPurifier = new \HTMLPurifier($config);
$htmlPurifier->purify($value);

(Table) Captions with Heading content results in a stripped table

In HTML5, captions may contain any flow content excluding descendent table elements- https://html.spec.whatwg.org/multipage/tables.html#the-caption-element

Meaning:

<table>
  <caption><h3>Monthly savings</h3></caption>
  <tr>
    <th>Month</th>
    <th>Savings</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
  </tr>
</table>

is perfectly valid HTML, but it results in the entire table being stripped out instead:

<h3>Monthly savings</h3>
  Month
    Savings
  January
    $100
  

Related issue: ezyang/htmlpurifier#131

Config->set sometimes doesn't work

I have a small problem. I have this code, but only half of it works. AutoFormat.linkify does not work for some reason. But in code below everything works perfectly.

$config = HTMLPurifier_HTML5Config::create();
// Those works:
$config->set('Attr.EnableID', true);
$config->set('Attr.ID.HTML5', true);
$config->set('Attr.AllowedFrameTargets', array('_blank','_self','_target','_top'));
$config->set('HTML.TargetBlank', true);
// This does not work
$config->set('AutoFormat.Linkify', true);

But if I update it to this, then everything works.

$config = HTMLPurifier_HTML5Config::create([
    'AutoFormat.Linkify' => true
]);
$config->set('Attr.EnableID', true);
$config->set('Attr.ID.HTML5', true);
$config->set('Attr.AllowedFrameTargets', array('_blank','_self','_target','_top'));
$config->set('HTML.TargetBlank', true);

Table without tbody

In HTML5 it is permitted to omit the <tbody> from a <table> - https://html.spec.whatwg.org/multipage/tables.html#the-tbody-element

The result of processing the below HTML is an empty string.

<table><thead><tr><th>foo</th></tr></thead></table>

Example valid HTML5 (https://validator.w3.org/nu/#textarea):

<!DOCTYPE html>
<html lang="en">
<head><title>foo</title><meta http-equiv="content-type" content="text/html;charset=utf-8"></head>
<body><table><thead><tr><th>foo</th></tr></thead></table></body>
</html>

The definition is ignored in case of HTMLPurifier_Config::inherit() usage

Hi.

Thanks for the great library which saves developer lifetime 😄

I'm using this config to override the Purifier configuration in a Symfony project with the exercise/htmlpurifier-bundle

The bundle creates the parent configuration than uses HTMLPurifier_Config::inherit() method to create the child one. The method implementation is taken from the parent class, not HTML5Config as the following.

    /**
     * Creates a new config object that inherits from a previous one.
     * @param HTMLPurifier_Config $config Configuration object to inherit from.
     * @return HTMLPurifier_Config object with $config as its parent.
     */
    public static function inherit(HTMLPurifier_Config $config)
    {
        return new HTMLPurifier_Config($config->def, $config->plist);
    }

As a result all the child configurations are HTMLPurifier_Config instances instead of HTMLPurifier_HTML5Config which causes errors as they don't support HTML5 tags.

I'm using a workaround inheriting the base class like:

class HTMLPurifier_AltHTML5Config extends \HTMLPurifier_HTML5Config
{
    public static function inherit(HTMLPurifier_Config $config)
    {
        return new static($config->def, $config->plist);
    }
}

But the 'inherit()' method should be overridden as well, I suppose.

Thanks again.
Best wishes.

Question: How to add custom tags

Hi!

When i try this code:

 $config = HTMLPurifier_HTML5Config::create($initial);
 $definition = $config->getHTMLDefinition(true);
 $definition->addElement("oembed", "Inline", "Inline", "", []);

Throwed exception:

Message: Cannot retrieve raw definition after it has already been setup (try moving this code block earlier in your initialization)
From: .../vendor/ezyang/htmlpurifier/library/HTMLPurifier/Config.php
Line: 540

thx

ezyang comments

I'm looking at switching to this from ezyang/htmlpurifier due to growing need for HTML5 support.

Several years ago, lukusw tried to add HTML5 support to htmlpurifier for Drupal but I think the idea dropped priority and was never implemented. ezyang made some comments on lukusw's attempt which is probably what slowed the whole thing down: https://www.drupal.org/project/htmlpurifier/issues/1321490#comment-9509073

I've been comparing lukusw and your code based on ezyang comments:

With this in mind, I'm hoping you can answer the below questions:

All of the HTML5 content needs to be gated, so it is only available when a user specifies an HTML5 doctype. You could try to put all of the HTML5 definitions in a new HTMLModule.

✔️ looks good

section/nav/aside/article are not Block content but Sectioning content. Flow should be redefined to include Sectioning (similar to how HTMLPurifier/HTMLModule/Text.php does Flow)

❌ Doesn't look to have changed?

header and footer need to exclude header/footer/main descendants; see the 'excludes' attribute; also an example in Text.php (pre)

❌ Doesn't look to have changed?

Ditto with address, use the same technique

❌ Doesn't look to have changed?

hgroup got removed from the HTML5 spec, so doesn't belong here.

✔️ seems fine to keep it

The figure specification doesn't look right; I think you need an asterisk after the Flow. A plain spec 'Flow' is special-cased. I suspect your specifications also exclude plain text.

❔ not sure if you've done this?

figcaption is not Inline, give it false instead.

✔️ seems fine

I'm a little worried about video tag, but the definition you've given is probably OK. I'm not sure if it should be allowed by default. Definitely autoplay should not be allowed. The contents has the same problem as figure.

✔️ allows autoplay, but otherwise seems ok

We should already have the inline elements; are the existing definitions buggy?

✔️ not sure that this is relevant... Existing definitions are gated to XHTML 1.1, so would need gated definition for html5 spec (http://htmlpurifier.org/phorum/read.php?3,8291,8514#msg-8514)

For ins/del datetime, ideally we would apply the HTML5 parse a date or time string and validate it, see http://www.w3.org/TR/html5/infrastructure.html#parse-a-date-or-time-string

✔️ seems fine

iframe allowfullscreen isn't an HTML5 attribute. And it shouldn't be allowed by default anyway, should be gated by Tricky at least.

❌ Not gated by tricky?

Element 'fieldset' is not supported

Hi!

I've installed latest version of purifier and your extension via composer

"ezyang/htmlpurifier": "^4.11",
"xemlock/htmlpurifier-html5": "^0.1.11"

The following code:

$text = '<fieldset><legend>Some title</legend><div><p>Some content</p></div></fieldset>';
$config = \HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.Allowed', 'fieldset');
$purifier = new \HTMLPurifier($config);
$purifier->purify($text);

throws an error:
Element 'fieldset' is not supported (for information on implementing this, see the support forums)

I've try also another approach:

$text = '<fieldset><legend>Some title</legend><div><p>Some content</p></div></fieldset>';
$config = \HTMLPurifier_HTML5Config::create([
    'HTML.Allowed' => 'fieldset'
]);
$purifier = new \HTMLPurifier($config);
$purifier->purify($text);

but got the same error.

I thought this extension adds support of some HTML5 tags including fieldset for HTMLPurifier. Did i missed something?

Regards, Alex

Border-radius getting removed

Hi,

I am using a combination of wkhtmltopdf and htmlpurifier-html5 to generate pdf's.

The problem that I am facing at the moment is that the inline style border-radius gets removed after passing the purify() function.

Any thoughts on why it could be doing this?

Thanks.

Audio block not handled correctly when surrounded by <strong> tags

With the below code, the tag is stripped from the audio block and replaced with

<?php
require_once('vendor/autoload.php');

$html = '<p><strong><audio controls="controls"><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>';

echo "In: " . $html . PHP_EOL;

$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);


echo "out: " . $purifier->purify($html);

Expected output:

<p><strong><audio controls><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>

(Unless I'm reading it wrong, the spec says <strong> can contain "Phrasing content" which includes <audio>) http://w3c.github.io/html/single-page.html#phrasing-content-2

Actual output:

<p><strong></strong></p><audio controls><strong></strong></audio><strong></strong>

I think it's because the <strong> tag in the base library is set to allow contents of type "Inline", whereas <audio> is defined as a block in this library.

Will follow up with a PR if I find a fix today

tr@bgcolor removed

htmlpurifier has support for deprecated attributes and will convert them to their style equivalent

<table>
<tr bgcolor="#edeeef">
<td width="3"></td>
<td bgcolor="#f9fafa" width="1"></td>
<td bgcolor="#edeeef" width="1"></td>
<td bgcolor="#dbdee0" width="1"></td>
</tr></table>

When using this lib bgcolor seems to get nuked. I've tried added "HTML.TidyLevel" => "heavy", but it doesn't seem to do anything. http://htmlpurifier.org/docs/enduser-tidy.html makes reference to the doctype, so I'm wondering whether the HTML5 doctype has something to do with it not working?

Allow <fieldset> and <label> in untrusted mode

Currently <fieldset> and <label> elements belong to unsafe part of HTML5_Forms module. When stripped of form and for attributes they are harmless. I think that hiding them behind HTML.Trusted flag, just as other form elements (and scripts) are, is too drastic a measure.

All safe elements: <fieldset>, <label> and <progress> should be extracted to a separate module (HTML5_SafeForms?). The module should be guarded by config setting (%HTML.SafeForms), allowing it to be enabled in untrusted mode.

Also, users expect that <fieldset> to be enabled by default:

Class 'HTMLPurifier_AttrDef_HTML_Bool2' not found

I hope you are doing well. :)

I'm working with HTML Purifier 4.10.0 and HTML5 Plugin version 0.1.8. Maybe I set something up wrong, but I'm getting a Class 'HTMLPurifier_AttrDef_HTML_Bool2' not found error on line 15 of \library\HTMLPurifier\HTML5Definition.php. The error is thrown during the purifying process. My calling code looks like this:

    $htmlPurifierPath = 'resources/html-purifier/htmlpurifier-4.10.0/library/HTMLPurifier.auto.php';
    $html5PluginRoot = 'resources/html-purifier/htmlpurifier-html5-0.1.8/library/HTMLPurifier';
    $html5PluginConfig = "$html5PluginRoot/HTML5Config.php";
    $html5PluginDefinition = "$html5PluginRoot/HTML5Definition.php";
    
    if (!file_exists($htmlPurifierPath)) {
        throw new Exception("HTML Purifier not found.", 500);
    }
    
    if (file_exists($html5PluginConfig)) {
        $html5 = true;
    }

    require_once $htmlPurifierPath;
    if ($html5) {
        require_once $html5PluginConfig;
        require_once $html5PluginDefinition;
    }
    
    $pdo = connMySql();
    $config = "";
    
    if ($html5) {
        $config = HTMLPurifier_HTML5Config::createDefault();
    } else {
        $config = HTMLPurifier_Config::createDefault();
    }
    
    $purifier = new HTMLPurifier($config);
    
    $html = $purifier->purify($_POST['html']);

The code specified as the source of the error looks like the following:
// use fixed implementation of Boolean attributes, instead of a buggy
// one provided with 4.6.0
$def->manager->attrTypes->set('Bool', new HTMLPurifier_AttrDef_HTML_Bool2());

I noticed in the comment preceding line 15, you were redefining a buggy boolean implementation defined in HTML Purifier version 4.6, which released in 2013. What would happen if I commented this line out?

Thanks for your time. :)

Add module toggles

Currently toggling modules is not granular enough - there is only one switch (HTML.Trusted) which enables all unsafe modules. And there is no way of enabling Forms module without also enabling Scripting. You can do something like the following, but it's not convenient and seems like a dirty override:

$config = new HTMLPurifier_HTML5Config::create([
    'HTML.Trusted' => true,
    'HTML.ForbiddenElements' => ['script', 'noscript'],
]);

Related to ezyang/htmlpurifier#213.

Do not finalize config

Normal HTMLPurifier lets you edit the config like:

$config = HTMLPurifier_Config::createDefault();
$html_purifier_cache_dir = sys_get_temp_dir() . '/HTMLPurifier/DefinitionCache';
if (!is_dir($html_purifier_cache_dir)) {
     mkdir($html_purifier_cache_dir, 0770, TRUE);
}
$config->set('Cache.SerializerPath', $html_purifier_cache_dir);

Your change: https://github.com/xemlock/htmlpurifier-html5/blob/master/library/HTMLPurifier/HTML5Config.php#L32

Makes this throw an exception:

$config = HTMLPurifier_HTML5Config::createDefault();
$html_purifier_cache_dir = sys_get_temp_dir() . '/HTMLPurifier/DefinitionCache';
if (!is_dir($html_purifier_cache_dir)) {
     mkdir($html_purifier_cache_dir, 0770, TRUE);
}
$config->set('Cache.SerializerPath', $html_purifier_cache_dir);

Cannot set directive after finalization invoked

HTML.Forms not working?

Hello I want to pass HTML form through purify process, this should by possible in vanilla htmlpurifier since 4.13.0 by "HTML.Forms" but it doesn't seems to work in this html5 extend. Example:

<?php
require 'vendor/autoload.php';

$config = HTMLPurifier_HTML5Config::createDefault();

$config->set('HTML.Trusted', FALSE);
$config->set('HTML.Forms', TRUE);

$purifier = new HTMLPurifier($config);

$dirty_html5 = '<form mnethod="post" action="#"><input></form>';
$clean_html5 = $purifier->purify($dirty_html5);
var_dump(htmlspecialchars($clean_html5));
----------------------
string(0) ""

from composer.lock:

"name": "ezyang/htmlpurifier",
"version": "v4.13.0",

"name": "xemlock/htmlpurifier-html5",
"version": "v0.1.11",

In case of vanilla "ezyang/htmlpurifier:v4.13.0:

<?php

require 'vendor/autoload.php';

$config = HTMLPurifier_Config::createDefault();

$config->set('HTML.Trusted', FALSE);
$config->set('HTML.Forms', TRUE);

$purifier = new HTMLPurifier($config);

$dirty_html5 = '<form mnethod="post" action="#"><input></form>';
$clean_html5 = $purifier->purify($dirty_html5);
var_dump(htmlspecialchars($clean_html5));
--------------------------
string(61) "<form action="#"><input /></form>" 

Not sure what's wrong there?

Implement Datetime attr

Datetime attribute type should be used in <ins>, <del> and <time> elements instead of potentially XSS-prone Text type.

Iframes are removed

Iframe are removed by default ?

$htmlpurify_config = \HTMLPurifier_HTML5Config::createDefault();
$purifier = new \HTMLPurifier($htmlpurify_config);

content

<b>Inline <del>context No block allowed</del></b>
<video width="400" height="222" controls><source src="video.mp4" type="video/mp4"><source src="video.webm" type="video/webm"><source src="video.ogv" type="video/ogg">
  Ici l'alternative à la vidéo : un lien de téléchargement, un message, etc.
</video>


<iframe width='560' height='315' src='//www.youtube.com/embed/RGLI7QBUitE?autoplay=1' frameborder='0' allowfullscreen></iframe>

Question: Figure not working?

Installed via composer:

composer require ezyang/htmlpurifier
composer require xemlock/htmlpurifier-html5

I have this code:

Debug("START", 'comment_xss');
Debug($_POST['comment'], 'comment_xss');			
$config = \HTMLPurifier_HTML5Config::create([
	  'HTML.AllowedElements' => ['p', 'figure', 'img', 'picture'],
	  'HTML.AllowedAttributes' => ['img.srcset', 'img.src', 'img.sizes'],
]);
$purifier = new \HTMLPurifier($config);
$comment = $purifier->purify($_POST['comment']);						
Debug($comment, 'comment_xss');

Here are the logs(Debug):

2018-12-17 16:04:10 START;
2018-12-17 16:04:10 <figure><img src="https://bla_bla.jpg" data-image="5236469657"></figure><p>aa</p>; 
2018-12-17 16:04:10 <p>aa</p>

As you can see the entire <figure> is removed even if I've added it in the AllowedElements array.
What am I doing wrong? Can you please help?

Deprecation notices in PHP 8.1

PHP Deprecated:  trim(): Passing null to parameter #1 ($string) of type string is
deprecated in
htmlpurifier-html5/library/HTMLPurifier/AttrTransform/HTML5/Input.php on line 242
....
PHP Deprecated:  str_replace(): Passing null to parameter #2 ($replace) of type
array|string is deprecated in
htmlpurifier-html5/vendor/ezyang/htmlpurifier/library/HTMLPurifier/ElementDef.php
on line 179

https://github.com/xemlock/htmlpurifier-html5/runs/7854291682?check_suite_focus=true

Related issue: ezyang/htmlpurifier#311

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.