GithubHelp home page GithubHelp logo

nahsra / antisamy Goto Github PK

View Code? Open in Web Editor NEW
176.0 14.0 90.0 4.86 MB

a library for performing fast, configurable cleansing of HTML coming from untrusted sources

License: BSD 3-Clause "New" or "Revised" License

Java 19.29% JavaScript 7.17% HTML 29.51% CSS 6.05% DIGITAL Command Language 33.00% Shell 4.42% Roff 0.56% Hack 0.01% ASP.NET 0.01%
html javascript xss-filter java-library security-tools

antisamy's Introduction

AntiSamy

A library for performing fast, configurable cleansing of HTML coming from untrusted sources. Supports Java 8+.

Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc., that get persisted on the server. The term "malicious code" in regards to web applications usually mean "JavaScript." Mostly, Cascading Stylesheets are only considered malicious when they invoke JavaScript. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.

IMPORTANT! - API breaking changes in 1.7.0

Throughout the development of the 1.6.x series, we have identified and deprecated a number of features and APIs. All of these deprecated items have been removed in the 1.7.0 release. These changes were all tracked in ticket: #195. Each of the changes are described below:

CssHandler had 2 constructors which dropped the LinkedList<URI> embeddedStyleSheets parameter. Both constructors now create an empty internal LinkedList<URI> and the method getImportedStylesheetsURIList() can be used to get a reference to it, if needed. This feature is rarely used, and in fact direct invocation of these constructors is also rare, so this change is unlikely to affect most users of AntiSamy. When used, normally an empty list is passed in as this parameter value and that list is never used again.

  • The CssHandler(Policy, LinkedList<URI>, List<String>, ResourceBundle) signature was dropped

    • It was replaced with: CssHandler(Policy, List<String>, ResourceBundle)
  • The CssHandler(Policy, LinkedList<URI>, List<String>, String, ResourceBundle) signature was dropped

    • It was replaced with: CssHandler(Policy, List<String>, ResourceBundle, String). NOTE: The order of the last 2 parameters to this method was reversed.
  • Support for XHTML was dropped. AntiSamy now only supports HTML. As we believe this was a rarely used feature, we don't expect this to affect many AntiSamy users.

  • XML Schema validation is now required on AntiSamy policy files and cannot be disabled. You must make your policy file schema compliant in order to use it with AntiSamy.

  • The policy directive noopenerAndNoreferrerAnchors is now ON by default. If it is disabled, AntiSamy issues a nag, encouraging you to enable it.

Note: Since 1.7.4 some outputs may differ due to upgrading the HTML parser dependency, consider this if you were using previous versions and get different outputs.

Deprecating support for external stylesheets

The AntiSamy team has decided that supporting the ability to allow embedded remote CSS is dangerous and so we are deprecating this feature and it will be removed in a future release. It is expected that there are very few, if any, users of this feature.

We have added a log WARNing if this feature is invoked. If you are, please disable/remove this feature by switching to the primary CssScanner constructor that does not enable this feature.

How to Use

1. Import the dependency

First, add the dependency from Maven:

<dependency>
   <groupId>org.owasp.antisamy</groupId>
   <artifactId>antisamy</artifactId>
   <version>LATEST_VERSION</version>
</dependency>

2. Choosing a base policy file

Chances are that your site’s use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a "typical" scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let’s look into the different policy files:

  1. antisamy-slashdot.xml

Slashdot is a techie news site that allows users to respond anonymously to news posts with very limited HTML markup. Now, Slashdot is not only one of the coolest sites around, it’s also one that’s been subject to many different successful attacks. The rules for Slashdot are fairly strict: users can only submit the following HTML tags and no CSS: <b>, <u>, <i>, <a>, <blockquote>.

Accordingly, we’ve built a policy file that allows fairly similar functionality. All text-formatting tags that operate directly on the font, color, or emphasis have been allowed.

  1. antisamy-ebay.xml

eBay is the most popular online auction site in the universe, as far as we can tell. It is a public site so anyone is allowed to post listings with rich HTML content. It’s not surprising that given the attractiveness of eBay as a target that it has been subject to a few complex XSS attacks. Listings are allowed to contain much more rich content than, say, Slashdot -- so it’s attack surface is considerably larger.

  1. antisamy-myspace.xml

MySpace was, at the time this project was born, the most popular social networking site. Users were allowed to submit pretty much all the HTML and CSS they wanted -- as long as it didn’t contain JavaScript. MySpace was using a word blacklist to validate users’ HTML, which is why they were subject to the infamous Samy worm. The Samy worm, which used fragmentation attacks combined with a word that should have been blacklisted (eval) - was the inspiration for this project.

  1. antisamy-anythinggoes.xml

We don’t know of a possible use case for this policy file. If you wanted to allow every single valid HTML and CSS element (but without JavaScript or blatant CSS-related phishing attacks), you can use this policy file. Not even MySpace was this crazy. However, it does serve as a good reference because it contains base rules for every element, so you can use it as a knowledge base when using tailoring the other policy files.

Logging

AntiSamy now includes the slf4j-simple library for its logging, but AntiSamy users can import and use an alternate slf4j compatible logging library if they prefer. They can also then exclude slf4j-simple if they want to.

WARNING: AntiSamy's use of slf4j-simple, without any configuration file, logs messages in a buffered manner to standard output. As such, some or all of these log messages may get lost if an Exception, such as a PolicyException is thrown. This can likely be rectified by configuring slf4j-simple to log to standard error instead, or use an alternate slf4j logger that does so.

3. Tailoring the policy file

You may want to deploy AntiSamy in a default configuration, but it’s equally likely that a site may want to have strict, business-driven rules for what users can allow. The discussion that decides the tailoring should also consider attack surface - which grows in relative proportion to the policy file.

Example policies can be adapted and tested based on the requirements for each tag. The supported tag actions that can be specified are:

  • filter: remove tags, but keep content.
  • validate: keep content as long as it passes rules.
  • remove: remove tag and contents.
  • truncate: remove tag attributes and all child tags except por its text content if any.
  • encode: similar to filter but it encodes the tag for HTML to preserve it as raw text and its children are moved up one level in the hierarchy.

4. Calling the AntiSamy API

Using AntiSamy is easy. Here is an example of invoking AntiSamy with a policy file:

import org.owasp.validator.html.*;

Policy policy = Policy.getInstance(POLICY_FILE_LOCATION);

AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policy);

MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function

There are a few ways to create a Policy object. The getInstance() method can take any of the following:

  • a String filename
  • a File object
  • an InputStream
  • Policy files can also be referenced by filename by passing a second argument to the AntiSamy#scan() method as the following examples show:
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policyFilePath);

Finally, policy files can also be referenced by File objects directly in the second parameter:

AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, new File(policyFilePath));

5. Analyzing CleanResults

The CleanResults object provides a lot of useful stuff.

  • getCleanHTML() - the clean, safe HTML output
  • getCleanXMLDocumentFragment() - the clean, safe XMLDocumentFragment which is reflected in getCleanHTML()
  • getErrorMessages() - a list of String error messages -- if this returns 0 that does not mean there were no attacks!
  • getNumberOfErrors() - the number of error messages -- Again, 0 does not mean the input was safe!
  • getScanTime() - returns the scan time in seconds

Important Note: There has been much confusion about the getErrorMessages() method. The getErrorMessages() method (nor getNumberOfErrors()) does not subtly answer the question "is this safe input?" in the affirmative if it returns an empty list. You must always use the sanitized input and there is no way to be sure the input passed in had no attacks.

The serialization and deserialization process that is critical to the effectiveness of the sanitizer is purposefully lossy and will filter out attacks via a number of attack vectors. Unfortunately, one of the tradeoffs of this strategy is that AntiSamy doesn't always know in retrospect that an attack was seen. Thus, the getErrorMessages() and getNumberOfErrors() APIs are there to help users understand whether their well-intentioned input meets the requirements of the system, not help a developer detect if an attack was present.

Other Documentation

Additional documentation is available on this GitHub project's wiki page: https://github.com/nahsra/antisamy/wiki and the OWASP AntiSamy Project Page: https://owasp.org/www-project-antisamy/

Contributing to AntiSamy

Find an Issue?

If you have found a bug, then create an issue in the AntiSamy repo: https://github.com/nahsra/antisamy/issues

Find a Vulnerability?

If you have found a vulnerability in AntiSamy, first search the issues list (see above) to see if it has already been reported. If it has not, then please contact Dave Wichers (dave.wichers at owasp.org) directly. Please do not report vulnerabilities via GitHub issues as we wish to keep our users secure while a patch is implemented and deployed. If you wish to be acknowledged for finding the vulnerability, then please follow this process.

More detail is available in the file: SECURITY.md.

How to Build

You can build and test from source pretty easily:

$ git clone https://github.com/nahsra/antisamy
$ cd antisamy
$ mvn package

License

Released under the BSD-3-Clause license as specified here: LICENSE.

antisamy's People

Contributors

bantic avatar davewichers avatar davidbarbrowatwork avatar dependabot[bot] avatar faf0-addepar avatar hazendaz avatar jasonparallel avatar jonespm avatar kwwall avatar liuxing-r avatar nahsra avatar rbri avatar rwhitworth avatar spassarop avatar tw-mcummings avatar vivekchsm avatar xeno6696 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antisamy's Issues

ArrayIndexOutOfBoundsException

I get.

javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: -1
org.owasp.validator.html.ScanException: javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.owasp.validator.html.scan.AntiSamySAXScanner.scan(AntiSamySAXScanner.java:135) ~[antisamy-1.5.7.jar:1.5.7]
	at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:101) ~[antisamy-1.5.7.jar:1.5.7]

When

antiSamy.scan ( "my &test",  antisamypolicy, AntiSamy.SAX ).getCleanHTML (); //used the standard antisamy.xml

Antisamy removes "margin" attribute when it's value is configured very small decimal number

Steps to reproduce the problem-

  1. Create HTML content having tag with styling attribute <p style="margin: 0.0001pt;" /> .
  2. Use filterHTML API to filter the above HTML content
  3. In the response, the "margin" attribute is getting removed with warning log [1]

Expected output-
With regexp configuration [2], should not remove the margin with any decimal number

[1] AntiSamy warning: The p tag had a style attribute, "margin", that could not be allowed for security reasons.
[2] <regexp name="length" value="((-|\+)?0|(-|\+)?([0-9]+(\.[0-9]*)?)(em|ex|px|in|cm|mm|pt|pc))"/>

stripNonValidXMLCharacters doesn't work with HTML where html.length() > 1

It looks like somewhere around version 1.5 that the method org.owasp.validator.html.scan.AntiSamyDOMScanner#stripNonValidXMLCharacters was altered to check if the Pattern for invalidXmlCharacters java.util.regex.Matcher#matches() .
I presume that was in a bid for efficiency to cut down on a replaceAll method call if there was no need to affect the String input.
You use the same technique in a test to check if there are time improvements made.

I believe it should use java.util.regex.Matcher#find() instead.

matches() checks if the entire sequence matches the pattern. Since the pattern represents only a single character, in effect, that can be in one of the defined sets, then if the sequence (the HTML) is longer than 1 character it can never match. It's been this way since forever I believe, at least Java 5.

find() will find the next subsequence that matches the pattern, in effect checking quickly and succeeding fast if the HTML needs to be cleansed.

You're getting a speed increase because matches() is getting to the second char and declaring the sequence as a non-match regardless.

Input HTML:
<div>Hello\uD83D\uDC95</div>

Expected on org.owasp.validator.html.CleanResults#getCleanHTML :
<div>Hello</div>

Actual:
<div>Hello\uD83D\uDC95</div>

Where input HTML is single character \uD888 only then the output is the empty string as expected.

I looked through the test class here and can see no tests where you are expecting data to be cleansed. All the tests ensure that characters make it through ok or that something is faster (checking only 1 char is faster!)

Incidentally, I only noticed this since the Antisamy code looked to want to cleanse the characters needed for an emoji, where the character is actually valid in XML and HTML spec so far as I can tell, when their UTF-8 bytes are read by our system we get a Java representation in 16 bit char underpinning the String and the character points fall within your filter and, although I don't believe you should be stripping those if they come together and according to java.lang.Character#isSurrogatePair the two \uD83D \uDC95 together return true rather than false and the toCodePoint method tells us that it's &#128149 . So I think the checks in this method ought to be more complex.
Ironically, if the code in this method worked as intended then the characters would have been cleansed away. But they weren't.

I believe you could get manipulative code points through now, because of this. But I can't be certain as I'm looking purely from a data cleansing point of view.

onUnknownTag directive causes AntiSamy.scan to lose closing tag

See attached java test class that shows the problem.
Given a Policy that accepts no html, but has <directive onUnknownTag="encode"/>, calling

    AntiSamy.scan("<div>abc</div>", policy);

produces the string &lt;div&gt;abc (without the trailing &lt;/div&gt;)
Is this a bug?

package org.yourname;

import static org.hamcrest.CoreMatchers.equalTo;
import static org.junit.Assert.assertThat;

import java.io.InputStream;
import java.io.StringBufferInputStream;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.mockito.runners.MockitoJUnitRunner;
import org.owasp.validator.html.AntiSamy;
import org.owasp.validator.html.CleanResults;
import org.owasp.validator.html.Policy;
import org.owasp.validator.html.PolicyException;
import org.owasp.validator.html.ScanException;


@SuppressWarnings("deprecation")
@RunWith(MockitoJUnitRunner.class)
public class AntiSamyEncodingTest 
{
    @Test
    /**
     * Demonstrates that the onUnknownTag directive causes AntiSamy's scan to
     * lose the closing tag
     * 
     * Given an input like 
     * <pre>
     *   <div>hello, world</div>
     * </pre>
     * scan will return 
     * <pre>
     *   &ltdiv&gt;hello, world
     * </pre>
     * without the closing tag.
     */
    public void standaloneTest() throws PolicyException, ScanException 
    {
        String policyDefinition = 
            "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>" +
            "<anti-samy-rules xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
            "                       xsi:noNamespaceSchemaLocation=\"antisamy.xsd\">" +
            "    <directives>" +
            "        <directive name=\"onUnknownTag\" value=\"encode\" />" +
            "     </directives>" +

            "    <common-regexps></common-regexps>" +

            "    <common-attributes></common-attributes>" +
            "    <!-- no tags are valid, by default all html elements are encoded -->" +
            "    <tag-rules></tag-rules>" +
            "</anti-samy-rules>";
                
        InputStream sr = new StringBufferInputStream(policyDefinition);
        AntiSamy as = new AntiSamy();
        Policy policy = Policy.getInstance(sr);
        String taintedHtml = "<div>hello, world</div>";
        CleanResults cr = as.scan(taintedHtml, policy, AntiSamy.SAX);
        String cleaned = cr.getCleanHTML();
        
        // the value is "&lt;div&gt;hello, world", missing the closing element
        assertThat(cleaned, equalTo("&lt;div&gt;hello, world&lt/div&gt;")); //fails
    }
}

If and only if you agree this is not correct, would be happy to open a PR

License

Please add a license file to your project

AntiSamy 1.6.4 doesn't play nicely with xalan-j 2.7.2

This code was added in org.owasp.validator.html.scan.AntiSamySAXScanner:

sTransformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
sTransformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");

Xerces2-j does not support these attribute constants. The Oracle JAXP documentation (https://docs.oracle.com/javase/tutorial/jaxp/properties/usingProps.html) says that it is recommended to catch the IllegalArgumentException for unsupported features.

I know that Xerces is ancient, but there isn't any need to break compatibility in this way.

UPDATE: This issue was originally created with Xerces-2 in the title as the offending library. So the thread below talks about Xerces alot. But the actual problem is with a variant of Xalan in the classpath and Xerces is a red herring. Way below, Xalan is finally mentioned as the real problem, not Xerces.

Embed style sheets after opening `embedStyleSheets` should not be deleted all.

Set'embedStyleSheets' in the configuration:

<directive name="embedStyleSheets" value="true"/>

Input:

<!DOCTYPE html>
<html>
	<head>
		<style type='text/css'>
			@import url(https://unpkg.com/element-ui/lib/theme-chalk/index.css)
			h1 {font: 15pt "Arial"; color: blue;}
			p {font: 10pt "Arial"; color: black;}
		</style>
	</head>
	<body>
		<div>
			<h1>Title</h1>
			<p>content</p>
		</div>
	</body>
</html>

Out:

<html>
  <head>
    <style type="text/css"><![CDATA[/* */]]></style></head>
  <body>
    <div>
      <h1>Title</h1>
      <p>content</p></div></body></html>

Result: All embedded styles are deleted。

protected void parseImportedStylesheets(LinkedList<?> stylesheets, CssHandler handler,
ArrayList<String> errorMessages, int sizeLimit) throws ScanException {

I found that the parseImportedStylesheets method signature cannot override the parseImportedStylesheets method of the parent class and will never be called. This seems to be a bug.

Hi, @davewichers @spassarop Is this check deprecated?

Filter Bypass

Antisamy fails to filter (identify) 'HTML / HTML5 elements with events (onerror, onload, etc) when the tags are not closed with ">" character.
Modern browsers (tested with Firefox and Chrome) will autocomplete such tags and hence will execute the JavaScript leading to Cross-site Scripting - XSS

Example Payloads for better understanding -

  1. <img src=# onerror=alert(0)//K7-onerror_attribute
  2. <img src= onload=alert(0)//K7-onload_attribute
  3. <input type="image" src=# onerror=alert(0)//K7-works_for_other_html_n_html5_tags
  4. <object data=# onerror=alert(0)//K7-FireFox_specific
  5. <object data=# onload=alert(0)//K7-Chrome_specific
  6. <script onerror=alert(0) onload=alert(1) src=http://xss.rocks/xss.js#K7-script_tag_works_under_specific_conditions
  7. <svg onload=alert(0)//K7-SVG_special_char_variant

I have tested using both - SAX and DOM, found the payloads to execute.

Karan Ramani

Test build failed in Java 7

Environment

windows jdk-7 maven 3.8.1 antisamy 1.6.4

Steps to reproduce

mvn test / mvn package

What is expected?

Success

What is actually happening?

......
[ERROR] Tests run: 14, Failures: 0, Error
s: 1, Skipped: 0, Time elapsed: 0.11 s <<< FAILURE! - in o
rg.owasp.validator.html.test.PolicyTest
[ERROR] org.owasp.validator.html.test.PolicyTest.testGithubIssue79  Ti
me elapsed: 0 s  <<< ERROR!
java.lang.UnsupportedClassVersionError: org/owasp/antisamy/test/Dummy : Unsuppor
ted major.minor version 52.0
        at org.owasp.validator.html.test.PolicyTest.testGithubIssue79(PolicyTest
.java:341)

[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   PolicyTest.testGithubIssue79:341 ? UnsupportedClassVe
rsion org/owasp/antisamy/...
[INFO]
[ERROR] Tests run: 93, Failures: 0, Errors: 1, Skipped: 0
[INFO]
[INFO] -----------------------------------------------------------
-------------
[INFO] BUILD FAILURE
[INFO] -----------------------------------------------------------
......

and

......
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.0.0-M3/maven-enforcer-plugin-3.0.0-M3.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.939 s
[INFO] Finished at: 2021-08-04T10:35:20+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M3 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-enforcer-
plugin:jar:3.0.0-M3: Could not transfer artifact org.apache.maven.plugins:maven-enforcer-plugin:pom:3.0.0-M3 from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache
.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.0.0-M3/maven-enforcer-plugin-3.0.0-M3.pom: Received fatal alert: protocol_version -> [Help 1]
......

Reason

1, The use case for TestGithubIssue79 requires jdk8.
2, Java7 defaults to TLSv1.0, but maven-enforcer-plugin-3.0.0-M3.pom has instructions to set it to TLSv1.2.
All of this means that the new version of Antisamy is no longer supporting Java 7, at least Java8, but I note that the readme document states that Antisamy 1.6.4 supports Java 7+.
As far as I know, JDK now has only two TLS versions, Java 8 and Java 11, and the other versions are no longer maintained.
Why does Antisamy still need to support Java 7?

Best Regards

Antisamy adds nested table tags

Input
image

Output after antisamy scan
image

After scanning, additional nested table close tags </tbody></table></td></tr> are added a line before img tag and so output becomes distorted. using antisamy.xml and not sure why nested table related tags are getting added after antisamy scan.

Antisamy Stripping nested lists and tables

If I input the following html:

<ul>
    <li> one </li>
    <li> two</li>
    <li> three 
          <ul>
             <li>a</li>
             <li>b</li>
          </ul>
    </li>
</ul>

The following output occurs:

<ul>
    <li> one </li>
    <li> two</li>
    <li> three 
          <ul>
          </ul>
    </li>
    <li>a</li>
    <li>b</li>
</ul>

Basically it moves the nested list content to the parent list. This seems to be a bug since I can't find any configuration to fix this.

The lang subtags is cleaned

<p lang="en-GB">This paragraph is defined as British English.</p>
Output:
<p>This paragraph is defined as British English.</p>

Hi, @davewichers Is there a security problem here? Why not support IANA subtags?

Multiple definition of tags on every policy file

On every policy XML file, I've found that the tag "col" is defined twice.

The first appearance is:

<tag name="col" action="validate"/>

And the second is:

<tag name="col" action="validate">
    <attribute name="align" />
    ...
    <attribute name="width" />
</tag>

Same thing happens with the property "clip", but this one is and exact copy.

This is not a real issue in Java, as the tests would fail when parsing the policy if it was. However, when implementing this on .NET for example, it may fail if you consider it appears once.

My proposal is to first verify if it is there on purpose. If it isn't, remove it. In my opinion, it makes no sense to have multiple definitions for anything here, it is just confusing for anyone who doesn't know about it as in Java this doesn't fail.

If it is OK to remove the duplicated tags, I can remove them and make the pull request (of course, tests don't fail when removing the tags from the default policy).

NullPointerException for input string "|<?ai aaa"

The code in question gets invoked through ESAPI library so I'm not sure if it has a bearing. But on examination of the method removePI of class AntiSamyDOMScanner, it looks like the node doesn't have any parent node. And so node.getParentNode() creates a null pointer exception. Attaching the stack trace here.

java.lang.NullPointerException
at org.owasp.validator.html.scan.AntiSamyDOMScanner.removePI(AntiSamyDOMScanner.java:689)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(AntiSamyDOMScanner.java:260)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.processChildren(AntiSamyDOMScanner.java:675)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.processChildren(AntiSamyDOMScanner.java:666)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner.java:159)
at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:93)

1.5后<>不可使用

The front desk carries a parameter through the url, the value of the parameter is <>, through Policy policy = Policy.getInstance("/antisamy-slashdot-1.4.4.xml");
final CleanResults cr = antiSamy.scan(value, policy);
String str = cr.getCleanHTML();
The str obtained is escaped <>, there is no such problem before 1.5, why not after 1.5

AntiSamy rules not getting applied to attributes if an attribute does not have a value

In a web project we use ESAPI validator to sanitize inputs. While most of the improper inputs are detected as expected, it fails to detect the below given input as improper. I am using esapi-2.1.0.1 and antisamy-1.5.3 jars.

<img src=x onerror=alert(1) alt=

Potentially, the browser closes the tag itself, hence triggering the alert function. Surprisingly, ESAPI detects the below given input as improper:

<img src=x onerror=alert(1) alt="text"

Below are some test cases and analysis done :
Regex used for alt attribute : [a-zA-Z0-9:-_.]+ (It needs minimum one character)
Regex used for onerror attribute : [0-9\s*,]* (It allows numbers and whitespace characters)

  1. Input: <img src=x alt="" || Observation: Tag is filtered || Status: Passed
  2. Input: <img src=x alt= || Observation: Tag not filtered || Status: Failed
  3. Input: <img src=x onerror=alert(1) alt="" || Observation: Tag is filtered (due to regex condition for onerror) || Status: Passed
  4. Input: <img src=x onerror=alert(1) alt= || Observation: Tag not filtered || Status: Failed

Observation:
Value of alt attribute is modifying the behavior of attribute validation (for itself in cases 1 & 2 and other tags in cases 3 & 4).

Questions:

  1. For case 4. isn't the regex for onerror still supposed to validate the onerror attribute? Has it got something to do with alt attribute value?
  2. What are the probable reasons where the value of alt attribute could alter the behavior of other attributes?

Can anyone please guide me with any suggestions or comments to mitigate the issue if I am going wrong somewhere ?

Thanks.

IOException on Policy creation from InputStream when schema validation is disabled

Antisamy version: 1.6.1

Problem

Unable to generate a Policy instance from Policy.newInstance(InputStream) when antisamy schema validation is disabled and the configuration file contains an invalid structure.

antisamy_tests.tar.gz

Results :

Tests in error: 
  testSystemProp(antisamy_tests.InvalidPolicyTest): java.io.IOException: Stream closed
  testDirectConfig(antisamy_tests.InvalidPolicyTest): java.io.IOException: Stream closed

Tests run: 2, Failures: 0, Errors: 2, Skipped: 0

mvnCleanTest.txt
mvnCleanTest_verbose.txt

Initial Assessment

Pertinent Stacktrace from logs:

org.owasp.validator.html.PolicyException: java.io.IOException: Stream closed
	at org.owasp.validator.html.Policy.getTopLevelElement(Policy.java:379)
	at org.owasp.validator.html.Policy.getTopLevelElement(Policy.java:355)
	at org.owasp.validator.html.Policy.getInstance(Policy.java:235)

Snippet from Policy.java

protected static Element getTopLevelElement(InputSource source, Callable<InputSource> getResetSource) throws PolicyException {
        // Track whether an exception was ever thrown while processing policy file
        Exception thrownException = null;
        try {
>> First Stream use ::             return getDocumentElementFromSource(source, true);
        } catch (SAXException e) {
            thrownException = e;
            if (!validateSchema) {
                try {
                    source = getResetSource.call();
    >> Second Stream use ::                   Element theElement = getDocumentElementFromSource(source, false);
                    // We warn when the policy has an invalid schema, but schema validation is disabled.
                    logger.warn("Invalid policy file: " + e.getMessage());
                    return theElement;
                } catch (Exception e2) {
                    throw new PolicyException(e2);
                }
            } else throw new PolicyException(e);
        } catch (ParserConfigurationException | IOException e) {
            thrownException = e;
    >> EXCEPTION ::           throw new PolicyException(e);
        } finally {
            if (!validateSchema && (thrownException == null)) {
                // We warn when the policy has a valid schema, but schema validation is disabled.
                logger.warn("XML schema validation is disabled for a valid policy. Please reenable policy validation.");
            }
        }
    }

I think that the first stream use completes the read process (I've followed that through all the way to the return clause from getDocumentElementFromSource), and then closes the stream. Then it fails to parse the response in the return and falls into the SAXException Block. The property set lets us try again, but the source object was never reset from the last event, and so we cannot read from the closed stream. I can't see the actual stream in my debugger so I can't be 100% certain, but this workflow would appear to match the current output from the tests.

!important CSS rule is removed

We started to use AntiSamy for CSS validation in our WEB project and realized that it removes !important CSS rules from the styles.

Eg. <p style=\"color: red !important\">Some Text</p> resolves to <p style=\"color: red\">Some Text</p>

The following test added to AntiSamyTest fails.

    @Test
    public void givenImportantRuleWhenScanThenPreserved() throws ScanException, PolicyException {
        String s = as.scan("<p style=\"color: red !important\">Some Text</p>", policy, AntiSamy.DOM).getCleanHTML();
        assertTrue(s.contains("!important"));

        s = as.scan("<p style=\"color: red !important\">Some Text</p>", policy, AntiSamy.SAX).getCleanHTML();
        assertTrue(s.contains("!important"));
    }

I see it from the method parameters of org.owasp.validator.css.CssHandler#property that we are aware of the fact if a property is important or not but it looks like the code ignores this information as the argument is not used anywhere.

...
public void property(String name, LexicalUnit value, boolean important)
			throws CSSException {
		// only bother validating and building if we are either inline or within
		// a selector tag

		if (!selectorOpen && !isInline) {
...

Is there a way to get it working or am I missing something? Let me know if you need further information!

Thank you in advance!

The 'data - *' dynamic attribute of AntiSamy.DOM is not supported.

Using AntiSamy.SAX

String input = "<div data-title=\"Pocahontas\" >Just Around the Riverbend</div>";
CleanResults results = new AntiSamy().scan(input, Policy.getInstance(), AntiSamy.SAX);
System.out.println("result: " + results.getCleanHTML());

// result: <div data-title="Pocahontas">Just Around the Riverbend</div>

Using AntiSamy.DOM

result: <div>Just Around the Riverbend</div>

Expect

result: <div data-title="Pocahontas">Just Around the Riverbend</div>

AntiSamy.DOM is the default mode, and we should support dynamic attribute.

Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:jar (attach-javadocs)

Hi,

I just tried to build antisamy on a windows box. I called mvn package and got the following messages.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:jar (attach-javadocs) on project antisamy: MavenReportException: Error while generating Javadoc:
[ERROR] Exit code: 1 - D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:123: warning: no @param for errorMessages
[ERROR]         public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR]                ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:123: warning: no @param for messages
[ERROR]         public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR]                ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:139: warning: no @param for errorMessages
[ERROR]         public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR]                ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:139: warning: no @param for messages
[ERROR]         public CssHandler(Policy policy, LinkedList embeddedStyleSheets,

There is no schema validation for policy XML

AntiSamy seems to lack of a schema validation when loading the XML of a policy.

This may lead to malformed policies that are valid (AntiSamy won't blow up) but do not comply with the XSD. Bugs can originate from bad policy definition, which could be prevented with XML schema validation.

Even if applying validation to current example policies (and some customized in tests), they fail to validate.

This is a screenshot to the validation on freeformatter for antisamy-tinymce.xml:

Screenshot 2020-12-12 094537

I would suggest applying strict schema validation with the already defined XSD. As an improvement, if requested or considered useful, multiple or "stacked" validation could be applied, seen as an intersection of schemas to restrict policies structure even more.

New URL validation breaks loading from jar files

A change late last year to prevent loading of remote URLs was achieved by checking that the URL only uses the file: scheme; this breaks a very common use case, which is bundling the policy file inside a java archive of some sort (jar/war/ear) . AFAICT this is no more of a security risk than loading a file from a file: URL, and disabling this ability significantly increases the complexity of deployment in some contexts

remove httpclient-3.1

Upstream and users want this library out due to known CVE. It's not a realistic threat for AntiSamy use for a couple reasons, but it's harmless to pull it out and replace with the latest.

AntiSamy is not working for special case

Antisamy is not working for the test case , i tried in latest version also.
When there is "/" character inside tag it fails.

My Test Case:
@test
public void testXSSScript() throws PolicyException, ScanException {
String result = scanner.scan("<style/onload=alert(document.domain)>");
assertEquals("", result);
}

====Logic which called by test case===
Please consider policy is loading and i attached antisamy.xml , For some reason it is not giving any error for <style/onload=alert(document.domain)> when "Collection errors = r.getErrorMessages();" executes

public String scan(String untrustedUserInput) throws PolicyException, ScanException {
CleanResults r = webSecurityScanner.scan(untrustedUserInput, AntiSamy.SAX);
if(logger.isDebugEnabled()) {
logger.debug("Scanned request parameter in " + r.getScanTime() + "ms");
logger.debug("Value: " + untrustedUserInput);
logger.debug("Result: " + r.getCleanHTML());
logger.debug("Errors: " + r.getErrorMessages());
}

    Collection<String> errors = r.getErrorMessages();
    if(CollectionUtils.exists(errors, securityErrorPredicate)) {
        logger.info("Returning cleansed input due to " + errors.size() + " security errors: " + errors);
        logger.debug("Original: [" + untrustedUserInput + "]");

        final String cleansedHTML = fixMangledTags(r.getCleanHTML());
        logger.debug("Cleansed: [" + cleansedHTML + "]");
        return cleansedHTML;
    }
    return untrustedUserInput;
}

antisamy.zip

Filter Bypasses

The markup below shows multiple ways to bypass the AntiSamy filter.

  • The list under the NOT Sanitized by AntiSamy shows a couple Javascript payloads that browsers execute (tested with Chrome 69), but won't be removed by AntiSamy.
  • The list under the Sanitized by AntiSamy heading is sanitized correctly and just left here as a recommendation for tests.
  • The list under the Tricky Encoding with Ampersand Encoding was created by taking one of the payloads under NOT Sanitized by AntiSamy and encoding all encountered ampersands, using different ways to encode an ampersand. One could repeat this process for any payload under NOT Sanitized by AntiSamy.
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <h1>Tricky Encoding</h1>
    <h2>NOT Sanitized by AntiSamy</h2>
    <ol>
      <li><a href="javascript&#00058x=alert,x%281%29">X&#00058;x</a></li>
      <li><a href="javascript&#00058y=alert,y%281%29">X&#00058;y</a></li>

      <li><a href="javascript&#58x=alert,x%281%29">X&#58;x</a></li>
      <li><a href="javascript&#58y=alert,y%281%29">X&#58;y</a></li>

      <li><a href="javascript&#x0003Ax=alert,x%281%29">X&#x0003A;x</a></li>
      <li><a href="javascript&#x0003Ay=alert,y%281%29">X&#x0003A;y</a></li>

      <li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
      <li><a href="javascript&#x3Ay=alert,y%281%29">X&#x3A;y</a></li>
    </ol>
    <h2>Sanitized by AntiSamy</h2>
    <ol>
      <li><a href="javascript&#00058;alert&lpar;1&rpar;">X&#00058;</a></li>
      <li><a href="javascript&#58;alert&lpar;1&rpar;">X&#58;</a></li>

      <li><a href="javascript&#x0003A;alert&lpar;1&rpar;">X&#x0003A;</a></li>
      <li><a href="javascript&#x3A;alert&lpar;1&rpar;">X&#x3A;</a></li>

      <li><a href="javascript&colon;alert&lpar;1&rpar;">X&colon;</a></li>
    </ol>

    <h1>Tricky Encoding with Ampersand Encoding</h1>
    <p>AntiSamy turns harmless payload into XSS by just decoding the encoded ampersands in the href attribute</a>
    <ol>
      <li><a href="javascript&amp;#x3Ax=alert,x%281%29">X&amp;#x3A;x</a></li>
      <li><a href="javascript&AMP;#x3Ax=alert,x%281%29">X&AMP;#x3A;x</a></li>

      <li><a href="javascript&#38;#x3Ax=alert,x%281%29">X&#38;#x3A;x</a></li>
      <li><a href="javascript&#00038;#x3Ax=alert,x%281%29">X&#00038;#x3A;x</a></li>

      <li><a href="javascript&#x26;#x3Ax=alert,x%281%29">X&#x26;#x3A;x</a></li>
      <li><a href="javascript&#x00026;#x3Ax=alert,x%281%29">X&#x00026;#x3A;x</a></li>
    </ol>
    <p><a href="javascript&#x3Ax=alert,x%281%29">Original without ampersand encoding</a></p>
  </body>
</html>

AntiSamy should not be dependent on Xerces

Xerces does not support the attributes on a Transformer that are required in order to mitigate XXE vulnerabilities. While this may not be a huge issue for AntiSamy itself, the fact that we have to include xercesImpl.jar in our application classpath means that xerces is used ahead of the JDK, and therefore our XXE mitigations are useless.

XXE mitigation reference: https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#TransformerFactory

Besides all of that: Xerces hasn't been maintained since 2010!

As has been mentioned by others: AntiSamy should be using XLST to serialize instead of the deprecated HTMLSerializer. As far as I can tell, this is the only direct xerces dependency.

Thanks.

Performance got degraded after upgrade from 1.5.7 version to 1.5.8 for the SAXScanner.

After upgrading antisamy jar version from 1.5.7 to 1.5.8 performance got down by 40-50%.
After comparing the 1.5.8 version of code with the 1.5.7. I found that in case of SAXScanner, in the class AntiSamySAXScanner after doing all the scanning process the cached item is not being added to the cachedItems Queue, because of which for every scan call object of CachedItem class is being created.
And this create operation for every scan is lowering down the performance.

Support HTML5

AntiSamy uses a deprecated HTMLSerializer which does not understand newer HTML5 tags like <figure>. While this is a minor issue, it also does not understand newer HTML5 entities like &colon; or &lpar;. This leads to a security vulnerability where the following text does not get cleaned:

<a href="javascript&colon;alert&lpar;1&rpar;">X</a>

incomplete tag removing rest of content

i have a problem, when there is a "<" symbol in content without ">" it will remove rest of content right side to it.
added directive: <directive name="onUnknownTag" value="encode"/>
input: "hello <hi world, it is clean"
output : "hello"
is there any way that i get the output same as input in this case.:
expected output: "hello <hi world, it is clean"

antisamy ignoring anything rule?

i have a rule for an attribute in a tag that is like this:
<tag name="tag1" action="validate" > <attribute name="attribute1"> <regexp-list> <regexp name="anything" /> </regexp-list> </attribute> <attribute name="attribute2" /> </tag>

when i use the SAX parser, attribute1 is dropped because it has an LSEP character in it. it also may be dropping the filename for other characters. either way, shouldn't attribute1 not be dropped since the regexp has been assigned as anything?

the specific error message i'm getting is "The tag1 tag contained an attribute that we could not process. The attribute1 attribute had a value of "Gartner - Executive Guide to Total Experience
.pdf". This value could not be accepted for security reasons. We have chosen to remove this attribute from the tag and leave everything else in place so that we could process the input."

Injecting js in "a" tag

Still able to inject <a onmouseover=alert(1)>click</a>, <p onmouseover=alert(1)>click</p> element. Even though I have updated the antisamy policy to remove "a" tag completely

Add rel="noopener" to anchor if target="_blank" is set => security enhancement

Add rel="noopener" to anker if target="_blank" is set
Based on the OWASP article
https://owasp.org/www-community/attacks/Reverse_Tabnabbing
it would be nice if the noopener attribute would be set automatically if the target blank attribute is in use.

This is very similar to the nofollow setting in antisamy

Example
<a href="https://example.com" target="_blank"> => <a href="https://example.com" target="_blank" rel="noopener">

when preserveComments directive is enabled, the HTML comments are moved to the end

Added:

into the config

I used this input:

<p>this is a test content before start testing</p>
<!-- TESTING COMMENT --><p>another line</p>
<p>end of the content</p>

then after

Policy policy = Policy.getInstance(App.class.getResourceAsStream("/antisamyConfig.xml"));
AntiSamy sanitizer = new AntiSamy(policy); 
CleanResults scanned = sanitizer.scan(input);
String sanitized = scanned.getCleanHTML(); 

The output was:

<p>this is a test content before start testing</p>
<p>another line</p>
<p>end of the content</p>
<!-- TESTING COMMENT -->

Antisamy 1.6 introduces log4j dependency

When updating one of my modules to Antisamy 1.6 I got a test failure due to org.owasp.validator.html.Policy now having a hard dependency on log4j, whereas the pom.xml declares that it's using slf4j.

https://github.com/nahsra/antisamy/blob/master/src/main/java/org/owasp/validator/html/Policy.java#L83

It appears this was introduced in the commit:

64416f1#diff-ea20191cc92e7360f2cc25757c0fd872902416e123220b6649b0dc7a0af5663b

It seems this is the ONLY logging in the project, the logging depdency doesn't appear to be mentioned in the release ntoes on https://github.com/nahsra/antisamy/releases/tag/v1.6.0 either.

Since slf4j is mentioned - should this not be using slf4js API only, with the slf4j-over-log4j be used in tests, or client applications ONLY.

... i think i found a way to bypass ....

if you are trying

&#x22;&#x3E;&#x3C;&#x69;&#x6D;&#x67;&#x20;&#x73;&#x72;&#x63;&#x3D;&#x61;&#x20;&#x6F;&#x6E;&#x65;&#x72;&#x72;&#x6F;&#x72;&#x3D;&#x61;&#x6C;&#x65;&#x72;&#x74;&#x28;&#x31;&#x29;&#x3E;

and put it in as parameter
antiSamy.scan ( parameter, policy, AntiSamy.SAX ).getCleanHTML ();

you will get ><img src=a onerror=alert(1)> and so an alert popping up.... even if img Tags are set to remove.

<a/href=javascript:[1].find(alert)>CLICKHERE</a> does not return error

The policy of tag a is below. The clean HTML removed the /href attribute without any error, could you help to have a look at it? How to return an error message for this case?

rev="1.6.3"

Thanks in advance!

         <tag name="a" action="validate">

            <!--  onInvalid="filterTag" has been removed as per suggestion at OWASP SJ 2007 - just "name" is valid -->
            <attribute name="href"/>
            <attribute name="onFocus"/>
            <attribute name="onBlur"/>
            <attribute name="nohref">
                <regexp-list>
                    <regexp name="anything"/>
                </regexp-list>
            </attribute>
            <attribute name="rel">
                <literal-list>
                    <literal value="nofollow"/>
                </literal-list>
            </attribute>
            <attribute name="name"/>


            <attribute name="target">
                <regexp-list>
                    <regexp value="[a-zA-Z0-9\-_\$]+"/>
                </regexp-list>
            </attribute>

        </tag>

the default onsiteURL regex is not safe, if the url starts with '//', the url can jump out of the origin domain

the default onsiteURL regex:
<regexp name="onsiteURL" value="^(?![\p{L}\p{N}\\\.\#@\$%\+&amp;;\-_~,\?=/!]*(&amp;colon))[\p{L}\p{N}\\\.\#@\$%\+&amp;;\-_~,\?=/!]*"/>

usually,rich text requires the href attribute and the validation rule like this:

<attribute name="href">
			<regexp-list>
				<regexp name="onsiteURL" />
			</regexp-list>
</attribute>

so if developer trust the onsiteURL regex, they will not do any other domain validate, but the onsiteURL regex can bypass by '//' like '//evali.com?params', In this case, phishing attacks may occur. In addition, information leakage may occur due to such as dangling markup attacks.
image

AntiSamy is not work on <svg/onload = alert('Hello')/>

I am working on 1 of XSS issue where our tester finds an issue like <svg/onload = alert('Hello') > and antisamy is not cleaning this particular tag.

even I debug antisamy library that it will consider or <style> as a tag and continue with current code so it is not throwing any particular exception.

i have already written small test case for your reference

@Test public void testStyleOnloadWithAlertScripts() throws PolicyException, ScanException { assertEquals( "", scanner.scan("<style/onload = alert(document.domain)>")); }

can anyone look into it to resolving this issue either from XML Configuration or from new patch release

Antisamy truncates whole content when frame tag is used in the input and configured frame tag to be removed.

When using antisamy with the attached file where the following tag rule is defined

and providing the input remove

should not be removed
giving the output as empty string instead of
should not be removed
.

sample code below:

Policy policy = Policy.getInstance("C:\\antisamy-basic.xml");
AntiSamy antisamy = new AntiSamy(policy);
CleanResults cleanResults = antisamy.scan("<frame>remove</frame><div>should not be removed</div>");
System.out.println(cleanResults.getCleanHTML());

We are using antisamy version 1.5.5
antisamy-basic.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.