nahsra / antisamy Goto Github PK
View Code? Open in Web Editor NEWa library for performing fast, configurable cleansing of HTML coming from untrusted sources
License: BSD 3-Clause "New" or "Revised" License
a library for performing fast, configurable cleansing of HTML coming from untrusted sources
License: BSD 3-Clause "New" or "Revised" License
Been using Antisamy for one of my projects, after I inputting the URL something like this
https://www.google.com/terms-conditions/vacancy.html and click on save button the same URL mention above will change something to this https://www.google.com/ter-ms-conditions/vacancy.html.
terms- is been change to ter-ms-. Please give the solution.
We started to use AntiSamy for CSS validation in our WEB project and realized that it removes !important
CSS rules from the styles.
Eg. <p style=\"color: red !important\">Some Text</p>
resolves to <p style=\"color: red\">Some Text</p>
The following test added to AntiSamyTest
fails.
@Test
public void givenImportantRuleWhenScanThenPreserved() throws ScanException, PolicyException {
String s = as.scan("<p style=\"color: red !important\">Some Text</p>", policy, AntiSamy.DOM).getCleanHTML();
assertTrue(s.contains("!important"));
s = as.scan("<p style=\"color: red !important\">Some Text</p>", policy, AntiSamy.SAX).getCleanHTML();
assertTrue(s.contains("!important"));
}
I see it from the method parameters of org.owasp.validator.css.CssHandler#property
that we are aware of the fact if a property is important or not but it looks like the code ignores this information as the argument is not used anywhere.
...
public void property(String name, LexicalUnit value, boolean important)
throws CSSException {
// only bother validating and building if we are either inline or within
// a selector tag
if (!selectorOpen && !isInline) {
...
Is there a way to get it working or am I missing something? Let me know if you need further information!
Thank you in advance!
I get.
javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: -1
org.owasp.validator.html.ScanException: javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: -1
at org.owasp.validator.html.scan.AntiSamySAXScanner.scan(AntiSamySAXScanner.java:135) ~[antisamy-1.5.7.jar:1.5.7]
at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:101) ~[antisamy-1.5.7.jar:1.5.7]
When
antiSamy.scan ( "my &test", antisamypolicy, AntiSamy.SAX ).getCleanHTML (); //used the standard antisamy.xml
See attached java test class that shows the problem.
Given a Policy that accepts no html, but has <directive onUnknownTag="encode"/>, calling
AntiSamy.scan("<div>abc</div>", policy);
package org.yourname;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.junit.Assert.assertThat;
import java.io.InputStream;
import java.io.StringBufferInputStream;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.mockito.runners.MockitoJUnitRunner;
import org.owasp.validator.html.AntiSamy;
import org.owasp.validator.html.CleanResults;
import org.owasp.validator.html.Policy;
import org.owasp.validator.html.PolicyException;
import org.owasp.validator.html.ScanException;
@SuppressWarnings("deprecation")
@RunWith(MockitoJUnitRunner.class)
public class AntiSamyEncodingTest
{
@Test
/**
* Demonstrates that the onUnknownTag directive causes AntiSamy's scan to
* lose the closing tag
*
* Given an input like
* <pre>
* <div>hello, world</div>
* </pre>
* scan will return
* <pre>
* <div>hello, world
* </pre>
* without the closing tag.
*/
public void standaloneTest() throws PolicyException, ScanException
{
String policyDefinition =
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>" +
"<anti-samy-rules xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
" xsi:noNamespaceSchemaLocation=\"antisamy.xsd\">" +
" <directives>" +
" <directive name=\"onUnknownTag\" value=\"encode\" />" +
" </directives>" +
" <common-regexps></common-regexps>" +
" <common-attributes></common-attributes>" +
" <!-- no tags are valid, by default all html elements are encoded -->" +
" <tag-rules></tag-rules>" +
"</anti-samy-rules>";
InputStream sr = new StringBufferInputStream(policyDefinition);
AntiSamy as = new AntiSamy();
Policy policy = Policy.getInstance(sr);
String taintedHtml = "<div>hello, world</div>";
CleanResults cr = as.scan(taintedHtml, policy, AntiSamy.SAX);
String cleaned = cr.getCleanHTML();
// the value is "<div>hello, world", missing the closing element
assertThat(cleaned, equalTo("<div>hello, world</div>")); //fails
}
}
If and only if you agree this is not correct, would be happy to open a PR
i have a rule for an attribute in a tag that is like this:
<tag name="tag1" action="validate" > <attribute name="attribute1"> <regexp-list> <regexp name="anything" /> </regexp-list> </attribute> <attribute name="attribute2" /> </tag>
when i use the SAX parser, attribute1 is dropped because it has an LSEP character in it. it also may be dropping the filename for other characters. either way, shouldn't attribute1 not be dropped since the regexp has been assigned as anything
?
the specific error message i'm getting is "The tag1 tag contained an attribute that we could not process. The attribute1 attribute had a value of "Gartner - Executive Guide to Total Experience .pdf". This value could not be accepted for security reasons. We have chosen to remove this attribute from the tag and leave everything else in place so that we could process the input."
Hi,
I just tried to build antisamy on a windows box. I called mvn package
and got the following messages.
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:jar (attach-javadocs) on project antisamy: MavenReportException: Error while generating Javadoc:
[ERROR] Exit code: 1 - D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:123: warning: no @param for errorMessages
[ERROR] public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR] ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:123: warning: no @param for messages
[ERROR] public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR] ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:139: warning: no @param for errorMessages
[ERROR] public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
[ERROR] ^
[ERROR] D:\tmp\antisamy\src\main\java\org\owasp\validator\css\CssHandler.java:139: warning: no @param for messages
[ERROR] public CssHandler(Policy policy, LinkedList embeddedStyleSheets,
AntiSamy seems to lack of a schema validation when loading the XML of a policy.
This may lead to malformed policies that are valid (AntiSamy won't blow up) but do not comply with the XSD. Bugs can originate from bad policy definition, which could be prevented with XML schema validation.
Even if applying validation to current example policies (and some customized in tests), they fail to validate.
This is a screenshot to the validation on freeformatter for antisamy-tinymce.xml:
I would suggest applying strict schema validation with the already defined XSD. As an improvement, if requested or considered useful, multiple or "stacked" validation could be applied, seen as an intersection of schemas to restrict policies structure even more.
Add rel="noopener" to anker if target="_blank" is set
Based on the OWASP article
https://owasp.org/www-community/attacks/Reverse_Tabnabbing
it would be nice if the noopener attribute would be set automatically if the target blank attribute is in use.
This is very similar to the nofollow setting in antisamy
Example
<a href="https://example.com" target="_blank"> => <a href="https://example.com" target="_blank" rel="noopener">
The policy of tag a is below. The clean HTML removed the /href attribute without any error, could you help to have a look at it? How to return an error message for this case?
rev="1.6.3"
Thanks in advance!
<tag name="a" action="validate">
<!-- onInvalid="filterTag" has been removed as per suggestion at OWASP SJ 2007 - just "name" is valid -->
<attribute name="href"/>
<attribute name="onFocus"/>
<attribute name="onBlur"/>
<attribute name="nohref">
<regexp-list>
<regexp name="anything"/>
</regexp-list>
</attribute>
<attribute name="rel">
<literal-list>
<literal value="nofollow"/>
</literal-list>
</attribute>
<attribute name="name"/>
<attribute name="target">
<regexp-list>
<regexp value="[a-zA-Z0-9\-_\$]+"/>
</regexp-list>
</attribute>
</tag>
When using antisamy with the attached file where the following tag rule is defined
and providing the input remove
sample code below:
Policy policy = Policy.getInstance("C:\\antisamy-basic.xml");
AntiSamy antisamy = new AntiSamy(policy);
CleanResults cleanResults = antisamy.scan("<frame>remove</frame><div>should not be removed</div>");
System.out.println(cleanResults.getCleanHTML());
We are using antisamy version 1.5.5
antisamy-basic.zip
Upstream and users want this library out due to known CVE. It's not a realistic threat for AntiSamy use for a couple reasons, but it's harmless to pull it out and replace with the latest.
This code was added in org.owasp.validator.html.scan.AntiSamySAXScanner:
sTransformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
sTransformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
Xerces2-j does not support these attribute constants. The Oracle JAXP documentation (https://docs.oracle.com/javase/tutorial/jaxp/properties/usingProps.html) says that it is recommended to catch the IllegalArgumentException for unsupported features.
I know that Xerces is ancient, but there isn't any need to break compatibility in this way.
UPDATE: This issue was originally created with Xerces-2 in the title as the offending library. So the thread below talks about Xerces alot. But the actual problem is with a variant of Xalan in the classpath and Xerces is a red herring. Way below, Xalan is finally mentioned as the real problem, not Xerces.
Hi,
This library has a dependency on the commons-httpclient library which is both end of life and vulnerable. Is it possible to upgrade to its replacement, http://hc.apache.org/httpcomponents-client-ga/index.html?
Thanks,
Geert
Still able to inject <a onmouseover=alert(1)>click</a>, <p onmouseover=alert(1)>click</p>
element. Even though I have updated the antisamy policy to remove "a" tag completely
I am working on 1 of XSS issue where our tester finds an issue like <svg/onload = alert('Hello') > and antisamy is not cleaning this particular tag.
even I debug antisamy library that it will consider or <style> as a tag and continue with current code so it is not throwing any particular exception.
i have already written small test case for your reference
@Test public void testStyleOnloadWithAlertScripts() throws PolicyException, ScanException { assertEquals( "", scanner.scan("<style/onload = alert(document.domain)>")); }
can anyone look into it to resolving this issue either from XML Configuration or from new patch release
if you are trying
"><img src=a onerror=alert(1)>
and put it in as parameter
antiSamy.scan ( parameter, policy, AntiSamy.SAX ).getCleanHTML ();
you will get ><img src=a onerror=alert(1)>
and so an alert popping up.... even if img Tags are set to remove.
Xerces does not support the attributes on a Transformer that are required in order to mitigate XXE vulnerabilities. While this may not be a huge issue for AntiSamy itself, the fact that we have to include xercesImpl.jar in our application classpath means that xerces is used ahead of the JDK, and therefore our XXE mitigations are useless.
XXE mitigation reference: https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#TransformerFactory
Besides all of that: Xerces hasn't been maintained since 2010!
As has been mentioned by others: AntiSamy should be using XLST to serialize instead of the deprecated HTMLSerializer. As far as I can tell, this is the only direct xerces dependency.
Thanks.
Added:
into the config
I used this input:
<p>this is a test content before start testing</p>
<!-- TESTING COMMENT --><p>another line</p>
<p>end of the content</p>
then after
Policy policy = Policy.getInstance(App.class.getResourceAsStream("/antisamyConfig.xml"));
AntiSamy sanitizer = new AntiSamy(policy);
CleanResults scanned = sanitizer.scan(input);
String sanitized = scanned.getCleanHTML();
The output was:
<p>this is a test content before start testing</p>
<p>another line</p>
<p>end of the content</p>
<!-- TESTING COMMENT -->
Hello,
Using antisamy causes batik-css-1.8.jar to be include as a run-time dependency. There is a high severity CVE against this library: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-5662.
batik-1.9 was recently release which fixes this issue. Any chance we could get a new version of antisamy with this instead of 1.8? I could do a pull request if you like.
Thanks!
The code in question gets invoked through ESAPI library so I'm not sure if it has a bearing. But on examination of the method removePI of class AntiSamyDOMScanner, it looks like the node doesn't have any parent node. And so node.getParentNode() creates a null pointer exception. Attaching the stack trace here.
java.lang.NullPointerException
at org.owasp.validator.html.scan.AntiSamyDOMScanner.removePI(AntiSamyDOMScanner.java:689)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(AntiSamyDOMScanner.java:260)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.processChildren(AntiSamyDOMScanner.java:675)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.processChildren(AntiSamyDOMScanner.java:666)
at org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner.java:159)
at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:93)
After upgrading antisamy jar version from 1.5.7 to 1.5.8 performance got down by 40-50%.
After comparing the 1.5.8 version of code with the 1.5.7. I found that in case of SAXScanner, in the class AntiSamySAXScanner after doing all the scanning process the cached item is not being added to the cachedItems Queue, because of which for every scan call object of CachedItem class is being created.
And this create operation for every scan is lowering down the performance.
It looks like somewhere around version 1.5 that the method org.owasp.validator.html.scan.AntiSamyDOMScanner#stripNonValidXMLCharacters
was altered to check if the Pattern
for invalidXmlCharacters
java.util.regex.Matcher#matches()
.
I presume that was in a bid for efficiency to cut down on a replaceAll
method call if there was no need to affect the String
input.
You use the same technique in a test to check if there are time improvements made.
I believe it should use java.util.regex.Matcher#find()
instead.
matches()
checks if the entire sequence matches the pattern. Since the pattern represents only a single character, in effect, that can be in one of the defined sets, then if the sequence (the HTML) is longer than 1 character it can never match. It's been this way since forever I believe, at least Java 5.
find()
will find the next subsequence that matches the pattern, in effect checking quickly and succeeding fast if the HTML needs to be cleansed.
You're getting a speed increase because matches()
is getting to the second char
and declaring the sequence as a non-match regardless.
Input HTML:
<div>Hello\uD83D\uDC95</div>
Expected on org.owasp.validator.html.CleanResults#getCleanHTML
:
<div>Hello</div>
Actual:
<div>Hello\uD83D\uDC95</div>
Where input HTML is single character \uD888
only then the output is the empty string as expected.
I looked through the test class here and can see no tests where you are expecting data to be cleansed. All the tests ensure that characters make it through ok or that something is faster (checking only 1 char is faster!)
Incidentally, I only noticed this since the Antisamy code looked to want to cleanse the characters needed for an emoji, where the character is actually valid in XML and HTML spec so far as I can tell, when their UTF-8 bytes are read by our system we get a Java representation in 16 bit char
underpinning the String
and the character points fall within your filter and, although I don't believe you should be stripping those if they come together and according to java.lang.Character#isSurrogatePair
the two \uD83D \uDC95
together return true
rather than false
and the toCodePoint
method tells us that it's 💕
. So I think the checks in this method ought to be more complex.
Ironically, if the code in this method worked as intended then the characters would have been cleansed away. But they weren't.
I believe you could get manipulative code points through now, because of this. But I can't be certain as I'm looking purely from a data cleansing point of view.
The markup below shows multiple ways to bypass the AntiSamy filter.
NOT Sanitized by AntiSamy
shows a couple Javascript payloads that browsers execute (tested with Chrome 69), but won't be removed by AntiSamy.Sanitized by AntiSamy
heading is sanitized correctly and just left here as a recommendation for tests.Tricky Encoding with Ampersand Encoding
was created by taking one of the payloads under NOT Sanitized by AntiSamy
and encoding all encountered ampersands, using different ways to encode an ampersand. One could repeat this process for any payload under NOT Sanitized by AntiSamy
.<html>
<head>
<title>Test</title>
</head>
<body>
<h1>Tricky Encoding</h1>
<h2>NOT Sanitized by AntiSamy</h2>
<ol>
<li><a href="javascript:x=alert,x%281%29">X:x</a></li>
<li><a href="javascript:y=alert,y%281%29">X:y</a></li>
<li><a href="javascript:x=alert,x%281%29">X:x</a></li>
<li><a href="javascript:y=alert,y%281%29">X:y</a></li>
<li><a href="javascript:x=alert,x%281%29">X:x</a></li>
<li><a href="javascript:y=alert,y%281%29">X:y</a></li>
<li><a href="javascript:x=alert,x%281%29">X:x</a></li>
<li><a href="javascript:y=alert,y%281%29">X:y</a></li>
</ol>
<h2>Sanitized by AntiSamy</h2>
<ol>
<li><a href="javascript:alert(1)">X:</a></li>
<li><a href="javascript:alert(1)">X:</a></li>
<li><a href="javascript:alert(1)">X:</a></li>
<li><a href="javascript:alert(1)">X:</a></li>
<li><a href="javascript:alert(1)">X:</a></li>
</ol>
<h1>Tricky Encoding with Ampersand Encoding</h1>
<p>AntiSamy turns harmless payload into XSS by just decoding the encoded ampersands in the href attribute</a>
<ol>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
<li><a href="javascript&#x3Ax=alert,x%281%29">X&#x3A;x</a></li>
</ol>
<p><a href="javascript:x=alert,x%281%29">Original without ampersand encoding</a></p>
</body>
</html>
windows jdk-7 maven 3.8.1 antisamy 1.6.4
mvn test / mvn package
Success
......
[ERROR] Tests run: 14, Failures: 0, Error
s: 1, Skipped: 0, Time elapsed: 0.11 s <<< FAILURE! - in o
rg.owasp.validator.html.test.PolicyTest
[ERROR] org.owasp.validator.html.test.PolicyTest.testGithubIssue79 Ti
me elapsed: 0 s <<< ERROR!
java.lang.UnsupportedClassVersionError: org/owasp/antisamy/test/Dummy : Unsuppor
ted major.minor version 52.0
at org.owasp.validator.html.test.PolicyTest.testGithubIssue79(PolicyTest
.java:341)
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] PolicyTest.testGithubIssue79:341 ? UnsupportedClassVe
rsion org/owasp/antisamy/...
[INFO]
[ERROR] Tests run: 93, Failures: 0, Errors: 1, Skipped: 0
[INFO]
[INFO] -----------------------------------------------------------
-------------
[INFO] BUILD FAILURE
[INFO] -----------------------------------------------------------
......
and
......
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.0.0-M3/maven-enforcer-plugin-3.0.0-M3.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.939 s
[INFO] Finished at: 2021-08-04T10:35:20+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M3 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-enforcer-
plugin:jar:3.0.0-M3: Could not transfer artifact org.apache.maven.plugins:maven-enforcer-plugin:pom:3.0.0-M3 from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache
.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.0.0-M3/maven-enforcer-plugin-3.0.0-M3.pom: Received fatal alert: protocol_version -> [Help 1]
......
1, The use case for TestGithubIssue79 requires jdk8.
2, Java7 defaults to TLSv1.0, but maven-enforcer-plugin-3.0.0-M3.pom has instructions to set it to TLSv1.2.
All of this means that the new version of Antisamy is no longer supporting Java 7, at least Java8, but I note that the readme document states that Antisamy 1.6.4 supports Java 7+.
As far as I know, JDK now has only two TLS versions, Java 8 and Java 11, and the other versions are no longer maintained.
Why does Antisamy still need to support Java 7?
Best Regards
i have a problem, when there is a "<" symbol in content without ">" it will remove rest of content right side to it.
added directive: <directive name="onUnknownTag" value="encode"/>
input: "hello <hi world, it is clean"
output : "hello"
is there any way that i get the output same as input in this case.:
expected output: "hello <hi world, it is clean"
Antisamy is not working for the test case , i tried in latest version also.
When there is "/" character inside tag it fails.
My Test Case:
@test
public void testXSSScript() throws PolicyException, ScanException {
String result = scanner.scan("<style/onload=alert(document.domain)>");
assertEquals("", result);
}
====Logic which called by test case===
Please consider policy is loading and i attached antisamy.xml , For some reason it is not giving any error for <style/onload=alert(document.domain)> when "Collection errors = r.getErrorMessages();" executes
public String scan(String untrustedUserInput) throws PolicyException, ScanException {
CleanResults r = webSecurityScanner.scan(untrustedUserInput, AntiSamy.SAX);
if(logger.isDebugEnabled()) {
logger.debug("Scanned request parameter in " + r.getScanTime() + "ms");
logger.debug("Value: " + untrustedUserInput);
logger.debug("Result: " + r.getCleanHTML());
logger.debug("Errors: " + r.getErrorMessages());
}
Collection<String> errors = r.getErrorMessages();
if(CollectionUtils.exists(errors, securityErrorPredicate)) {
logger.info("Returning cleansed input due to " + errors.size() + " security errors: " + errors);
logger.debug("Original: [" + untrustedUserInput + "]");
final String cleansedHTML = fixMangledTags(r.getCleanHTML());
logger.debug("Cleansed: [" + cleansedHTML + "]");
return cleansedHTML;
}
return untrustedUserInput;
}
The provided files "antisamy.xml" and antisamy.xsd" don't match.
The xml files contains <dynamic-tag-attributes>
, but in the schema, there is no such tag included.
<p lang="en-GB">This paragraph is defined as British English.</p>
Output:
<p>This paragraph is defined as British English.</p>
Hi, @davewichers Is there a security problem here? Why not support IANA subtags?
Steps to reproduce the problem-
<p style="margin: 0.0001pt;" />
.filterHTML
API to filter the above HTML contentExpected output-
With regexp configuration [2], should not remove the margin with any decimal number
[1] AntiSamy warning: The p tag had a style attribute, "margin", that could not be allowed for security reasons.
[2] <regexp name="length" value="((-|\+)?0|(-|\+)?([0-9]+(\.[0-9]*)?)(em|ex|px|in|cm|mm|pt|pc))"/>
AntiSamy uses a deprecated HTMLSerializer which does not understand newer HTML5 tags like <figure>
. While this is a minor issue, it also does not understand newer HTML5 entities like :
or (
. This leads to a security vulnerability where the following text does not get cleaned:
<a href="javascript:alert(1)">X</a>
Antisamy fails to filter (identify) 'HTML / HTML5 elements with events (onerror, onload, etc) when the tags are not closed with ">" character.
Modern browsers (tested with Firefox and Chrome) will autocomplete such tags and hence will execute the JavaScript leading to Cross-site Scripting - XSS
Example Payloads for better understanding -
I have tested using both - SAX and DOM, found the payloads to execute.
Karan Ramani
If I input the following html:
<ul>
<li> one </li>
<li> two</li>
<li> three
<ul>
<li>a</li>
<li>b</li>
</ul>
</li>
</ul>
The following output occurs:
<ul>
<li> one </li>
<li> two</li>
<li> three
<ul>
</ul>
</li>
<li>a</li>
<li>b</li>
</ul>
Basically it moves the nested list content to the parent list. This seems to be a bug since I can't find any configuration to fix this.
I'm reasonably sure that I found a bypass leading to XSS. Can I please get a contact to further discuss this issue?
When updating one of my modules to Antisamy 1.6 I got a test failure due to org.owasp.validator.html.Policy
now having a hard dependency on log4j
, whereas the pom.xml
declares that it's using slf4j
.
It appears this was introduced in the commit:
64416f1#diff-ea20191cc92e7360f2cc25757c0fd872902416e123220b6649b0dc7a0af5663b
It seems this is the ONLY logging in the project, the logging depdency doesn't appear to be mentioned in the release ntoes on https://github.com/nahsra/antisamy/releases/tag/v1.6.0 either.
Since slf4j is mentioned - should this not be using slf4js API only, with the slf4j-over-log4j
be used in tests, or client applications ONLY.
Antisamy version: 1.6.1
Unable to generate a Policy instance from Policy.newInstance(InputStream)
when antisamy schema validation is disabled and the configuration file contains an invalid structure.
Results :
Tests in error:
testSystemProp(antisamy_tests.InvalidPolicyTest): java.io.IOException: Stream closed
testDirectConfig(antisamy_tests.InvalidPolicyTest): java.io.IOException: Stream closed
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0
mvnCleanTest.txt
mvnCleanTest_verbose.txt
Pertinent Stacktrace from logs:
org.owasp.validator.html.PolicyException: java.io.IOException: Stream closed
at org.owasp.validator.html.Policy.getTopLevelElement(Policy.java:379)
at org.owasp.validator.html.Policy.getTopLevelElement(Policy.java:355)
at org.owasp.validator.html.Policy.getInstance(Policy.java:235)
Snippet from Policy.java
protected static Element getTopLevelElement(InputSource source, Callable<InputSource> getResetSource) throws PolicyException {
// Track whether an exception was ever thrown while processing policy file
Exception thrownException = null;
try {
>> First Stream use :: return getDocumentElementFromSource(source, true);
} catch (SAXException e) {
thrownException = e;
if (!validateSchema) {
try {
source = getResetSource.call();
>> Second Stream use :: Element theElement = getDocumentElementFromSource(source, false);
// We warn when the policy has an invalid schema, but schema validation is disabled.
logger.warn("Invalid policy file: " + e.getMessage());
return theElement;
} catch (Exception e2) {
throw new PolicyException(e2);
}
} else throw new PolicyException(e);
} catch (ParserConfigurationException | IOException e) {
thrownException = e;
>> EXCEPTION :: throw new PolicyException(e);
} finally {
if (!validateSchema && (thrownException == null)) {
// We warn when the policy has a valid schema, but schema validation is disabled.
logger.warn("XML schema validation is disabled for a valid policy. Please reenable policy validation.");
}
}
}
I think that the first stream use completes the read process (I've followed that through all the way to the return clause from getDocumentElementFromSource
), and then closes the stream. Then it fails to parse the response in the return and falls into the SAXException Block. The property set lets us try again, but the source object was never reset from the last event, and so we cannot read from the closed stream. I can't see the actual stream in my debugger so I can't be 100% certain, but this workflow would appear to match the current output from the tests.
In a web project we use ESAPI validator to sanitize inputs. While most of the improper inputs are detected as expected, it fails to detect the below given input as improper. I am using esapi-2.1.0.1 and antisamy-1.5.3 jars.
<img src=x onerror=alert(1) alt=
Potentially, the browser closes the tag itself, hence triggering the alert function. Surprisingly, ESAPI detects the below given input as improper:
<img src=x onerror=alert(1) alt="text"
Below are some test cases and analysis done :
Regex used for alt attribute : [a-zA-Z0-9:-_.]+ (It needs minimum one character)
Regex used for onerror attribute : [0-9\s*,]* (It allows numbers and whitespace characters)
Observation:
Value of alt attribute is modifying the behavior of attribute validation (for itself in cases 1 & 2 and other tags in cases 3 & 4).
Questions:
Can anyone please guide me with any suggestions or comments to mitigate the issue if I am going wrong somewhere ?
Thanks.
The validation message
"Der h3 Tag leer war, und daher konnten wir nicht verarbeiten. Der Rest der Nachricht intakt ist, und ihre Entfernung sollte keine Nebenwirkungen."
is not correct german.
This should be fixed in /scr/main/resources/AntiSamy_de_DE.properties
This was given CVE-2016-10006, and was reported by Vivek Krishna, Zoho Corporation.
Please add a license file to your project
This would address a vulnerability in a downstream dependency
https://app.snyk.io/vuln/SNYK-JAVA-ORGAPACHEXMLGRAPHICS-1079038
I am using antisamy 1.5.7.
I saw issue when input was
firstname,lastname<[email protected]> or firstname,lastname<[email protected] testing>
Result after Antisamy scan is same for both above cases
firstname,lastname<name>
I have below directive in policy file
<directive name="onUnknownTag" value="encode"/>
Is there a place in policy file I can update to encode @ when it is within <> ?
On every policy XML file, I've found that the tag "col" is defined twice.
The first appearance is:
<tag name="col" action="validate"/>
And the second is:
<tag name="col" action="validate">
<attribute name="align" />
...
<attribute name="width" />
</tag>
Same thing happens with the property "clip", but this one is and exact copy.
This is not a real issue in Java, as the tests would fail when parsing the policy if it was. However, when implementing this on .NET for example, it may fail if you consider it appears once.
My proposal is to first verify if it is there on purpose. If it isn't, remove it. In my opinion, it makes no sense to have multiple definitions for anything here, it is just confusing for anyone who doesn't know about it as in Java this doesn't fail.
If it is OK to remove the duplicated tags, I can remove them and make the pull request (of course, tests don't fail when removing the tags from the default policy).
the default onsiteURL regex:
<regexp name="onsiteURL" value="^(?![\p{L}\p{N}\\\.\#@\$%\+&;\-_~,\?=/!]*(&colon))[\p{L}\p{N}\\\.\#@\$%\+&;\-_~,\?=/!]*"/>
usually,rich text requires the href attribute and the validation rule like this:
<attribute name="href">
<regexp-list>
<regexp name="onsiteURL" />
</regexp-list>
</attribute>
so if developer trust the onsiteURL regex, they will not do any other domain validate, but the onsiteURL regex can bypass by '//' like '//evali.com?params', In this case, phishing attacks may occur. In addition, information leakage may occur due to such as dangling markup attacks.
String input = "<div data-title=\"Pocahontas\" >Just Around the Riverbend</div>";
CleanResults results = new AntiSamy().scan(input, Policy.getInstance(), AntiSamy.SAX);
System.out.println("result: " + results.getCleanHTML());
// result: <div data-title="Pocahontas">Just Around the Riverbend</div>
result: <div>Just Around the Riverbend</div>
result: <div data-title="Pocahontas">Just Around the Riverbend</div>
AntiSamy.DOM is the default mode, and we should support dynamic attribute.
A change late last year to prevent loading of remote URLs was achieved by checking that the URL only uses the file: scheme; this breaks a very common use case, which is bundling the policy file inside a java archive of some sort (jar/war/ear) . AFAICT this is no more of a security risk than loading a file from a file: URL, and disabling this ability significantly increases the complexity of deployment in some contexts
The front desk carries a parameter through the url, the value of the parameter is <>, through Policy policy = Policy.getInstance("/antisamy-slashdot-1.4.4.xml");
final CleanResults cr = antiSamy.scan(value, policy);
String str = cr.getCleanHTML();
The str obtained is escaped <>, there is no such problem before 1.5, why not after 1.5
Set'embedStyleSheets' in the configuration:
<directive name="embedStyleSheets" value="true"/>
Input:
<!DOCTYPE html>
<html>
<head>
<style type='text/css'>
@import url(https://unpkg.com/element-ui/lib/theme-chalk/index.css)
h1 {font: 15pt "Arial"; color: blue;}
p {font: 10pt "Arial"; color: black;}
</style>
</head>
<body>
<div>
<h1>Title</h1>
<p>content</p>
</div>
</body>
</html>
Out:
<html>
<head>
<style type="text/css"><![CDATA[/* */]]></style></head>
<body>
<div>
<h1>Title</h1>
<p>content</p></div></body></html>
Result: All embedded styles are deleted。
Hi, @davewichers @spassarop Is this check deprecated?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.