GithubHelp home page GithubHelp logo

straight-shoota / sanitize Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 2.0 113 KB

Crystal library for transforming HTML/XML trees to sanitize HTML from untrusted sources

License: Apache License 2.0

Makefile 3.41% Shell 1.19% Crystal 95.41%
crystal html sanitization sanitize-html sanitize-url html-traverse xml-transformation xss-filter striptags

sanitize's Introduction

sanitize

sanitize is a Crystal library for transforming HTML/XML trees. It's primarily used to sanitize HTML from untrusted sources in order to prevent XSS attacks and other adversities.

It builds on stdlib's XML module to parse HTML/XML. Based on libxml2 it's a solid parser and turns malformed and malicious input into valid and safe markup.

Installation

  1. Add the dependency to your shard.yml:

    dependencies:
      sanitize:
        github: straight-shoota/sanitize
  2. Run shards install

Sanitization Features

The Sanitize::Policy::HTMLSanitizer policy applies the following sanitization steps. Except for the first one (which is essential to the entire process), all can be disabled or configured.

  • Turns malformed and malicious HTML into valid and safe markup.
  • Strips HTML elements and attributes not included in the safe list.
  • Sanitizes URL attributes (like href or src) with customizable sanitization policy.
  • Adds rel="nofollow" to all links and rel="noopener" to links with target.
  • Validates values of accepted attributes align, width and height.
  • Filters class attributes based on a whitelist (by default all classes are rejected).

Usage

Transformation is based on rules defined by Sanitize::Policy implementations.

The recommended standard policy for HTML sanitization is Sanitize::Policy::HTMLSanitizer.common which represents good defaults for most use cases. It sanitizes user input against a known safe list of accepted elements and their attributes.

require "sanitize"

sanitizer = Sanitize::Policy::HTMLSanitizer.common
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>)) # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>)) # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%(<img src="foo.jpg">)) # => %(<img src="foo.jpg">)
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(<table><tr><td>foo</td><td>bar</td></tr></table>)

Sanitization should always run after any other processing (for example rendering Markdown) and is a must when including HTML from untrusted sources into a web page.

With Markd

A typical format for user generated content is Markdown. Even though it has only a very limited feature set compared to HTML, it can still produce potentially harmful HTML and is is usually possible to embed raw HTML directly. So Sanitization is necessary.

The most common Markdown renderer is markd, so here is a sample how to use it with sanitize:

sanitizer = Sanitize::Policy::HTMLSanitizer.common
# Allow classes with `language-` prefix which are used for syntax highlighting.
sanitizer.valid_classes << /language-.+/

markdown = <<-MD
  Sanitization with [https://shardbox.org/shards/sanitize](sanitize) is not that
  **difficult**.
  ```cr
  puts "Hello World!"
  ```
  <p><a href="javascript:alert("XSS attack!")">Hello world!</a></p>
  MD

html = Markd.to_html(markdown)
sanitized = sanitizer.process(html)
puts sanitized

The result:

<p>Sanitization with <a href="sanitize" rel="nofollow">https://shardbox.org/shards/sanitize</a> is not that
<strong>difficult</strong>.</p>
<pre><code class="language-cr">puts &quot;Hello World!&quot;
</code></pre>
<p>Hello world!</p>

Limitations

Sanitizing CSS is not supported. Thus style attributes can't be accepted in a safe way. CSS sanitization features may be added when a CSS parsing library is available.

Security

If you want to privately disclose security-issues, please contact straightshoota on Keybase or [email protected] (PGP: DF2D C9E9 FFB9 6AE0 2070 D5BC F0F3 4963 7AC5 087A).

Contributing

  1. Fork it (https://github.com/straight-shoota/sanitize/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

sanitize's People

Contributors

straight-shoota avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sanitize's Issues

Deprecation warning in Crystal 1.4

In src/policy/html_sanitizer.cr:269:22

 269 | uri.path = URI.encode(URI.decode(path))
                      ^-----
Warning: Deprecated URI.encode. Use `.encode_path` instead.

In /usr/local/Cellar/crystal/1.4.1/src/uri/encoding.cr:123:25

 123 | String.build { |io| encode(string, io, space_to_plus: space_to_plus) }
                           ^-----
Warning: Deprecated URI.encode:space_to_plus. Use `.encode_path` instead.

A total of 2 warnings were found.

I took a quick look into this, but it's not as simple as swapping out to use encode_path as the behaviors aren't 100% the same, and several tests fail:

crystal spec spec/html_sanitizer/url_spec.cr:5 # Sanitize::Policy::HTMLSanitizer escapes URL attribute
crystal spec spec/html_sanitizer/html_sanitizer_spec.cr:18 # Sanitize::Policy::HTMLSanitizer escapes URL attribute
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer protocol_javascript.hrx simple, spaces before (common)
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer protocol_javascript.hrx simple, spaces before and after (common)
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer protocol_javascript.hrx preceding colon (common)
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer protocol_javascript.hrx null char (common)
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer protocol_javascript.hrx invalid URL char (common)
crystal spec spec/support/hrx.cr:57 # Sanitize::Policy::HTMLSanitizer xss.hrx . (common)
# example failure

Expected: "<img src=\"java%5Cscript:alert(%22XSS%22)\"/>"
     got: "<img src=\"java%5Cscript%3Aalert%28%22XSS%22%29\"/>"

Wasn't sure when this will actually be deprecated by Crystal, but seemed worth noting for now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.