GithubHelp home page GithubHelp logo

html-pipeline's Introduction

HTML-Pipeline

HTML processing filters and utilities. This module is a small framework for defining CSS-based content filters and applying them to user provided content.

Although this project was started at GitHub, they no longer use it. This gem must be considered standalone and independent from GitHub.

Installation

Add this line to your application's Gemfile:

gem 'html-pipeline'

And then execute:

$ bundle

Or install it by yourself as:

$ gem install html-pipeline

Usage

This library provides a handful of chainable HTML filters to transform user content into HTML markup. Each filter does some work, and then hands off the results tothe next filter. A pipeline has several kinds of filters available to use:

  • Multiple TextFilters, which operate a UTF-8 string
  • A ConvertFilter filter, which turns text into HTML (eg., Commonmark/Asciidoc -> HTML)
  • A SanitizationFilter, which remove dangerous/unwanted HTML elements and attributes
  • Multiple NodeFilters, which operate on a UTF-8 HTML document

You can assemble each sequence into a single pipeline, or choose to call each filter individually.

As an example, suppose we want to transform Commonmark source text into Markdown HTML:

Hey there, @gjtorikian

With the content, we also want to:

  • change every instance of Hey to Hello
  • strip undesired HTML
  • linkify @mention

We can construct a pipeline to do all that like this:

require 'html_pipeline'

class HelloJohnnyFilter < HTMLPipelineFilter
  def call
    text.gsub("Hey", "Hello")
  end
end

pipeline = HTMLPipeline.new(
  text_filters: [HelloJohnnyFilter.new]
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
    # note: next line is not needed as sanitization occurs by default;
    # see below for more info
  sanitization_config: HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG,
  node_filters: [HTMLPipeline::NodeFilter::MentionFilter.new]
)
pipeline.call(user_supplied_text) # recommended: can call pipeline over and over

Filters can be custom ones you create (like HelloJohnnyFilter), and HTMLPipeline additionally provides several helpful ones (detailed below). If you only need a single filter, you can call one individually, too:

filter = HTMLPipeline::ConvertFilter::MarkdownFilter.new
filter.call(text)

Filters combine into a sequential pipeline, and each filter hands its output to the next filter's input. Text filters are processed first, then the convert filter, sanitization filter, and finally, the node filters.

Some filters take optional context and/or result hash(es). These are used to pass around arguments and metadata between filters in a pipeline. For example, if you want to disable footnotes in the MarkdownFilter, you can pass an option in the context hash:

context = { markdown: { extensions: { footnotes: false } } }
filter = HTMLPipeline::ConvertFilter::MarkdownFilter.new(context: context)
filter.call("Hi **world**!")

Alternatively, you can construct a pipeline, and pass in a context during the call:

pipeline = HTMLPipeline.new(
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
  node_filters: [HTMLPipeline::NodeFilter::MentionFilter.new]
)
pipeline.call(user_supplied_text, context: { markdown: { extensions: { footnotes: false } } })

Please refer to the documentation for each filter to understand what configuration options are available.

More Examples

Different pipelines can be defined for different parts of an app. Here are a few paraphrased snippets to get you started:

# The context hash is how you pass options between different filters.
# See individual filter source for explanation of options.
context = {
  asset_root: "http://your-domain.com/where/your/images/live/icons",
  base_url: "http://your-domain.com"
}

# Pipeline used for user provided content on the web
MarkdownPipeline = HTMLPipeline.new (
  text_filters: [HTMLPipeline::TextFilter::ImageFilter.new],
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
  node_filters: [
    HTMLPipeline::NodeFilter::HttpsFilter.new,HTMLPipeline::NodeFilter::MentionFilter.new,
  ], context: context)

# Pipelines aren't limited to the web. You can use them for email
# processing also.
HtmlEmailPipeline = HTMLPipeline.new(
  text_filters: [
    PlainTextInputFilter.new,
    ImageFilter.new
  ], {})

Filters

TextFilters

TextFilters must define a method named call which is called on the text. @text, @config, and @result are available to use, and any changes made to these ivars are passed on to the next filter.

  • ImageFilter - converts image url into <img> tag
  • PlainTextInputFilter - html escape text and wrap the result in a <div>

ConvertFilter

The ConvertFilter takes text and turns it into HTML. @text, @config, and @result are available to use. ConvertFilter must defined a method named call, taking one argument, text. call must return a string representing the new HTML document.

  • MarkdownFilter - creates HTML from text using Commonmarker

Sanitization

Because the web can be a scary place, HTML is automatically sanitized after the ConvertFilter runs and before the NodeFilters are processed. This is to prevent malicious or unexpected input from entering the pipeline.

The sanitization process takes a hash configuration of settings. See the Selma documentation for more information on how to configure these settings.

A default sanitization config is provided by this library (HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG). A sample custom sanitization allowlist might look like this:

ALLOWLIST = {
  elements: ["p", "pre", "code"]
}

pipeline = HTMLPipeline.new \
  text_filters: [
    HTMLPipeline::TextFilter::ImageFilter.new,
  ],
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
  sanitization_config: ALLOWLIST

result = pipeline.call <<-CODE
This is *great*:

    some_code(:first)

CODE
result[:output].to_s

This would print:

<p>This is great:</p>
<pre><code>some_code(:first)
</code></pre>

Sanitization can be disabled if and only if nil is explicitly passed as the config:

pipeline = HTMLPipeline.new \
  text_filters: [
    HTMLPipeline::TextFilter::ImageFilter.new,
  ],
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
  sanitization_config: nil

For more examples of customizing the sanitization process to include the tags you want, check out the tests and the FAQ.

NodeFilters

NodeFilterss can operate either on HTML elements or text nodes using CSS selectors. Each NodeFilter must define a method named selector which provides an instance of Selma::Selector. If elements are being manipulated, handle_element must be defined, taking one argument, element; if text nodes are being manipulated, handle_text_chunk must be defined, taking one argument, text_chunk. @config, and @result are available to use, and any changes made to these ivars are passed on to the next filter.

NodeFilter also has an optional method, after_initialize, which is run after the filter initializes. This can be useful in setting up a custom state for result to take advantage of.

Here's an example NodeFilter that adds a base url to images that are root relative:

require 'uri'

class RootRelativeFilter < HTMLPipeline::NodeFilter

  SELECTOR = Selma::Selector.new(match_element: "img")

  def selector
    SELECTOR
  end

  def handle_element(img)
    next if img['src'].nil?
    src = img['src'].strip
    if src.start_with? '/'
      img["src"] = URI.join(context[:base_url], src).to_s
    end
  end
end

For more information on how to write effective NodeFilters, refer to the provided filters, and see the underlying lib, Selma for more information.

  • AbsoluteSourceFilter: replace relative image urls with fully qualified versions
  • AssetProxyFilter: replace image links with an encoded link to an asset server
  • EmojiFilter: converts :<emoji>: to emoji
    • (Note: the included MarkdownFilter will already convert emoji)
  • HttpsFilter: Replacing http urls with https versions
  • ImageMaxWidthFilter: link to full size image for large images
  • MentionFilter: replace @user mentions with links
  • SanitizationFilter: allow sanitize user markup
  • SyntaxHighlightFilter: applies syntax highlighting to pre blocks
    • (Note: the included MarkdownFilter will already apply highlighting)
  • TableOfContentsFilter: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
  • TeamMentionFilter: replace @org/team mentions with links

Dependencies

Since filters can be customized to your heart's content, gem dependencies are not bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.

For example, SyntaxHighlightFilter uses rouge to detect and highlight languages; to use the SyntaxHighlightFilter, you must add the following to your Gemfile:

gem "rouge"

Note See the Gemfile :test group for any version requirements.

When developing a custom filter, call HTMLPipeline.require_dependency at the start to ensure that the local machine has the necessary dependency. You can also use HTMLPipeline.require_dependencies to provide a list of dependencies to check.

On a similar note, you must manually require whichever filters you desire:

require "html_pipeline" # must be included
require "html_pipeline/convert_filter/markdown_filter" # included because you want to use this filter
require "html_pipeline/node_filter/mention_filter" # included because you want to use this filter

Documentation

Full reference documentation can be found here.

Instrumenting

Filters and Pipelines can be set up to be instrumented when called. The pipeline must be setup with an ActiveSupport::Notifications compatible service object and a name. New pipeline objects will default to the HTMLPipeline.default_instrumentation_service object.

# the AS::Notifications-compatible service object
service = ActiveSupport::Notifications

# instrument a specific pipeline
pipeline = HTMLPipeline.new [MarkdownFilter], context
pipeline.setup_instrumentation "MarkdownPipeline", service

# or set default instrumentation service for all new pipelines
HTMLPipeline.default_instrumentation_service = service
pipeline = HTMLPipeline.new [MarkdownFilter], context
pipeline.setup_instrumentation "MarkdownPipeline"

Filters are instrumented when they are run through the pipeline. A call_filter.html_pipeline event is published once any filter finishes; call_text_filters and call_node_filters is published when all of the text and node filters are finished, respectively. The payload should include the filter name. Each filter will trigger its own instrumentation call.

service.subscribe "call_filter.html_pipeline" do |event, start, ending, transaction_id, payload|
  payload[:pipeline] #=> "MarkdownPipeline", set with `setup_instrumentation`
  payload[:filter] #=> "MarkdownFilter"
  payload[:context] #=> context Hash
  payload[:result] #=> instance of result class
  payload[:result][:output] #=> output HTML String
end

The full pipeline is also instrumented:

service.subscribe "call_text_filters.html_pipeline" do |event, start, ending, transaction_id, payload|
  payload[:pipeline] #=> "MarkdownPipeline", set with `setup_instrumentation`
  payload[:filters] #=> ["MarkdownFilter"]
  payload[:doc] #=> HTML String
  payload[:context] #=> context Hash
  payload[:result] #=> instance of result class
  payload[:result][:output] #=> output HTML String
end

Third Party Extensions

If you have an idea for a filter, propose it as an issue first. This allows us to discuss whether the filter is a common enough use case to belong in this gem, or should be built as an external gem.

Here are some extensions people have built:

FAQ

1. Why doesn't my pipeline work when there's no root element in the document?

To make a pipeline work on a plain text document, put the PlainTextInputFilter at the end of your text_filters config . This will wrap the content in a div so the filters have a root element to work with. If you're passing in an HTML fragment, but it doesn't have a root element, you can wrap the content in a div yourself.

2. How do I customize an allowlist for SanitizationFilters?

HTMLPipeline::SanitizationFilter::ALLOWLIST is the default allowlist used if no sanitization_config argument is given. The default is a good starting template for you to add additional elements. You can either modify the constant's value, or re-define your own config and pass that in, such as:

config = HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG.deep_dup
config[:elements] << "iframe" # sure, whatever you want

Contributors

Thanks to all of these contributors.

This project is a member of the OSS Manifesto.

html-pipeline's People

Contributors

actions-user avatar aroben avatar atmos avatar benubois avatar bkeepers avatar bradly avatar defunkt avatar dependabot[bot] avatar gjtorikian avatar haileys avatar jakedouglas avatar jch avatar jonrohan avatar josh avatar juanitofatas avatar mastahyeti avatar mislav avatar mtodd avatar oreoshake avatar rsanheim avatar rtomayko avatar simeonwillbanks avatar sr avatar st0012 avatar technoweenie avatar timdiggins avatar tmm1 avatar tricknotes avatar vmg avatar ymendel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

html-pipeline's Issues

AutolinkFilter link_attr doesn't seem to work

Hi,
In my code I have:

context = {
      asset_root: 'https://a248.e.akamai.net/assets.github.com/images/icons/',
      link_attr: 'target="_blank"',
      gfm: true
    }

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SanitizationFilter,
      HTML::Pipeline::EmojiFilter,
      HTML::Pipeline::AutolinkFilter
    ], context

    pipeline.call(text)[:output].to_s

and
%p= raw format(answer.body) to invoke it.

The link however doesn't add the attribute target="_blank"
Any idea?

Thanks,
Roy

History

It'd be cool to retain the original history when extracting libraries like this. Would you guys mind if I push a branch with the full history from the github/github repo? We'd need to rebase everything that's happened here on top and force push unfortunately. Sorry, I would have chimed in here earlier but had no idea this was going on.

OSX HTML::Pipeline::MarkdownFilter Fails on Right Double Quotation Mark around email address

When using the HTML::Pipeline::MarkdownFilter on a string containing a "Right Double Quotation Mark" (U+201D) around an email address the output html will include an invalid byte sequence when trying to autolink it as a mailto:

I'm only having this issue on OSX. I'm running 10.10.2.

To reproduce:

renderer = HTML::Pipeline.new([HTML::Pipeline::MarkdownFilter]).freeze
renderer.to_html("This is  an β€œ[email protected]” example").split

This is really a bug within github-markdown, but I'm submitting it here as github-markdown doesn't seem to have a Github repository. I've also tried using Redcloth and it fails as well.

ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
# Nokogiri (1.6.5)
    ---
    warnings: []
    nokogiri: 1.6.5
    ruby:
      version: 2.1.5
      platform: x86_64-darwin14.0
      description: ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxml2/2.9.2"
      libxslt_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxslt/1.1.28"
      libxml2_patches:
      - 0001-Revert-Missing-initialization-for-the-catalog-module.patch
      - 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
      libxslt_patches:
      - 0001-Adding-doc-update-related-to-1.1.28.patch
      - 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
      - 0003-Initialize-pseudo-random-number-generator-with-curre.patch
      - 0004-EXSLT-function-str-replace-is-broken-as-is.patch
      - 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
      - 0007-Separate-function-for-predicate-matching-in-patterns.patch
      - 0008-Fix-direct-pattern-matching.patch
      - 0009-Fix-certain-patterns-with-predicates.patch
      - 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
      - 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
      - 0014-Fix-for-bug-436589.patch
      - 0015-Fix-mkdir-for-mingw.patch
      compiled: 2.9.2
      loaded: 2.9.2

Emoji syntax gravatars

I'm not sure if this is a good idea or if this is actually the place to suggest it, but it'd be cool if you could put something like :cameronmcefee: in any gfm field and have the person's avatar appear, probably linked to their profile and maybe tool-tipped with their name.

Contributing Guidelines

CONTRIBUTING.md is a cool feature; we should add it to html-pipeline! πŸ˜„

When a user submits a New Issue or sends a Pull Request, they are linked to the project's CONTRIBUTING.md.

New Issue:
screen shot 2014-02-06 at 11 37 32 am

Pull Request:
screen shot 2014-02-06 at 11 41 24 am

Since CONTRIBUTING.md is linked from both places, we could split it into two pieces of documentation. At the top of the document, we could have navigation to both pieces. Here is a rough draft for review. Thoughts?


Submitting New Issue

Please include:

  1. Example code
  2. Result output
  3. nokogiri -v

Sending Pull Request

How to run the tests:

bundle exec rake

Potential class loading conflict with add-on filters

Due to the fact that HTML::Pipeline is a class, not a module, there is risk that an add-on filter will prematurely define this class before it's extended in the core library, which causes the notorious "superclass mismatch" exception.

Here's an example of where this happens. While create a new gem for the BarFilter, we define a version file:

lib/html/pipeline/bar_filter/version.rb

module HTML
  class Pipeline
    class BarFilter
      VERSION = '1.0.0'
    end
  end
end

If we load this at the top of a gemspec file, for instance, then if we attempt to load 'html/pipeline', it goes πŸ’₯.

Normally the way these things are defined (as far as I understand it), the top-level type in a gem is a module, not a class. One way to accomplish this without breaking the current API (much), is to define the class method new on the module that instantiates the concrete class. Something like:

module HTML
  module Pipeline
    def self.new filters, default_context = {}, result_class = nil
      Engine.new filters, default_context, result_class
    end

    class Engine
      # relocate Pipeline class definition here
    end
  end
end

The other solution, which I used in html-pipeline-asciidoc_filter, is to put the filter class in a different module for the purpose of holding the VERSION constant.

module HTML_Pipeline
class BarFilter
  VERSION = '1.0.0'
end
end

Either way, I think this is an important issue to address to minimize the challenges of creating an add-on filter.

Loosen Markdown Dependency.

Considering that Github Markdown tends to lack documentation on how to configure it (that or Google is failing me,) and it does a lot of things that aren't necessarily nice for user content that you want to restrict (such as autolinking) it would be nice if the dependence on github-markdown was loosened so that people who wish to use redcarpet can.

No stylesheets for SyntaxHighlightFilter

Using the example is the README:

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SyntaxHighlightFilter
    ]
    result = pipeline.call input
    result[:output].to_s

produces the requisite <span>s with classes, but there are no styles / stylesheets to colorize the output.

Is there something I need to add to application.css?

Place Dependency Management On Filters

#48 kickstarted discussion, and here is a plan for placing dependency management on Filters.

  1. Add dependency management tests
  2. Add dependency management to Filter with descriptive exception
    message
  3. Refactor Filters to use new dependency management logic
  4. For CI, move gem dependencies from gemspec to Gemfile :test block
  5. Add gem post install message alerting users to new dependency
    management
  6. Update README to detail each Filters dependencies e.g. FaradayMiddleware README

EmailReplyParser is undefined

I might be missing some dependency, but the EmailReplyFilter references an EmailReplyParser constant which is not defined in the gem, at all :)

Can't remember if this is something that was there in github/github or maybe github/html-pipeline? But it should proooobably be here. Or maybe it's EmailReplyFilter that shouldn't be :P

Open source, transferring repo ownership

I think this repo is ready for 🚒ing. #6 extracted this project from .com, and removed GitHub specific references in the gem. Here's a list of remaining things I'd like to do before I share the ❀️ with the world:

  • update the readme
  • write a blog post with some examples
  • add travis
  • transfer ownership to jch (per @rtomayko, having a maintainer rather than putting it under the org)

Is there anything I'm missing?

Implement an AsciiDoc filter based on Asciidoctor

Implement an AsciiDoc filter based on Asciidoctor.

Adding this filter will allow AsciiDoc output to be syntax highlighted. The filter should invoke Asciidoctor using attributes that make the HTML produced reasonably consistent with the HTML generated from Markdown (notitle! idprefix idseparator=-)

Passed content must be valid XML to be filtered

Right now HTML::Pipeline::MentionFilter.new "test @benbalter test" will return the input string, while filter = HTML::Pipeline::MentionFilter.new "<p>test @benbalter test</p>" will return the expected @mentioned string.

I believe this is due to the doc.search('text()') pattern. Would be awesome if html-pipeline could support arbitrary strings, as right now I believe the input must be HTML, or the first filter must be the markdown filter for the expected behavior to occur.

At the very least, documentation could help clear things up for new users.

Enable syntax highlighting for inline code

Copying from this issue from github/markup:

Currently, you can syntax-highlight code blocks. For example,

main :: IO ()
main = putStrLn "Hello, World!"

renders as

main :: IO ()
main = putStrLn "Hello, World!"

However, you cannot do the same with inline code such as

main :: IO ()

or

main :: IO ()

both of which get rendered as main :: IO () (without syntax highlighting) when used inline. It would be nice to have something like

haskell main :: IO ()

that gives you inline syntax-highlighting (right now, that would render as haskell main :: IO ()).

As gjtorikian suggested on the other issue, this could conceivably be fixed by changing this line to match on code tags, as well as pre.

Warn if "pipelines" are out of order.

I would love it if rather then sending a generic error that means nothing to the user (in some cases) and could be confusing, html-pipeline should detect order issues if there is a clear process order or emoji should convert the DocumentFragment. What I mean is:

[
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::EmojiFilter
]

Works, but

[
  HTML::Pipeline::EmojiFilter,
  HTML::Pipeline::MarkdownFilter
]

Fails. However your lib sends people a broad message that doesn't even hint closely to what the problem might be, it only sends: https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/text_filter.rb#L7 which can confuse some users who are simply doing the most simple things like:

class HTMLPipeline < Filter
  FILTERS =
    [
      HTML::Pipeline::EmojiFilter
      HTML::Pipeline::MarkdownFilter,
    ]

  def run(content, opts = {})
    opts = { gfm: true, asset_root: "/assets/img" }.merge(opts)
    HTML::Pipeline.new(FILTERS, opts).to_html(content)
  end
end

This might be a problem with Emoji on Ruby 2.0.0-p0 though.

Medico

It seems too complicated to make a repository. What help can you give when the code to paste within the page's body doesn't click?

Better error notification on missing linguist dependency?

Chalk this up to RTFM, but with a simple filter like this

HTML::Pipeline.new [
          HTML::Pipeline::MarkdownFilter,
          HTML::Pipeline::SyntaxHighlightFilter
        ]

I kept getting the help rails app to crash:

SystemExit in Help/articles#show

Showing /Users/garentorikian/github/help/app/views/help/articles/_article.html.erb where line #22 raised:

exit
Extracted source (around line #22):

Finally, after looking at the logs, I found: You need to install linguist before using the SyntaxHighlightFilter. See README.md for details.

Not sure if this error can be raised in the browser itself, but it'd be nice. Also not sure if this'll be fixed by #28 anyway.

Whitelist table sections (thead, tbody, tfoot)

Add the table section elements to the whitelist.

Table sections (thead, tbody, tfoot) are important table elements that control how a table gets rendered. If handled with the same restrictions as the table element (they can only contain tr, th and td elements), allowing them does not impose any security risk.

Decrease number of dependencies

Remove as many gem dependencies as possible because not everyone uses every single filter. The responsibility of checking for dependencies will be on the filter. This is similar to what faraday does for it's adapters. I don't want the current filters to be split up into a bunch of mini-gems (html-pipeline-emoji, html-pipeline-markdown) cause that's just dicing things too thin.

Camo Filter doesn't return doc when disabled

During some testing this morning I started using the disable_asset_proxy option. It seems when you pass that in the CamoFilter just returns nil, instead of the doc causing the rest of the filter chain to break.

cut a 1.6.0 release

We should bump a release. I want to get the Digest deprecation taken care of in some projects upstream.

/cc @jch

Question about github markdown filter (low priority!)

Hi there,

I have been trying to work out how to stop newline's being inserted into a (github flavour) markdown blockquote.

If I have a markdown file like this:

> this is a start of a quote
> this is a continuation of a quote

according to the docs, github markdown does not put a <br> tag in there.

I have been using your excellent pipeline in a small gem I created for using markdown with the excellent vimwiki plugin, and I keep getting <br> tags inside my generated html. I'm happy to create a test case if it'll help, but I'm wondering if you can tell me what (if any) other filters I should be using. Currently it just uses your sample ones:

pipeline = HTML::Pipeline.new [
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::SyntaxHighlightFilter
]

Any help most appreciated!

Allow SSH protocol links

It'd be handy if you could also use SSH protocol links like [test server](ssh://[email protected]). Is there any chance of adding that to the protocol whitelist in SanitizationFilter? I don't think there should be any security implications, but I may be missing something.

Fix travis-ci build

The builds are failing because ActiveSupport 4.x requires Ruby 1.9:

Installing activesupport (4.0.0) 
Gem::InstallError: activesupport requires Ruby version >= 1.9.3.
An error occurred while installing activesupport (4.0.0), and Bundler cannot
continue.
Make sure that `gem install activesupport -v '4.0.0'` succeeds before bundling.

Need to add separate gemfiles for CI to fix this.

EmojiFilter doesn't work on strings that don't contain HTML

When I pass this string...

"I can do this.\r\n:scream: Juice 3: Whoa, that's a LOT of cayenne!"

...to a pipeline containing EmojiFilter, it does not replace the emoji-cheat-sheet code with the Emoji as expected.

I tracked the problem down to here:

irb(main):204:0> doc.search('text()')
=> []

What does happen is that the DocumentFragment in doc contains one child Nokogiri::XML::Text node, and doc.text contains the same text that html contains. So....

Armed with that knowledge, I made the following changes:

def call
- doc.search('text()').each do |node|
+ nodes(doc).each do |node|
    content = node.to_html
    next if !content.include?(':')
    next if has_ancestor?(node, %w(pre code))
    html = emoji_image_filter(content)
    next if html == content
    node.replace(html)
  end
  doc
end

# Look for text nodes in the DocumentFragment
# 
# If doc's text is the same as original string,
# just nab its children to get the proper nodes.
# Otherwise do a search for text nodes.
+ def nodes(doc)
+   doc.text == html ? doc.children : doc.search('text()')
+ end

... and that fixed it for me.

Anyone see any problems with that fix? If not, I'll work up a PR as soon as I can.

Tweaks to the email reply filter

Am I correct in thinking this is used to parse the replies on GitHub? If so, what do you think about adding a way to strip the garbage from this:

remove_redundant_data_tidy_up_the_code_indentation_and_add_a_new_menu_i _by_dylanbarwick__pull_request_125__bauerpubtwinit_20130909_113910

I'm happy to do it but I wanted to make sure this filter was the correct place to do it.

I think the non-code solution is for that dude to delete the garbage from his email but that is sort of "you're holding it wrong".

Support for ActiveSupport 4

We were upgrading from 0.0.14 to 0.2.0, but got blocked by the gemspec requirement on activesupport 3 or earlier.

Bundler could not find compatible versions for gem "activesupport":
  In Gemfile:
    html-pipeline (~> 0.1.0) ruby depends on
      activesupport (< 4, >= 2) ruby

    rails (~> 4.0) ruby depends on
      activesupport (4.0.0)

MentionFilter base_url config question

Hi. I am using MentionFilter, and my user lives in www.lvh.me:3000/~jch.

HTML::Pipeline.new [
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::SanitizationFilter,
  HTML::Pipeline::MentionFilter
], context.merge(gfm: true, base_url: '/~')

If I specified base_url: '~' or /~, it gives me

www.lvh.me:3000/~/jch

instead of

www.lvh.me:3000/~jch.

How to achieve behaviour as mentioned with MentionFilter?

Currently I replace it by myself:

text.gsub!(/@([a-z0-9][a-z0-9-]*)/i) do |match|
  %Q(<a href="/~#{$1}">#{match}</a>)
end

Thanks!

Spaces inserted into code

Using

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SyntaxHighlightFilter
    ]

produces code that has 10 spaces prepended to every line after the first, including an extra line with 10 spaces at the end.

This

```css
@media (max-width: 992px) {
    #contact_email{ display: none; }
}

produces

@media (max-width: 992px) {
              #contact_email{ display: none; }
          }
          // 10 spaces at end

Getting Started Guide

The README has tons of information (usage, dependencies, examples, etc). However, new users would benefit from a Getting Started Guide; factory_girl's guide is a good example. The Getting Started Guide could detail common implementations such as integrating with Rails or Sinatra. Thoughts?

Separate gems for versioning external dependencies

We don't specify versions for external dependencies and raise runtime errors when a dependency is missing (#80). For example, HTML::Pipeline::AutolinkFilter depends on rinku:

begin
  require "rinku"
rescue LoadError => _
  abort "Missing dependency 'rinku' for AutolinkFilter. See README.md for details."
end

This approach is simple, but couples html-pipeline's versioning to the versions of it's external dependencies. For example, to update from gemoji ~> 1 to ~> 2, we would need to increase the major version for html-pipeline #159.

Here are a few ideas I came up with:

Keep things the same

This requires the least changes. We would raise html-pipeline's major version whenever one of it's dependencies made breaking changes. There are 8 external dependencies for 8 filters. They are all pretty stable gems and unlikely to change frequently.

Separate gems, same repository

I experimented with this in the separate-gems branch. This is similar to how rails/rails is composed of separate gems (actionpack, actionmailer, activesupport), but all live in the same repository for an easy development workflow. The problem I ran into with this is bundler does not like having multiple projects within the same folder. If you poke around rails/rails, you can see they've added a good number of helper methods to Rakefile and their own set of conventions to bumping versions to make it work well. This feels a bit overkill to me, but maybe I'm missing something obvious.

Separate gems, separate repositories

We recommend 3rd party filters to be written this way. We could do the same thing with the existing filters and package them as their own separate gems in separate repositories. The trade off here is we'd have to jump between 9 projects (html-pipeline, and 8 filter gems). We could add a html-pipeline organization to help with this, but it is more overhead and would make the project harder to discover, and harder to contribute to. This is also how the bkeepers/qu gem handles swapping different backend stores.

@simeonwillbanks @JuanitoFatas @rsanheim @bkeepers What do you think? Are there other factors I haven't covered? Another possible way?

Detect asset pipeline availability

In the github app, the emoji icons are frozen to public/images, and urls to images are coded relative to the value of :asset_root. It'd be preferable to detect the availability of the asset-pipeline and use asset_path when it's available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.