GithubHelp home page GithubHelp logo

studiosity / grover Goto Github PK

View Code? Open in Web Editor NEW
901.0 7.0 100.0 750 KB

A Ruby gem to transform HTML into PDFs, PNGs or JPEGs using Google Puppeteer/Chromium

License: MIT License

Ruby 93.91% JavaScript 6.09%

grover's Introduction

Test Build Status Maintainability Test Coverage Gem Version

Grover

A Ruby gem to transform HTML into PDFs, PNGs or JPEGs using Google Puppeteer and Chromium.

Grover

Installation

Add this line to your application's Gemfile:

gem 'grover'

Google Puppeteer

npm install puppeteer

Usage

# Grover.new accepts a URL or inline HTML and optional parameters for Puppeteer
grover = Grover.new('https://google.com', format: 'A4')

# Get an inline PDF
pdf = grover.to_pdf

# Get a screenshot
png = grover.to_png
jpeg = grover.to_jpeg

# Get the HTML content (including DOCTYPE)
html = grover.to_html

# Options can be provided through meta tags
Grover.new('<html><head><meta name="grover-page_ranges" content="1-3"')
Grover.new('<html><head><meta name="grover-margin-top" content="10px"')

N.B.

  • options are underscore case, and sub-options separated with a dash
  • all options can be overwritten, including emulate_media and display_url

From a view template

It's easy to render a normal Rails view template as a PDF, using Rails' render_to_string:

html = MyController.new.render_to_string({
  template: 'controller/view',
  layout: 'my_layout',
  locals: { :@instance_var => ... }
})
pdf = Grover.new(html, **grover_options).to_pdf

Relative paths

If calling Grover directly (not through middleware) you will need to either specify a display_url or modify your HTML by converting any relative paths to absolute paths before passing to Grover.

This can be achieved using the HTML pre-processor helper (pay attention to the slash at the end of the url):

absolute_html = Grover::HTMLPreprocessor.process relative_html, 'http://my.server/', 'http'

This is important because Chromium will try and resolve any relative paths via the display url host. If not provided, the display URL defaults to http://example.com.

Why would you pre-process the HTML rather than just use the display_url

There are many scenarios where specifying a different host of relative paths would be preferred. For example, your server might be behind a NAT gateway and the display URL in front of it. The display URL might be shown in the header/footer, and as such shouldn't expose details of your private network.

If you run into trouble, take a look at the debugging section below which would allow you to inspect the page content and devtools.

Configuration

Grover can be configured to adjust the layout of the resulting PDF/image.

For available PDF options, see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.pdfoptions.md

Also available are the emulate_media, cache, viewport, timeout, requestTimeout, convertTimeout and launch_args options.

# config/initializers/grover.rb
Grover.configure do |config|
  config.options = {
    format: 'A4',
    margin: {
      top: '5px',
      bottom: '10cm'
    },
    user_agent: 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0',
    viewport: {
      width: 640,
      height: 480
    },
    prefer_css_page_size: true,
    emulate_media: 'screen',
    bypass_csp: true,
    media_features: [{ name: 'prefers-color-scheme', value: 'dark' }],
    timezone: 'Australia/Sydney',
    vision_deficiency: 'deuteranopia',
    extra_http_headers: { 'Accept-Language': 'en-US' },
    geolocation: { latitude: 59.95, longitude: 30.31667 },
    focus: '#some-element',
    hover: '#another-element',
    cache: false,
    timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
    request_timeout: 1000, # Timeout when fetching the content (overloads the `timeout` option)
    convert_timeout: 2000, # Timeout when converting the content (overloads the `timeout` option, only applies to PDF conversion)
    launch_args: ['--font-render-hinting=medium'],
    wait_until: 'domcontentloaded'
  }
end

For available PNG/JPEG options, see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.page.screenshot.md#remarks

Note that by default the full_page option is set to false and you will get a 800x600 image. You can either specify the image size using the clip options, or capture the entire page with full_page set to true.

For viewport options, see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.page.setviewport.md#remarks

For launch_args options, see http://peter.sh/experiments/chromium-command-line-switches/ Launch parameter args can also be provided using a meta tag:

For timezone IDs see ICUs metaZones.txt. Passing nil disables timezone emulation.

The vision_deficiency option can be passed one of achromatopsia, deuteranopia, protanopia, tritanopia, blurredVision or none.

The focus option takes a CSS selector and will focus on the first matching element after rendering is complete (including waiting for the specified wait_for_selector).

The hover option takes a CSS selector and will hover on the first matching element after rendering is complete (including waiting for the specified wait_for_selector).

<meta name="grover-launch_args" content="['--disable-speech-api']" />

For wait_until option, default for URLs is networkidle2 and for HTML content networkidle0. For available options see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.page.goto.md#remarks

The wait_for_selector option can also be used to wait until an element appears on the page. Additional waiting parameters can be set with the wait_for_selector_options options hash. For available options, see: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.page.waitforselector.md#remarks.

The wait_for_function option can be used to wait until a specific function returns a truthy value. Additional parameters can be set with the wait_for_function_options options hash. For available options, see: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.page.waitforfunction.md#remarks

The wait_for_timeout option can also be used to wait the specified number of milliseconds have elapsed.

The raise_on_request_failure option, when enabled, will raise a Grover::JavaScript::RequestFailedError if the initial content request or any subsequent asset request returns a bad response or times out.

The Chrome/Chromium executable path can be overridden with the executable_path option.

Supplementary JavaScript can be executed on the page (after render and before conversion to PDF/image) by passing it to the execute_script option.

Grover.new(<some url>, { execute_script: 'document.getElementsByTagName("footer")[0].innerText = "Hey"' }).to_pdf

Basic authentication

For requesting a page with basic authentication, username and password options can be provided. Note that this only really makes sense if you're calling Grover directly (and not via middleware).

Grover.new('<some URI with basic authentication', username: 'the username', password: 'super secret').to_pdf

Remote Chromium

By default, Grover launches a local Chromium instance. You can connect to a remote/external Chromium with the browser_ws_endpoint options.

For example, to connect to a chrome instance started with docker using docker run -p 3000:3000 ghcr.io/browserless/chrome:latest:

options = {"browser_ws_endpoint": "ws://localhost:3000/chrome"}
grover = Grover.new("https://mysite.com/path/to/thing", options)
File.open("grover.png", "wb") { |f| f << grover.to_png }

You can also pass launch flags like this: ws://localhost:3000/chrome?--disable-speech-api

If you are only using remote chromium, you can install the puppeteer-core node package instead of puppeteer to avoid downloading chrome. Grover will use puppeteer or fallback to puppeteer-core if it is available.

npm install puppeteer-core

Adding cookies

To set request cookies when requesting a URL, pass an array of hashes as such N.B. Only the name and value properties are required. See page.setCookie documentation for more details (old documentation with more detailed description is available here).

myCookies = [
  { name: 'sign_username', value: '[email protected]', domain: 'mydomain' },
  { name: '_session_id', value: '9c014df0b699d8dc08d1c472f8cc594c', domain: 'mydomain' }
]
Grover.new('<some URI with cookies', cookies: myCookies).to_pdf

If you need to forward the cookies from the original request, you could extract them as such:

def header_cookies
  request.headers['Cookie'].split('; ').map do |cookie|
    key, value = cookie.split '='
    { name: key, value: value, domain: request.headers['Host'] }
  end
end

And give that array to Grover:

Grover.new('<some URI with cookies', cookies: header_cookies).to_pdf

Adding style tags

To add style tags, pass an array of style tag options as such See page.addStyleTag documentation for more details (old documentation with more detailed description is available here).

style_tag_options = [
  { url: 'http://example.com/style.css' },
  { path: 'style.css' },
  { content: '.body{background: red}' }
]
Grover.new('<html><body><h1>Heading</h1></body></html>', style_tag_options: style_tag_options).to_pdf

Adding script tags

To add script tags, pass an array of script tag options as such See documentation for more details page.addScriptTag (old documentation is available here).

script_tag_options = [
  { url: 'http://example.com/script.js' },
  { path: 'script.js' },
  { content: 'document.querySelector("h1").style.display = "none"' }
]
Grover.new('<html><body><h1>Heading</h1></body></html>', script_tag_options: script_tag_options).to_pdf

Page URL for middleware requests (or passing through raw HTML)

If you want to have the header or footer display the page URL, Grover requires that this is passed through via the display_url option. This is because the page URL is not available in the raw HTML!

For Rack middleware conversions, the original request URL (without the .pdf extension) will be passed through and assigned to display_url for you. You can of course override this by using a meta tag in the downstream HTML response.

For raw HTML conversions, if the display_url is not provided http://example.com will be used as the default.

Header and footer templates

Should be valid HTML markup with following classes used to inject printing values into them:

  • date formatted print date
  • title document title
  • url document location
  • pageNumber current page number
  • totalPages total pages in the document

Setting custom PDF filename with header

In respective controller's action use:

respond_to do |format|
  format.html do
    response.headers['Content-Disposition'] = %(attachment; filename="lorem_ipsum.pdf")

    render layout: 'pdf'
  end
end

Setting custom environment variable for node

The node_env_vars configuration option enables you to set custom environment variables for the spawned node process. For example you might need to disable jemalloc in some environments (#80).

# config/initializers/grover.rb
Grover.configure do |config|
  config.node_env_vars = { "LD_PRELOAD" => "" }
end

Middleware

Grover comes with a middleware that allows users to get a PDF, PNG or JPEG view of any page on your site by appending .pdf, .png or .jpeg/.jpg to the URL.

Middleware Setup

Non-Rails Rack apps

# in config.ru
require 'grover'
use Grover::Middleware

Rails apps

# in application.rb
require 'grover'
config.middleware.use Grover::Middleware

N.B. by default PNG and JPEG are not modified in the middleware to prevent breaking standard behaviours. To enable them, there are configuration options for each image type as well as an option to disable the PDF middleware (on by default).

If either of the image handling middleware options are enabled, the ignore_path and/or ignore_request should also be configured, otherwise assets are likely to be handled which would likely result in 404 responses.

# config/initializers/grover.rb
Grover.configure do |config|
  config.use_png_middleware = true
  config.use_jpeg_middleware = true
  config.use_pdf_middleware = false
end

root_url

The root_url option can be specified either when configuring the middleware or as a global option. This is needed when running the Grover middleware behind a URL rewriting proxy or within a containerised system.

As a middleware option:

# in application.rb
require 'grover'
config.middleware.use Grover::Middleware, root_url: 'https://my.external.domain'

or as a global option:

# config/initializers/grover.rb
Grover.configure do |config|
  config.root_url = 'https://my.external.domain'
end

ignore_path

The ignore_path configuration option can be used to tell Grover's middleware whether it should handle/modify the response. There are three ways to set up the ignore_path:

  • a String which matches the start of the request path.
  • a Regexp which could match any part of the request path.
  • a Proc which accepts the request path as a parameter.
# config/initializers/grover.rb
Grover.configure do |config|
  # assigning a String
  config.ignore_path = '/assets/'
  # matches `www.example.com/assets/foo.png` and not `www.example.com/bar/assets/foo.png`

  # assigning a Regexp
  config.ignore_path = /my\/path/
  # matches `www.example.com/foo/my/path/bar.png`

  # assigning a Proc
  config.ignore_path = ->(path) do
    /\A\/foo\/.+\/[0-9]+\.png\z/.match path
  end
  # matches `www.example.com/foo/bar/123.png`
end

ignore_request

The ignore_request configuration option can be used to tell Grover's middleware whether it should handle/modify the response. It should be set with a Proc which accepts the request (Rack::Request) as a parameter.

# config/initializers/grover.rb
Grover.configure do |config|
  # assigning a Proc
  config.ignore_request = ->(req) do
    req.host == 'www.example.com'
  end
  # matches `www.example.com/foo/bar/123.png`

  config.ignore_request = ->(req) do
    req.has_header?('X-BLOCK')
  end
  # matches `HTTP Header X-BLOCK`
end

allow_file_uris

The allow_file_uris option can be used to render an html document from the file system. This should be used with EXTREME CAUTION. If used improperly it could potentially be manipulated to reveal sensitive files on the system. Do not enable if rendering content from outside entities (user uploads, external URLs, etc).

It defaults to false preventing local system files from being read.

# config/initializers/grover.rb
Grover.configure do |config|
  config.allow_file_uris = true
end

And used as such:

# Grover.new accepts a file URI and optional parameters for Puppeteer
grover = Grover.new('file:///some/local/file.html', format: 'A4')

# Get an inline PDF of the local file
pdf = grover.to_pdf

Cover pages

Since the header/footer for Puppeteer is configured globally, displaying of front/back cover pages (with potentially different headers/footers etc) is not possible.

To get around this, Grover's middleware allows you to specify relative paths for the cover page contents. For direct execution, you can make multiple calls and combine the resulting PDFs together.

Using middleware

You can specify relative paths to the cover page contents using the front_cover_path and back_cover_path options either via the global configuration, or via meta tags. These paths (with query parameters) are then requested from the downstream app.

The cover pages are converted to PDF in isolation, and then combined together with the original PDF response, before being returned back up through the Rack stack.

N.B To simplify things, the same request method and body are used for the cover page requests.

# config/initializers/grover.rb
Grover.configure do |config|
  config.options = {
    front_cover_path: '/some/global/cover/page?foo=bar'
  }
end

Or via the meta tags in the original response:

<html>
  <head>
    <meta name="grover-back_cover_path" content="/back/cover/page?bar=baz" />
  </head>
  ...
</html>

Direct execution

To add a cover page using direct execution, you can make multiple calls and combine the results using the combine_pdf gem.

require 'combine_pdf'

  # ...

  def invoke(file_path)
    pdf = CombinePDF.parse(Grover.new(pdf_report_url).to_pdf)
    pdf >> CombinePDF.parse(Grover.new(pdf_front_cover_url).to_pdf)
    pdf << CombinePDF.parse(Grover.new(pdf_back_cover_url).to_pdf)
    pdf.save file_path
  end

Running on Heroku

To run Grover (Puppeteer) on Heroku follow these steps:

  1. Add the node buildpack. Puppeteer requires a node environment to run.

    heroku buildpacks:add heroku/nodejs --index=1 [--remote yourappname]
    
  2. Add the puppeteer buildpack. Make sure the puppeteer buildpack runs after the node buildpack and before the main ruby buildpack.

    heroku buildpacks:add jontewks/puppeteer --index=2 [--remote yourappname]
    
  3. Next, tell Grover to run Puppeteer in the "no-sandbox" mode by setting an ENV variable GROVER_NO_SANDBOX=true on your app dyno. Make sure that you trust all the HTML/JS you provide to Grover.

    heroku config:set GROVER_NO_SANDBOX=true [--remote yourappname]
    
  4. Finally, if using puppeteer 19+ (the default) add the following to a .puppeteerrc.cjs file in the root of your project:

    const {join} = require('path');
    
    /**
    * @type {import("puppeteer").Configuration}
    * */
    module.exports = {
      cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
    };
    

Debugging

If you're having trouble with converting the HTML content, you can enable some debugging options to help. These can be enabled as global options via Grover.configure, by passing through to the Grover initializer, or using meta tag options.

debug: {
  headless: false,  # Default true. When set to false, the Chromium browser will be displayed
  devtools: true    # Default false. When set to true, the browser devtools will be displayed.
}

N.B.

  • The headless option disabled is not compatible with exporting of the PDF.
  • If showing the devtools, the browser will halt resulting in a navigation timeout

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/Studiosity/grover.

Note that spec tests are appreciated to minimise regressions. Before submitting a PR, please ensure that:

$ rspec

and

$ rubocop

both succeed

To run tests tagged with remote_browser, you need to start a browser in a container: docker run -p 3000:3000 browserless/chrome:latest and run: rspec --tag remote_browser

Special mention

Thanks are given to the great work done in the PDFKit project. The middleware and HTML preprocessing components were used heavily in the implementation of Grover.

Thanks are also given to the excellent Schmooze project. The Ruby to NodeJS interface in Grover is heavily based off that work. Grover previously used that gem, however migrated away due to differing requirements over persistence/cleanup of the NodeJS worker process.

License

The gem is available as open source under the terms of the MIT License.

grover's People

Contributors

abrom avatar afromankenobi avatar braindeaf avatar deanmarano avatar dtgay avatar elmassimo avatar hoppergee avatar jkowens avatar jukra avatar julianwegkamp avatar klappradla avatar koenhandekyn avatar lucasluitjes avatar matthewschultz avatar mkalygin avatar mrleebo avatar nathancolgate avatar nicolasrouanne avatar paresharma avatar petergoldstein avatar rafraser avatar richacinas avatar rtymishak avatar sbounmy avatar soundasleep avatar walski avatar willkoehler avatar xiazek avatar ydah avatar zinggi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

grover's Issues

JSON::ParserError: A JSON text must at least contain two octets!

"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/json-1.8.6/lib/json/common.rb:155:in `initialize'",
"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/json-1.8.6/lib/json/common.rb:155:in `new'",
"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/json-1.8.6/lib/json/common.rb:155:in `parse'",
"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/grover-0.13.1/lib/grover/processor.rb:84:in `call_js_method'",
"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/grover-0.13.1/lib/grover/processor.rb:20:in `convert'",
"/Users/ckhall/Projects/project/vendor/bundle/ruby/2.6.0/gems/grover-0.13.1/lib/grover.rb:50:in `to_pdf'",
"/Users/ckhall/Projects/project/app/services/create_order_labels_with_routing.rb:45:in `create_order_labels'",

did some digging and found the cause, at least as far as I can get anyways. Line 84 of lib/grover/processor.rb where the value of input is a single newline "\n" cannot be parsed as json (obviously).

The size of the html string being passed to grover, with newlines and extraneous whitespace removed, is 3352813 bytes so I can only assume this may very well be the actual root cause. The expected result is a little over 300 pages of labels.

I do not know what would cause a single newline to be returned on line 81 of lib/grover/processor.rb. Smaller html strings are converted ok.

debug session:

    79: def call_js_method(method, url_or_html, options) # rubocop:disable Metrics/MethodLength
    80:   stdin.puts JSON.dump([method, url_or_html, options])
    81:   input = stdout.gets
    82:   raise Errno::EPIPE, "Can't read from worker" if input.nil?
    83: 
    84:   status, message, error_class = JSON.parse(input)
    85: 
    86:   if status == 'ok'
    87:     message
    88:   elsif error_class.nil?
    89:     raise Grover::JavaScript::UnknownError, message
    90:   else
    91:     raise Grover::JavaScript.const_get(error_class, false), message
    92:   end
    93: rescue JSON::ParserError => e
    94:   binding.pry
 => 95:   puts e.message
    96:   raise e
    97: rescue Errno::EPIPE, IOError
    98:   raise Grover::Error, "Worker process failed:\n#{stderr.read}"
    99: end

[1] pry(#<Grover::Processor>)> input
=> "\n"
[2] pry(#<Grover::Processor>)> url_or_html.to_s.bytesize
=> 3352813 
[3] pry(#<Grover::Processor>)> options
=> {"height"=>"11in",
 "width"=>"8.5in",
 "printBackground"=>true,
 "emulateMedia"=>"screen",
 "timeout"=>0,
 "margin"=>{"top"=>"0px", "bottom"=>"0px", "left"=>"0px", "right"=>"0px"}}

continuing to run stdout.gets shows:

[4] pry(#<Grover::Processor>)> stdout.gets
=> "<--- Last few GCs --->\n"
[5] pry(#<Grover::Processor>)> stdout.gets
=> "e [21931:0x10280c000]    94948 ms: Mark-sweep 392.1 (398.2) -> 350.1 (356.2) MB, 0.6 / 0.0 ms  (+ 3.1 ms in 2 steps since start of marking, biggest step 3.0 ms, walltime since start of marking 723 ms) (average mu = 0.992, current mu = 0.995) finalize increm[21931:0x10280c000]    97435 ms: Mark-sweep 734.1 (740.2) -> 398.1 (404.2) MB, 151.6 / 0.0 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1797 ms) (average mu = 0.947, current mu = 0.939) allocation f\n"
[6] pry(#<Grover::Processor>)> stdout.gets
=> "\n"
[7] pry(#<Grover::Processor>)> stdout.gets
=> "<--- JS stacktrace --->\n"
[8] pry(#<Grover::Processor>)> stdout.gets
=> "\n"
[9] pry(#<Grover::Processor>)> stdout.gets
=> "==== JS stack trace =========================================\n"
[10] pry(#<Grover::Processor>)> stdout.gets
=> "\n"
[11] pry(#<Grover::Processor>)> stdout.gets
=> "    0: ExitFrame [pc: 0xde9fb3dbe3d]\n"
[12] pry(#<Grover::Processor>)> stdout.gets
=> "Security context: 0x06429f2d6781 <JSObject>\n"
[13] pry(#<Grover::Processor>)> stdout.gets
=> "    1: toJSON [0x6426d7a0b09] [buffer.js:~979] [pc=0xde9fb6bbfa5](this=0x06425fcc6561 <Uint8Array map = 0x642e9fd59d1>)\n"
[14] pry(#<Grover::Processor>)> stdout.gets
=> "    2: arguments adaptor frame: 1->0\n"
[15] pry(#<Grover::Processor>)> stdout.gets
=> "    3: InternalFrame [pc: 0xde9fb38ee75]\n"
[16] pry(#<Grover::Processor>)> stdout.gets
=> "    4: EntryFrame [pc: 0xde9fb3892c1]\n"
[17] pry(#<Grover::Processor>)> stdout.gets
=> "    5: builtin exit frame: stringify(this=0x06429f2cae89 <Object map = 0x642e9f842a9>,0x0642141026f1 <undefined>,0x0642141026...\n"
[18] pry(#<Grover::Processor>)> stdout.gets
=> "\n"

grover gem version: 0.13.1
puppeteer version: 5.2.1 (also tried 5.3.0)
chromium version:

Would you happen to have any insights into why this could be happening? Happy to create puppeteer issue if necessary, but wanted to post here as this could be considered a bug on expecting json at lib/grover/processor.rb:84.

Look forward to hearing what you might think about this. Thanks.

Best way to debug in production?

This gem has worked great for us so far but I'm no longer getting styles in production.

I'm preprocessing the html and when I debug the output of the page all of the link/script tags are correct and can be loaded in the browser without any issues. The content is fine and the time from screen to pdf is very quick so there's no timeout issues or lack of threads.

My controller setup hasn't changed much from #101 (although I no longer have execute_script: false in there of course).

Detecting status codes?

Say I'm asking Grover to generate a PDF from a URL.
Let's say that URL throws a 404 or 500.

Is there a way to read this from Grover?
Or perhaps pass some configuration to puppeteer to give us the status?

I'm returning the PDF file to the user. If the requested URL 404s, 500s, or something else I'd like to tell the user there was an error. I don't care if the PDF gets generated. I just don't want to send the user a PDF of a URL they weren't expecting.

Add clarification in the docs on how to use with Rails

I've been trying to use grover with Rails 6 but am having a bit of a hard time.

Would it be possible to flesh out the explanation on how to use grover with Rails in the README?

Say I have:

  • app/views/layouts/pdf.html.slim
  • app/views/models/pdf.html.slim (inherits the layout from the above)

In my controller action:

def show
  respond_to do |format|
    format.html
    format.pdf do
      html = ModelsController.new.render_to_string({
        template: 'models/pdf',
        layout: 'pdf',
        locals: { :@model => model }
      })
      pdf = Grover.new(html, { format: 'A4' }).to_pdf
      render pdf: pdf
    end
  end
end

gives a missing template error.
If I use render pdf instead, I get a path name contains null byte error.

What's the correct way to generate the pdf from a controller in Rails?

Blank page depending on margins

Hi there. I'm using Grover for my Ruby on Rails application and, while it works perfectly fine for the most part, I do have an issue from time to time where the contents will go right to the end of the page, and the next page will be blank. See screenshot below:

image

Here's what my options look to generate this:

body_html = render_to_string template: 'common/report_templates/generate_pdf.html.erb', layout: false
header_template = render_to_string template: 'common/report_templates/shared/report_header.html.erb', layout: false
              footer_template = render_to_string template: 'common/report_templates/shared/report_footer.html.erb', layout: false
              grover_options = {
                format: 'Letter',
                full_page: true,
                prefer_css_page_size: false,
                emulate_media: 'screen',
                cache: false,
                timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
                launch_args: ['--font-render-hinting=medium', '--no-sandbox'],
                margin: { top: 75, right: 30, bottom: 44, left: 30 },
                print_background: true,
                scale: 0.75,
                display_header_footer: true,
                header_template: header_template,
                footer_template: footer_template
              }
final_pdf = CombinePDF.new
final_pdf << CombinePDF.parse(Grover.new(body_html, grover_options).to_pdf)
file_path = "tmp/test"
final_pdf.save file_path

It seems that when this happens, I have to constantly adjust the bottom margin to a lower value so that the content can extend a little further down the page, but then this takes away from the footer.

The bottom of the page that has the blank page after it looks like this:

<p style="page-break-after: always;">&nbsp;</p>

What I suspect to be happening is something like this:

- a bunch of text
- page break
- a bunch of text
- page break

If a bunch of text is at the very bottom of a page, then the page break paragraph will appear on a new page, at which point it inserts a page break page after that paragraph that declares the page break, causing it to create a blank page.

This is what I think is happening:

image

Here's what my Rails code looks like:

    <% items.each do |item| %>
        <% unless item == items[0] %>
            <p style="page-break-after: always;">&nbsp;</p>
        <% end %>
        <%= render partial: "common/report_templates/shared/item" %>
    <% end %>

Is there any way I can get around this by chance?

How to actually display headers and footers?

Hi guys,

Loving this gem and so far I've gotten everything narrowed down. The only thing I cannot figure out is how to display header and footer. Here's my current options:

          grover_options = {
            format: 'Letter',
            full_page: true,
            prefer_css_page_size: false,
            emulate_media: 'screen',
            cache: false,
            timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
            launch_args: ['--font-render-hinting=medium', '--no-sandbox'],
            margin: { top: 36, right: 36, bottom: 20, left: 36 },
            print_background: true,
            scale: 0.75,
            # display_header_footer: true
            # footer_template: render_to_string template: 'common/report_templates/shared/report_footer.html.erb', layout: false
          }

However, I don't even know if display_header_footer is in the right format as I just discovered in a recent issue that my print_background statement was incorrect (it was printBackground as shown in the API).

Other than the API (which the options there are actually formatted differently when using this gem), is there any other place for documentation so that I can understand what options/arguments should be provided so that it works properly when it converts over to the API?

EDIT: Here's what I've got thus far now:

          header_template = render_to_string template: 'common/report_templates/shared/report_header.html.erb', layout: false
          footer_template = render_to_string template: 'common/report_templates/shared/report_footer.html.erb', layout: false

          grover_options = {
            format: 'Letter',
            full_page: true,
            prefer_css_page_size: false,
            emulate_media: 'screen',
            cache: false,
            timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
            launch_args: ['--font-render-hinting=medium', '--no-sandbox'],
            margin: { top: 36, right: 36, bottom: 20, left: 36 },
            print_background: true,
            scale: 0.75,
            display_header_footer: true,
            footer_template: footer_template,
            header_template: header_template            
          }

However, it looks like the footer and header are extremely tiny:

Screen Shot 2020-05-19 at 5 05 35 PM

Here are the contents of my footer:

 Page <span class="pageNumber"></span> of <span class="totalPages"></span>

I tried wrapping that around <font size="300pt"> but no luck

EDIT

I've now changed the footer to this:

 <span style="font-size: 30pt">Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>

And it looks like this:

Screen Shot 2020-05-19 at 5 09 25 PM

Sorry for the multiple edits, just trying to simply get a basic header and footer going but seems to be having issues. Now it looks like the word "page" is cut off. not sure how to go about that

Conclusion

After numerous tests (even just using puppeteer itself), it seems that the footer doesn't get displayed correctly until its the last page, and I can't render images in the header as they show as just the block when an image doesn't load properly..

Can you confirm that there isn't anything I can do to solve this with grover?

Error when used `image_tag` : Can't resolve image into URL: undefined method `polymorphic_url`

I get the error when I try to render an image with image_tag

ActionView::Template::Error (Can't resolve image into URL: undefined method `polymorphic_url' for #<#<Class:0x00007f9a84178830>:0x00007f9a89b5e840>):
     5:     .row
     6:       .col-md-12.preview-height
     7:         .mx-auto.preview class="participant#{page[:messages_count]}"
     8:           = image_tag memory.background.background_image.image, class: "w-100 common-border"
     9:           .title-wrap

This is my code at controller

def show
    @memory = Memory.find(params[:id])
    results = CalculatePagesForMemory.new(memory: @memory).call
    @pages = results.data if results.success?

    respond_to do |format|
      format.html
      format.pdf do
        controller = ActionController::Base.new
        html = controller.render_to_string(
          template: 'admin/memories/show',
          layout: 'admin/application',
          locals: { memory: @memory, pages: @pages }
        )
        pdf = Grover.new(html).to_pdf

        send_data(pdf, filename: 'your_filename.pdf', type: 'application/pdf')
      end
    end
  end

So I have this in my initializer

Grover.configure do |config|
  config.options = {
    format: 'A4',
    margin: {
      top: '5px',
      bottom: '10cm'
    },
    viewport: {
      width: 1300,
      height: 1400
    },
    prefer_css_page_size: true,
    emulate_media: 'screen',
    cache: false,
    timeout: 0,
    launch_args: ['--font-render-hinting=medium', '--lang=ja'],
    wait_until: 'domcontentloaded'
  }
end

Help would be appreciated.

Grover completes rendering before images are loaded

Howdy, and thanks much for an excellent gem!

I'm having trouble with renders (to JPEG, for what it's worth) occurring before the destination page has completed loading of all images. Thus, I get output with missing or only partially-loaded images; note the only halfway-loaded "False" meter (which appears in various states during different passes):

grover-render

Digging through the Puppeteer options, I think what I want is to be able to set waitUntil: 'networkidle0' somewhere, but I'm not sure where or how. So:

  1. Is this a known limitation, orโ€ฆ
  2. Is there a Grover config option I can set that will instruct Puppeteer to wait until the page has completely finished loading before rendering the image, orโ€ฆ
  3. Am I missing something?

A bit more detail: this is a locally-rendered ERB template being passed in with image tags (and remote fonts) pointing to a variety of external servers. That particular image does happen to be served from a fairly slow HTTPS destination and generally takes 1-2 seconds to load. (Loaded in Chrome via Rails, that template fires DOMContentLoaded in ~850ms and Load in ~2.5s.)

Multi Page PDF Generation

Hey guys. I have Grover implemented. However it isn't working quite as I would expect. I have Reports feature where a User can export a report as a PDF. This report is essentially a table that could have 1 or 1000 rows. When exporting all I get back is the first page or the current viewport. What do I need to setup so that I'm getting shots of the content below the fold?

page_ranges option doesn't seem to be working

Grover.new("https://stackoverflow.com/questions/5905054/how-can-i-recursively-find-all-files-in-current-and-subfolders-based-on-wildcard", format: "A4", page_ranges: "1-2", timeout: 0).to_pdf

This code intermittently gives me 7 or 8 pages in a pdf, rather than the expected 2. Is this a puppeteer issue? Or are the options incorrectly being formatted when passed to puppeteer? Or am I just using Grover incorrectly?

Thanks in advance ๐Ÿ˜

Expected to get |string| or |function| as the first argument, but got "false" instead.

I'm having issues getting grover working in my Rails app.

Environment

  • MacOS 10.15.6
  • Ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin19]
  • Rails 6.0.3.4
  • Grover ~> 0.14
  • Webpacker 5.2.1
  • Webpack 4.44.2
  • Puppeteer 5.5.0

config/initializers/grover.rb

# frozen_string_literal: true

Grover.configure do |config|
  config.options = {
    format: 'A4',
    full_page: true,
    execute_script: true,
    margin: {
      top: 10,
      right: 10,
      bottom: 10,
      left: 10
    },
    viewport: {
      width: 1000,
      height: 2950
    },
    prefer_css_page_size: true,
    emulate_media: 'screen',
    cache: false,
    # timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
    launch_args: ['--font-render-hinting=medium'],
    wait_until: 'networkidle0'
  }
end

PdfsController

# frozen_string_literal: true

module Pro
  module Game
    class PdfsController < ApplicationController
      skip_authorization_check

      before_action :load_report

      def new
        attach_pdf

        redirect_to url_for(@game_report.pdf)
      end

      def show
        attach_pdf unless @game_report.pdf.attached?

        redirect_to url_for(@game_report.pdf)
      end

      private

      def attach_pdf
        pdf_from_string

        # @game_report.pdf.attach(
        #   io: StringIO.new(pdf_from_string),
        #   filename: "game_report_#{@game_report.id}_#{Time.zone.now}",
        #   content_type: 'application/pdf'
        # )
      end

      def load_report
        @game_report = Report.includes_associations.find(params[:report_id]).decorate
      end

      def pdf_from_string
        base_url = "#{request.protocol}#{request.host_with_port}/"
        processed_html = Grover::HTMLPreprocessor.process(pdf_string, base_url, request.protocol)

        grover_options = {
          display_url: request.referer,
          execute_script: false,
          cache: true,
          timeout: 50_000
          # timeout: 0,
          # debug: {
          #   headless: false,
          #   devtools: false
          # },
          # wait_until: 'domcontentloaded'
        }

        Grover.new(processed_html, grover_options).to_pdf
      end

      def pdf_string
        ReportsController.new.render_to_string(
          template: 'pro/game/reports/show',
          formats: :pdf,
          layout: 'pro/game/reports',
          locals: { '@game_report': @game_report }
        )
      end
    end
  end
end

I've confirmed the processed_html is indeed a string. This gets passed into the initializer but somewhere between the initializer and to_pdf something goes awry.

Heroku - No usable sandbox! Update your kernel or see

2019-07-08T23:50:42.562794+00:00 app[web.1]: [0708/235042.049464:FATAL:zygote_host_impl_linux.cc(116)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.

I set env to GROVER_NO_SANDBOX=true and added builpack https://github.com/jontewks/puppeteer-heroku-buildpack

what am I doing wrong?

Css background-image won't ever show

So I'm using your gem trying to generate cards for a proto game.
Works fine and all and sufficiently documented but only problem, I'm using background-image: url() css rule to display repeted concepts (borders, keywords, and others ...) and all those won't show.
Also using an img tag and images also won't display within the final render.

I've tried using those paramaters for to_png with no success :

  • wait_until: %w[load domcontentloaded]
  • timeout: 0
  • omit_background: true
  • print_background: true
  • full_page: true
grover = Grover.new(converted_html, wait_until: %w[load domcontentloaded], timeout: 0, omit_background: true, print_background: true, full_page: true)

# [...]

grover.to_png(local_path)

I can't seem to find anything more that could prevent background image to display, or maybe the pupppeteer "screenshot" is taken before the images get fully loaded. Any idea about how to resolve this ?

Stylesheet renders in grover debug but not on downloaded pdf

I'm using Grover to render and download pdf copies of documents that are generated in my app and am having trouble applying css to it. When in grover's debug mode (and therefore exported and rendered in Chromium), the CSS renders exactly how I would expect. However, when I put it in headless mode and download as a pdf, it appears that my application.css is not being applied.

The PDF is being generated with the following method in a concern:

def purchase_order_gen(the_purchase_order, request_host_with_port)
  puts "generating pdf"
  puts "Generating PDF for PO " + the_purchase_order.id.to_s
  puts "Host is " + request_host_with_port

  html = render_to_string({
                              partial: '/purchase_orders/display_purchase_order',
                              layout: '/layouts/pdf_layout.html.erb',
                              locals: { po: the_purchase_order }
                              })
  puts "html"
  ap html

  the_display_url = 'http://' + request_host_with_port
  grover = Grover.new(html, {format: 'A4' , display_url: the_display_url})
  pdf = grover.to_pdf
  puts "Back From Grover!"

  return pdf
end

And I'm referencing the css with the following line, placed in the <head> of my layout.

<%= stylesheet_link_tag '/assets/stylesheets/application.css'%>

If I had to guess, I'd figure that I'm bungling the stylesheet_link_tag and that the debug version is able to pull from the asset pipeline, where the headless version is not.

Thanks in advance

Is it possible to set the viewport size with `<meta>` tags?

Is it possible to set the viewport size with <meta> tags? Ex:

<meta name="grover-viewport-width" content=640>
<meta name="grover-viewport-height" content=480>

This doesn't work because 640 and 480 are converted to strings and puppeteer throws an error "Protocol error (Emulation.setDeviceMetricsOverride): Invalid parameters width: integer value expected; height: integer value expected"

(Asking here because it comes up when searching, so hopefully that will help others benefit from the answer)

Originally posted by @willkoehler in #26 (comment)

Can't see any styles on debug mode (non headless and devtools)

Hi there,

While trying to debug some issues with the HTML that I'm converting to PDF using the Grover Middleware, I turned on the debug configuration so the HTML opens on Chromium with Devtools.

The problem is, that all the styles that I try to inspect are empty. I have my CSS stylesheets inside the head tag and they are loading perfectly.

I attach a screenshot so maybe we can find out if this is an issue that would belong here or not.

Thanks a lot!

Captura de pantalla de 2019-04-23 14-08-46

About addStyleTag and addScriptTag

Hi, @abrom
Maybe this is not a bug, but a question of mine. My scss and js files are not valid in the PDF file.
I have tried the following code:

  def edit
    pdf_html = render_to_string(template: 'teams/written_pad_reports/show', layout: 'pdf.html.slim', )

    base_url = "#{request.protocol}#{request.host_with_port}/"
    grover_options = {
      display_url: base_url,
      format: 'A4',
      print_background: true,
    }

    pdf = Grover.new(pdf_html, grover_options).to_pdf
    send_data pdf, filename: "#{test}.pdf", type: 'application/pdf'
  end

and

  def edit
    pdf_html = render_to_string(template: 'teams/written_pad_reports/show', layout: 'pdf.html.slim', )
    pdf = Grover.new(pdf_html).to_pdf
    send_data pdf, filename: "#{test}.pdf", type: 'application/pdf'
  end

my html code:
image
Neither css nor js takes effect, Then I saw your style_tag_options, script_tag_options
But now I don't quite understand, how should I write these two options
I have tried to write like this:

script_tag_options = [
  { url: 'http://localhost:3000/packs/css/application.js' },
  { path: 'application.js' }
]
style_tag_options = [
  { url: 'http://localhost:3000/packs/css/application.css' },
  { path: 'application.css' }
]

It won't work๏ผ(They are in webpacker)
Could you please advise๏ผŸ

cover_path params not present in Rails Controller

I might be doing something wrong here, but I am unable to see my cover_path params when the request is initiated by grover. When I request the page directly via browser the params are visible.
Is this a known limitation?

front_cover_path: pdf_cover_page_path(title: "Title", subtitle: "Subtitle"),

[5, 16] in app/controllers/pdf_cover_pages_controller.rb
    5:   def show
    6:     byebug
=>  7:     @title = pdf_cover_page_params[:title]
    8:     @subtitle = pdf_cover_page_params[:subtitle]
    9:   end
   10:
   11:   private
   12:
   13:   def pdf_cover_page_params
   14:     params.permit(:title, :subtitle)
   15:   end
   16: end
(byebug) request.params
{"controller"=>"pdf_cover_pages", "action"=>"show"}
(byebug) request.url
"https://host.redacted.net/pdf_cover_page?subtitle=Subtitle&title=Title"
(byebug) params
<ActionController::Parameters {"controller"=>"pdf_cover_pages", "action"=>"show"} permitted: false>

"undefined method `close' for nil:NilClass" on Heroku

Description

When trying to generate a PDF on Heroku I get the following error

NoMethodError (undefined method `close' for nil:NilClass)

This error happens when running Grover.new(some_html).to_pdf

Heroku configuration

I use the jontewks/puppeteer buildpack, which is set to run prior the ruby buildpack.
Screen Shot 2020-06-22 at 11 14 45

It seems correctly installed when looking in the build logs. I also set GROVER_NO_SANDBOX=true in env variables.
Screen Shot 2020-06-22 at 11 14 54

Grover runs correctly on my Mac though w/ puppeteer installed.

Don't know wether it's a grover or buildpack related issue, sorry if not appropriate for this repo.

Versions

jontewks/puppeteer: 1.1.7
grover: 0.12.1

Background images not printing through middleware

Hi again,

After trying to convert to PDF a more complex HTML set, I suspect that the property print_background is not being properly passed to Puppeteer, or maybe there is a timeout problem.

Here is a WORKING example:

grover = Grover.new('<html><body><div style="display: block; width: 300px; height: 300px; background-image: url(https://statics.memondo.com/p/99/cfs/2017/12/CF_57532_33c1af8a2ba0422f9d908b927670e9e9_perros_cuando_no_sabes_si_tienes_una_foquita_o_un_perrete.jpg);background-size:cover"></div></body></html>'
)

grover.to_pdf

Here is a NOT WORKING example:

render(:inline => '<html><body><div style="display: block; width: 300px; height: 300px; background-image: url(https://statics.memondo.com/p/99/cfs/2017/12/CF_57532_33c1af8a2ba0422f9d908b927670e9e9_perros_cuando_no_sabes_si_tienes_una_foquita_o_un_perrete.jpg);background-size:cover"></div></body></html>', :layout => 'layout_pdf')

I went on to change the gem a little bit so I can see (print to rails console) exactly what options is Puppeteer getting after the normalize that you run through them, and it looks exactly the same on both cases:

grover.rb (line 127)

puts normalized_options

And I get this:

{"displayUrl"=>"http://app.dev.informa/es/informe_pdfkit/informe_comercial/3193516/informe_comercial_A80192727", "format"=>"A4", "margin"=>{"top"=>"0.8in", "bottom"=>"0.65in", "right"=>"0.45in", "left"=>"0.5in"}, "preferCSSPageSize"=>true, "displayHeaderFooter"=>true, "printBackground"=>true, "emulateMedia"=>"print", "cache"=>false, "timeout"=>0}

As you can see, I pass the print_background option so Puppeter should be getting it and returning the PDF with background images.

Do you have an idea what could be happening here? I tried with Puppeteer alone, passing the same html string and it works perfectly.

Thank you very much..

Heroku slug size

Hello

Thanks for this great Gem.

On one application I am running into an issue when deploying Grover to Heroku my slug size gets above the 500Mb allowed and therefore I am unable to deploy.

I have tried purging the cache on Heroku and several tutos to reduce the slug size but without any major success.

Looking more into details of my slug I realized Puppeteer download a Chrome version of around 300Mb. I looked into using puppeteer-core with a chrome build pack to see if this could help. I could specify puppeteer-core, but I am running into an issue on how to tell Grover to use the Chrome install coming from the build pack.

Do you have some guidance on how I could do this or other suggestion for working with Heroku and minimizing the slug size with Puppeteer

Thanks

how to change the viewport size in ruby

// reports_controller.rb

def pdf
    pdf_html = ActionController::Base.new.render_to_string(template: 'reports/pdf.html.erb', layout: 'pdf' )

    pdf = Grover.new(pdf_html, {
      format: 'A4' //dynamic 
    }).to_pdf
end

//grover.rb

Grover.configure do |config|
  config.options = {
    emulate_media: 'screen',
    cache: false,
    print_background: true,
    prefer_css_page_size: true,
    # timeout: 0 # Timeout in ms. A value of `0` means 'no timeout'
  }
end

I tried pdf.setViewPort() but I think it is for page = ..browser something

Best way to optimize using Grover (just discovered 750 chrome processes opened)

Hi there,

I am having a similar issue to #60, but my scenario is just slightly different. I am using Sidekiq (plus Ruby on Rails) to generate a PDF document within a docker container. I am using Sidekiq worker to basically process HTML content (which sometimes have up to two charts).

Here's how I'm using them:

    grover_options = {
      format: "Letter",
      full_page: true,
      prefer_css_page_size: false,
      emulate_media: "screen",
      cache: false,
      timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
      launch_args: ["--font-render-hinting=medium", "--no-sandbox"],
      margin: { top: 75, right: 30, bottom: 44, left: 30 },
      print_background: true,
      scale: 0.75,
      display_header_footer: true,
      header_template: header_template,
      footer_template: footer_template,
    }

    final_pdf = CombinePDF.new
    final_pdf << CombinePDF.parse(Grover.new(cover_html, grover_options.merge(margin: { top: 0, right: 0, bottom: 0, left: 0 })).to_pdf)
    final_pdf << CombinePDF.parse(Grover.new(body_html, grover_options).to_pdf)

I realized that, lately, the Sidekiq worker has been failing with these types of errors:

2021-01-22T03:29:04.135Z pid=31936 tid=gmywnrgm0 WARN: Grover::Error: Failed to instantiate worker process:

2021-01-22T03:29:04.136Z pid=31936 tid=gmywnrgm0 WARN: /usr/local/bundle/gems/grover-0.14.1/lib/grover/processor.rb:42:in `ensure_packages_are_initiated'
/usr/local/bundle/gems/grover-0.14.1/lib/grover/processor.rb:19:in `convert'
/usr/local/bundle/gems/grover-0.14.1/lib/grover.rb:50:in `to_pdf'
/myapp/app/workers/report_generator_worker.rb:219:in `get_report_contents'
/myapp/app/workers/report_generator_worker.rb:68:in `perform'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:196:in `execute_job'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:133:in `invoke'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:163:in `block in process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_retry.rb:111:in `local'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/rails.rb:14:in `block in call'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/reloader.rb:73:in `block in wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/reloader.rb:72:in `wrap'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/rails.rb:13:in `call'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:257:in `stats'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_logger.rb:13:in `call'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_retry.rb:78:in `global'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/logger.rb:10:in `with'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_logger.rb:33:in `prepare'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:123:in `dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:162:in `process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:78:in `process_one'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:68:in `run'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/util.rb:15:in `watchdog'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/util.rb:24:in `block in safe_thread'
Killed

The above output is the most common. The one below is one I've just started seeing:

2021-01-22T03:28:49.846Z pid=31936 tid=gmywnu0js WARN: Grover::Error: Worker process failed:
(node:592) UnhandledPromiseRejectionWarning: Error: Page crashed!
    at Page._onTargetCrashed (/myapp/node_modules/puppeteer/lib/Page.js:209:28)
    at CDPSession.<anonymous> (/myapp/node_modules/puppeteer/lib/Page.js:129:57)
    at CDPSession.emit (events.js:314:20)
    at CDPSession._onMessage (/myapp/node_modules/puppeteer/lib/Connection.js:166:18)
    at Connection._onMessage (/myapp/node_modules/puppeteer/lib/Connection.js:83:25)
    at WebSocket.<anonymous> (/myapp/node_modules/puppeteer/lib/WebSocketTransport.js:25:32)
    at WebSocket.onMessage (/myapp/node_modules/ws/lib/event-target.js:132:16)
    at WebSocket.emit (events.js:314:20)
    at Receiver.receiverOnMessage (/myapp/node_modules/ws/lib/websocket.js:825:20)
    at Receiver.emit (events.js:314:20)
(node:592) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/
api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:592) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

2021-01-22T03:28:49.848Z pid=31936 tid=gmywnu0js WARN: /usr/local/bundle/gems/grover-0.14.1/lib/grover/processor.rb:96:in `rescue in call_js_method'
/usr/local/bundle/gems/grover-0.14.1/lib/grover/processor.rb:79:in `call_js_method'
/usr/local/bundle/gems/grover-0.14.1/lib/grover/processor.rb:20:in `convert'
/usr/local/bundle/gems/grover-0.14.1/lib/grover.rb:50:in `to_pdf'
/myapp/app/workers/report_generator_worker.rb:220:in `get_report_contents'
/myapp/app/workers/report_generator_worker.rb:68:in `perform'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:196:in `execute_job'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:133:in `invoke'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:163:in `block in process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_retry.rb:111:in `local'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/rails.rb:14:in `block in call'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/reloader.rb:73:in `block in wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4/lib/active_support/reloader.rb:72:in `wrap'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/rails.rb:13:in `call'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:257:in `stats'                                                                                                                                                                                                                                                                                       [97/1794]
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_logger.rb:13:in `call'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_retry.rb:78:in `global'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/logger.rb:10:in `with'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/job_logger.rb:33:in `prepare'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:123:in `dispatch'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:162:in `process'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:78:in `process_one'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/processor.rb:68:in `run'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/util.rb:15:in `watchdog'
/usr/local/bundle/gems/sidekiq-6.1.2/lib/sidekiq/util.rb:24:in `block in safe_thread'

I haven't seen the error above before except for today. I did some investigating and, from what I can see, each time I run something as simple as:

Grover.new("<body>html</body>", {launch_args: ["--font-render-hinting=medium", "--no-sandbox"]}).to_pdf

This spawns 5 processes that contains "chrome" in it and they don't seem to ever close.

root@61195ba0f622:/myapp# pgrep -f chrome | wc -l
750
root@61195ba0f622:/myapp# pgrep -f chrome | wc -l
755

I wonder if this is my problem? If so, is there a best practice that I should be implementing to ensure that the chrome process closes once it's finished? As you can see from the above example, it's just loading a very basic HTML with small data, but yet the 5 chrome processes stay opened.

It gets to the point to where I can't even kill the chrome process with kill -9 [pid]:

root     32724  0.0  0.0      0     0 ?        Z    03:06   0:00 [chrome] <defunct>
root     32735  0.3  0.0      0     0 ?        Z    03:06   0:09 [chrome] <defunct>
root@61195ba0f622:/myapp# kill -9 32735
root@61195ba0f622:/myapp# ps aux | grep -i 32735
root      1966  0.0  0.0   4836   892 pts/3    S+   03:49   0:00 grep -i 32735
root     32735  0.3  0.0      0     0 ?        Z    03:06   0:09 [chrome] <defunct>

Cannot find module 'puppeteer'.

Hello, There is an error in my rails project

Grover::DependencyError (Cannot find module 'puppeteer'. You need to add it to '/package.json' and run 'npm install'):

Environment:

Ruby 2.7.1
Rails 6.0.3.2
Grover 0.13.3
in production

# package-lock.json
  "puppeteer": {
      "version": "5.5.0",
  },

However, if you modify it as follows, no problem

# js/processor.js
// var puppeteer = require(require.resolve('puppeteer', { paths: Module._nodeModulePaths(process.cwd()) }));
var puppeteer = require('puppeteer');

Do you have any other solutions?

Rail's groover_options match my pdf.js file, but I get two separate results (half the styles work in the rails-generated PDF)

New to this gem and appreciate on it.

I'm trying to figure out why I'm missing so many styles when I try to render content from within Rails compared to when rendering it just straight from the command line.

For example, here's what my controller has:

          grover_options = {
            format: 'Letter',
            margin: {
              top: '5px',
              bottom: '10cm'
            },
            viewport: {
              width: 640,
              height: 480
            },
            prefer_css_page_size: true,
            emulate_media: 'screen',
            cache: false,
            timeout: 0, # Timeout in ms. A value of `0` means 'no timeout'
            launch_args: ['--font-render-hinting=medium', '--no-sandbox'],
            wait_until: 'domcontentloaded',
            waitUntil: 'networkidle2'
          }
          grover = Grover.new('http://localhost:3000/test.html', grover_options)
          send_data grover.to_pdf, filename: "Report.pdf", type: 'application/pdf', disposition: :inline

and here's what I have inside of my Rails app's pdf.js file:

'use strict';

const puppeteer = require('puppeteer');

const createPdf = async() => {
  let browser;
  try {
    browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
    const page = await browser.newPage();
    await page.goto(process.argv[2], {timeout: 3000, waitUntil: 'networkidle2'});
    await page.waitFor(250);
    await page.pdf({
      path: process.argv[3],
      format: 'Letter',
      full_page: true,
      prefer_css_page_size: true,
      emulate_media: 'screen',
      margin: { top: 36, right: 36, bottom: 20, left: 36 },
      printBackground: true
    });
  } catch (err) {
      console.log(err.message);
  } finally {
    if (browser) {
      browser.close();
    }
    process.exit();
  }
};
createPdf();

and I call it using:

node pdf.js http://localhost:3000/test.html public/output.pdf

Aren't these two pretty much configured to do the same thing? When I run node from the command line, I can see my table headers' backgrounds and everything looks fine. However, when I try to generate the PDF from the controller, the table header background doesn't show. It seems like some styles work (for example the h1 elements are applied) but some others don't.

I'm also running this in a digital ocean droplet, inside a container, so it's a little tough to debug.

Not sure what I'm doing wrong here. Any help would be greatly appreciated.

How to save PDF file using middleware?

This gem is the cleanest, simplest, sturdiest view-to-PDF option for Rails I can find, and I have tried several (wicked; rails_pdf; html2pdf-rails). Thank you!

The middleware to tack ".pdf" on to the end of a URL and have it output a PDF using my print styles is great. Is there a way to have Rails save this file to my server's disk while using the middleware?

This would be easier than refactoring my views to pass in local vars, and trying to get an absolute path for my styles working, to then use the Grover.new(...).to_pdf approach.

TypeError [ERR_INVALID_ARG_TYPE]: The "original" argument must be of type function

Hey @abrom,

I just rebooted my docker container and for some reason I'm getting this issue when trying to use Grover:

Screen Shot 2020-05-22 at 8 26 22 AM

Any suggestions on why this may be happening? I got this on another container too while trying to dockerize a rails app. Not sure why this may be happening.

EDIT

Additional details from the console:

[10] pry(#<Consultants::ReportsController>)> grover = Grover.new('https://google.com', format: 'A4')
=> #<Grover:0x47462255799660 @url="https://google.com">
[11] pry(#<Consultants::ReportsController>)> grover.to_pdf
Grover::Error: TypeError [ERR_INVALID_ARG_TYPE]: The "original" argument must be of type function
from /usr/local/rvm/gems/ruby-2.5.1/gems/grover-0.12.1/lib/grover/processor.rb:53:in `parse_package_error'
[12] pry(#<Consultants::ReportsController>)>

Emojis not working

When I render inline HTML with emojis, the PDF contains odd characters instead of emojis.

My HTML:

image

PDF, generated using Grover.new(File.read("test.html")).to_pdf("test.pdf");

image

Any clue why this would happen?

PDF Cover Pages are not rendered for non-middleware executions

Hi there, I appreciate the work you guys have done with this gem!

I'm not sure if this is intentional, but cover pages aren't rendered when you use direct execution,

# cover page will not render
Grover.new(pdf_report_url, front_cover_path: pdf_title_url).to_pdf

This is because the cover page logic exists in the middleware itself, but it is a little confusing because the grover instance receives the front_cover_path and back_cover_path options, it just doesn't do anything with them on its own.

Upon close examination of the README, the Cover Page does mention the middleware, but I figured the information about the global/meta-tag configurations was for the sake of the middleware execution, not that it was a complete restriction.

If the middleware restriction is intentional, I think it might help to point out the restriction explicitly. i.e. "This won't work for direct execution!"

As a fix, I used the same combine_pdf gem as the middleware uses:

  require 'combine_pdf'

  # ...

  def invoke(file_path)
    pdf = CombinePDF.parse(Grover.new(pdf_report_url).to_pdf)
    pdf >> CombinePDF.parse(Grover.new(pdf_title_url).to_pdf)
    pdf.save file_path
  end

This is an unrelated note and I don't expect you to try to resolve it, but when I tried to use middleware execution I would always get a Navigation Timeout error from the underlying puppeteer page.pdf() call. Running the same execution directly would quickly succeed. I don't have a good repro for that though (my app is a Single Page App that uses Hash Routing, which is why I ignored the middleware). I am assuming that the issue had to do with that app's architecture and isn't a middleware bug, but I figured I'd mention it in case that sparked an idea for you.

TypeError: util_1.promisify is not a function

Works great locally, but on my production server I get error "TypeError: util_1.promisify is not a function". Everything points to node being below version 8, but my node is version 14.

CPU/memory usage

Hi! Thanks a lot for this gem! It's working beautifully.

I'm using it in a Docker container in Kubernetes. If multiple jobs are processed at the same time (Sidekiq worker with N threads), does the gem start as many browsers to take the screen shots?

I am concerned about resources usage.

Thanks in advance for any clarification ๐Ÿ™‚

Zombie node processes

First of all I want to thank you for the great gem. Unfortunately I'm experiencing some troubles with zombie chrome nodejs processes growing increasingly. It seems it's a wide-spread problem if you check puppeteer issue tracker. As I can see browser.close() is present but seems like not executed... Are you experiencing same issues or have any idea how we can prevent this? Thanks!

grover v0.6.2
puppeteer v1.10.0

Edit: Seems like they are not chrome zobies but node zombies after all

alex             98350   0,0  0,4  4576400  31432 s000  S+   10:05pm   0:00.42 node -e try {\012  var puppeteer = require("puppeteer");\012} catch (e)

Maybe it's related to schmooze gem..

Generating PDF takes a long time

It's seems the page is loading for a long time before anything happens, even if I have a 1s timeout, it kinda sits there and tries to load the page for more than a minute.

Any ideas what could be happening?

From what I saw it's hanging long on the to_pdf call.

Guidance using Grover with Devise

Apologies first. This is not really an issue but I couldn't find any where else to ask this.

Rails 6.0.3.4
Ruby 2.7.2

I have been experimenting with Grover (cool by the way) and find I have to add a

skip_before_action :authenticate_user!, only: [:show] (action where pdf is generated in an invoice controller)

otherwise I am unable to log in to view the pdf.

I don't really want to do this as I don't want anyone visiting the route and viewing the invoice page. I use Devise for authentication in my admin backend.

If I missed something in your docs let me know. Thanks

Broken pipe - Can't read from worker (Worker process failed)

Hey, I have a weird issue with Grover, and it seems to only happen in our production server.
Anyone experienced an error where it just says:

Grover::Error
Worker process failed:

which actually started with:

Errno::EPIPE
Broken pipe - Can't read from worker

I can't seem to get more information on this, and am wondering if / how I could debug it (can't reproduce the issue either, just happens sometimes)

Call Grover.new('').to_pdf gets stuck forever

Happy path:

  • When you call to_pdf with a value with content (even spaces), it returns the PDF

Bad path:

  • When you call to_pdf with a nil value, it returns an error. I don't know if it should return an error or change it to a blank value and return the PDF.
  • When you call to_pdf with an empty value (Grover.new('').to_pdf), it gets frozen until it reaches the timeout.

How can I debug JS/CSS or view the output?

Hi there!

Thanks so much for the amazing work you have done with this library.

I am getting a blank page when trying to export a PDF containing some view rendered from a controller.

The thing is, that the page generated is just blank, and I can't see in the docs any way to debug.

Is there any?

Thanks

Font styles won't apply in non-english languages

Hi,

First of all let me say great job with Grover!

Now, I recently used Grover to generate some PDFs for my web app and a very strange thing happened:

I'm using a particular Google font namely Noto Sans which I load in my template. The "problem" is that only english Characters are rendered with Noto Sans. Greek words (which I use) are rendered using the system defaults.

To add insult to the injury I used these to get a closer look:

debug: {
  headless: false,  # Default true. When set to false, the Chromium browser will be displayed
  devtools: true    # Default false. When set to true, the browser devtools will be displayed.
}

and much to my surprise everything looked OK in Chromium.

I'm attaching the following screenshots to get a better idea of what I am talking about:

Final PDF result in Chrome (Notice that the English and the Greek words don't share the same font):
Screenshot 2019-07-10 12 01 26

and this is how it looks in Chromium using the devtools (Notice that now the English and Greek works do share the same font):
Screenshot 2019-07-10 12 09 06

Any ideas on what this might be?

Thanks.

Most common encoding issues ?

Just to know if there is some common encoding issues with Grover.
I'm successfully generating pngs from full image content, but having some \xD3 ASCII-8BIT to UTF-8 and other of the kind conversion errors with combined text and images produced from grover - haven't tried yet with non grover produced images.

Tried to force encoding at every step (within js and rb processors, within grover rb and at every steps of my process).
I'm also having those without specifying a path for conversion, I've tried using to_png and to_pdf. Seems that the error triggers after most of the return chain, it even was able to print within the Grover.screenshot before triggering.

The trigger seems to be a write method that ruby is telling me I have in my code, though the line at which it points is the Grover to_png / to_pdf method, and the only write method I could traceback was within the js processor. But the error triggers far after it is called and has produced something.

Also the consumed htmls are displaying correctly from the point where Grover is calling them (Rails erb view tested through preventing puppeteer browser closing).

But nothing seems to do right now. Will keep on tomorrow with different htmls and different ways of producing them ... right now I need some sleep.

Issue setting up sandbox

Failed to launch the browser process! [0730/163311.689983:FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.

I get this error message in my browser when launching my Rails app. I went to the URL and it says install it from build/update-linux-sandbox.sh, which is not a file on my system nor do I know how to get it. I installed pupeteer with npm install and grover with a bundle install.

I am developing in a linux subsystem on a windows PC.

Help would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.