GithubHelp home page GithubHelp logo

fluent-plugin-bigquery's Introduction

fluent-plugin-bigquery

Fluentd output plugin to load/insert data into Google BigQuery.

  • Plugin type: Output

Current version of this plugin supports Google API with Service Account Authentication, but does not support OAuth flow for installed applications.

Support Version

plugin version fluentd version ruby version
v0.4.x 0.12.x 2.0 or later
v1.x.x 0.14.x or later 2.2 or later
v2.x.x 0.14.x or later 2.3 or later
v3.x.x 1.x or later 2.7 or later

With docker image

If you use official alpine based fluentd docker image (https://github.com/fluent/fluentd-docker-image), You need to install bigdecimal gem on your own dockerfile. Because alpine based image has only minimal ruby environment in order to reduce image size. And in most case, dependency to embedded gem is not written on gemspec. Because embbeded gem dependency sometimes restricts ruby environment.

Configuration

Options

common

name type required? placeholder? default description
auth_method enum yes no private_key private_key or json_key or compute_engine or application_default
email string yes (private_key) no nil GCP Service Account Email
private_key_path string yes (private_key) no nil GCP Private Key file path
private_key_passphrase string yes (private_key) no nil GCP Private Key Passphrase
json_key string yes (json_key) no nil GCP JSON Key file path or JSON Key string
location string no no nil BigQuery Data Location. The geographic location of the job. Required except for US and EU.
project string yes yes nil
dataset string yes yes nil
table string yes (either tables) yes nil
tables array(string) yes (either table) yes nil can set multi table names splitted by ,
auto_create_table bool no no false If true, creates table automatically
ignore_unknown_values bool no no false Accept rows that contain values that do not match the schema. The unknown values are ignored.
schema array yes (either fetch_schema or schema_path) no nil Schema Definition. It is formatted by JSON.
schema_path string yes (either fetch_schema) yes nil Schema Definition file path. It is formatted by JSON.
fetch_schema bool yes (either schema_path) no false If true, fetch table schema definition from Bigquery table automatically.
fetch_schema_table string no yes nil If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored
schema_cache_expire integer no no 600 Value is second. If current time is after expiration interval, re-fetch table schema definition.
request_timeout_sec integer no no nil Bigquery API response timeout
request_open_timeout_sec integer no no 60 Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value.
time_partitioning_type enum no (either day) no nil Type of bigquery time partitioning feature.
time_partitioning_field string no no nil Field used to determine how to create a time-based partition.
time_partitioning_expiration time no no nil Expiration milliseconds for bigquery time partitioning.
clustering_fields array(string) no no nil One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data.

bigquery_insert

name type required? placeholder? default description
template_suffix string no yes nil can use %{time_slice} placeholder replaced by time_slice_format
skip_invalid_rows bool no no false
insert_id_field string no no nil Use key as insert_id of Streaming Insert API parameter. see. https://docs.fluentd.org/v1.0/articles/api-plugin-helper-record_accessor
add_insert_timestamp string no no nil Adds a timestamp column just before sending the rows to BigQuery, so that buffering time is not taken into account. Gives a field in BigQuery which represents the insert time of the row.
allow_retry_insert_errors bool no no false Retry to insert rows when an insertErrors occurs. There is a possibility that rows are inserted in duplicate.
require_partition_filter bool no no false If true, queries over this table require a partition filter that can be used for partition elimination to be specified.

bigquery_load

name type required? placeholder? default description
source_format enum no no json Specify source format json or csv or avro. If you change this parameter, you must change formatter plugin via <format> config section.
max_bad_records integer no no 0 If the number of bad records exceeds this value, an invalid error is returned in the job result.

Buffer section

name type required? default description
@type string no memory (insert) or file (load)
chunk_limit_size integer no 1MB (insert) or 1GB (load)
total_limit_size integer no 1GB (insert) or 32GB (load)
chunk_records_limit integer no 500 (insert) or nil (load)
flush_mode enum no interval default, lazy, interval, immediate
flush_interval float no 1.0 (insert) or 3600 (load)
flush_thread_interval float no 0.05 (insert) or 5 (load)
flush_thread_burst_interval float no 0.05 (insert) or 5 (load)

And, other params (defined by base class) are available

see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/output.rb

Inject section

It is replacement of previous version time_field and time_format.

For example.

<inject>
  time_key time_field_name
  time_type string
  time_format %Y-%m-%d %H:%M:%S
</inject>
name type required? default description
hostname_key string no nil
hostname string no nil
tag_key string no nil
time_key string no nil
time_type string no nil
time_format string no nil
localtime bool no true
utc bool no false
timezone string no nil

see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin_helper/inject.rb

Formatter section

This section is for load mode only. If you use insert mode, used formatter is json only.

Bigquery supports csv, json and avro format. Default is json I recommend to use json for now.

For example.

source_format csv

<format>
  @type csv
  fields col1, col2, col3
</format>

see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin_helper/formatter.rb

Examples

Streaming inserts

Configure insert specifications with target table schema, with your credentials. This is minimum configurations:

<match dummy>
  @type bigquery_insert

  auth_method private_key   # default
  email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
  private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
  # private_key_passphrase notasecret # default

  project yourproject_id
  dataset yourdataset_id
  table   tablename

  schema [
    {"name": "time", "type": "INTEGER"},
    {"name": "status", "type": "INTEGER"},
    {"name": "bytes", "type": "INTEGER"},
    {"name": "vhost", "type": "STRING"},
    {"name": "path", "type": "STRING"},
    {"name": "method", "type": "STRING"},
    {"name": "protocol", "type": "STRING"},
    {"name": "agent", "type": "STRING"},
    {"name": "referer", "type": "STRING"},
    {"name": "remote", "type": "RECORD", "fields": [
      {"name": "host", "type": "STRING"},
      {"name": "ip", "type": "STRING"},
      {"name": "user", "type": "STRING"}
    ]},
    {"name": "requesttime", "type": "FLOAT"},
    {"name": "bot_access", "type": "BOOLEAN"},
    {"name": "loginsession", "type": "BOOLEAN"}
  ]
</match>

For high rate inserts over streaming inserts, you should specify flush intervals and buffer chunk options:

<match dummy>
  @type bigquery_insert
  
  <buffer>
    flush_interval 0.1  # flush as frequent as possible
    
    total_limit_size 10g
    
    flush_thread_count 16
  </buffer>
  
  auth_method private_key   # default
  email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
  private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
  # private_key_passphrase notasecret # default

  project yourproject_id
  dataset yourdataset_id
  tables  accesslog1,accesslog2,accesslog3

  schema [
    {"name": "time", "type": "INTEGER"},
    {"name": "status", "type": "INTEGER"},
    {"name": "bytes", "type": "INTEGER"},
    {"name": "vhost", "type": "STRING"},
    {"name": "path", "type": "STRING"},
    {"name": "method", "type": "STRING"},
    {"name": "protocol", "type": "STRING"},
    {"name": "agent", "type": "STRING"},
    {"name": "referer", "type": "STRING"},
    {"name": "remote", "type": "RECORD", "fields": [
      {"name": "host", "type": "STRING"},
      {"name": "ip", "type": "STRING"},
      {"name": "user", "type": "STRING"}
    ]},
    {"name": "requesttime", "type": "FLOAT"},
    {"name": "bot_access", "type": "BOOLEAN"},
    {"name": "loginsession", "type": "BOOLEAN"}
  ]
</match>

Important options for high rate events are:

  • tables
    • 2 or more tables are available with ',' separator
    • out_bigquery uses these tables for Table Sharding inserts
    • these must have same schema
  • buffer/chunk_limit_size
    • max size of an insert or chunk (default 1000000 or 1MB)
    • the max size is limited to 1MB on BigQuery
  • buffer/chunk_records_limit
    • number of records over streaming inserts API call is limited as 500, per insert or chunk
    • out_bigquery flushes buffer with 500 records for 1 inserts API call
  • buffer/queue_length_limit
    • BigQuery streaming inserts needs very small buffer chunks
    • for high-rate events, buffer_queue_limit should be configured with big number
    • Max 1GB memory may be used under network problem in default configuration
      • chunk_limit_size (default 1MB) x queue_length_limit (default 1024)
  • buffer/flush_thread_count
    • threads for insert api calls in parallel
    • specify this option for 100 or more records per seconds
    • 10 or more threads seems good for inserts over internet
    • less threads may be good for Google Compute Engine instances (with low latency for BigQuery)
  • buffer/flush_interval
    • interval between data flushes (default 0.25)
    • you can set subsecond values such as 0.15 on Fluentd v0.10.42 or later

See Quota policy section in the Google BigQuery document.

Load

<match bigquery>
  @type bigquery_load

  <buffer>
    path bigquery.*.buffer
    flush_at_shutdown true
    timekey_use_utc
  </buffer>

  auth_method json_key
  json_key json_key_path.json

  project yourproject_id
  dataset yourdataset_id
  auto_create_table true
  table yourtable%{time_slice}
  schema_path bq_schema.json
</match>

I recommend to use file buffer and long flush interval.

Authentication

There are four methods supported to fetch access token for the service account.

  1. Public-Private key pair of GCP(Google Cloud Platform)'s service account
  2. JSON key of GCP(Google Cloud Platform)'s service account
  3. Predefined access token (Compute Engine only)
  4. Google application default credentials (http://goo.gl/IUuyuX)

Public-Private key pair of GCP's service account

The examples above use the first one. You first need to create a service account (client ID), download its private key and deploy the key with fluentd.

JSON key of GCP(Google Cloud Platform)'s service account

You first need to create a service account (client ID), download its JSON key and deploy the key with fluentd.

<match dummy>
  @type bigquery_insert

  auth_method json_key
  json_key /home/username/.keys/00000000000000000000000000000000-jsonkey.json

  project yourproject_id
  dataset yourdataset_id
  table   tablename
  ...
</match>

You can also provide json_key as embedded JSON string like this. You need to only include private_key and client_email key from JSON key file.

<match dummy>
  @type bigquery_insert

  auth_method json_key
  json_key {"private_key": "-----BEGIN PRIVATE KEY-----\n...", "client_email": "[email protected]"}

  project yourproject_id
  dataset yourdataset_id
  table   tablename
  ...
</match>

Predefined access token (Compute Engine only)

When you run fluentd on Googlce Compute Engine instance, you don't need to explicitly create a service account for fluentd. In this authentication method, you need to add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your Compute Engine instance, then you can configure fluentd like this.

<match dummy>
  @type bigquery_insert

  auth_method compute_engine

  project yourproject_id
  dataset yourdataset_id
  table   tablename

  ...
</match>

Application default credentials

The Application Default Credentials provide a simple way to get authorization credentials for use in calling Google APIs, which are described in detail at http://goo.gl/IUuyuX.

In this authentication method, the credentials returned are determined by the environment the code is running in. Conditions are checked in the following order:credentials are get from following order.

  1. The environment variable GOOGLE_APPLICATION_CREDENTIALS is checked. If this variable is specified it should point to a JSON key file that defines the credentials.
  2. The environment variable GOOGLE_PRIVATE_KEY and GOOGLE_CLIENT_EMAIL are checked. If this variables are specified GOOGLE_PRIVATE_KEY should point to private_key, GOOGLE_CLIENT_EMAIL should point to client_email in a JSON key.
  3. Well known path is checked. If file is exists, the file used as a JSON key file. This path is $HOME/.config/gcloud/application_default_credentials.json.
  4. System default path is checked. If file is exists, the file used as a JSON key file. This path is /etc/google/auth/application_default_credentials.json.
  5. If you are running in Google Compute Engine production, the built-in service account associated with the virtual machine instance will be used.
  6. If none of these conditions is true, an error will occur.

Table id formatting

this plugin supports fluentd-0.14 style placeholder.

strftime formatting

table and tables options accept Time#strftime format to construct table ids. Table ids are formatted at runtime using the chunk key time.

see. https://docs.fluentd.org/configuration/buffer-section

For example, with the configuration below, data is inserted into tables accesslog_2014_08_02, accesslog_2014_08_03 and so on.

<match dummy>
  @type bigquery_insert

  ...

  project yourproject_id
  dataset yourdataset_id
  table   accesslog_%Y_%m_%d

  <buffer time>
    timekey 1d
  </buffer>
  ...
</match>

NOTE: In current fluentd (v1.15.x), The maximum unit supported by strftime formatting is the granularity of days

record attribute formatting

The format can be suffixed with attribute name.

CAUTION: format is different with previous version

<match dummy>
  ...
  table   accesslog_${status_code}

  <buffer status_code>
  </buffer>
  ...
</match>

If attribute name is given, the time to be used for formatting is value of each row. The value for the time should be a UNIX time.

time_slice_key formatting

Instead, Use strftime formatting.

strftime formatting of current version is based on chunk key. That is same with previous time_slice_key formatting .

Date partitioned table support

this plugin can insert (load) into date partitioned table.

Use placeholder.

<match dummy>
  @type bigquery_load

  ...
  table   accesslog$%Y%m%d

  <buffer time>
    timekey 1d
  </buffer>
  ...
</match>

But, Dynamic table creating doesn't support date partitioned table yet. And streaming insert is not allowed to insert with $%Y%m%d suffix. If you use date partitioned table with streaming insert, Please omit $%Y%m%d suffix from table.

Dynamic table creating

When auto_create_table is set to true, try to create the table using BigQuery API when insertion failed with code=404 "Not Found: Table ...". Next retry of insertion is expected to be success.

NOTE: auto_create_table option cannot be used with fetch_schema. You should create the table on ahead to use fetch_schema.

<match dummy>
  @type bigquery_insert

  ...

  auto_create_table true
  table accesslog_%Y_%m

  ...
</match>

Also, you can create clustered table by using clustering_fields.

Table schema

There are three methods to describe the schema of the target table.

  1. List fields in fluent.conf
  2. Load a schema file in JSON.
  3. Fetch a schema using BigQuery API

The examples above use the first method. In this method, you can also specify nested fields by prefixing their belonging record fields.

<match dummy>
  @type bigquery_insert

  ...

  schema [
    {"name": "time", "type": "INTEGER"},
    {"name": "status", "type": "INTEGER"},
    {"name": "bytes", "type": "INTEGER"},
    {"name": "vhost", "type": "STRING"},
    {"name": "path", "type": "STRING"},
    {"name": "method", "type": "STRING"},
    {"name": "protocol", "type": "STRING"},
    {"name": "agent", "type": "STRING"},
    {"name": "referer", "type": "STRING"},
    {"name": "remote", "type": "RECORD", "fields": [
      {"name": "host", "type": "STRING"},
      {"name": "ip", "type": "STRING"},
      {"name": "user", "type": "STRING"}
    ]},
    {"name": "requesttime", "type": "FLOAT"},
    {"name": "bot_access", "type": "BOOLEAN"},
    {"name": "loginsession", "type": "BOOLEAN"}
  ]
</match>

This schema accepts structured JSON data like:

{
  "request":{
    "time":1391748126.7000976,
    "vhost":"www.example.com",
    "path":"/",
    "method":"GET",
    "protocol":"HTTP/1.1",
    "agent":"HotJava",
    "bot_access":false
  },
  "remote":{ "ip": "192.0.2.1" },
  "response":{
    "status":200,
    "bytes":1024
  }
}

The second method is to specify a path to a BigQuery schema file instead of listing fields. In this case, your fluent.conf looks like:

<match dummy>
  @type bigquery_insert

  ...
  
  schema_path /path/to/httpd.schema
</match>

where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery. By using external schema file you are able to write full schema that does support NULLABLE/REQUIRED/REPEATED, this feature is really useful and adds full flexbility.

The third method is to set fetch_schema to true to enable fetch a schema using BigQuery API. In this case, your fluent.conf looks like:

<match dummy>
  @type bigquery_insert

  ...
  
  fetch_schema true
  # fetch_schema_table other_table # if you want to fetch schema from other table
</match>

If you specify multiple tables in configuration file, plugin get all schema data from BigQuery and merge it.

NOTE: Since JSON does not define how to encode data of TIMESTAMP type, you are still recommended to specify JSON types for TIMESTAMP fields as "time" field does in the example, if you use second or third method.

Specifying insertId property

BigQuery uses insertId property to detect duplicate insertion requests (see data consistency in Google BigQuery documents). You can set insert_id_field option to specify the field to use as insertId property. insert_id_field can use fluentd record_accessor format like $['key1'][0]['key2']. (detail. https://docs.fluentd.org/v1.0/articles/api-plugin-helper-record_accessor)

<match dummy>
  @type bigquery_insert

  ...

  insert_id_field uuid
  schema [{"name": "uuid", "type": "STRING"}]
</match>

TODO

  • OAuth installed application credentials support
  • Google API discovery expiration
  • check row size limits

Authors

  • @tagomoris: First author, original version
  • KAIZEN platform Inc.: Maintener, Since 2014.08.19
  • @joker1007

fluent-plugin-bigquery's People

Contributors

abicky avatar cosmo0920 avatar dianthudia avatar glstephen avatar hakobera avatar hirose31 avatar joker1007 avatar kenhys avatar kiyoto avatar matsuzj avatar miyakawataku avatar mugenen avatar nagachika avatar naoya avatar okkez avatar paulbellamy avatar potato2003 avatar ryanchao2012 avatar s-tajima avatar siburu avatar tagomoris avatar threetreeslight avatar tomykaira avatar ttanimichi avatar wapa5pow avatar yoheimuta avatar yoshiso avatar yugui avatar yuya-takeyama avatar yuya373 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluent-plugin-bigquery's Issues

Warn with Buffer in Copy?

I have config like this:

<match prod.project.push.myEvent>
  @type copy

  <store>
    @type bigquery
    @id mytable
    table mytable
    dataset mydataset

    auth_method compute_engine
    project myproject
    fetch_schema true
    <buffer>
      flush_interval 0.1  # flush as frequent as possible
      buffer_queue_limit 1024        # 1MB * 10240 -> 10GB!
      flush_thread_count 16
    </buffer>
  </store>
  <store>
    @type flowcounter
    tag prod.project.metric.myEvent
    unit minute
    aggregate all
  </store>
</match>

This config is emitting this sort of error in the logs:

2017-07-07 14:34:05 +0000 [warn]: parameter 'flush_interval' in

Is this an issue with fluentd or with this plugin? Things seem to be working, but I'd like to resolve these warnings if possible.

Gem (activesupport) conflict on Ruby 2.3.0

Hi, I am using fluent docker image (https://github.com/fluent/fluentd-docker-image) which has ruby 2.3.0 and fluent 0.12.31 and I am getting error with bigquery plugin.

I am also using bigquery plugin on Amazon linux with ruby 2.0.0 and that works fine.

2016-12-18 21:22:34 +0000 [info]: reading config file path="/home/fluent/fluentd.conf"
2016-12-18 21:22:34 +0000 [info]: starting fluentd-0.12.31
2016-12-18 21:22:34 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2016-12-18 21:22:34 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-12-18 21:22:34 +0000 [info]: gem 'fluent-plugin-bigquery' version '0.3.3'
2016-12-18 21:22:34 +0000 [info]: gem 'fluent-plugin-buffer-lightening' version '0.0.2'
2016-12-18 21:22:34 +0000 [info]: gem 'fluentd' version '0.12.31'
2016-12-18 21:22:34 +0000 [info]: adding match pattern="nginx.shop-access" type="bigquery"
/usr/lib/ruby/2.3.0/rubygems/specification.rb:2284:in `raise_if_conflicts': Unable to activate activesupport-4.2.7.1, because json-2.0.2 conflicts with json (>= 1.7.7, ~> 1.7) (Gem::ConflictError)
        from /usr/lib/ruby/2.3.0/rubygems/specification.rb:1407:in `activate'
        from /usr/lib/ruby/2.3.0/rubygems/specification.rb:1441:in `block in activate_dependencies'
        from /usr/lib/ruby/2.3.0/rubygems/specification.rb:1427:in `each'
        from /usr/lib/ruby/2.3.0/rubygems/specification.rb:1427:in `activate_dependencies'
        from /usr/lib/ruby/2.3.0/rubygems/specification.rb:1409:in `activate'
        from /usr/lib/ruby/2.3.0/rubygems.rb:196:in `rescue in try_activate'
        from /usr/lib/ruby/2.3.0/rubygems.rb:193:in `try_activate'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:125:in `rescue in require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:40:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/fluent-plugin-bigquery-0.3.3/lib/fluent/plugin/out_bigquery.rb:3:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:172:in `block in try_load_plugin'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:170:in `each'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:170:in `try_load_plugin'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:130:in `new_impl'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:59:in `new_output'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:131:in `add_match'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:64:in `block in configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:57:in `each'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:57:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:86:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:129:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:103:in `run_configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:489:in `run_configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:160:in `block in start'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:366:in `main_process'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:339:in `block in supervise'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `fork'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `supervise'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:156:in `start'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:68:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:68:in `require'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/bin/fluentd:5:in `<top (required)>'
        from /usr/bin/fluentd:23:in `load'
        from /usr/bin/fluentd:23:in `<main>'
2016-12-18 21:22:34 +0000 [info]: process finished code=256
2016-12-18 21:22:34 +0000 [warn]: process died within 1 second. exit.

Support for json private key instead of p12

Does this plugin support private key in json format? Google has deprecated p12 format now in favor of json. If the plugin already supports json, please mention the config in the README.

Placeholder for schema_path

Environments

  • fluentd version: 0.14.21
  • plugin version: 1.0.0

I'm working on refactoring fluentd configurations at work.
And I thought it would be nice if placeholders can be used in schema_path like fetch_schema_table.

Ability to convert to Json ignored unknown values

We love the plugin, and we primary use for event shipping.

We use the ignore_unknown_values config option but we would like to convert all ignored values to a json string something like convert_hash_to_json option does AND we would like to name to a column eg: meta

proposed config line:

convert_ignored_unknown_json_key meta #column name to hold the JSON string
time_string meta

this would take all ignored values and with their name would complile as JSON and place in the column named meta

expected in meta (in Bigquery there is no column for host/ip/port/user:

"meta" => "{\"host\":\"remote.example\",\"ip\":\"192.0.2.1\",\"port\":12345,\"user\":\"tagomoris\"}"

for us it's really high priority, let me know once you read it, what you think.

active_support/json LoadError in v0.2.15

I have encountered this error in v0.2.15 .
This plugin requires active_support(rails)?

06:42:54 system      | sending SIGTERM to all processes
06:42:54 dockergen.1 | terminated by SIGTERM
06:42:54 fluentd.1   | exited with code 0
06:42:54             | 2016-03-01 06:42:54 +0000 [info]: shutting down fluentd
06:42:54             | 2016-03-01 06:42:54 +0000 [info]: shutting down input type="forward" plugin_id="object:2aca4b870cb8"
06:42:54             | 2016-03-01 06:42:54 +0000 [info]: shutting down input type="forward" plugin_id="object:2aca4b871960"
06:42:54             | /usr/local/lib/ruby/site_ruby/2.2.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- active_support/json (LoadError)
06:42:54             |  from /usr/local/lib/ruby/site_ruby/2.2.0/rubygems/core_ext/kernel_require.rb:55:in `require'
06:42:54             |  from /usr/local/bundle/gems/fluent-plugin-bigquery-0.2.15/lib/fluent/plugin/out_bigquery.rb:137:in `initialize'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin.rb:128:in `new'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin.rb:128:in `new_impl'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin.rb:57:in `new_output'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/out_copy.rb:42:in `block in configure'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/out_copy.rb:35:in `each'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/out_copy.rb:35:in `configure'
06:42:54             |  from /usr/local/bundle/gems/fluent-plugin-forest-0.3.0/lib/fluent/plugin/out_forest.rb:132:in `block in plant'
06:42:54             |  from /usr/local/bundle/gems/fluent-plugin-forest-0.3.0/lib/fluent/plugin/out_forest.rb:128:in `synchronize'
06:42:54             |  from /usr/local/bundle/gems/fluent-plugin-forest-0.3.0/lib/fluent/plugin/out_forest.rb:128:in `plant'
06:42:54             |  from /usr/local/bundle/gems/fluent-plugin-forest-0.3.0/lib/fluent/plugin/out_forest.rb:168:in `emit'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/event_router.rb:88:in `emit_stream'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:155:in `on_message'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:276:in `call'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:276:in `block in on_read_msgpack'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:275:in `feed_each'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:275:in `on_read_msgpack'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:261:in `call'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:261:in `on_read'
06:42:54             |  from /usr/local/bundle/gems/cool.io-1.4.3/lib/cool.io/io.rb:123:in `on_readable'
06:42:54             |  from /usr/local/bundle/gems/cool.io-1.4.3/lib/cool.io/io.rb:186:in `on_readable'
06:42:54             |  from /usr/local/bundle/gems/cool.io-1.4.3/lib/cool.io/loop.rb:88:in `run_once'
06:42:54             |  from /usr/local/bundle/gems/cool.io-1.4.3/lib/cool.io/loop.rb:88:in `run'
06:42:54             |  from /usr/local/bundle/gems/fluentd-0.12.20/lib/fluent/plugin/in_forward.rb:98:in `run'
06:42:54             | 2016-03-01 06:42:54 +0000 [info]: process finished code=256

td-agent error - permission issue for BQ Scheme

Getting this error :

[error]: fluent/supervisor.rb:184:rescue in dry_run: dry run failed: Permission denied @ rb_sysopen - /home/kpm/public_html/bq_scheme/impression_v1.json

td-agent.conf look like this:
USING [ and ] instead of < >
[match debug.**]
type stdout
[/match]

[source]
type debug_agent
bind 127.0.0.1
port 24230
[/source]
[match impression_stream]
@type bigquery
num_threads 4
flush_interval 1
buffer_queue_limit 10240
buffer_chunk_records_limit 100

auth_method private_key # default
email *************************[email protected]
private_key_path /_/****_/__/
_*******************.p12

project ********
dataset **********
fetch_schema false
auto_create_table true
table impression_table_%Y%d%m
schema_path /home/kpm/public_html/bq_scheme/impression_v1.json
[/match]

i tried to run chmod on the impression_v1.json to 777 , 0644 and also chown to : root , td-agent , www-data , user

and no luck ,
the error apear when running td-agent restart , i am running over ubuntu 14.04 and using the following plugins and fluent version:

2016-04-19 19:51:26 +0000 [info]: fluent/supervisor.rb:403:read_config: reading config file path="/etc/td-agent/td-agent.conf"
2016-04-19 19:51:26 +0000 [info]: fluent/supervisor.rb:176:dry_run: starting fluentd-0.12.12 as dry run mode
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-bigquery' version '0.2.16'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-buffer-lightening' version '0.0.2'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-mongo' version '0.7.10'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-rewrite-tag-filter' version '1.4.1'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-s3' version '0.5.9'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-scribe' version '0.10.14'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-td' version '0.10.27'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-td-monitoring' version '0.2.1'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluent-plugin-webhdfs' version '0.4.1'
2016-04-19 19:51:26 +0000 [info]: fluent/engine.rb:90:block in configure: gem 'fluentd' version '0.12.12'
2016-04-19 19:51:26 +0000 [info]: fluent/agent.rb:123:add_match: adding match pattern="debug.**" type="stdout"
2016-04-19 19:51:26 +0000 [info]: fluent/agent.rb:123:add_match: adding match pattern="impression_stream" type="bigquery"

thanks for any help!.

http proxy support

plugin is unable to use if there's http proxy required, please support http proxy

Improve schema error handling

If required field is null, this plugin raises exception.
But, this plugin raises before Buffer#emit.
Because of it, that invalid data is not redirected to secondary output plugin.
It is probrem.

Dynamic add column

BigQuery can dynamic add column, is it a good idea to add this handling in plugin?

Avoid check the table schema every time, may add this in error handling.

code Logic may like:
bq table has 5 columns

  1. generate 5 columns data -> fluentd -> bq (normal flow)
  2. generate 7 column data -> fluentd -> bq (error, trigger add column and retry)

Any suggestion?

support for multi worker like fluent-plugin-gcs

Fluentd v0.14.12 supports multi process workers.
https://www.fluentd.org/blog/fluentd-v0.14.12-has-been-released
but, this plugin was errored like this.

2017-07-26 16:49:31 +0900 [error]: config error file="/etc/td-agent/td-agent.conf" error_class=Fluent::ConfigError error="Plugin 'bigquery' does not support multi workers configurat
ion (Fluent::Plugin::BigQueryOutput)"

could you support multi process workers?

fluent-plugin-gcs may support multi process workers.
https://github.com/daichirata/fluent-plugin-gcs

Add note on spaces for field_integer/field_string/field_float/field_boolean

Great project, anyway I had some rough time trying to figure out why I keep getting null logs in Bigquery, turns out that I have spaces in between "field_integer/field_string/field_float/field_boolean".

I'm thinking that there's a high chance many new users will encounter the same issue. Hence I'm suggesting that we can either:

  1. Make a note not to have spaces for field_integer/field_string/field_float/field_boolean in the README
  2. Or clean up the spaces while processing the logs

fluent-plugin-bigquery.gemspec OLD?

Hi.

td-agent gets angry like this.

/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/specification.rb:2112:in `raise_if_conflicts': Unable to activate fluent-plugin-bigquery-0.4.4, because fluentd-0.14.16 conflicts with fluentd (~> 0.12.0) (Gem::ConflictError)

So, Is this something wrong with gem [0.4.4] ? or am i missed something ?

thanks.

  • information
    $ rpm -qa td-agent
    td-agent-2.3.4-0.el6.x86_64
    $ /opt/td-agent/embedded/bin/gem list |grep bigquery
    fluent-plugin-bigquery (0.4.4)

  • .gemspec
    $ less /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.4/fluent-plugin-bigquery.gemspec

# coding: utf-8
lib = File.expand_path('../lib', __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'fluent/plugin/bigquery/version'

Gem::Specification.new do |spec|
  spec.name          = "fluent-plugin-bigquery"
  spec.version       = Fluent::BigQueryPlugin::VERSION
  spec.authors       = ["Naoya Ito", "joker1007"]
  spec.email         = ["[email protected]", "[email protected]"]
  spec.description   = %q{Fluentd plugin to store data on Google BigQuery, by load, or by stream inserts}
  spec.summary       = %q{Fluentd plugin to store data on Google BigQuery}
  spec.homepage      = "https://github.com/kaizenplatform/fluent-plugin-bigquery"
  spec.license       = "Apache-2.0"

  spec.files         = `git ls-files`.split($/)
  spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
  spec.require_paths = ["lib"]

  spec.add_development_dependency "rake"
  spec.add_development_dependency "rr"
  spec.add_development_dependency "test-unit"
  spec.add_development_dependency "test-unit-rr"

  spec.add_runtime_dependency "google-api-client", "~> 0.9.3"
  spec.add_runtime_dependency "googleauth", ">= 0.5.0"
  spec.add_runtime_dependency "multi_json"
  spec.add_runtime_dependency "activesupport", ">= 3.2", "< 6"
  spec.add_runtime_dependency "fluentd", "~> 0.12.0"
  spec.add_runtime_dependency "fluent-mixin-plaintextformatter", '>= 0.2.1'
  spec.add_runtime_dependency "fluent-mixin-config-placeholders", ">= 0.3.0"
  spec.add_runtime_dependency "fluent-plugin-buffer-lightening", ">= 0.0.2"
end

Dynamic table name assign support

I saw the plugin could support the table name with the time format a few days ago.
But sometimes the time format table name maybe not enough.
Is it possible let plugin read table name by the fields?
For example :
table accesslog_%Y_%m_%{name}

"name" is one of the fileds in fluentd.

Release v0.3.0

  • Support load
  • Enhance fetch_schema
  • Enhance error handling and invalid record handing
  • Conceal secret json key string

Weird error with "Google::Apis::TransmissionError"

My fluentd with bigquery plugin got the error message below and it seems lead high CPU usage, anyone have any clue on this ?

2017-03-16 15:31:40 +0800 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-16 15:31:02 +0800 error_class="Google::Apis::TransmissionError" error="SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A" plugin_id="object:3f847a2f9f20"
2017-03-16 15:31:41 +0800 [warn]: retry succeeded. plugin_id="object:3f847a2f9f20"
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/ssl_socket.rb:46:in `connect'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/ssl_socket.rb:46:in `ssl_connect'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/ssl_socket.rb:24:in `create_socket'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/session.rb:746:in `block in connect'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/timeout.rb:90:in `block in timeout'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/timeout.rb:100:in `call'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/timeout.rb:100:in `timeout'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/session.rb:742:in `connect'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/session.rb:504:in `query'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient/session.rb:174:in `query'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient.rb:1240:in `do_get_block'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient.rb:1017:in `block in do_request'
2017-03-16 15:31:43 +0800 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:3f847994ec50" elapsed_time=45.6442592 slow_flush_log_threshold=40.0
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient.rb:1131:in `protect_keep_alive_disconnected'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient.rb:1012:in `do_request'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/httpclient-2.8.2.4/lib/httpclient.rb:854:in `request'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_client_adapter.rb:17:in `block in call'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/hurley-0.2/lib/hurley/client.rb:252:in `initialize'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_client_adapter.rb:16:in `new'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_client_adapter.rb:16:in `call'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/hurley-0.2/lib/hurley/client.rb:122:in `call_with_redirects'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/hurley-0.2/lib/hurley/client.rb:89:in `call'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/hurley-0.2/lib/hurley/client.rb:71:in `post'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_command.rb:272:in `execute_once'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_command.rb:107:in `block (2 levels) in execute'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:54:in `block in retriable'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:48:in `times'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:48:in `retriable'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_command.rb:104:in `block in execute'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:54:in `block in retriable'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:48:in `times'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-2.1.0/lib/retriable.rb:48:in `retriable'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/http_command.rb:96:in `execute'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/lib/google/apis/core/base_service.rb:353:in `execute_or_queue_command'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.9.28/generated/google/apis/bigquery_v2/service.rb:664:in `insert_all_table_data'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/bigquery/writer.rb:94:in `insert_rows'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/out_bigquery.rb:426:in `insert'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/out_bigquery.rb:421:in `block in _write'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/out_bigquery.rb:420:in `each'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/out_bigquery.rb:420:in `_write'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.4.0/lib/fluent/plugin/out_bigquery.rb:352:in `write'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.33/lib/fluent/buffer.rb:354:in `write_chunk'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.33/lib/fluent/buffer.rb:333:in `pop'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.33/lib/fluent/output.rb:342:in `try_flush'
  2017-03-16 15:31:40 +0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.33/lib/fluent/output.rb:149:in `run'

Big query no longer inserting: Errno::EPIPE

Hello,
I have started to see an error with the bigquery plugin. I get a Errno::EPIPE and no further details.
Is there anything that can be done to make the logging more verbose? I am already running td-agent with -vvv
Thanks,

Sean

stacktrace:

2015-06-09 09:36:17 +0000 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-06-09 09:36:11 +0000 error_class="Errno::EPIPE" error="Broken pipe" plugin_id="redir-vis"
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/openssl/buffering.rb:326:in `syswrite'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/openssl/buffering.rb:326:in `do_write'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/openssl/buffering.rb:344:in `write'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/protocol.rb:211:in `write0'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/protocol.rb:185:in `block in write'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/protocol.rb:202:in `writing'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/protocol.rb:184:in `write'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http/generic_request.rb:184:in `send_request_with_body'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http/generic_request.rb:130:in `exec'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1406:in `block in transport_request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1378:in `request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1371:in `block in request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:853:in `start'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:1369:in `request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:82:in `perform_request'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:40:in `block in call'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:32:in `call'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/response.rb:8:in `call'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.8.6/lib/google/api_client/request.rb:163:in `send'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.8.6/lib/google/api_client.rb:648:in `block (2 levels) in execute!'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.8.6/lib/google/api_client.rb:645:in `block in execute!'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.8.6/lib/google/api_client.rb:636:in `execute!'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/google-api-client-0.8.6/lib/google/api_client.rb:679:in `execute'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.8/lib/fluent/plugin/out_bigquery.rb:285:in `insert'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.8/lib/fluent/plugin/out_bigquery.rb:346:in `write'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in `write_chunk'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:in `pop'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in `try_flush'
  2015-06-09 09:36:17 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:in `run'

Not working well with copy plugin?

When I tried to use the bq plugin with the built-in copy plugin with the following td-agent.conf entry, it can insert the event into BQ but fluentd stopped forwarding events to norikra. If you comment out the store element for "to BQ", it works well. Can you please check if the BQ plugin works with the copy plugin properly?

<match nginx.access>
  type copy

  # to norikra
  <store>
    type forward
    <server>
      host 10.240.201.118
    </server>
    flush_interval 1s
  </store>

  # to BQ
  <store>
    type bigquery
    auth_method compute_engine

    project gcp-samples
    dataset gcp_samples
    table nginx3

    flush_interval 1
    buffer_chunk_records_limit 1000
    buffer_queue_limit 1024
    num_threads 50

    time_format %s
    time_field time
    field_string agent,code,host,method,path,referer,user
    field_integer time,size
  </store>

</match>

"table" doesn't work

Currently only the configuration "tables" work fine, but "table" doesn't.
Specifying "table" always cause ConfigError.

Can't insert repeated fields

Since BigQuery expects that repeated fields are stored as arrays of values, inserting records with repeated fields fails.

Having trouble with Json formatting when this plugin to send logs to BigQuery

I am using Fluentd plugin to send logs to BigQuery, but the BQ plugin seems to change the ":" to "=>" when sending a value that is a json blob for a key to BigQuery. The logs arrive in BigQuery with key=>value pairs inside that json blob instead of key:value formatting. I have the following match definitions in td-agent.conf

<match bq..>
type copy
deep_copy true

type bigquery
auth_method json_key
json_key /home/fereshteh/keys/LL-POC-9081311ba6a0.json
project my-poc
dataset MY_POC
table LogMessage
auto_create_table true
field_string body,header
buffer_queue_limit 10240
num_threads 16
buffer_type file
buffer_path /var/log/td-agent/buffer/bq


type file
path /var/log/td-agent/bq-logtextmsg.log

Using "copy" feature, I was able to verify the source part is working correctly and that the copied logs do show the correct formatting of the json logs, key:value. However, in BigQuery they show up as key=>value. Any suggestions on how to change that to use ":"? BigQuery json_extract functions doesn't like "=>" and expects ":"s.

Here is a log as was saved in bq-logtextmsg.log in the copy section:
{"body":{"asset_id":"00000000","loc_id":"76fd-7e32","sender_id":"8d512d0f ....... },"header {"topic":"LogTextMessage","time":"2015-12-03T13:12:01","host":".... }}

Here is the log as shows up in BigQuery, in two fields header and body:
"body" field value: {"asset_id"=>"00000000", "loc_id"=>"76fd-7e32", "sender_id"=>".......
"header" field value: {"topic"=>"LogTextMessage", "time"=>"2015-12-03T13:12:01", "host"=>"s......

As you can see the colons in the content of the header and body that was used between key / value pairs has been converted to "=>" .

Any suggestions?

certificate verify failed on windows

Seems to be an overall ruby problem on windows with certificate validation, just wondering is there an easy way to turn off verification?

After looking around seems that it is known issue, and might be fixed by downloading cacert.pem and pointing to this file via SSL_CERT_FILE environment variable, not sure if this is good solution but in case it will help someone

Configurable wait_interval

Environments

  • fluentd version: 14
  • plugin version: 1.0.0

I'm considering making wait_interval configurable in the version of fluent-plugin-bigquery that I am using. The current wait time for jobs is too short and my debug view is very very chatty. Would you like a PR for this?

Thread Errors - Deadlock? Other source?

I'm seeing errors in fluent-plugin-bigquery that seem similar to deadlock issues that have been reporting in general with FluentD 14. However, since the errors are specifically referencing fluent-plugin-bigquery I wanted to drop them in here. Things to know: Restart takes a long time and finally dies with these errors.

2017-08-24 16:41:41 +0000 [warn]: #1 [bqPumpBidWon] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007efd4a26fc20@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqPumpBidResponse] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007f264245c210@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqPumpBidRequested] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007efd4a25e3f8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqPumpBidTimeout] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007f2642466a08@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:enqueue_thread thread=#<Thread:0x007efd4a23f0e8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_2 thread=#<Thread:0x007f264242faa8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_2 thread=#<Thread:0x007efd4a245718@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_3 thread=#<Thread:0x007efd4a244368@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ForwardInput title=:event_loop thread=#<Thread:0x007efd4a237c08@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqPumpBidResponse] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007efd4a267c78@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_1 thread=#<Thread:0x007efd4a247450@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqPumpBidTimeout] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007efd4a27b958@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #1 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007efd4a24e1d8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ForwardInput title=:event_loop thread=#<Thread:0x007f26424136f0@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007f2642435020@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_3 thread=#<Thread:0x007f264242eec8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqauctionInit] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_1 thread=#<Thread:0x007f2642434508@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqPumpBidRequested] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007f264244dcd8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqPumpBidWon] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:flush_thread_0 thread=#<Thread:0x007f26424647d0@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2017-08-24 16:41:41 +0000 [warn]: #0 [bqPumpBidRequested] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::BigQueryOutput title=:enqueue_thread thread=#<Thread:0x007f264244d3a0@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.20/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil

Error occurs in ruby-2.1.x

Hi,

Though I tried fluent-plugin-bigquery-0.2.4, it seems to not work in ruby-2.1.4-p265. But it works in ruby-2.0.x.

fluent-plugin-bigquery does not support ruby-2.1.x yet?

2014-11-05 16:16:27 +0900 [error]: unexpected error error_class=ArgumentError error=#<ArgumentError: unknown keyword: interval>
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/google-api-client-0.7.1/lib/google/api_client.rb:595:in `execute!'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/google-api-client-0.7.1/lib/google/api_client.rb:330:in `discovery_document'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/google-api-client-0.7.1/lib/google/api_client.rb:375:in `discovered_api'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.4/lib/fluent/plugin/out_bigquery.rb:205:in `start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/match.rb:40:in `start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/engine.rb:263:in `block in start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/engine.rb:262:in `each'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/engine.rb:262:in `start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/engine.rb:213:in `run'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:464:in `run_engine'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:135:in `block in start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:250:in `call'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:250:in `main_process'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:225:in `block in supervise'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:224:in `fork'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:224:in `supervise'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/supervisor.rb:128:in `start'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/lib/fluent/command/fluentd.rb:164:in `<top (required)>'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/fluentd-0.10.56/bin/fluentd:6:in `<top (required)>'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/bin/fluentd:23:in `load'
  2014-11-05 16:16:27 +0900 [error]: /Users/bokko/.rbenv/versions/2.1.4/bin/fluentd:23:in `<main>'

The followings are my configurations.

fluentd.conf

<source>
  type forward
  port 24224
</source>

<match access.log>
  type bigquery

  method insert

  flush_interval 1

  auth_method private_key
  email [email protected]
  private_key_path /path/xxx.p12

  project yourproject-id
  dataset yourdataset-id
  table tablename

  time_format %s
  time_field time
  schema_path /path/schema.json
</match>

schema.json

[
  {
    "name": "time",
    "type": "INTEGER"
  },
  {
    "name": "uri",
    "type": "STRING"
  }
]

Feature Request: Log Load Failures or Move Failed Logs

We are moving many of our streaming jobs to load jobs because their volume of data has become very large. In streaming inserts a bad insert would fail and be put back into the retry queue. Load jobs are simply deleted. Is it possible to log these or move the buffer file somewhere on failure? Or is there some other way to handle this?

Auto table creation fails when table partition is specified

If auto_create_table is true and table contains parition specifier, auto table creation fails. For example, if you have following config

<match test.**>
  @type bigquery
  method load

  buffer_type memory

  auth_method json_key
  json_key json_key.json
  project your-proejct-id
  dataset test

  auto_create_table true
  time_slice_format %Y%m%d
  table test$%{time_slice}
  time_partitioning_type day
  schema_path sample.schema
</match>

and the table test does not exist, the plugin tries to create table test$20161212 and it fails with following error:

2016-12-12 11:46:58 +0900 [error]: tables.insert API
  project_id="your-project-id" dataset="test" table="test$20161212" code=400
  message="invalid: Invalid table ID \"test$20161212\". Table IDs must be
  alphanumeric (plus underscores) and must be at most 1024 characters long."
  reason=nil

Thanks.

Secondary Output Not Working in FluentD 14?

I'm working on converting a V12 config over to V14 because I need to dynamically name the tables. Everything is working for the most part, but the secondary output will not work regardless of what I try. My config is below. This is one of two elements in a that are copied. The other if a flowcounter. I left that surround bit out.

I've disable "ignore_unknown_values true" to force some extra data to fail the load job.

Can you provide any guidance here? I've tried a ton of configs and none work. The jobs are buffered, but never fail over to the failed folder. This worked well in V12, but I need the dynamic table naming for load jobs.

<store>
    @type bigquery
    @id bqTablePump
    #auth_method application_default
    auth_method json_key
    json_key /vagrant/creds.json

    project project
    dataset dataset
    table ${event_type}_${bq_guid}

    fetch_schema true
    max_bad_records 5
    #ignore_unknown_values true

    path bigquery.${event_type}.${bq_guid}.*.log

    <buffer event_type, bq_guid>
      @type file
      flush_interval 30s
      flush_at_shutdown true
      timekey_use_utc
    </buffer>

     <secondary>
        @type file # or forward
        path /var/log/td-agent/buffer/failed
     </secondary>
  </store>

I think I found the source of the issue. In Version 1.0 the plugin tries this to handle retries. It never seems to flush the buffers though. Like it does in the previous version.

Version for FluentD14

        rescue Fluent::BigQuery::Error => e
          raise if e.retryable?


          if @secondary
            # TODO: find better way
            @retry = retry_state_create(
              :output_retries, @buffer_config.retry_type, @buffer_config.retry_wait, @buffer_config.retry_timeout,
              forever: false, max_steps: @buffer_config.retry_max_times, backoff_base: @buffer_config.retry_exponential_backoff_base,
              max_interval: @buffer_config.retry_max_interval,
              secondary: true, secondary_threshold: Float::EPSILON,
              randomize: @buffer_config.retry_randomize
            )
          else
            @retry = retry_state_create(
              :output_retries, @buffer_config.retry_type, @buffer_config.retry_wait, @buffer_config.retry_timeout,
              forever: false, max_steps: 0, backoff_base: @buffer_config.retry_exponential_backoff_base,
              max_interval: @buffer_config.retry_max_interval,
              randomize: @buffer_config.retry_randomize
            )
          end


          raise
        end

For FluentD 12

      rescue Fluent::BigQuery::Error => e
        if e.retryable?
          raise e
        elsif @secondary
          flush_secondary(@secondary)

Cannot load such file "bigdecimal/util"

I am getting new error after activesupport update (#103)

~ $ fluentd -c /home/fluent/fluentd.conf -p /fluentd/plugins
2016-12-19 11:10:41 +0000 [info]: reading config file path="/home/fluent/fluentd.conf"
2016-12-19 11:10:41 +0000 [info]: starting fluentd-0.12.31
2016-12-19 11:10:42 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2016-12-19 11:10:42 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-12-19 11:10:42 +0000 [info]: gem 'fluent-plugin-bigquery' version '0.3.4'
2016-12-19 11:10:42 +0000 [info]: gem 'fluent-plugin-buffer-lightening' version '0.0.2'
2016-12-19 11:10:42 +0000 [info]: gem 'fluentd' version '0.12.31'
2016-12-19 11:10:42 +0000 [info]: adding match pattern="nginx.shop-access" type="bigquery"
/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- bigdecimal/util (LoadError)
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/activesupport-5.0.0.1/lib/active_support/core_ext/big_decimal/conversions.rb:2:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/activesupport-5.0.0.1/lib/active_support/core_ext/object/json.rb:4:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/activesupport-5.0.0.1/lib/active_support/json/encoding.rb:1:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/activesupport-5.0.0.1/lib/active_support/json.rb:2:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /home/fluent/.gem/ruby/2.3.0/gems/fluent-plugin-bigquery-0.3.4/lib/fluent/plugin/out_bigquery.rb:167:in `initialize'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:132:in `new'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:132:in `new_impl'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/plugin.rb:59:in `new_output'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:131:in `add_match'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:64:in `block in configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:57:in `each'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/agent.rb:57:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:86:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:129:in `configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:103:in `run_configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:489:in `run_configure'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:160:in `block in start'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:366:in `main_process'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:339:in `block in supervise'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `fork'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `supervise'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:156:in `start'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.12.31/bin/fluentd:5:in `<top (required)>'
        from /usr/bin/fluentd:23:in `load'
        from /usr/bin/fluentd:23:in `<main>'
2016-12-19 11:10:42 +0000 [info]: process finished code=256
2016-12-19 11:10:42 +0000 [warn]: process died within 1 second. exit.

Support for formatters

Hi there :)

I was wondering if you could enhance this plugin with the ability to support formatters.

The simple use case is that anyone who wish to send data to BigQuery in the Avro format is unable to do at the moment. There's already an Avro formatter plugin but that plugin cannot be used together with this plugin.

Thank you and best wishes,
Luki

Update the document for 100k rps support

Now BQ Streaming supports 100k rps insertion. Can you update the document to be suitable for it? e.g. the best parameter values for buffer_chunk_records_limit, buffer_queue_limit, and num_threads for the 100k rps insertion.

Setup instructions

Environments

  • fluentd version: fluentd-0.12.35
  • plugin version: N/A

I recently installed fluentd on ubuntu-trusty, and wondering how can I install this plugin. I am new to Fluentd setup.

Appreciate any help.

Thank you,
Ashish

REPEATED functionality needed

We are actively using for event shipping the plugin, and now our roles field was changed recently in REPEATED mode. As far I know the plugin doesn't support yet repeated fields. Can this feature be added as high priority?

Plugin with PHP input?

I am trying to test this fluentd plugin with the php input however I get the following problem:

2016-06-03 12:06:06 -0400 [warn]: temporarily failed to flush the buffer. next_retry=2016-06-03 12:06:07 -0400 error_class="MultiJson::ParseError" error="invalid comment format at line 1, column 1 [parse.c:96]" plugi$
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/multi_json-1.11.0/lib/multi_json/adapters/oj.rb:15:in `load'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/multi_json-1.11.0/lib/multi_json/adapters/oj.rb:15:in `load'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/multi_json-1.11.0/lib/multi_json/adapter.rb:21:in `load'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/multi_json-1.11.0/lib/multi_json.rb:119:in `load'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/googleauth-0.5.1/lib/googleauth/service_account.rb:75:in `read_json_key'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/googleauth-0.5.1/lib/googleauth/service_account.rb:59:in `make_creds'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:272:in `client'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:419:in `insert'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:414:in `block in _write'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:413:in `each'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:413:in `_write'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-bigquery-0.2.16/lib/fluent/plugin/out_bigquery.rb:366:in `write'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
  2016-06-03 12:06:06 -0400 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'

The data im sending does have some json encoded data (inside the json) as such:

$array = [
 'one'=>'{"something": [1,2,3,4]}',
 'two'=>2
];

Any insight on what might cause this issue?

queue_length_limit is ignored

Error "queue_length_limit" ignore happens with latest fluent v0.14.20 & fluent-plugin-bigquery v1.0.0.

Config:

<match filtered>
  @type  bigquery
  method insert

  <buffer>
    @type               file
    path                /var/log/td-agent/buffer/fluentd_log_buffer
    flush_interval      1
    queue_length_limit  10000
    chunk_limit_size    1M
    chunk_records_limit 200
  </buffer>
:
:

error:

[warn]: parameter 'queue_length_limit' in 
@type "file"
path "/var/log/td-agent/buffer/fluentd_log_buffer"
flush_interval 1
queue_length_limit 10000
chunk_limit_size 1M
chunk_records_limit 200
flush_mode interval
flush_thread_interval 0.05
flush_thread_burst_interval 0.05
total_limit_size 1073741824
is not used.

Support for Record's timestamp key in formatting Table Id

Hi,

We have a requirement where we want to be sure that records are going into the correct bigquery table based upon their timestamp.
Currently, table ids are formatted based upon the local time of fluentd server.
"Note that the timestamp of logs and the date in the table id do not always match, because there is a time lag between collection and transmission of logs." I would like to know the reason behind this limitation. Also, I'm not familiar with Ruby, would it be possible to make some minor changes in the plugin and achieve the desired?

Thanks for your help.

v0.2.15 conflicts with fluent-plugin-td-monitoring

STDERR: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/specification.rb:2064:in `raise_if_conflicts': Unable to activate google-api-client-0.9.1, because httpclient-2.5.3.3 conflicts with httpclient (~> 2.7) (Gem::LoadError)

Reason:
td-agent 2.3.0 buildin plugin fluent-plugin-td-monitoring requires httpclient [">= 2.4.0", "< 2.6.0"]. But fluent-plugin-bigquery requires httpclient '~> 2.7' by google-api-client 0.9.1.

lexical error: invalid bytes in UTF8 string

periodically I do catch following error:

2017-07-29 11:46:43 +0300 [warn]: #0 plugin/output.rb:1096:block in update_retry_state: failed to flush the buffer. retry_time=0 next_retry_seconds=2017-07-29 11:46:43 +0300 chunk="55570d4efaf3ca91e7b235dffef577db" error_class=MultiJson::ParseError error="lexical error: invalid bytes in UTF8 string.\n          w=1&hl=ru&ie=windows-1251&q=\xF0\xEE\xE1\xEE\xF2\xE0 ua\",\"host\":\"rabota.ua\",\"s\n                     (right here) ------^\n"

not sure is it bigquery or fluentd or ruby issue

after looking around I have added .scrub.strip to all fields which may cause this issue in record_transformer filter but still with no luck

Only Send Fields Matching the Schema

Is it possible to add, or does the plugin already support, the ability to ignore fields that don't match the schema? We have some instances where inconsistent data flows into our fluentd. I would rather takes the fields that match, and drop just the fields, while taking all other fields that do match the schema. Is this possible? Basically, allow the schema to act like a sieve, instead of a block.

Could you bump up the gem version?

I'd like to use features added since v0.2.16 (especially my pull req, #82).
So if there's any blocker, could you bump up the gem version and publish to rubygems.org?

Retryable errors should be logged at warning level

Hi, thank you for developing a useful plugin.

The BigQuery API often returns 500 or 503 errors and in most cases the API call is successful with retry.
If the API call fails, errors and warnings are logged as follows:

fluent.error

2017-04-02T21:16:41+09:00       fluent.error    {"project_id":"XXXXX","dataset":"server_logs","table":"reverse_proxy_nginx_access_log$20170402","code":503,"message":"tabledata.insertAll API project_id=\"XXXXX\" dataset=\"server_logs\" table=\"reverse_proxy_nginx_access_log$20170402\" code=503 message=\"Server error\" reason=nil","reason":null}

fluent.warn

2017-04-02T21:16:41+09:00       fluent.warn     {"next_retry":"2017-04-02 21:16:36 +0900","error_class":"Fluent::BigQuery::RetryableError","error":"Server error","plugin_id":"bigquery_reverse_proxy_nginx_access_log","message":"temporarily failed to flush the buffer. next_retry=2017-04-02 21:16:36 +0900 error_class=\"Fluent::BigQuery::RetryableError\" error=\"Server error\" plugin_id=\"bigquery_reverse_proxy_nginx_access_log\""}
2017-04-02T21:16:42+09:00       fluent.warn     {"plugin_id":"bigquery_reverse_proxy_nginx_access_log","message":"retry succeeded. plugin_id=\"bigquery_reverse_proxy_nginx_access_log\""}

I'm monitoring the fluent error log, and it is noisy that the log is often recorded at error level.
Of course it depends on workload but in my environment it will occur about once every few days.

I know that it is natural to record as the error because the BigQuery API returned the error.
However, the fluentd itself seems to output an error if the retry limit is reached and the retry fails.
So I think that the warning level is suitable for retryable errors.
What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.