GithubHelp home page GithubHelp logo

googleapis / nodejs-bigquery Goto Github PK

View Code? Open in Web Editor NEW
457.0 52.0 204.0 6.66 MB

Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.

Home Page: https://cloud.google.com/bigquery/

License: Apache License 2.0

JavaScript 0.86% TypeScript 99.01% Python 0.12%
nodejs database sql bigquery

nodejs-bigquery's Introduction

Google Cloud Platform logo

release level npm version

Google BigQuery Client Library for Node.js

A comprehensive list of changes in each version may be found in the CHANGELOG.

Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained.

Table of contents:

Quickstart

Before you begin

  1. Select or create a Cloud Platform project.
  2. Enable the Google BigQuery API.
  3. Set up authentication with a service account so you can access the API from your local workstation.

Installing the client library

npm install @google-cloud/bigquery

Using the client library

// Imports the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');

async function createDataset() {
  // Creates a client
  const bigqueryClient = new BigQuery();

  // Create the dataset
  const [dataset] = await bigqueryClient.createDataset(datasetName);
  console.log(`Dataset ${dataset.id} created.`);
}
createDataset();

Samples

Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.

Sample Source Code Try it
Add Column Load Append source code Open in Cloud Shell
Add Column Query Append source code Open in Cloud Shell
Add Empty Column source code Open in Cloud Shell
Auth View Tutorial source code Open in Cloud Shell
Browse Table source code Open in Cloud Shell
Cancel Job source code Open in Cloud Shell
Client JSON Credentials source code Open in Cloud Shell
Copy Table source code Open in Cloud Shell
Copy Table Multiple Source source code Open in Cloud Shell
Create Dataset source code Open in Cloud Shell
Create Job source code Open in Cloud Shell
Create Model source code Open in Cloud Shell
Create Routine source code Open in Cloud Shell
Create Routine DDL source code Open in Cloud Shell
Create Table source code Open in Cloud Shell
Create Table Clustered source code Open in Cloud Shell
Create Table Column ACL source code Open in Cloud Shell
Create Table Partitioned source code Open in Cloud Shell
Create Table Range Partitioned source code Open in Cloud Shell
Create View source code Open in Cloud Shell
Ddl Create View source code Open in Cloud Shell
Delete Dataset source code Open in Cloud Shell
Delete Label Dataset source code Open in Cloud Shell
Delete Label Table source code Open in Cloud Shell
Delete Model source code Open in Cloud Shell
Delete Routine source code Open in Cloud Shell
Delete Table source code Open in Cloud Shell
Extract Table Compressed source code Open in Cloud Shell
Extract Table JSON source code Open in Cloud Shell
Extract Table To GCS source code Open in Cloud Shell
Get Dataset source code Open in Cloud Shell
Get Dataset Labels source code Open in Cloud Shell
Get Job source code Open in Cloud Shell
BigQuery Get Model source code Open in Cloud Shell
Get Routine source code Open in Cloud Shell
BigQuery Get Table source code Open in Cloud Shell
Get Table Labels source code Open in Cloud Shell
Get View source code Open in Cloud Shell
Insert Rows As Stream source code Open in Cloud Shell
Inserting Data Types source code Open in Cloud Shell
BigQuery Label Dataset source code Open in Cloud Shell
Label Table source code Open in Cloud Shell
List Datasets source code Open in Cloud Shell
List Datasets By Label source code Open in Cloud Shell
List Jobs source code Open in Cloud Shell
BigQuery List Models source code Open in Cloud Shell
BigQuery List Models Streaming source code Open in Cloud Shell
List Routines source code Open in Cloud Shell
List Tables source code Open in Cloud Shell
Load CSV From GCS source code Open in Cloud Shell
Load CSV From GCS Autodetect source code Open in Cloud Shell
Load CSV From GCS Truncate source code Open in Cloud Shell
Load JSON From GCS source code Open in Cloud Shell
Load JSON From GCS Autodetect source code Open in Cloud Shell
Load JSON From GCS Truncate source code Open in Cloud Shell
Load Local File source code Open in Cloud Shell
Load Orc From GCS Truncate source code Open in Cloud Shell
Load Parquet From GCS Truncate source code Open in Cloud Shell
Load Table Clustered source code Open in Cloud Shell
Load Table GCS Avro source code Open in Cloud Shell
Load Table GCS Avro Truncate source code Open in Cloud Shell
Load Table GCSORC source code Open in Cloud Shell
Load Table GCS Parquet source code Open in Cloud Shell
Load Table Partitioned source code Open in Cloud Shell
Load Table URI Firestore source code Open in Cloud Shell
Nested Repeated Schema source code Open in Cloud Shell
Query source code Open in Cloud Shell
Query Batch source code Open in Cloud Shell
Query Clustered Table source code Open in Cloud Shell
Query Destination Table source code Open in Cloud Shell
Query Disable Cache source code Open in Cloud Shell
Query Dry Run source code Open in Cloud Shell
Query External GCS Perm source code Open in Cloud Shell
Query External GCS Temp source code Open in Cloud Shell
Query Legacy source code Open in Cloud Shell
Query Legacy Large Results source code Open in Cloud Shell
Query Pagination source code Open in Cloud Shell
Query Params Arrays source code Open in Cloud Shell
Query Params Named source code Open in Cloud Shell
Query Params Named Types source code Open in Cloud Shell
Query Params Positional source code Open in Cloud Shell
Query Params Positional Types source code Open in Cloud Shell
Query Params Structs source code Open in Cloud Shell
Query Params Timestamps source code Open in Cloud Shell
Query Stack Overflow source code Open in Cloud Shell
Quickstart source code Open in Cloud Shell
Relax Column source code Open in Cloud Shell
Relax Column Load Append source code Open in Cloud Shell
Relax Column Query Append source code Open in Cloud Shell
Remove Table Clustering source code Open in Cloud Shell
Set Client Endpoint source code Open in Cloud Shell
Set User Agent source code Open in Cloud Shell
Table Exists source code Open in Cloud Shell
Undelete Table source code Open in Cloud Shell
Update Dataset Access source code Open in Cloud Shell
Update Dataset Description source code Open in Cloud Shell
Update Dataset Expiration source code Open in Cloud Shell
BigQuery Update Model source code Open in Cloud Shell
Update Routine source code Open in Cloud Shell
Update Table Column ACL source code Open in Cloud Shell
Update Table Description source code Open in Cloud Shell
Update Table Expiration source code Open in Cloud Shell
Update View Query source code Open in Cloud Shell

The Google BigQuery Node.js Client API Reference documentation also contains samples.

Supported Node.js Versions

Our client libraries follow the Node.js release schedule. Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:

  • Legacy versions are not tested in continuous integration.
  • Some security patches and features cannot be backported.
  • Dependencies cannot be kept up-to-date.

Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags. The dist-tags follow the naming convention legacy-(version). For example, npm install @google-cloud/bigquery@legacy-8 installs client libraries for versions compatible with Node.js 8.

Versioning

This library follows Semantic Versioning.

This library is considered to be stable. The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.

More Information: Google Cloud Platform Launch Stages

Contributing

Contributions welcome! See the Contributing Guide.

Please note that this README.md, the samples/README.md, and a variety of configuration files in this repository (including .nycrc and tsconfig.json) are generated from a central template. To edit one of these files, make an edit to its templates in directory.

License

Apache Version 2.0

See LICENSE

nodejs-bigquery's People

Contributors

alexander-fenster avatar alixhami avatar alvarowolfx avatar bcoe avatar c0b avatar callmehiphop avatar crwilcox avatar dpebot avatar fhinkel avatar gcf-owl-bot[bot] avatar greenkeeper[bot] avatar iida-hayato avatar jkwlui avatar jmdobry avatar justinbeckwith avatar laljikanjareeya avatar loferris avatar lukesneeringer avatar meredithslota avatar release-please[bot] avatar renovate-bot avatar renovate[bot] avatar shollyman avatar sofisl avatar steffnay avatar stephenplusplus avatar surferjeffatgoogle avatar tswast avatar yoshi-automation avatar zamnuts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nodejs-bigquery's Issues

Invalid timestamp value

I'm seeing this locally and in CI:

 1) BigQuery
       BigQuery/Table
         importing & exporting
           should start export data to a storage file:
     ApiError: Invalid timestamp value: 1414634759012000000
      at src\job.js:427:13
      at node_modules\@google-cloud\common\src\service-object.js:274:5
      at Object.handleResp (node_modules\@google-cloud\common\src\util.js:140:3)
      at node_modules\@google-cloud\common\src\util.js:496:12
      at Request.onResponse [as _callback] (node_modules\retry-request\index.js:191:7)
      at Request.self.callback (node_modules\request\request.js:186:22)
      at Request.<anonymous> (node_modules\request\request.js:1163:10)
      at Gunzip.<anonymous> (node_modules\request\request.js:1085:12)
      at endReadableNT (_stream_readable.js:1054:12)
      at _combinedTickCallback (internal/process/next_tick.js:138:11)
      at process._tickCallback (internal/process/next_tick.js:180:9)

Convert to TypeScript

Hello, I'm trying to use this module with typescript, but found out than it doesn't have typescript support, like google-cloud/storage does.

Is typescript planed to be supported in the future, and also, is there any workaround?

Job.prototype.poll_ does not callback with err when job failed.

From @ziplokk1 on November 15, 2017 20:57

Description:
Job.prototype.poll_ does not callback with err when job failed.
This is causing issues with Job.promise since the promise will always resolve instead of reject

Problem snippet:

/**
 * Poll for a status update. Execute the callback:
 *
 *   - callback(err): Job failed
 *   - callback(): Job incomplete
 *   - callback(null, metadata): Job complete
 *
 * @private
 *
 * @param {function} callback
 */
Job.prototype.poll_ = function(callback) {
  this.getMetadata(function(err, metadata, apiResponse) {
    if (!err && apiResponse.status && apiResponse.status.errors) {
      // ******************************************
      // Here is where the issue lies.
      // For some reason err is undefined which is 
      // causing the following if statement condition to 
      // be false, thus not calling the callback with the error.
      // 
      // The solution is just putting `new` before common.util.ApiError.
      // err = new common.util.ApiError(apiResponse.status);
      err = common.util.ApiError(apiResponse.status);
    }
    if (err) {
      callback(err);
      return;
    }

    if (metadata.status.state !== 'DONE') {
      callback();
      return;
    }

    callback(null, metadata);
  });
};

Environment Details:

  • OS: agnostic
  • Node.js version: v6.11.2
  • npm version: 3.10.10
  • @google-cloud/bigquery version: 0.10.0

Steps to reproduce:

const bigquery = require('@google-cloud/bigquery')();
// Query a table that doesn't exist to cause the job to fail.
bigquery.startQuery({query: "SELECT * FROM dataset.abc_this_table_doesnt_exist;"})
    .then(([job, apiResponse]) => {
        job.on('error', function (err) {
            // Never called
            console.error(err);
        });
        job.on('complete', function (meta) {
            console.log('success');
            console.log(meta.status.errors);
        });
    });

Copied from original issue: googleapis/google-cloud-node#2745

TypeScript Typings

Adding TypeScript typings will make the library easier to consume for TypeScript users, and it will also enable intellisense for JavaScript users (at least in Visual Studio Code).

Basically this means creating a .d.ts file with all the function signatures, pointing to it from the typings field of your package.json, and publishing it to NPM. I can help and try to work on a PR, if you are interested.

Also, here is a partial typings file I'm currently using for my own project:

declare module '@google-cloud/bigquery' {
    interface IInsertResponse {
        kind: 'bigquery#tableDataInsertAllResponse';
        insertErrors?: [{
            'index': number,
            'errors': [{
                'reason': string,
                'location': string,
                'debugInfo': string,
                'message': string,
            }],
        }];
    }

    class Table {
        public insert(rows: any[]): Promise<IInsertResponse[] | null>;
    }

    class Dataset {
        public table(name: string): Table;
    }

    class Bigquery {
        public query(options: { query: string, params?: any[], maxResults?: number }): Promise<[any[]]>;
        public dataset(name: string): Dataset;
    }

    function create(options?: any): Bigquery;

    namespace create {
    }

    export = create;
}

bigquery stream (insert) directly into date partition table leaves rows with _PARTITIONTIME column NULL

I'm trying go stream data into partition tables as documented in the official docs.

When streaming data into a partition using a table decorator (table_name$YYYYMMDD) the rows are correctly added and a successful response returned:

[{"kind":"bigquery#tableDataInsertAllResponse"}]
[ [ { "name": "test" } ] ] // table.getRows()

If the same operation is performed directly to the table and not a partition (table_name) the response is the same and the rows are added from the Node.js API view

[{"kind":"bigquery#tableDataInsertAllResponse"}]
[ [ { "name": "test" }, { "name": "test" } ] ] // table.getRows()

But if I use the WebUI I get the following results:

SELECT * FROM [bicg-mosaic:test_dataset.test_table] 
WHERE _PARTITIONTIME >= "2017-12-12 00:00:00" AND _PARTITIONTIME < "2017-12-13 00:00:00" 

1 row

SELECT * FROM [bicg-mosaic:test_dataset.test_table] 
WHERE IS_NULL(_PARTITIONTIME) 

1 row

By the official docs I understand that both rows should have the same date in _PARTITIONTIME.

Is this a misunderstanding on how it should work or is it bug in the Node.js library?

sample code

const table_name = process.env.TABLE || 'test_table'

const table = bigquery.dataset('test_dataset').table(table_name)

const data = [
  {
    name: 'test'
  }
]

table.insert(data, {}).then((response) => {
  console.log(response)
}).then(()=>{
  return table.getRows({})
}).then((response)=> {
  console.log(response)
}).catch((err) => {
  console.error('Error: %o', err)
})

Environment details

  • OS: macOS
  • Node.js version: v8.9.0
  • npm version: 5.5.1
  • @google-cloud/bigquery version: 0.11.1

Steps to reproduce

  1. Execute above code using TABLE=test_table node index.js
  2. Execute above code again using TABLE=test_table\$20171212 node index.js
  3. Check WebUI to verify results

Many thanks!

Support access token authentication

Hi,

I did some research about this topic but i did not find clear explanations about all the different authentication methods for biquery nodejs client.

In my understanding it's possible to authenticate only via a service account. I don't see any option to login with Oauth and pass an access_token anywhere.

Could you please clarify ? Thanks !

Regards

Edit: Found this related issue: googleapis/nodejs-common#11

Add support for BigQuery customer-managed encryption keys

Copied from original issue: googleapis/google-cloud-node#2801

@choenden
February 6, 2018 4:33 PM

BigQuery customer-managed encryption keys allow users to specify a Cloud KMS key to protect their BigQuery table.

API:
Jobs: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
In a job, there is a destinationEncryptionConfiguration field, which indicates which Cloud KMS key should be used for the destination.

Tables: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables
In a table, there is a encryptionConfiguration field, which indicates which Cloud KMS key protects (or should protect in case of CreateTable) a BigQuery table.

These are the main APIs that are required for day-to-day interaction.
With lower priority, support for getServiceAccount would also be nice: https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/getServiceAccount
Note that unlike the other methods mentioned above, this would generally only be called once and the resulting value (the email address) does not change - so it can easily also be called from UI/API/CLI without much hindrance (hence lower priority).

Feature request: Wildcard query matching no tables should not throw an error

When executing wildcard queries using REGEXP_MATCH, and matching no tables, the library currently throws an exception stating that no tables have been matched. This does not feel correct, as matching no tables are very often a valid use-case. Just returning no rows feels much more natural. Or maybe add a parameter that we can pass in to prevent the exception being thrown.

bigquery:createWriteStream() should propogate insertErrors

From @chrishiestand on September 29, 2017 4:30

The gist is bigquery sometimes silently drops one or more streams when many streams are used in parallel (42 in this example). If I'm hitting a quota an error ought to be thrown, but no errors are thrown the program runs as expected but in the end there are rows missing from the bigquery data.

If there is a bug, it might be in gcloud-node, or it might be in the bigquery api. Both seem less likely than me having made a mistake, so I hope you can find something I've done wrong.

The tricky part is the bug doesn't always reproduce. When the bug does reproduce n streams of size s are dropped so bigquery will have n * s missing rows. So if n=2 and s=150 bigquery will be missing 300 rows. In other words, the problem does not appear to be that a subset of stream data is missing, but rather one or more entire streams are missing.

This seems to reproduce reliably sometimes, and other times it reliably does not reproduce. To try and get the opposite result, try again later and/or change the stream load with the env variables.

This small bug reproduction project https://github.com/chrishiestand/gcloud-node-bigquery-manystreams-bugis the result of troubleshooting missing big query data in a production system where 50 streams are processed in parallel.

Below is a screenshot of the reproduction repository showing a reproduction of the issue.

image

In contrast, if I reduce the number of streams from 42 to 10, the tests pass as below:
image

Environment details

  • OS: OS X 10.12.6
  • Node.js version: 8.6.0
  • npm version: 5.3.0
  • google-cloud-node version: @google-cloud/bigquery = 0.9.6

Steps to reproduce

Please go here for a reproduction project: https://github.com/chrishiestand/gcloud-node-bigquery-manystreams-bug

Copied from original issue: googleapis/google-cloud-node#2635

Job ID system test failure

@callmehiphop could you take a look?

https://circleci.com/gh/googleapis/nodejs-bigquery/1524:

  1) BigQuery
       should honor the job id option:
     Uncaught ApiError: Already Exists: Job long-door-651:US.hi-im-a-job-id
      at Object.parseHttpRespBody (node_modules/@google-cloud/common/src/util.js:193:30)
      at Object.handleResp (node_modules/@google-cloud/common/src/util.js:131:18)
      at /root/project/node_modules/@google-cloud/common/src/util.js:496:12
      at Request.onResponse [as _callback] (node_modules/retry-request/index.js:195:7)
      at Request.self.callback (node_modules/request/request.js:186:22)
      at Request.<anonymous> (node_modules/request/request.js:1163:10)
      at Gunzip.<anonymous> (node_modules/request/request.js:1085:12)
      at endReadableNT (_stream_readable.js:1064:12)
      at _combinedTickCallback (internal/process/next_tick.js:138:11)
      at process._tickCallback (internal/process/next_tick.js:180:9)

bigquery: consider moving from getQueryResults to listTableData

From @pongad on October 31, 2017 3:35

The current query method uses getQueryResults RPC to read rows of the results. However, benchmark shows that listTableData run significantly faster. Experiments in Go shows getQueryResults take 70% longer.

The choice of RPC is an implementation detail and can be changed whenever. However, listTableData returns fewer fields. In Java, we decided to tweak the API surface to make it easier to change RPC later. Of course, the change to API needs to happen before GA.

Will Node require any API change?

Copied from original issue: googleapis/google-cloud-node#2709

TypeError: Cannot read property 'constructor' of null during insert

Environment details

Running in Firebase Cloud Functions

  • @google-cloud/bigquery version: 0.11.0

Steps to reproduce

Feed any object containing null values to insert function like this:

  const BigQuery = require('@google-cloud/bigquery')
  const bq = BigQuery({ projectId: 'prjctd' })

  bq
    .dataset('ds')
    .table('tbl')
    .insert([{
      'first': 'A value',
      'second': null
    }])

Here is the stacktrace:

TypeError: Cannot read property 'constructor' of null
    at Function.Table.encodeValue_ (/user_code/node_modules/@google-cloud/bigquery/src/table.js:286:30)
    at /user_code/node_modules/@google-cloud/bigquery/src/table.js:303:24
    at Array.reduce (native)
    at Function.Table.encodeValue_ (/user_code/node_modules/@google-cloud/bigquery/src/table.js:302:31)
    at /user_code/node_modules/@google-cloud/bigquery/src/table.js:1027:21
    at Array.map (native)
    at Table.insert (/user_code/node_modules/@google-cloud/bigquery/src/table.js:1025:30)
    at /user_code/node_modules/@google-cloud/bigquery/node_modules/@google-cloud/common/src/util.js:753:22
    at Table.wrapper [as insert] (/user_code/node_modules/@google-cloud/bigquery/node_modules/@google-cloud/common/src/util.js:737:12)
    at BigQueryService.insertStuff (/user_code/index.js:2992:14)

Looking at the Table.encodeValue_ function there propably should be simple check like this at the beginning of the function:

if (value === undefined || value === null) {
  return null
}

Allow to stream compressed data into BigQuery

Copied from original issue: googleapis/google-cloud-node#2811

@xgalen
March 20, 2018 3:19 PM

Hi all,

In order to save data transfer (out) costs, we would want to stream the data compressed with gzip.

I have made some tries and it works, requesting directly to the API. Example (omitting values for the sake of simplicity):

#!/bin/bash

...

OBJECT="{'kind': 'bigquery#tableDataInsertAllRequest', 'skipInvalidRows': true, 'ignoreUnknownValues': true, 'rows': $ROWS}"

echo $OBJECT | gzip -cf > compressed.gz

curl -v -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: text/plain" \
     -H "Content-Encoding: gzip" \
     --data-binary @compressed.gz \
"https://www.googleapis.com/bigquery/v2/projects/$GOOGLE_CLOUD_PROJECT/datasets/$DATASET_ID/tables/$TABLE_ID/insertAll"

But I couldn't find where to set the header to change the content-encoding to gzip. I know it's an option to the responses and to store files (

* @param {boolean} options.gzip - Specify if you would like the file compressed
) but no for requesting.

Is it possible to add a new setting to allow compress or not? In that case, the responsibility of the compression is for the server, not for this module. I think it would be worth :)

Of course, I could help if needed.

Thanks!

Alfredo

Unable to limit results in versions past 0.9.6

In versions prior to 0.11.x, limiting results where done by passing in the maxResults option. However, after upgrading to 0.11.x results are no longer being limited (no errors though). Looking at the example source code, LIMIT is now explicitly added to the query, however doing so results in the following error:

ApiError: Syntax error: Unexpected keyword LIMIT at [1:426] at Object.parseHttpRespBody

Environment details

  • OS: n/a
  • Node.js version: 6.11.1
  • npm version: n/a
  • @google-cloud/bigquery version: 0.11.1

Steps to reproduce

  1. Use LIMIT in your query

losing precision in converting TIMESTAMP and INT64 to Javascript Number

From @c0b on October 9, 2016 5:5

googleapis/google-cloud-node#1648 (comment)

The BigQuery TIMESTAMP has up to microseconds precision, but when converting to a JavaScript Date, it becomes up to milliseconds

googleapis/google-cloud-node#1648 (comment)

A JavaScript Number is really only a FLOAT64, there is no real INT64, so during conversion some precision is lost:

$ node ./bigquery/queries.js sync 'SELECT ARRAY<INT64>[0x7fff1234deadbeef, -0x8000000000000000] AS example_array'
{ err: null,
  rows: [ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'example_array', type: 'INTEGER', mode: 'REPEATED' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: [ { v: '9223110580161593071' }, { v: '-9223372036854775808' } ] } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: false } }
Received 1 row(s)!
[ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ]

I don't really have a solution, please suggest when application need this much precision

Copied from original issue: googleapis/google-cloud-node#1681

do not suggest using manual pagination for query()

Let's change the example code to use manual pagination for startQuery() only. query() should be the simple high level method that just does what is expected, all tweaks should apply to startQuery() only.

An in-range update of @google-cloud/nodejs-repo-tools is breaking the build 🚨

☝️ Greenkeeper’s updated Terms of Service will come into effect on April 6th, 2018.

Version 2.2.3 of @google-cloud/nodejs-repo-tools was just published.

Branch Build failing 🚨
Dependency @google-cloud/nodejs-repo-tools
Current Version 2.2.2
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

@google-cloud/nodejs-repo-tools is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/appveyor/branch Waiting for AppVeyor build to complete Details
  • ci/circleci: node9 Your tests failed on CircleCI Details
  • ci/circleci: node4 Your tests failed on CircleCI Details
  • ci/circleci: node8 Your tests failed on CircleCI Details
  • ci/circleci: node6 Your tests failed on CircleCI Details

Commits

The new version differs by 1 commits.

  • 7b3af41 Fix link to open in cloud shell button image.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Table.setMetadata function is duplicated

I received this feedback on a code sample. There are duplicate method definitions for Table.setMetadata.

One refers to Table#update

nodejs-bigquery/src/table.js

Lines 1798 to 1845 in affdfa6

/**
* Set the metadata on the table.
*
* @see [Tables: update API Documentation]{@link https://cloud.google.com/bigquery/docs/reference/v2/tables/update}
*
* @param {object} metadata The metadata key/value object to set.
* @param {string} metadata.description A user-friendly description of the
* table.
* @param {string} metadata.name A descriptive name for the table.
* @param {string|object} metadata.schema A comma-separated list of name:type
* pairs. Valid types are "string", "integer", "float", "boolean", "bytes",
* "record", and "timestamp". If the type is omitted, it is assumed to be
* "string". Example: "name:string, age:integer". Schemas can also be
* specified as a JSON array of fields, which allows for nested and repeated
* fields. See a [Table resource](http://goo.gl/sl8Dmg) for more detailed
* information.
* @param {function} [callback] The callback function.
* @param {?error} callback.err An error returned while making this request.
* @param {object} callback.apiResponse The full API response.
* @returns {Promise}
*
* @example
* const BigQuery = require('@google-cloud/bigquery');
* const bigquery = new BigQuery();
* const dataset = bigquery.dataset('my-dataset');
* const table = dataset.table('my-table');
*
* const metadata = {
* name: 'My recipes',
* description: 'A table for storing my recipes.',
* schema: 'name:string, servings:integer, cookingTime:float, quick:boolean'
* };
*
* table.setMetadata(metadata, function(err, metadata, apiResponse) {});
*
* //-
* // If the callback is omitted, we'll return a Promise.
* //-
* table.setMetadata(metadata).then(function(data) {
* const metadata = data[0];
* const apiResponse = data[1];
* });
*/
Table.prototype.setMetadata = function(metadata, callback) {
var body = Table.formatMetadata_(metadata);
common.ServiceObject.prototype.setMetadata.call(this, body, callback);
};

The other refers to Table#patch

/**
* Set the metadata for this Table. This can be useful for updating table
* labels.
*
* @see [Tables: patch API Documentation]{@link https://cloud.google.com/bigquery/docs/reference/v2/tables/patch}
*
* @method Table#setMetadata
* @param {object} metadata Metadata to save on the Table.
* @param {function} [callback] The callback function.
* @param {?error} callback.err An error returned while making this
* request.
* @param {object} callback.apiResponse The full API response.
* @returns {Promise}
*
* @example
* const BigQuery = require('@google-cloud/bigquery');
* const bigquery = new BigQuery();
* const dataset = bigquery.dataset('my-dataset');
*
* const table = dataset.table('my-table');
*
* const metadata = {
* labels: {
* foo: 'bar'
* }
* };
*
* table.setMetadata(metadata, function(err, apiResponse) {});
*
* //-
* // If the callback is omitted, we'll return a Promise.
* //-
* table.setMetadata(metadata).then(function(data) {
* const apiResponse = data[0];
* });
*/
setMetadata: true,
};

I believe only PATCH should be supported as UPDATE can be unsafe (end up modifying / removing unintended properties).

default useLegacySql option changed from true to false?

When upgrading from 0.9.6 to 1.0.0 it seems like the default SQL style has been set to Standard SQL and not to legacy SQL which it was before. I can't find any documentation for this breaking change.

REST API still says that useLegacySql true is default. https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query

Specifies whether to use BigQuery's legacy SQL dialect for this query. The default value is true. If set to false, the query will use BigQuery's standard SQL: https://cloud.google.com/bigquery/sql-reference/ When useLegacySql is set to false, the value of flattenResults is ignored; query will be run as if flattenResults is false.

I need to explicitly pass the option from now on.

bq.query({
      query: queryString,
      useLegacySql: true,
});
  • OS: mac
  • Node.js version: 8.9.1
  • npm version: 5.6.0
  • @google-cloud/bigquery version: 1.0.0

bigquery: set user-agent

From @kwent on October 6, 2017 7:19

Hi,

I'm trying to override the user-agent used by biquery.

I browsed the code but i don't see any obvious way to do it.

Could you walk me or is it something which would needs to be implemented ?

Regards

Copied from original issue: googleapis/google-cloud-node#2659

allow customization of an operation's polling interval

From @c0b on October 31, 2017 17:48

from the bigquery job api I was only aware the complete event to listen to a job with a callback when job completed, till recently I found from some gist shared code I got that a job.promise() is available, since our application uses node v6 and recently upgraded to v8; the promise api fits the code better, and works with async await model; wonder should you at least document it?
https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.9.6/bigquery/job

On the other hand, I spent some time figured out how was this default job.promise() working, I found the call trace down to the Operation's setTimeout self.startPolling of every 500ms, so it's polling at a hard coded interval of 500ms? while in many gcloud products best practices a backing off strategry of retrying is preferred,
https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/common/src/operation.js#L184

this behavior of polling 500ms may be acceptable (or wanted) for some cases, for our ETL scripts which runs hundreds of query jobs concurrently in BATCH mode is just not so efficient, for this ETL purpose I have a piece of code already in use in production for a long while, implemented the backing off strategry, it supports an optional options obj parameters of waitbegin (default to 500ms) and waitmax (default to 10s)

// looping on a bigquery job till it's 'DONE' or error
//   using a backing off strategry, waiting starts with 500ms,
//     then increase by half till the max is 10s
function waitJobFinish(job, {waitbegin=500, waitmax=10000, initialState='UNKNOWN'} = {}) {
  return new Promise((fulfilled, rejected) =>
    function loop(retries=0, wait=waitbegin, state=initialState) {
      job.getMetadata()
        .catch(rejected)
        .then(([ metadata, apiResponse ]) => {
          if (metadata.status.state !== state) {
            console.log(`Job ${metadata.id} state transit from ${state} to ${metadata.status.state}, at ${(new Date).toJSON()} after ${retries} retries check job status.`);
            state = metadata.status.state;
          }

          if (metadata.status.errorResult)
            return rejected(metadata.status.errorResult);

          if (metadata.status.state === 'DONE')
            return fulfilled([ metadata, apiResponse, retries ]);

          setTimeout(loop, wait, retries+1, Math.min(waitmax, (wait+=wait/2)), state);
        });
    }() // (0, waitbegin, 'UNKNOWN')
  );
}

so with this API, it's similar to job.promise() we can write code like this, but internally it's doing a backing off strategy of retrying retrieve metadata;

  bigquery.startQuery({
    query: '...',
    // more options
  })
  .then(([ job ]) => waitJobFinish(job))
  .then(([ metadata, apiResponse, retries ]) => { ... })

or with async await

  // in an async function
  const [ job ] = await bigquery.startQuery(...);
  const [ metadata, apiResponse, retries ] = await waitJobFinish(job);
  // ...

the console.log lines give us transparency of how healthy each job runs, state transition from 'PENDING' to 'RUNNING' to 'DONE'

I'm not sure this strategy can be in the Operation for all the @google-cloud/... packages, but at least works for bigquery job; let me know if you like the code.

Copied from original issue: googleapis/google-cloud-node#2710

table.load() is not a function

Using Cloud functions to trigger the creation of a new JSON file to be ingested into BigQuery. The below function is triggered when files is uploaded.

function insertGCS(datasetId, tableId, bucketName, filename, projectId) { 
const BigQuery = require('@google-cloud/bigquery');
const bigquery = new BigQuery()
const dataset = bigquery.dataset(datasetId);
const table = dataset.table(tableId);

var metadata = {
            sourceFormat: "NEWLINE_DELIMITED_JSON",
            autodetect : true,
            schemaUpdateOptions: ["ALLOW_FIELD_ADDITION","ALLOW_FIELD_RELAXATION"],
            createDisposition: "CREATE_IF_NEEDED",
            writeDisposition: "WRITE_APPEND"
  };

var gcs = require('@google-cloud/storage')({
  projectId: projectId
});
var data = gcs.bucket(bucketName).file(filename);
table.load(data, metadata, function(err, apiResponse) {});
}

Table reference in documentation (https://cloud.google.com/nodejs/docs/reference/bigquery/1.0.x/Table#load) states:

const table = bigquery.table(tableId);

which results in bigquery.table() is not a function. When changed to:

const table = dataset.table(tableId);

executes as expected, however now fails with table.load() is not a function.

When printing the table object in the logger i can see that table, only has the following methods:

    methods: {
        create: true,
        delete: true,
        exists: true,
        get: true,
        getMetadata: true
    }

getMetadata: allow arbitrary request parameters

From @c0b on October 20, 2017 5:49

the REST API reference mentioned selectedFields [2], and all google cloud services support the fields parameter [3]; but from [1] I'm not seeing how can I pass these parameters?

  1. https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.9.6/bigquery/table?method=getMetadata
  2. https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get
  3. https://developers.google.com/apis-explorer/#p/bigquery/v2/bigquery.tables.get

Background is we have some tens of thousands tables in bigquery want to figure out which ones are the most expensive ones (using the most storage), so write a little script to enumerate all table metadata, but it runs slow because each API call need to retrieve all metadata, want to use the fields to get only the needed information; if anyone knows a workaround, let me know.
(the http command here is an advanced version of curl; from https://httpie.org/)

$ access_token=$(gcloud auth application-default print-access-token)
$ http https://www.googleapis.com/bigquery/v2/projects/<projectId>/datasets/<datasetId>/tables/<tableId> \
 fields==id,numBytes,numLongTermBytes,numRows,creationTime,expirationTime,lastModifiedTime,type,location \
 access_token==$access_token
[...]
{
 "id": "<projectId>:<datasetId>.<tableId>",
 "numBytes": "112728076",
 "numLongTermBytes": "0",
 "numRows": "28431",
 "creationTime": "1505962477264",
 "expirationTime": "1513738477264",
 "lastModifiedTime": "1505962477264",
 "type": "TABLE",
 "location": "US"
}

Copied from original issue: googleapis/google-cloud-node#2684

bigquery: set user-agent

From @kwent on October 6, 2017 7:19

Hi,

I'm trying to override the user-agent used by biquery.

I browsed the code but i don't see any obvious way to do it.

Could you walk me or is it something which would needs to be implemented ?

Regards

Copied from original issue: googleapis/google-cloud-node#2659

Unable to LIMIT query using parameter

Limiting results using an @parameter results in the following error:

ApiError: Syntax error: Unexpected keyword LIMIT at [1:426] at Object.parseHttpRespBody

Environment details

OS: n/a
Node.js version: 6.11.5
npm version: n/a
version: 1.0.0
Steps to reproduce

Use LIMIT @limitVariable in your query and pass in limitVariable in params: { ... } option.

Support user-provided job ID for all job types

I got the following feedback from a user in the docs. https://cloud.google.com/bigquery/docs/running-jobs

include a disclaimer here for nodejs ... no matter what we assign to this in a configuration resource object, the returned jobReference.jobId is always one created by the service, not the one supplied.

I believe all other languages have a jobId and jobPrefix parameter for all job types, which allows using a manually-specified job ID or a prefix for a randomly-generated job ID.

ApiError: Not found: Files

Copied from original issue: googleapis/google-cloud-node#2795

@cvanputten
January 18, 2018 5:18 PM

I am suddenly getting an error from 'google-cloud/bigquery' that can't find a file on gdrive. I am not calling this file in my code and I think it's coming from the @Google-Cloud node module. Here is the full error:

 { ApiError: Not found: Files /gdrive/id/1XehS5UuvCKR3fHGkeoX_7JphVc0TqGsVy6YlGRBTvZU
    at Object.parseHttpRespBody (/user_code/node_modules/@google-cloud/bigquery/node_modules/@google-cloud/common/src/util.js:192:30)
    at Object.handleResp (/user_code/node_modules/@google-cloud/bigquery/node_modules/@google-cloud/common/src/util.js:132:18)
    at /user_code/node_modules/@google-cloud/bigquery/node_modules/@google-cloud/common/src/util.js:465:12
    at Request.onResponse [as _callback] (/user_code/node_modules/@google-cloud/bigquery/node_modules/retry-request/index.js:180:7)
    at Request.self.callback (/user_code/node_modules/@google-cloud/bigquery/node_modules/request/request.js:186:22)
    at emitTwo (events.js:106:13)
    at Request.emit (events.js:191:7)
    at Request.<anonymous> (/user_code/node_modules/@google-cloud/bigquery/node_modules/request/request.js:1163:10)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
  code: 404,
  errors: 
   [ { domain: 'global',
       reason: 'notFound',
       message: 'Not found: Files /gdrive/id/1XehS5UuvCKR3fHGkeoX_7JphVc0TqGsVy6YlGRBTvZU' } ],
  response: undefined,
  message: 'Not found: Files /gdrive/id/1XehS5UuvCKR3fHGkeoX_7JphVc0TqGsVy6YlGRBTvZU' }"  
 timestamp:  "2018-01-18T09:13:17.477Z

Google Auth - error loading default credentials

Hello all,

I'm trying to use this library but with no luck so far.

I have the service account activated:
[root@localhost credentials]# gcloud auth activate-service-account --key-file=/var/www/xxxx/config/credentials/keyfile-default.json
Activated service account credentials for: [[email protected]]

And when I try to use the functions insertAll or insert I get the following error:

ERROR: Error: **Could not load the default credentials**. Browse to https://developers.google.com/accounts/docs/application-default-credentials for more information.
    at /var/www/xxxx/node_modules/google-auth-library/lib/auth/googleauth.js:316:21
    at /var/www/xxxx/node_modules/google-auth-library/lib/auth/googleauth.js:346:7
    at Request._callback (/var/www/xxxx/node_modules/google-auth-library/lib/transporters.js:70:30)
    at self.callback (/var/www/xxxx/node_modules/request/request.js:188:22)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at Request.onRequestError (/var/www/xxxx/node_modules/request/request.js:884:8)
    at emitOne (events.js:96:13)
    at ClientRequest.emit (events.js:188:7)
    at Socket.socketErrorListener (_http_client.js:309:9)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at connectErrorNT (net.js:1021:8)
    at _combinedTickCallback (internal/process/next_tick.js:80:11)
    at process._tickCallback (internal/process/next_tick.js:104:9)

Here are the two code snippets:

// Imports newest Google BigQuery library
    const BigQuery = require('@ google-cloud/bigquery'); 

    // execute the following command
    //export GOOGLE_APPLICATION_CREDENTIALS=/var/www/xxxx/config/credentials/keyfile-default.json
    const bigquery = BigQuery({
        projectId: projectId
    });

    // The project ID to use
    const projectId = "xxxxx-dev";

    // The ID of the dataset of the table into which data should be inserted
    const datasetId = "xxxx";

    // The ID of the table into which data should be inserted
    const tableId = "yyyyy";
    
    bigquery.tabledata.insertAll({
        //auth: oauth2Client,
        'projectId': projectId,
        'datasetId': datasetId,
        'tableId': tableId,
        'resource': {
            "kind": "bigquery#tableDataInsertAllRequest",
            "rows": rows
        }
    }, function(err, result)
    {
        if (err)
        {
            return console.error(err);
        }
        console.log(result);
    });

OR

bigquery
    .dataset(datasetId)
    .table(tableId)
    .insert(rows)
    .then((insertErrors) => {
        console.log('Inserted:');
        rows.forEach((row) => console.log(row));
        if (insertErrors && insertErrors.length > 0)
        {
            console.log('Insert errors:');
            insertErrors.forEach((err) => console.error(err));
        }
    })
    .catch((err) => {
        console.error('ERROR:', err);
    });

Environment details

  • OS: CentOS 7 - 64Bits
  • Node.js version: 6.11.0
  • npm version: 3.10.10
  • @google-cloud/bigquery version: 0.9.6

Steps to reproduce

  1. Try to use the bigquery.tabledata.insertAll function or bigquery.dataset(datasetId).table(tableId).insert(rows) function

Thanks in advance for your help.
Regards,
DR

nodejs bigquery client fails for ssl error, if run behind corporate proxy

It seems bigquery client uses axios, which has issues connecting via corporate proxy. Is there a fix for this ?

I am using "@google-cloud/bigquery": "^1.2.0",

Below is the error log.

[0] ERROR: { Error: write EPROTO 140736266007488:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:827:
[0]
[0] at _errnoException (util.js:1022:11)
[0] at WriteWrap.afterWrite [as oncomplete] (net.js:880:14)
[0] code: 'EPROTO',
[0] errno: 'EPROTO',
[0] syscall: 'write',
[0] config:
[0] { adapter: [Function: httpAdapter],
[0] transformRequest: { '0': [Function: transformRequest] },
[0] transformResponse: { '0': [Function: transformResponse] },
[0] timeout: 0,
[0] xsrfCookieName: 'XSRF-TOKEN',
[0] xsrfHeaderName: 'X-XSRF-TOKEN',
[0] maxContentLength: -1,
[0] validateStatus: [Function: validateStatus],
[0] headers:
[0] { Accept: 'application/json, text/plain, /',
[0] 'Content-Type': 'application/x-www-form-urlencoded',
[0] 'User-Agent': 'axios/0.18.0',
[0] 'Content-Length': 709,
[0] host: 'www.googleapis.com' },
[0] method: 'post',
[0] url: 'https://www.googleapis.com/oauth2/v4/token',
[0] data: 'grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&assertion=eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJzdmMtYXhwLWdjcC1iaWdxdWVyeUBheHAtbXdlLXJ1bS5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsInNjb3BlIjoiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vYXV0aC9iaWdxdWVyeSIsImF1ZCI6Imh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL29hdXRoMi92NC90b2tlbiIsImV4cCI6MTUyNTM2Mjg0OSwiaWF0IjoxNTI1MzU5MjQ5fQ.HnJ6xssGIr7-VIYL6SPiwfwDBttAfa_pkqdUldiH4ljzfseg-CcQcW1fhqIX_f8z9xWfO8Q8HbeIMkQK0xpNm7JWCJK0KiYLH0ph1-eoVsyMbQZRHhIvaCxTYUqtdIym3ol9gScz3p_hYT4sGtbLOX60wEnZEAB-pE0yOU99VyGxzhHpZ5WZB7wlDKr7i0TwOKzkWGxiHtNDoykd-6as1iqOWjlgGDQvilbxsOED34kFIb1Sjrs3n9lFKWP6h6lqRQFm5DjNtWpYV2_CGsLgm3oIPfw_hSIS7DUQMthWSB3RWb_YwXAxT3MAf9qErE2gpyNsUlietUs6PNB0_91MSg' },
[0] request:
[0] Writable {
[0] _writableState:
[0] WritableState {
[0] objectMode: false,
[0] highWaterMark: 16384,
[0] finalCalled: false,
[0] needDrain: false,
[0] ending: false,
[0] ended: false,
[0] finished: false,
[0] destroyed: false,
[0] decodeStrings: true,
[0] defaultEncoding: 'utf8',
[0] length: 0,
[0] writing: false,
[0] corked: 0,
[0] sync: true,
[0] bufferProcessing: false,
[0] onwrite: [Function: bound onwrite],
[0] writecb: null,
[0] writelen: 0,
[0] bufferedRequest: null,
[0] lastBufferedRequest: null,
[0] pendingcb: 0,
[0] prefinished: false,
[0] errorEmitted: false,
[0] bufferedRequestCount: 0,
[0] corkedRequestsFree: [Object] },
[0] writable: true,
[0] domain: null,
[0] _events:
[0] { response: [Function: handleResponse],
[0] error: [Function: handleRequestError] },
[0] _eventsCount: 2,
[0] _maxListeners: undefined,
[0] _options:
[0] { protocol: 'https:',
[0] maxRedirects: 21,
[0] maxBodyLength: 10485760,
[0] path: 'https://www.googleapis.com/oauth2/v4/token',
[0] method: 'post',
[0] headers: [Object],
[0] agent: undefined,
[0] auth: undefined,
[0] hostname: 'proxy..com',
[0] port: '8080',
[0] host: 'proxy.
.com',
[0] nativeProtocols: [Object],
[0] pathname: 'https://www.googleapis.com/oauth2/v4/token' },
[0] _redirectCount: 0,
[0] _requestBodyLength: 709,
[0] _requestBodyBuffers: [ [Object] ],
[0] _onNativeResponse: [Function],
[0] _currentRequest:
[0] ClientRequest {
[0] domain: null,
[0] _events: [Object],
[0] _eventsCount: 6,
[0] _maxListeners: undefined,
[0] output: [],
[0] outputEncodings: [],
[0] outputCallbacks: [],
[0] outputSize: 0,
[0] writable: true,
[0] _last: true,
[0] upgrading: false,
[0] chunkedEncoding: false,
[0] shouldKeepAlive: false,
[0] useChunkedEncodingByDefault: true,
[0] sendDate: false,
[0] _removedConnection: false,
[0] _removedContLen: false,
[0] _removedTE: false,
[0] _contentLength: null,
[0] _hasBody: true,
[0] _trailer: '',
[0] finished: true,
[0] _headerSent: true,
[0] socket: [Object],
[0] connection: [Object],
[0] _header: 'POST https://www.googleapis.com/oauth2/v4/token HTTP/1.1\r\nAccept: application/json, text/plain, /\r\nContent-Type: application/x-www-form-urlencoded\r\nUser-Agent: axios/0.18.0\r\nContent-Length: 709\r\nhost: www.googleapis.com\r\nConnection: close\r\n\r\n',
[0] _onPendingData: [Function: noopPendingOutput],
[0] agent: [Object],
[0] socketPath: undefined,
[0] timeout: undefined,
[0] method: 'POST',
[0] path: 'https://www.googleapis.com/oauth2/v4/token',
[0] _ended: false,
[0] res: null,
[0] aborted: undefined,
[0] timeoutCb: null,
[0] upgradeOrConnect: false,
[0] parser: null,
[0] maxHeadersCount: null,
[0] _redirectable: [Circular],
[0] [Symbol(outHeadersKey)]: [Object] },
[0] _currentUrl: 'https://proxy.***.com/https://www.googleapis.com/oauth2/v4/token' },

Allow running load job to a different project ID

There could be a need to run a load job to a different projectId than the projectId id of the bigQuery client.
the metadata object defines the destinationTable.projectId which should be used in case it's provided,
if it's not provided - the fallback should be the same as today - the projectId of the instantiated bigQuery client.

Thanks!

ApiError: Invalid value for: STRING is not a valid value

{
  schema: 'Code: string, ChannelSlug: string, Path: string, Country: string'
}

In case of a scheme created in this way turns this error.

Environment details

  • OS: macOS
  • Node.js version: v7.7.3
  • npm version: v4.1.2
  • @google-cloud/bigquery version: 1.0.0

reuse connections on table.insert

TL;DR: the current implementation of BigQuery establishes a new connections on each table.insert() call. This causes problems for Cloud Functions because of connection quota that our users hit. Please update the client libraries to reuse connections.

Why new connections cause problems:

Cloud Functions is a serverless platform for executing snippets of code. Many our customers use function to store events in Big Query. GCF has a quota on connections, and this quota is relatively low because each connection requires a NAT port to be allocated and maintained until TCP packet timeout; and there are only 32k ports per server.

Why BigQuery is worse than other client libraries:

I found three types of clients:

  1. Clients that always reuse connections - even if we construct the client object in a local scope - @google-cloud/storage
  2. Clients that reuse connectoins only if we declare the client object at global scope - @google-cloud/pubsub, @google-cloud/language, @google-cloud/spanner
  3. Clients that never reuse connectoins - @google-cloud/bigquery

It would be nice if all libraries worked as 1.

How to reproduce the problem:

Every call to the function exported by the code below creates a new connection.

******************* function.js ****************

const bigquery = require('@google-cloud/bigquery')();
const table = bigquery.dataset('Tests').table('RandomNumbers');

exports.function = function(req, res) {
  var r = Math.floor(Math.random() * 100);
  console.log("Generated random number " + r);
  var row = {number: r};
  table.insert(row, function(err, apiResponse) {
    if (err) {
      result = 'Error inserting data to bigquery';
      console.log(result + ": " + JSON.stringify(err, null, 2));
      res.status(500).send("Error" + err);
    } else {
      res.status(200).send("" + r);
    }
  });
};

************* package.json ****************

{
  "dependencies": {
    "@google-cloud/bigquery": "^0.10.0"
  }
}

We also found that BigQuery uses IPv4, while other libraries prefer IPv6. This should also be fixed.

[Googlers: internal tracking id b/68240537]

error: ApiError: Error during request.

Environment details

  • OS: OS X high sierra
  • Node.js version: v8.7.0
  • npm version: 5.4.2
  • @google-cloud/bigquery version: 1.0.0

Steps to reproduce

  1. Create table from csv and schema autodetect enabled
  2. Create a load job without schema autodetect

The documentation https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load says that I can omit the schema if table exists. However if I leave autodetect away then I get this api error. Not sure what the actual reason is as it's just general error.

How do I run queries that depend on UDF?

I don't have steps to reproduce, but as my question suggests, I have a query that depends on UDF. How do I execute that using SDK. There is not one place to find all information/examples.

Appreciate any help.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

bigquery: support raw configuration

From @c0b on March 19, 2017 22:6

The nodejs API is the most flexible (comparing to other language bindings like googleapis/google-cloud-go#554 (comment))
that allows to pass in any key value pairs in the configuration.query object which may be in the REST API support but not in Client libraries, I worked around that in the past months for query parameters

https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.8.0/bigquery?method=startQuery

now am researching the bq command line has a label feature, it seems useful to manage when there are too many load / query jobs, I want to tag them into different labels,

But from the REST API Reference, the labels should be set on configuration object, so that it can apply to query load copy extract all kinds of jobs; when I tried to pass labels: { k1: 'v1', ... } to the startQuery it seems passed to within the configuration.query object, then server side seems ignoring that, then the returned job doesn't come with any labels

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query

An example from bq tool:

$ bq --apilog - --nosync --format prettyjson query --label k1:v1 '#standardSQL
          SELECT 1'

INFO:root:body: {"configuration": {"query": {"query": "#standardSQL\nSELECT 1"}, "labels": {"k1": "v1"}}, "jobReference": {"projectId": "XXXXX", "jobId": "bqjob_XXXXXXXXXXXXXXXXX"}}

INFO:root:{
 "kind": "bigquery#job",
 "etag": "\"..........................................\"",
 "id": "..............",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/.........",
 "jobReference": {
  ...
 },
 "configuration": {
  "query": {
   "query": "#standardSQL\nSELECT 1",
   "destinationTable": {
      ...
   },
   "createDisposition": "CREATE_IF_NEEDED",
   "writeDisposition": "WRITE_TRUNCATE",
   "useLegacySql": false
  },
  "labels": {
   "k1": "v1"
  }
 },
 "status": {
  "state": "RUNNING"
 },
 "statistics": {
  "creationTime": "1489959702315",
  "startTime": "1489959702555"
 },
...

This command line bq tool seems the only one support labels on job right now, I have researched all GoogleCloudPlatform/google-cloud-node GoogleCloudPlatform/google-cloud-go GoogleCloudPlatform/google-cloud-python python still needs googleapis/google-cloud-python#2931 to add support for labels on dataset / tables, but NodeJS support labels on dataset / table by nature of setMetadata method calling PATCH, doesn't need any special work in the client libraries

But for labels on Job, I believe the NodeJS client still require some changes to support; the Python and Go would require similar changes as well.

Copied from original issue: googleapis/google-cloud-node#2107

Incomplete error object for streaming inserts

I'm using nodejs-bigquery version 1.0.

I'm using the following code to stream records into the database:

module.exports.sendToBigQuery = (rows) => {
    bigquery
        .dataset(DATASET_NAME)
        .table(TABLE_NAME)
        .insert(rows)
        .catch(err => {
            if (err && err.name === 'PartialFailureError') {
                if (err.errors && err.errors.length > 0) {
                    console.log('Insert errors:');
                    err.errors.forEach(err => console.error(err));
                }
            } else {
                console.error('ERROR:', err);
            }
        });
};

Unfortunately, whenever my data doesn't match the schema, all I'm getting is this cryptic error object: { errors: [ { message: 'no such field.', reason: 'invalid' } ],

There is no location field that would tell me which field is missing which makes debugging this code a nightmare for more complex schemas.

Is there any way to enable debug level of errors somehow? Or it's just a bug in client implementation? Any idea how I could access this information?

Cannot convert value to floating point (bad value) when using 0 or 0.0 as the floating point value on table.insert

Environment details

  • OS: Running this in Cloud Functions
  • Node.js version: Running this in Cloud Functions
  • npm version: Running this in Cloud Functions
  • @google-cloud/bigquery version: 0.11.1

Steps to reproduce

I've been passing a field in a CSV to Natural Language API to fetch sentiments and then insert them to BigQuery, however, there are cases where users do not enter any text and I want to do an insert anyway with the sentiment scores being zero. If I try manually setting them to zero however:

	row["Sentimentscore"] = 0.0;
	row["Sentimentmagnitude"] = 0.0;

I get a bad value error. I've tried using several variations and I get the same error: 0, 0.0, ''. I know that a manual load into bigquery with a value of 0.0 works, but I can't get it to work here which perplexes me. All other values returned from NL API which aren't zero are inserted with no errors.

My insert code:

bigquery
  .dataset(datasetId)
  .table(tableId)
  .insert(row,options)
  .then(() => {
    console.log('Inserted : ', JSON.stringify(row));
  })
  .catch(err => {
    if (err && err.name === 'PartialFailureError') {
      if (err.errors && err.errors.length > 0) {
          console.log('Insert errors:');
          err.errors.forEach(err => console.error(err));
    }
    } else {
    console.error('ERROR:', err);
    }
  });

P.S. first time submitting an issue in a repo, apologies in advance if I made any mistakes.

delete, create, insert into table - silent failure

  • OS: Windows
  • Node.js version: v8.9.4
  • npm version: 5.6.0
  • @google-cloud/bigquery version: "@google-cloud/bigquery": "^1.0.0",

Steps to reproduce

I copied and pasted the sample examples to arrange an etl data refresh process.
Through nodejs bigquery api write a script that does the following with chained promises:

  1. delete xyz table
  2. create xyz table
  3. insert refreshed rows into xyz table using streaming insert api
  4. verify getRows matches inserted rows.length
    • verification fails. 0 rows were inserted and catch is not executed.

Expected: Non silent failure or promises to wait till fully complete.

const BigQuery = require('@google-cloud/bigquery');
const Papa = require('papaparse');
const path = require('path');
const fs = require('fs');


const projectId = "projectid";
const datasetId = "sqlserver";
const tableId = "xyz";

const bigquery = new BigQuery({
  projectId: projectId,
});

const initBigQueryTable = () => {
  const schema = {
    fields: [{
      "mode": "NULLABLE",
      "name": "EventId",
      "type": "INTEGER"
    }]
  };

  // Delete original table.
  return bigquery
    .dataset(datasetId)
    .table(tableId)
    .delete().then(() => {
      console.log('Deleted:', tableId);
      // Create a new table in the dataset
      return bigquery
        .dataset(datasetId)
        .createTable(tableId, { schema: schema })
        .then(() => {
          console.log('Created:', tableId);
        })
    });
};

const importRows = (toInsertRows) => {

  return bigquery
    .dataset(datasetId)
    .table(tableId)
    .getRows().then(results => {
      const rows = results[0];
      if (rows.length == 0) {
        return bigquery
          .dataset(datasetId)
          .table(tableId)
          .insert(toInsertRows)
          .catch(err => {
            if (err && err.name === 'PartialFailureError') {
              if (err.errors && err.errors.length > 0) {
                console.log('Insert errors:');
                err.errors.forEach(err => console.error(err));
              }
            } else {
              console.error('ERROR:', err);
            }
          });
      } else {
        throw new Error(`new table should have zero rows. Found ${rows.length} expected 0.`);
      }
    }).then(() => {
      return bigquery
        .dataset(datasetId)
        .table(tableId)
        .getRows().then(results => {
          console.log(`Should have inserted ${toInsertRows.length} rows.`);
          const rows = results[0];
          if (rows.length != toInsertRows.length) {
            throw new Error(`rows should match inserted rows. Found ${rows.length} expected ${toInsertRows.length}.`);
          }
        })
    })
};


const csvRows = fs.readFileSync('./xyz.csv', "utf8").trim();
Papa.parse(csvRows, {
  header: true,
  delimiter: ',',
  complete: function (results) {
    if ((results.errors || []).length > 0) {
      console.log('failed', results.errors);
      return;
    }

    initBigQueryTable().then(() => {
      return importRows(results.data);
    }).then(() => {
      console.log('done');
      process.exit(0);
    }).catch(err => {
      console.error('ERROR:', err);
      process.exit(1);
    });
  }
});

webjob-import

update or patch

HI
Is there any way to update or patch a table without deleting/copying it first?
Would be great to be able to insert columns in a specific order. If I have to delete, it is going to mess up my charts.

Can't find any documentation about it.
Tried
table.patch(table,options).then(function(data) {
console.log(data)
});
and
table.updatetable,options).then(function(data) {
console.log(data)
});
Thanks

FR: Let clients use encodeValue function

I'd really like to do this:

const bq = BigQuery({ projectId: 'prjctd' })
const table = bq.dataset('dtst').table('tbl')
const valueToInsert = {
  insertId: 'myInsertId',
  json: table.encodeValue_(myItem)
}

but currently the function is not visible. I can copy it to my project for sure, but it would be nicer to be able to use it.

dryRun option stopped working in 1.0.0

For some of my integration tests I utilize the dryRun option to make sure my query is syntactically correct while avoiding getting charged for the processing.

Upgrading from 0.9.6 to 1.0.0 gives me an error every time a execute a query with dryRun option enabled.

bq.query({
      query: queryString,
      useLegacySql: true,
      dryRun: true,
    });

testing with querystring: SELECT * FROM publicdata.samples.natality LIMIT 5; i get a really weird error:

  ErrorClass {
      code: 404,
      errors:
       [ { domain: 'global',
           reason: 'notFound',
           message: 'Not found: Job MY_PROJECT_ID:SOME_UUID' } ],
      response: undefined,
      message: 'Not found: Job MY_PROJECT_ID:SOME_UUID' }

with dryRun turned off, it all works as normal.

  • OS: mac
  • Node.js version: 8.9.1
  • npm version: 5.6.0
  • @google-cloud/bigquery version: 1.0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.