databricks / databricks-sql-nodejs Goto Github PK

View Code? Open in Web Editor NEW

24.0 10.0 32.0 1.84 MB

Databricks SQL Connector for Node.js

License: Apache License 2.0

TypeScript 99.84% JavaScript 0.16%

databricks node node-js nodejs sql dwh

databricks-sql-nodejs's Introduction

Databricks SQL Driver for Node.js

Description

The Databricks SQL Driver for Node.js is a Javascript driver for applications that connect to Databricks clusters and SQL warehouses. This project is a fork of Hive Driver which connects via Thrift API.

Requirements

Node.js 14 or newer

Installation

npm i @databricks/sql

Usage

examples/usage.js

const { DBSQLClient } = require('@databricks/sql');

const client = new DBSQLClient();

client
  .connect({
    host: '********.databricks.com',
    path: '/sql/2.0/warehouses/****************',
    token: 'dapi********************************',
  })
  .then(async (client) => {
    const session = await client.openSession();

    const queryOperation = await session.executeStatement('SELECT "Hello, World!"');
    const result = await queryOperation.fetchAll();
    await queryOperation.close();

    console.table(result);

    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.log(error);
  });

Run Tests

Unit tests

You can run all unit tests, or specify a specific test to run:

npm test
npm test -- <path/to/file.test.js>

e2e tests

Before running end-to-end tests, create a file named tests/e2e/utils/config.local.js and set the Databricks SQL connection info:

{
    host: '***.databricks.com',
    path: '/sql/2.0/warehouses/***',
    token: 'dapi***',
    database: ['catalog', 'database'],
}

Then run

npm run e2e
npm run e2e -- <path/to/file.test.js>

Contributing

See CONTRIBUTING.md

Issues

If you find any issues, feel free to create an issue or send a pull request directly.

License

Apache License 2.0

databricks-sql-nodejs's People

Contributors

Stargazers

Watchers

databricks-sql-nodejs's Issues

Aliasing columns in a query results in the query's result missing data

Check existing issues for a duplicate of this bug

Summary

This query returns the expected data,

SELECT carat as a, color as b FROM default.diamonds LIMIT 2;

-- Result
┌─────────┬────────┬─────┐
│ (index) │   a    │  b  │
├─────────┼────────┼─────┤
│    0    │ '0.23' │ 'E' │
│    1    │ '0.21' │ 'E' │
└─────────┴────────┴─────┘

Whereas this query returns results missing data,

SELECT carat as a, color as a FROM default.diamonds LIMIT 2;

-- Result
┌─────────┬─────┐
│ (index) │  a  │
├─────────┼─────┤
│    0    │ 'E' │
│    1    │ 'E' │
└─────────┴─────┘

Is it possible to handle this scenario properly so we get the right data for such queries?

Reproduction

You'll find a minimal and complete reproduction example here that you can run yourself https://github.com/varun-dc/databricks-nodejs-duplicate-column-select-bug-reproduction

Fail to connect using an all-purpose cluster when it's running

Hi folks,
I'm having trouble connecting to a terminated all-purpose cluster. If the cluster is not running, it will fail.
Appreciate any help!
Cheers,
Quang

My code

      const client = await dbsqlClient.connect({
        host: process.env["DB_HOST"],
        path: process.env["DB_PATH"],
        token: process.env["DB_TOKEN"],
      });

Error Message

[error] Worker 98804825-b3c9-4902-bf7b-9e39aa2a1dae uncaught exception (learn more: https://go.microsoft.com/fwlink/?linkid=2097909 ): TypeError: Converting circular structure to JSON --> starting at object with constructor 'TLSSocket' | property 'parser' -> object with constructor 'HTTPParser' --- property 'socket' closes the circle at JSON.stringify (<anonymous>) at exports.HttpConnection.<anonymous> (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\@databricks\sql\dist\DBSQLClient.js:92:69) at exports.HttpConnection.emit (node:events:390:28) at exports.HttpConnection.responseCallback (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\thrift\lib\nodejs\lib\thrift\http_connection.js:173:12) at ClientRequest.connection.responseCallback (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\@databricks\sql\dist\connection\connections\HttpConnection.js:67:30) at Object.onceWrapper (node:events:510:26) at ClientRequest.emit (node:events:390:28) at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:623:27) at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17) at TLSSocket.socketOnData (node:_http_client:487:22) [2022-11-04T03:03:20.959Z] Waiting for the debugger to disconnect... [2022-11-04T03:03:21.122Z] Executed 'Functions.GetGeoJson' (Failed, Id=296d02dd-006a-473f-8019-b1743fd24641, Duration=1057ms) [2022-11-04T03:03:21.123Z] System.Private.CoreLib: Exception while executing function: Functions.GetGeoJson. System.Private.CoreLib: node exited with code 1 LanguageWorkerConsoleLog[error] Worker 98804825-b3c9-4902-bf7b-9e39aa2a1dae uncaught exception (learn more: https://go.microsoft.com/fwlink/?linkid=2097909 ): TypeError: Converting circular structure to JSON --> starting at object with constructor 'TLSSocket' | property 'parser' -> object with constructor 'HTTPParser' --- property 'socket' closes the circle at JSON.stringify (<anonymous>) at exports.HttpConnection.<anonymous> (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\@databricks\sql\dist\DBSQLClient.js:92:69) at exports.HttpConnection.emit (node:events:390:28) at exports.HttpConnection.responseCallback (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\thrift\lib\nodejs\lib\thrift\http_connection.js:173:12) at ClientRequest.connection.responseCallback (C:\Users\mmqqq\Workspace\SmarterKnowledge\websites\api\node_modules\@databricks\sql\dist\connection\connections\HttpConnection.js:67:30) at Object.onceWrapper (node:events:510:26) at ClientRequest.emit (node:events:390:28) at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:623:27) at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17) at TLSSocket.socketOnData (node:_http_client:487:22).

HTTP connection doesn't use http keep-alive

The node.js driver is currently not using HTTP keep alive.The thrift HTTP connection sets the "Keep-Alive" header in https://github.com/apache/thrift/blob/66d897667c451ef6560d89b979b7001c57a3eda6/lib/nodejs/lib/thrift/http_connection.js#L101 but that's not sufficient. We also need to configure the node.js http client for keep-alive (see https://stackoverflow.com/questions/28229044/http-keep-alive-in-node-js).

Turning on keep-alive on a low latency connection resulted in 30%-50% performance improvements for me.

Use TypeScript for tests

PECO-1390

await client.openSession() unhandled error

I cannot find any solution to this issue when our cluster is 'inactive'. I also cannot find a way to test the connection status.

Additionally, it seems like the try catch isn't able to trap the error.

Connect fails - due to missing package.json

Symptom:

During client.connect - the code tries to create a User-Agent with the package version.
However the buildUserAgentString - relies on a specific location for its package.json, which is hardcoded.

const json = JSON.parse(fs.readFileSync(path.join(__dirname, '../../package.json')).toString());

If the file is not available (in case of a mono repo setup), it fails.

Fix requested:
If the file is not available, it should print a warning but not raise an exception.

Columns with '.' in name break

When selecting columns such as read.id the result contains keys named id instead of read.id.

Is there an option that I'm missing or is that a bug in the client?

Connecting to third party databases

Question: is it possible to use this library to connect to a third party database such as teradata.

This can be accomplished in Databricks directly by using python to import your drivers and load the data. Is is possible to use this library to either execute python code, trigger a workbook that has python code, or some other solution that connects us to out third party databases?

Getting error on insert "Column b is not specified in INSERT"

Hey,
I created simple table:

CREATE TABLE schema_1.test_11 (a DECIMAL(29,0), b STRING);

Tried to run simple insert, the same insert works from with no issue in Databricks Notbook:

INSERT INTO schema_1.test_11 (`a`) VALUES (0), (-1);

Got Exception:

Error: The operation failed due to an error
    at OperationStateError.HiveDriverError [as constructor] (/usr/src/app/node_modules/@databricks/sql/dist/errors/HiveDriverError.js:21:42)
    at new OperationStateError (/usr/src/app/node_modules/@databricks/sql/dist/errors/OperationStateError.js:25:28)
    at WaitUntilReady.isReady (/usr/src/app/node_modules/@databricks/sql/dist/utils/WaitUntilReady.js:98:23)
    at WaitUntilReady.<anonymous> (/usr/src/app/node_modules/@databricks/sql/dist/utils/WaitUntilReady.js:69:44)
    at step (/usr/src/app/node_modules/@databricks/sql/dist/utils/WaitUntilReady.js:33:23)
    at Object.next (/usr/src/app/node_modules/@databricks/sql/dist/utils/WaitUntilReady.js:14:53)
    at fulfilled (/usr/src/app/node_modules/@databricks/sql/dist/utils/WaitUntilReady.js:5:58)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  response: {
    status: {
      statusCode: 0,
      infoMessages: null,
      sqlState: null,
      errorCode: null,
      errorMessage: null
    },
    operationState: 5,
    sqlState: null,
    errorCode: 0,
    errorMessage: 'org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Column b is not specified in INSERT\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:47)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:435)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:257)\n' +
      '\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:52)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:235)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:220)\n' +
      '\tat java.security.AccessController.doPrivileged(Native Method)\n' +
      '\tat javax.security.auth.Subject.doAs(Subject.java:422)\n' +
      '\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:269)\n' +
      '\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n' +
      '\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n' +
      '\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n' +
      '\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n' +
      '\tat java.lang.Thread.run(Thread.java:748)\n' +
      'Caused by: org.apache.spark.sql.AnalysisException: Column b is not specified in INSERT\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaErrors$.missingColumnsInInsertInto(DeltaErrors.scala:1613)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis.$anonfun$resolveQueryColumnsByName$3(DeltaAnalysis.scala:644)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis.$anonfun$resolveQueryColumnsByName$3$adapted(DeltaAnalysis.scala:641)\n' +
      '\tat scala.collection.Iterator.foreach(Iterator.scala:943)\n' +
      '\tat scala.collection.Iterator.foreach$(Iterator.scala:943)\n' +
      '\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n' +
      '\tat scala.collection.IterableLike.foreach(IterableLike.scala:74)\n' +
      '\tat scala.collection.IterableLike.foreach$(IterableLike.scala:73)\n' +
      '\tat org.apache.spark.sql.types.StructType.foreach(StructType.scala:104)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis.com$databricks$sql$transaction$tahoe$DeltaAnalysis$$resolveQueryColumnsByName(DeltaAnalysis.scala:641)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:93)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:79)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:171)\n' +
      '\tat org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:167)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:171)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:169)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:165)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:161)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:160)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:30)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis.apply(DeltaAnalysis.scala:79)\n' +
      '\tat com.databricks.sql.transaction.tahoe.DeltaAnalysis.apply(DeltaAnalysis.scala:72)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:216)\n' +
      '\tat com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)\n' +
      '\tat scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)\n' +
      '\tat scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)\n' +
      '\tat scala.collection.immutable.List.foldLeft(List.scala:91)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)\n' +
      '\tat scala.collection.immutable.List.foreach(List.scala:431)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:301)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:294)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:196)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:294)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:222)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:184)\n' +
      '\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:126)\n' +
      '\tat org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:184)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:274)\n' +
      '\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:331)\n' +
      '\tat org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:273)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:128)\n' +
      '\tat com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)\n' +
      '\tat org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:265)\n' +
      '\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:265)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:129)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:126)\n' +
      '\tat org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:118)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$2(SparkExecuteStatementOperation.scala:340)\n' +
      '\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$1(SparkExecuteStatementOperation.scala:334)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:327)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.compileQuery(SparkExecuteStatementOperation.scala:334)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:390)\n' +
      '\t... 16 more\n',
    taskStatus: null,
    operationStarted: null,
    operationCompleted: null,
    hasResultSet: null,
    progressUpdateResponse: null,
    numModifiedRows: null
  }
}

Fetching structs containing datetimes fails silently

In this step,

databricks-sql-nodejs/lib/result/JsonResult.ts

Line 121 in c99dfd1

private toJSON(value: any, defaultValue: any): any {

databricks-sql-nodejs/lib/result/JsonResult.ts

Line 89 in c99dfd1

case TTypeId.STRUCT_TYPE:

If the data being parsed contains datetimes, it fails and returns defaultValue. Example error and data are below:

SyntaxError: Unexpected number in JSON at position 39
    at JSON.parse (<anonymous>)
    at JsonResult.toJSON (./databricks-sql-nodejs/dist/result/JsonResult.js:93:25)
    at JsonResult.convertData (./databricks-sql-nodejs/dist/result/JsonResult.js:64:29)
    at ./databricks-sql-nodejs/dist/result/JsonResult.js:47:25
    at Array.map (<anonymous>)
    at JsonResult.getSchemaValues (./databricks-sql-nodejs/dist/result/JsonResult.js:43:35)
    at ./databricks-sql-nodejs/dist/result/JsonResult.js:27:62
    at Array.reduce (<anonymous>)
    at JsonResult.getRows (./databricks-sql-nodejs/dist/result/JsonResult.js:27:28)
    at ./databricks-sql-nodejs/dist/result/JsonResult.js:16:31

The string that's failing to parse is below (partially scrubbed)
'{"id":414247,"created_at":2021-12-21 21:33:59.339,"updated_at":2021-12-21 21:33:59.339,"deleted_at":null,"s3_bucket":"thebucket","s3_key":"c411f24d-1b4a-4eb0-b25b-d2287c7ba3c0"}'

Also, would it make sense to at least log a warning if parsing fails and returns the default value?

Parameterized queries

Hi there

In the examples you insert data in tables like this:

session.executeStatement('INSERT INTO pokes VALUES(123, "Hello, world!"');

Are parameterized queries comming or how would you handle sanitizing this string against sql injections?
We need to insert user input.

Thank you

"Cannot read property 'waitUntilReady' of undefined"

Hi team,

I am using the code example below,

https://docs.databricks.com/dev-tools/nodejs-sql-driver.html#language-JavaScript

but got "Cannot read property 'waitUntilReady' of undefined" recently.

missing errorCode and statusCode on errors

Hi,

in case of error there is no indication in errorCode and statusCode (or maybe also other fields), only the message

response: {
    status: {
      statusCode: 0,
      infoMessages: null,
      sqlState: null,
      errorCode: null,
      errorMessage: null
    },
    operationState: 5,
    sqlState: null,
    errorCode: 0,
    errorMessage: 'org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: \n' +
      'DataType string(35) is not supported.(line 6, pos 28)\n' +
	  ...
	  ...
    taskStatus: null,
    operationStarted: null,
    operationCompleted: null,
    hasResultSet: null,
    progressUpdateResponse: null,
    numModifiedRows: null
  }

THTTPException: Received a response with a bad HTTP status code: 400

Hi! I had TypeError: Converting circular structure to JSON" when calling client.openSession() error. After applying changes from #89 I managed to get the real error:

THTTPException: Received a response with a bad HTTP status code: 400
    at exports.HttpConnection.responseCallback (/workspace/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js:173:26)
    at ClientRequest.connection.responseCallback (/workspace/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js:67:30)
    at Object.onceWrapper (node:events:628:26)
    at ClientRequest.emit (node:events:513:28)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:693:27)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)
    at TLSSocket.socketOnData (node:_http_client:534:22)
    at TLSSocket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)

the credentials are right because I'm refactoring the code that uses odbc + simbaspark driver and it works with the same credentials.

token in demo file

Most examples are properly sanitized, but in one file i found some tokens, that probably shouldnt be there

See: https://github.com/databricks/databricks-sql-nodejs/blob/main/examples/logging.js#L12

Doc issue

In the documentation https://github.com/databricks/databricks-sql-nodejs/blob/main/docs/readme.md#dbsqlsession there's the following code sample:

...
const operation = await session.executeStatement(
    'CREATE TABLE IF NOT EXISTS pokes (foo INT, bar STRING)',
    { runSync: true }
);

I think there's a small mistyping for property runAsync in the optionsobject which is written as runSync

Getting build issue if we give npm run build

> [email protected] build
> tsc -p tsconfig.json

node_modules/@databricks/sql/dist/contracts/IDBSQLSession.d.ts:1:23 - error TS2688: Cannot find type definition file for 'node-int64'.

1 /// <reference types="node-int64" />
                        ~~~~~~~~~~

node_modules/@databricks/sql/dist/DBSQLClient.d.ts:3:8 - error TS1192: Module '"C:/MPM/MPM1/isc-mpm-db-sys-services/src/production/ISC-MPM-db-sys-services/node_modules/@databricks/sql/thrift/TCLIService"' has no default export.

3 import TCLIService from '../thrift/TCLIService';
         ~~~~~~~~~~~

node_modules/@databricks/sql/dist/DBSQLLogger.d.ts:1:8 - error TS1259: Module '"C:/MPM/MPM1/isc-mpm-db-sys-services/src/production/ISC-MPM-db-sys-services/node_modules/winston/index"' can only be default-imported using the 'esModuleInterop' flag

1 import winston, { Logger } from 'winston';
         ~~~~~~~

  node_modules/winston/index.d.ts:219:1
    219 export = winston;
        ~~~~~~~~~~~~~~~~~
    This module is declared with 'export =', and can only be used with a default import when using the 'esModuleInterop' flag.

node_modules/@databricks/sql/dist/dto/InfoValue.d.ts:2:23 - error TS2688: Cannot find type definition file for 'node-int64'.

2 /// <reference types="node-int64" />
                        ~~~~~~~~~~

node_modules/@databricks/sql/dist/hive/HiveDriver.d.ts:1:8 - error TS1192: Module '"C:/MPM/MPM1/isc-mpm-db-sys-services/src/production/ISC-MPM-db-sys-services/node_modules/@databricks/sql/thrift/TCLIService"' has no default export.

1 import TCLIService from '../../thrift/TCLIService';
         ~~~~~~~~~~~

node_modules/@databricks/sql/dist/index.d.ts:2:8 - error TS1192: Module '"C:/MPM/MPM1/isc-mpm-db-sys-services/src/production/ISC-MPM-db-sys-services/node_modules/@databricks/sql/thrift/TCLIService"' has no default export.

2 import TCLIService from '../thrift/TCLIService';

Support for Azure AD tokens instead of PAT tokens

The Python SQL driver supports the use of Azure AD tokens instead of PAT tokens.

Can the NodeJS driver add support for using Azure AD tokens (Bearer) rather than PAT tokens (Basic).

Thank you

uncaughtexception of cluster is temporary unavailable

I want to know if we can anyhow catch the uncaughtexception of cluster state when the cluster is off and we hit the query through executeStatement and return that in API response so that user maybe notified he has to wait?

Setup E2E tests with GitHub Actions

Questions about SQL Query metrics

Hello :)

We are new to using the databricks sql driver and have a question.

After executing a query, is there a way to see the metrics for that query?

We were able to see the metrics in the Databricks Cloud SQL History like the screenshot below.

I would like to receive this metric information after a successful query execution in the SQL driver. (even with the query progress callback if possible)

Do you currently have a way to receive metrics or are there plans to provide a way?

Thank you.

Add Parameters to session.executeStatement for Parameterized Queries

Hello,

We would like to have parameterized queries instead of passing a string literal to session.executeStatement. Perhaps there is a way to do this and I missed it in the documentation.

Something like this would be nice:

const queryOperation = await session.executeStatement('select from table where id = %(id)', { id: '123' });

I am curious what other db drivers do implementation wise to protect against sql injection but those are the protections that I would hope to see.

Much appreciated.

Invalid configuration value detected for fs.azure.account.key - after update

Hi,

After updating this api, I started facing errors in Azure.

My queries before were running fine using the 'utils' package from DBSQLClient, but now, it is giving this error:

Error running query: Failure to initialize configuration Invalid configuration value detected for fs.azure.account.key

From my research, this error is related to authentication in Azure's File Storage.

If this is the case, how can I pass my credentials to correctly execute the queries?
If not, how can I solve this problem?

Thanks.

OperationStateError: The operation failed due to an error
    at OperationStatusHelper.<anonymous> (C:\Users\adrianofe\Documents\incident-handling\node_modules\@databricks\sql\dist\DBSQLOperation\OperationStatusHelper.js:92:27)
    at Generator.next (<anonymous>)
    at fulfilled (C:\Users\adrianofe\Documents\incident-handling\node_modules\@databricks\sql\dist\DBSQLOperation\OperationStatusHelper.js:5:58)
    at processTicksAndRejections (internal/process/task_queues.js:95:5) {
  response: {
    status: {
      statusCode: 0,
      infoMessages: null,
      sqlState: null,
      errorCode: null,
      errorMessage: null
    },
    operationState: 5,
    sqlState: '42000',
    errorCode: 0,
    errorMessage: 'org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:53)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:435)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:257)\n' +
      '\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:52)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:235)\n' +
      '\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:220)\n' +
      '\tat java.security.AccessController.doPrivileged(Native Method)\n' +
      '\tat javax.security.auth.Subject.doAs(Subject.java:422)\n' +

[release] When will GA release be?

Hey guys,

I'm actively using this package and wondering:

When a new release will be published on npm
When this package will be out of beta. Is there a timeline somewhere?

`HiveUtils.fetchAll` fetches one block too many

I noticed that HiveUtils.fetchAll calls driver.fetchResults twice for small operations that fit in a single response. Apparently checkIfOperationHasMoreRows returns true if the response contains any rows regardless of the hasMoreRows value provided by the server.

This might not be a big issue for fetching large datasets but for small operations on high latency connections this is pretty impactful.

Add configuration for http.Agent

Follow-up on #24

Make parameters for http agent configurable by the end user

getTables giving back weird response

Hi,
Running into a bit of an issue while running the getTables function.
const session = await client.openSession( {initialCatalog: 'hive_metastore'} ); const tables = await session.getTables( { catalogName: 'TARGET_DATABASE', schemaName: 'TARGET_SCHEMA', tableTypes: ['TABLE'], } ); console.log(tables) const views = await session.getTables( { catalogName: 'TARGET_DATABASE', schemaName: 'TARGET_SCHEMA', tableTypes: ['VIEW'], } );

Getting this as a response.
DBSQLOperation { driver: HiveDriver { client: { output: [THeaderTransport], pClass: [Function], _seqid: 2, _reqs: {} } }, operationHandle: { operationId: { guid: <Buffer 71 ed 93 70 8c 4b 4d b0 8c 70 b7 5d 6d 8a e3 82>, secret: <Buffer 1f 81 14 76 e4 7c 43 48 ab 8a a2 8d 3d de 51 9a> }, operationType: 4, hasResultSet: true, modifiedRowCount: null }, logger: DBSQLLogger { transports: { console: [Console], file: [File] }, logger: DerivedLogger { _readableState: [ReadableState], readable: true, _events: [Object: null prototype], _eventsCount: 3, _maxListeners: undefined, _writableState: [WritableState], writable: true, allowHalfOpen: true, _transformState: [Object], silent: undefined, format: [Format], defaultMeta: null, levels: [Object], level: 'info', exceptions: [ExceptionHandler], rejections: [RejectionHandler], profilers: {}, exitOnError: true, [Symbol(kCapture)]: false } }, _status: OperationStatusHelper { statusFactory: StatusFactory {}, state: 2, hasResultSet: true, driver: HiveDriver { client: [Object] }, operationHandle: { operationId: [Object], operationType: 4, hasResultSet: true, modifiedRowCount: null }, operationStatus: { status: [Object], operationState: 2, sqlState: null, errorCode: null, errorMessage: null, taskStatus: null, operationStarted: null, operationCompleted: null, hasResultSet: null, progressUpdateResponse: null, numModifiedRows: null, displayMessage: null, diagnosticInfo: null } }, _schema: SchemaHelper { statusFactory: StatusFactory {}, metadata: { status: [Object], schema: [Object], resultFormat: 1, lz4Compressed: null, arrowSchema: null, cacheLookupResult: null, uncompressedBytes: null, compressedBytes: null }, driver: HiveDriver { client: [Object] }, operationHandle: { operationId: [Object], operationType: 4, hasResultSet: true, modifiedRowCount: null } }, _data: FetchResultsHelper { fetchOrientation: 4, statusFactory: StatusFactory {}, prefetchedResults: [ [Object] ], hasMoreRows: false, driver: HiveDriver { client: [Object] }, operationHandle: { operationId: [Object], operationType: 4, hasResultSet: true, modifiedRowCount: null } }, _completeOperation: CompleteOperationHelper { statusFactory: StatusFactory {}, closed: true, cancelled: false, driver: HiveDriver { client: [Object] }, operationHandle: { operationId: [Object], operationType: 4, hasResultSet: true, modifiedRowCount: null } } }

Add ESLint

maxRows does not appear to limit returned chunk size

package version 1.0.0
nodejs 16.8
os linux

Hello again,

I'm scratching my head about the use of maxRows. I am testing its use, but a get back the whole result no matter what value I set.

For example, I've set an artificially low number for maxRows here:

let result = await queryOperation.fetchChunk({ maxRows: 500 });
console.log(result.length);

But in my test example, this will print 8544 as the length of the result, which is the full result of the query.

I am trying to implement a chunked consumption like so.

do {
    let result = await queryOperation.fetchChunk({ maxRows: 1000 });
    myStream.write(result);
} while (await queryOperation.hasMoreRows());

This never loops, because it fetches all rows the first time, and hasMoreRows returns false.

Is there something I'm missing about how maxRows, hasMoreRows and fetchChunk should work?

Thanks very much.

Slow Response Time

Hi,

Currently the response time that I got from NodeJS is between 3 to 5 seconds slower than the same query ran in the Databricks UI, e.g. in Databricks I have 0.5s, and in other side with NodeJS connector I got 4.5s.

There is anything that I can do to improve this?

I followed the documentation avaiable here: https://docs.databricks.com/dev-tools/nodejs-sql-driver.html

Thank you

Patch-Package security vulnerability

Dependabot flagged this project because it requires patch-package, which has a YAML dependency on the current version. Looks like patch-package 7.0.0 has the fix.

I can submit a PR, but I don't super understand what the package is accomplishing, so I don't know how to test if it broke anything.

Native vs. SQL-92 syntax

When connecting via ODBC, Databricks accepts a parameter to specify SQL dialect via the UseNativeQuery=1 parameter. Is it possible to do the equivalent, either at the connection, session, or operation level, in this Node library? If not, is this likely to come in the future?

"TypeError: Converting circular structure to JSON" when calling client.openSession()

package version 1.0.0
nodejs 16.8
os linux

Hi folks, I'm having trouble opening a session.

I'm essentially using the example code in your readme:

let statement = `...`;
const client = new DBSQLClient();
client
  .connect(connOptions)
  .then(async (client) => {
    const session = await client.openSession();
    const queryOperation = await session.executeStatement(statement, { runAsync: true });
    const result = await queryOperation.fetchAll();
    await queryOperation.close();
    console.table(result);
    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.log(error);
  });

The client is successfully created, but I get the following error:

{
  "date": "Tue Nov 01 2022 11:28:52 GMT-0700 (Pacific Daylight Time)",
  "error": {},
  "exception": true,
  "level": "error",
  "message": "uncaughtException: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\nTypeError: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\n    at JSON.stringify (<anonymous>)\n    at exports.HttpConnection.<anonymous> (/***/node_modules/@databricks/sql/dist/DBSQLClient.js:92:69)\n    at exports.HttpConnection.emit (node:events:394:28)\n    at exports.HttpConnection.responseCallback (/***/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js:173:12)\n    at ClientRequest.connection.responseCallback (/***/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js:67:30)\n    at Object.onceWrapper (node:events:514:26)\n    at ClientRequest.emit (node:events:394:28)\n    at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:621:27)\n    at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)\n    at TLSSocket.socketOnData (node:_http_client:487:22)",
  "os": { "loadavg": [0.24, 0.32, 0.35], "uptime": 519547.65 },
  "process": {
    "argv": [
      "/nix/store/p6h1iz5lcvfz1hg1z497ypa22rv52xxx-nodejs-16.8.0/bin/node",
      "/***/dbHandlers/test2.js"
    ],
    "cwd": "/***",
    "execPath": "/nix/store/p6h1iz5lcvfz1hg1z497ypa22rv52xxx-nodejs-16.8.0/bin/node",
    "gid": 100,
    "memoryUsage": {
      "arrayBuffers": 206093,
      "external": 2085823,
      "heapTotal": 17436672,
      "heapUsed": 10880032,
      "rss": 55578624
    },
    "pid": 355355,
    "uid": 1000,
    "version": "v16.8.0"
  },
  "stack": "TypeError: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\n    at JSON.stringify (<anonymous>)\n    at exports.HttpConnection.<anonymous> (/***/node_modules/@databricks/sql/dist/DBSQLClient.js:92:69)\n    at exports.HttpConnection.emit (node:events:394:28)\n    at exports.HttpConnection.responseCallback (/***/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js:173:12)\n    at ClientRequest.connection.responseCallback (/***/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js:67:30)\n    at Object.onceWrapper (node:events:514:26)\n    at ClientRequest.emit (node:events:394:28)\n    at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:621:27)\n    at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)\n    at TLSSocket.socketOnData (node:_http_client:487:22)",
  "trace": [
    {
      "column": null,
      "file": null,
      "function": null,
      "line": null,
      "method": null,
      "native": false
    },
    {
      "column": null,
      "file": null,
      "function": "JSON.stringify",
      "line": null,
      "method": "stringify",
      "native": false
    },
    {
      "column": 69,
      "file": "/***/node_modules/@databricks/sql/dist/DBSQLClient.js",
      "function": null,
      "line": 92,
      "method": null,
      "native": false
    },
    {
      "column": 28,
      "file": "node:events",
      "function": "exports.HttpConnection.emit",
      "line": 394,
      "method": "emit",
      "native": false
    },
    {
      "column": 12,
      "file": "/***/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js",
      "function": "exports.HttpConnection.responseCallback",
      "line": 173,
      "method": "responseCallback",
      "native": false
    },
    {
      "column": 30,
      "file": "/***/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js",
      "function": "ClientRequest.connection.responseCallback",
      "line": 67,
      "method": "responseCallback",
      "native": false
    },
    {
      "column": 26,
      "file": "node:events",
      "function": "Object.onceWrapper",
      "line": 514,
      "method": "onceWrapper",
      "native": false
    },
    {
      "column": 28,
      "file": "node:events",
      "function": "ClientRequest.emit",
      "line": 394,
      "method": "emit",
      "native": false
    },
    {
      "column": 27,
      "file": "node:_http_client",
      "function": "HTTPParser.parserOnIncomingClient [as onIncoming]",
      "line": 621,
      "method": "parserOnIncomingClient [as onIncoming]",
      "native": false
    },
    {
      "column": 17,
      "file": "node:_http_common",
      "function": "HTTPParser.parserOnHeadersComplete",
      "line": 128,
      "method": "parserOnHeadersComplete",
      "native": false
    },
    {
      "column": 22,
      "file": "node:_http_client",
      "function": "TLSSocket.socketOnData",
      "line": 487,
      "method": "socketOnData",
      "native": false
    }
  ]
}

I get this error whether the cluster is running or not.

I tried reverting back to 0.1.8-beta.2 but I get a different failure at the same step (it logs the whole request object and overflows my terminal, and terminates without throwing an error).

I've also tried various other node versions.

The type of session should be exported

I am trying to import and use all the right types and found that the type returned by DBSQLClient.openSession(): Promise<IHiveSession> is not exported at the top level, so I can't import it.

Update: I see that there's been some epic refactoring going on and it now returns a DBSQLSession. It would be really helpful if that could be exported in the same way that DBSQLClient is here =>

databricks-sql-nodejs/lib/index.ts

Line 22 in a2a4b32

export { DBSQLClient };

It probably makes sense to export DBSQLOperation as well.

I note that the last build published to npm was 0.1.8-beta.1 a couple months ago. I realise its beta but still keen to use the latest changes.

Happy to do a PR for this small contribution if you agree it is a useful change.

client.openSession() returning "bad request"

Hi,

I'm trying to open a session, but the method is returning "bad request" even after client gets a connection

This is my code

const {
    DBSQLClient, thrift
} = require('@databricks/sql');

const RECONNECT_ATTEMPTS = 50;
const RECONNECT_TIMEOUT = 3000; // millisecond

const client = new DBSQLClient();

client.on('close', () => {
    console.error('[Connection Lost]');

    connect(RECONNECT_ATTEMPTS).catch(error => {
        console.error('[Connection Failed]', error);
    });
});

const connect = (attempts) => new Promise((resolve, reject) => {
    setTimeout(() => {
        client.connect({
            host: "adb-xxxxx.azuredatabricks.net",
            path: "sql/protocolv1/endpoint",
            token: "dapixxxx",
        }).then((client) => {
            console.log('Connected successfully!'); // until here it's okay
            resolve(client);
        }, (error) => {
            console.error('[Connection Failed] attempt:' + attempts, error.message);

            if (!attempts) {
                reject(error);
            } else {
                connect(attempts - 1).then(resolve, reject);
            }
        });
    }, RECONNECT_TIMEOUT);
})

connect(RECONNECT_ATTEMPTS).then(async client => {
    // work with client
    console.log(`conected...`)

    const session = await client.openSession(); // the client can't open a session - but the client is connected.
}, (error) => {
    console.error('[Connection Failed]', error);
});

HiveDriverError: Unsupported result format: undefined

Any one idea how can i resolve it.
please help to resolved out

Array<TIMESTAMP> Returns an empty list

Hi,

When I SELECT * from a table with columns containing an Array of TIMESTAMPS I always get back an empty list. Array of STRING seems to work perfectly fine as well as columns of type TIMESTAMP.

TypeError: Cannot read properties of undefined (reading 'binaryVal')

getting an error when I try to get columns of the table.

version: @databricks/[email protected]

TypeError: Cannot read properties of undefined (reading 'binaryVal')
    at JsonResult.getColumnValue (/Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:108:23)
    at JsonResult.getSchemaValues (/Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:40:34)
    at /Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:28:62
    at Array.reduce (<anonymous>)
    at JsonResult.getRows (/Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:28:28)
    at /Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:17:31
    at Array.reduce (<anonymous>)
    at JsonResult.getValue (/Users/jani/code/lightdash/node_modules/@databricks/sql/dist/result/JsonResult.js:15:26)
    at getResult (/Users/jani/code/lightdash/node_modules/@databricks/sql/dist/DBSQLOperation/getResult.js:16:20)
    at /Users/jani/code/lightdash/node_modules/@databricks/sql/dist/DBSQLOperation/index.js:48:56

code that I am running:

query = await session.getColumns({
    catalogName: request.database,
    schemaName: request.schema,
    tableName: request.table,
});

const result = await query.fetchAll()

client.openSession() not running

Hi,
Currently running this through my express server with the same format as put in...

client
  .connect({
    host: serverHostname,
    path: httpPath,
    token: token,
  })
  .then(async (client) => {
    const session = await client.openSession();
    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.error(error.message);
    console.error(error.stack);
  });

Getting this as my response

{"date":"Thu Jan 12 2023 20:16:20 GMT-0500 (Eastern Standard Time)","error":{},"exception":true,"level":"error","message":"uncaughtException: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\nTypeError: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\n    at JSON.stringify (<anonymous>)\n    at exports.HttpConnection.<anonymous> (/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/DBSQLClient.js:92:69)\n    at exports.HttpConnection.emit (events.js:315:20)\n    at exports.HttpConnection.responseCallback (/Users/adobayua/Downloads/Picasso/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js:173:12)\n    at ClientRequest.connection.responseCallback (/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js:67:30)\n    at Object.onceWrapper (events.js:422:26)\n    at ClientRequest.emit (events.js:315:20)\n    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:632:27)\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:117:17)\n    at TLSSocket.socketOnData (_http_client.js:501:22)","os":{"loadavg":[3.14306640625,3.54052734375,4.97265625],"uptime":64697},"process":{"argv":["/usr/local/bin/node","/Users/adobayua/Downloads/Picasso/server.js","--ignore","client"],"cwd":"/Users/adobayua/Downloads/Picasso","execPath":"/usr/local/bin/node","gid":20,"memoryUsage":{"arrayBuffers":47660,"external":1480305,"heapTotal":27160576,"heapUsed":9508992,"rss":43745280},"pid":54902,"uid":503,"version":"v14.0.0"},"stack":"TypeError: Converting circular structure to JSON\n    --> starting at object with constructor 'TLSSocket'\n    |     property 'parser' -> object with constructor 'HTTPParser'\n    --- property 'socket' closes the circle\n    at JSON.stringify (<anonymous>)\n    at exports.HttpConnection.<anonymous> (/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/DBSQLClient.js:92:69)\n    at exports.HttpConnection.emit (events.js:315:20)\n    at exports.HttpConnection.responseCallback (/Users/adobayua/Downloads/Picasso/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js:173:12)\n    at ClientRequest.connection.responseCallback (/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js:67:30)\n    at Object.onceWrapper (events.js:422:26)\n    at ClientRequest.emit (events.js:315:20)\n    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:632:27)\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:117:17)\n    at TLSSocket.socketOnData (_http_client.js:501:22)","trace":[{"column":null,"file":null,"function":null,"line":null,"method":null,"native":false},{"column":null,"file":null,"function":"JSON.stringify","line":null,"method":"stringify","native":false},{"column":69,"file":"/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/DBSQLClient.js","function":null,"line":92,"method":null,"native":false},{"column":20,"file":"events.js","function":"exports.HttpConnection.emit","line":315,"method":"emit","native":false},{"column":12,"file":"/Users/adobayua/Downloads/Picasso/node_modules/thrift/lib/nodejs/lib/thrift/http_connection.js","function":"exports.HttpConnection.responseCallback","line":173,"method":"responseCallback","native":false},{"column":30,"file":"/Users/adobayua/Downloads/Picasso/node_modules/@databricks/sql/dist/connection/connections/HttpConnection.js","function":"ClientRequest.connection.responseCallback","line":67,"method":"responseCallback","native":false},{"column":26,"file":"events.js","function":"Object.onceWrapper","line":422,"method":"onceWrapper","native":false},{"column":20,"file":"events.js","function":"ClientRequest.emit","line":315,"method":"emit","native":false},{"column":27,"file":"_http_client.js","function":"HTTPParser.parserOnIncomingClient [as onIncoming]","line":632,"method":"parserOnIncomingClient [as onIncoming]","native":false},{"column":17,"file":"_http_common.js","function":"HTTPParser.parserOnHeadersComplete","line":117,"method":"parserOnHeadersComplete","native":false},{"column":22,"file":"_http_client.js","function":"TLSSocket.socketOnData","line":501,"method":"socketOnData","native":false}]}

Also getting a SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data that I am logging.

Any help would be much appreciated.
Is there also any way to just log the error, it seems like theres a lot of distorted stuff in the log.
Thanks!

getTables method does not honor the tableTypes in the request and also returns blank for table_type in response

I realize there is some refactoring going on so this might be out of date or you might already be aware of it.

The signature of the method HiveSession.getTables() looks like:

IHiveSession.getTables(request: TablesRequest): Promise<IOperation>

... after all the handling is done the response looks something like this:

[
  {
    TABLE_CAT: 'hive_metastore',
    TABLE_SCHEM: 'sqltools_databricks_driver',
    TABLE_NAME: 'parent',
    TABLE_TYPE: '',
    REMARKS: 'UNKNOWN',
    TYPE_CAT: null,
    TYPE_SCHEM: null,
    TYPE_NAME: null,
    SELF_REFERENCING_COL_NAME: null,
    REF_GENERATION: null
  },
  {
    TABLE_CAT: 'hive_metastore',
    TABLE_SCHEM: 'sqltools_databricks_driver',
    TABLE_NAME: 'parent_view',
    TABLE_TYPE: '',
    REMARKS: 'UNKNOWN',
    TYPE_CAT: null,
    TYPE_SCHEM: null,
    TYPE_NAME: null,
    SELF_REFERENCING_COL_NAME: null,
    REF_GENERATION: null
  }
]

Notice that both of the returned objects have TABLE_TYPE: ''

I am calling it something like the following.

    const tables = await session.getTables(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableTypes: ['TABLE'], 
        }
    );
    await handle(tables);

    const views = await session.getTables(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableTypes: ['VIEW'],
        }
    );
    await handle(views);

I notice it makes no difference if I use a value in the tableTypes or if I leave that out completely, note that it is optional. The docs / code as inherited from the Hive project sadly don't explain the usage here- if it is supposed to filter or not. I assume that is its purpose.

export declare type TablesRequest = {
    catalogName?: string;
    schemaName?: string;
    tableName?: string;
    tableTypes?: Array<string>;
};

I also note that the getTableTypes() method returns this, so I am confident that 'TABLE' or 'VIEW' are the correct values here:

[ { TABLE_TYPE: 'TABLE' }, { TABLE_TYPE: 'VIEW' } ]

side note

I was hoping to find some type definitions of these various responses, if they exist. I would appreciate being pointed
to them, otherwise I am recreating them just through testing them and looking at the responses. There is a fair bit of
indirection in the codebase and the method that eventually runs has an any return type (sadface)
OperationResult.getValue(): any

My main task here is to extract some of these fields and use in a different library's interface so I need to know what types to expect.

for example getColumns() returns some nulls and I wonder if these should be union types

type columnResponse = [
    {
        TABLE_CAT: string // 'hive_metastore',
        TABLE_SCHEM: string // 'sqltools_databricks_driver',
        TABLE_NAME: string // 'parent',
        COLUMN_NAME: string // 'id',
        DATA_TYPE: number // 4,
        TYPE_NAME: string //'INT',
        COLUMN_SIZE: number // 4,
        BUFFER_LENGTH: any // null,
        DECIMAL_DIGITS: number // 0,
        NUM_PREC_RADIX: number // 10,
        NULLABLE: number // 0,
        REMARKS: string // '',
        COLUMN_DEF: any // null,
        SQL_DATA_TYPE: any //  null,
        SQL_DATETIME_SUB: any //  null,
        CHAR_OCTET_LENGTH: any //  null,
        ORDINAL_POSITION: 0,
        IS_NULLABLE: string // 'YES',
        SCOPE_CATALOG: any //  null,
        SCOPE_SCHEMA: any //  null,
        SCOPE_TABLE: any //  null,
        SOURCE_DATA_TYPE: any //  null,
        IS_AUTO_INCREMENT: string // 'NO'
      }
]

Reproducing the issue

To quickly repro this I did something similar to the E2E test, but I've added a .env file and using the dotenv package to autoconfigure them.

.env file

SQLTOOLS_DATABRICKS_HOST='**********.cloud.databricks.com'
SQLTOOLS_DATABRICKS_PATH='/sql/1.0/endpoints/************'
SQLTOOLS_DATABRICKS_TOKEN='***********TOKENMCTOKENFACE****'

*test script, just running this like npx ts-node test_script.ts

import { DBSQLClient } from '@databricks/sql';
import IOperation from '@databricks/sql/dist/contracts/IOperation';
import dotenv from 'dotenv';

const utils = DBSQLClient.utils;
const client = new DBSQLClient();
dotenv.config();

const TARGET_DATABASE = 'hive_metastore';
const TARGET_SCHEMA = 'sqltools_databricks_driver';
const TARGET_TABLE = 'parent';
const TARGET_VIEW = 'parent_view';

type catalogsResponse = [
    {
        TABLE_SCHEM: string
    }
];

type schemasResponse = [
    {
        TABLE_SCHEM: string
        TABLE_CATALOG: string
    }
];

type tableTypesResponse = [
    {
        TABLE_TYPE: string
    }
];

type tablesResponse = [
    {
        TABLE_CAT: string // 'hive_metastore',
        TABLE_SCHEM: string // 'sqltools_databricks_driver',
        TABLE_NAME: string // 'parent',
        TABLE_TYPE: string // '', // why is this blank?
        REMARKS: string // 'UNKNOWN',
        TYPE_CAT: string // null,
        TYPE_SCHEM: string // null,
        TYPE_NAME: string // null,
        SELF_REFERENCING_COL_NAME: string // null,
        REF_GENERATION: string // null
      }
];

type columnResponse = [
    {
        TABLE_CAT: string // 'hive_metastore',
        TABLE_SCHEM: string // 'sqltools_databricks_driver',
        TABLE_NAME: string // 'parent',
        COLUMN_NAME: string // 'id',
        DATA_TYPE: number // 4,
        TYPE_NAME: string //'INT',
        COLUMN_SIZE: number // 4,
        BUFFER_LENGTH: any // null,
        DECIMAL_DIGITS: number // 0,
        NUM_PREC_RADIX: number // 10,
        NULLABLE: number // 0,
        REMARKS: string // '',
        COLUMN_DEF: any // null,
        SQL_DATA_TYPE: any //  null,
        SQL_DATETIME_SUB: any //  null,
        CHAR_OCTET_LENGTH: any //  null,
        ORDINAL_POSITION: 0,
        IS_NULLABLE: string // 'YES',
        SCOPE_CATALOG: any //  null,
        SCOPE_SCHEMA: any //  null,
        SCOPE_TABLE: any //  null,
        SOURCE_DATA_TYPE: any //  null,
        IS_AUTO_INCREMENT: string // 'NO'
      }
]

type response = (
    catalogsResponse | schemasResponse | tableTypesResponse |
    tablesResponse | columnResponse
);

async function handle(queryOperation: IOperation, logResult: boolean = true): Promise<response> {
    await utils.waitUntilReady(queryOperation, false, () => {});
    await utils.fetchAll(queryOperation);
    await queryOperation.close();
    const result = utils.getResult(queryOperation).getValue();
    if (logResult) {
        console.log(result);
    }

    return Promise.resolve(result);

};

client
  .connect({
    host:  `${process.env.SQLTOOLS_DATABRICKS_HOST}`,
    path:  `${process.env.SQLTOOLS_DATABRICKS_PATH}`,
    token: `${process.env.SQLTOOLS_DATABRICKS_TOKEN}`,
  })
  .then(async (client) => {
    const session = await client.openSession();

    const createSchema = await session.executeStatement(`create schema if not exists hive_metastore.sqltools_databricks_driver`, {runAsync: true});
    await handle(createSchema, false);

    const createTable = await session.executeStatement(`
    create or replace table hive_metastore.sqltools_databricks_driver.parent (
        id int not null,
        name string not null,
        desc string not null
      )`
      ,{runAsync: true}
    );
    await handle(createTable, false);

    const insertTable = await session.executeStatement(`
    insert into hive_metastore.sqltools_databricks_driver.parent (id, name, desc)
    values
    (1, 'hey', 'yo'),
    (1, 'whut', 'noway')`
      ,{runAsync: true}
    );
    await handle(insertTable, false);

    const createView = await session.executeStatement(`
    create or replace view hive_metastore.sqltools_databricks_driver.parent_view as
    select * from hive_metastore.sqltools_databricks_driver.parent`
    ,{runAsync: true}
    );
    await handle(createView, false);

    const databases = await session.getCatalogs();
    await handle(databases);

    const schemas = await session.getSchemas(
        { //schemaName: 'default', // don't provide this
        catalogName: TARGET_DATABASE}
    );
    await handle(schemas);

    const tabletypes = await session.getTableTypes();
    await handle(tabletypes);

    const tables = await session.getTables(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableTypes: ['TABLE'], // no MATERIALIZED_VIEW ...yet
        }
    );
    await handle(tables);

    const views = await session.getTables(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableTypes: ['VIEW'], // no MATERIALIZED_VIEW ...yet
        }
    );
    await handle(views);

    const table_columns = await session.getColumns(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableName: TARGET_TABLE,
            //tableTypes: [TABLE, VIEW, MATERIALIZED_VIEW],
        }
    )
    await handle(table_columns);

    const view_columns = await session.getColumns(
        {
            catalogName: TARGET_DATABASE,
            schemaName: TARGET_SCHEMA,
            tableName: TARGET_VIEW,
            //tableTypes: [TABLE, VIEW, MATERIALIZED_VIEW],
        }
    )
    await handle(view_columns);

    //Nonsense: compulsory functionName arg means this returns single function info
    //          not a list
    //const functions = await session.getFunctions(
    //    {
    //        functionName: '',
    //        catalogName: 'hive_metastore', //samples
    //        schemaName: '',
    //    }
    //)
    //await handle(functions);

    // requires unity catalog
    // const primarykeys = await session.getPrimaryKeys(
    //     {
    //         schemaName: TARGET_SCHEMA,
    //         tableName: TARGET_TABLE,
    //         catalogName: TARGET_DATABASE, // - this is optional?? madness.
    //     }
    // )
    // await handle(primarykeys);

    // requires unity catalog
    // const foreignkeys = await session.getCrossReference(
    //     {
    //         parentCatalogName: '',
    //         parentSchemaName: '',
    //         parentTableName: '',
    //         foreignCatalogName: '',
    //         foreignSchemaName: '',
    //         foreignTableName: '',
    //     }
    // )
    // await handle(foreignkeys);

    const cleanUp = await session.executeStatement(
        `drop schema hive_metastore.sqltools_databricks_driver cascade`
        ,{runAsync: true}
    );
    await handle(cleanUp, false);

    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.log(error);
  });

Bug: fetch Throwing Cast Exception

We noticed an odd behavior today that was working yesterday. We can make any sql call such as

const queryOperation = await session.executeStatement('select * from table limit 10', { runAsync: true });

and it will work once and only once. Follow up calls of the same sql will result in a 'org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.Row' exception. Uniquely new sql calls we haven't invoked before will work once before throwing this exception in subsequent calls. Narrowing it down, it's failing in the firstFetch function which is returning a response with a status like this:

response.status { statusCode: 3, infoMessages: ['*java.lang.ClassCastException:org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.Row:122:121',], sqlState: '08000', errorCode: null, errorMessage: 'org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.Row' }

Given this function is a fetch, I suspect the issue may not be with the library itself but the databricks instance its communicating with. Curious if anyone has seen anything like this. If it's not the databricks npm package, Im not sure where to go to debug from here. Any suggestions would be appreciated.

Simplify basic usage (hide Thrift details)

Current basic usage example:

const driver = require('databricks-sql-node');
const { TCLIService, TCLIService_types } = driver.thrift;
const client = new driver.DBSQLClient(
    TCLIService,
    TCLIService_types
);

client.connect({
    host: '********.databricks.com',
    path: '/sql/1.0/endpoints/****************',
    token: 'dapi********************************',
}).then(async client => {
    const session = await client.openSession();
    const response = await session.getInfo(
        TCLIService_types.TGetInfoType.CLI_DBMS_VER
    );

    console.log(response.getValue());

    await session.close();
}).catch(error => {
    console.log(error);
});

Let's simplify it to be:

const driver = require('databricks-sql-node');
const client = new driver.DBSQLClient();

await client.connect({
    host: '********.databricks.com',
    path: '/sql/1.0/endpoints/****************',
    token: 'dapi********************************',
});

const session = await client.openSession();
const response = await session.executeStatement("SELECT 1");

// TODO: probably need to change the following
console.log(response.getValue());

await session.close();

Update CHANGELOG

Just start a new one.

UPDATE with MERGE INTO and Temporary View - Execute Multiple Queries Statements

I need to join some tables to update data, and seems that the best approach to do it in Databricks is with MERGE INTO. However, it works fine in Databricks UI, but with Node JS connector it doesn't work because has multiples queries statements (divided by semicolons ";"). It's like this:

DROP VIEW IF EXISTS tempView; 

CREATE TEMPORARY VIEW tempView AS
    SELECT ... FROM table1 t1 JOIN... WHERE xyz="..." ;

MERGE INTO table2 t2 USING tempView ON t1.id = t2.id
 WHEN MATCHED THEN 
     UPDATE SET
         column1 = ..., 
         column2 = ...

How can I execute this query with NodeJS connector?

Getting "TypeError: process.on is not a function"

Not sure if i am doing something wrong, but I have a VueJS website where i want to make use of databricks-sql-nodejs and anytime i try to call the code i wrote, i get this error "TypeError: process.on is not a function". Attached is a screenshot and my example code that gets imported into a page. Upon trying to call the exported function, i get the error. I am making the assumption that because the error mentioned DBSQLLogger at the end, that it is related to the library but maybe that is not correct? any ideas?

I am using Node 16.5.0

import { DBSQLClient } from "@databricks/sql";

const host = 'xxx'
const hostPath = 'xxx'
const token = 'xxx'

async function GetExecutionNotebooks(clientName) {
    const dbSqlClient = new DBSQLClient();
    
    dbSqlClient.connect({
        host: host,
        path: hostPath,
        token: token
    })
    .then(async (client)=>{
        const session = await client.openSession();

        const sql = `select client, notebook, enabled, final, job_id from xx.xxx where client="${clientName}"`
        const queryOperation = await session.executeStatement(sql, { runAsync: true });
        const result = await queryOperation.fetchAll();
        await queryOperation.close();
    
        console.table(result);
    
        await session.close();
        await client.close();
    })
    .catch((error)=>{
        console.log(error);
    })

}

export { GetExecutionNotebooks }

Does session.close() and client.close() actually close the session/client connection?

Hello,

I've been doing some testing with the code from this repository as we're planning to use it in production.

I used the example from the readme and I just added two lines of code (I close the session AND client before executing the query operation). However, I don't know if I have missed something basic or if I'm doing something wrong but the below code still prints out 'Hello, World!'.

const { DBSQLClient } = require('@databricks/sql');

const client = new DBSQLClient();

client
  .connect({
    host: '***',
    path: '***',
    token: '***'
  })
  .then(async (client) => {
    const session = await client.openSession();

    // added code
    await session.close();
    await client.close();

    const queryOperation = await session.executeStatement('SELECT "Hello, World!"', { runAsync: true });
    const result = await queryOperation.fetchAll();
    await queryOperation.close();

    console.table(result);

    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.log(error);
  });

Is this intended to work like that?

Default for `maxRows` is too small

The default number of rows to fetch is currently 100. Since fetching each block causes a roundtrip to the server, fetching large data sets (even moderately small ones) takes a lot of time. High latency amplifies this problem.

I had a case where I was trying to fetch 11.000 rows over a high latency connection and with default settings it took me 1:40min. Changing the default to 10.000 dropped the time to fetch the data to 6s.

I'd suggest to increase the default to 10.000 or 100.000.

Call DBSQLClient.connect with host option set to sql/protocolv1/o/1234567890123456/1234-567890-abcdefgh for a cluster.

Actual:

Get a bunch of pseudo-tracing output when trying to get info from session.executeStatement.

Expected:

Get output that resembles something like the session.ExecuteStatement results, for example a table/row representation.

Fix:

When I change the host option set to /sql/protocolv1/o/1234567890123456/1234-567890-abcdefgh (prepend a forward slash), I get the expected output.

Note:

SQL warehouses already prepend a forward slash, for example /sql/1.0/endpoints/a1b234c5678901d2.

databricks / databricks-sql-nodejs Goto Github PK

databricks-sql-nodejs's Introduction

Databricks SQL Driver for Node.js

Description

Requirements

Installation

Usage

Run Tests

Unit tests

e2e tests

Contributing

Issues

License

databricks-sql-nodejs's People

Contributors

Stargazers

Watchers

Forkers

databricks-sql-nodejs's Issues

Summary

Reproduction

My code

Error Message

side note

Reproducing the issue

Recommend Projects

Recommend Topics

Recommend Org

Jobs