GithubHelp home page GithubHelp logo

goccy / bigquery-emulator Goto Github PK

View Code? Open in Web Editor NEW
710.0 11.0 82.0 425 KB

BigQuery emulator server implemented in Go

License: MIT License

Go 99.40% Dockerfile 0.21% Makefile 0.21% Python 0.18%
go bigquery golang emulator google-cloud gcp google-cloud-platform

bigquery-emulator's Introduction

BigQuery Emulator

build and test GoDoc

BigQuery emulator server implemented in Go.
BigQuery emulator provides a way to launch a BigQuery server on your local machine for testing and development.

Features

  • If you can choose the Go language as BigQuery client, you can launch a BigQuery emulator on the same process as the testing process by httptest .
  • BigQuery emulator can be built as a static single binary and can be launched as a standalone process. So, you can use the BigQuery emulator from programs written in non-Go languages or such as the bq command, by specifying the address of the launched BigQuery emulator.
  • BigQuery emulator utilizes SQLite for storage. You can select either memory or file as the data storage destination at startup, and if you set it to file, data can be persisted.
  • You can load seeds from a YAML file on startup

Status

Although this project is still in beta version, many features are already available.

BigQuery API

We've been implemented all the BigQuery APIs except the API to manipulate IAM resources. It is possible that some options are not supported, in which case please report them in an Issue.

Google Cloud Storage linkage

BigQuery emulator supports loading data from Google Cloud Storage and extracting table data. Currently, only CSV and JSON data types can be used for extracting. If you use Google Cloud Storage emulator, please set STORAGE_EMULATOR_HOST environment variable.

BigQuery Storage API

Supports gRPC-based read/write using BigQuery Storage API. Supports both Apache Avro and Arrow formats.

Google Standard SQL

BigQuery emulator supports many of the specifications present in Google Standard SQL. For example, it has the following features.

  • 200+ standard functions
  • Wildcard table
  • Templated Argument Function
  • JavaScript UDF

If you want to know the specific features supported, please see here

Goals and Sponsors

The goal of this project is to build a server that behaves exactly like BigQuery from the BigQuery client's perspective. To do so, we need to support all features present in BigQuery ( Model API / Connection API / INFORMATION SCHEMA etc.. ) in addition to evaluating Google Standard SQL.

However, this project is a personal project and I develop it on my days off and after work. I work full time and maintain a lot of OSS. Therefore, the time available for this project is also limited. Of course, I will be adding features and fixing bugs on a regular basis to get us closer to our goals, but if you want me to implement the features you want, please consider sponsoring me. Of course, you can use this project for free, but if you sponsor me, that will be my motivation. Especially if you are part of a commercial company and could use this project, I'd be glad if you could consider sponsoring me at the same time.

Install

If Go is installed, you can install the latest version with the following command

$ go install github.com/goccy/bigquery-emulator/cmd/bigquery-emulator@latest

The BigQuery emulator depends on go-zetasql. This library takes a very long time to install because it automatically builds the ZetaSQL library during install. It may look like it hangs because it does not log anything during the build process, but if the clang process is running in the background, it is working fine, so just wait it out. Also, for this reason, the following environment variables must be enabled for installation.

CGO_ENABLED=1
CXX=clang++

You can also download the docker image with the following command

$ docker pull ghcr.io/goccy/bigquery-emulator:latest

You can also download the darwin(amd64) and linux(amd64) binaries directly from releases

How to start the standalone server

If you can install the bigquery-emulator CLI, you can start the server using the following options.

$ ./bigquery-emulator -h
Usage:
  bigquery-emulator [OPTIONS]

Application Options:
      --project=        specify the project name
      --dataset=        specify the dataset name
      --port=           specify the http port number. this port used by bigquery api (default: 9050)
      --grpc-port=      specify the grpc port number. this port used by bigquery storage api (default: 9060)
      --log-level=      specify the log level (debug/info/warn/error) (default: error)
      --log-format=     specify the log format (console/json) (default: console)
      --database=       specify the database file if required. if not specified, it will be on memory
      --data-from-yaml= specify the path to the YAML file that contains the initial data
  -v, --version         print version

Help Options:
  -h, --help            Show this help message

Start the server by specifying the project name

$ ./bigquery-emulator --project=test
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

If you want to use docker image to start emulator, specify like the following.

$ docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test
  • If you are using an M1 Mac ( and Docker Desktop ) you may get a warning. In that case please use --platform linux/x86_64 option.

How to use from bq client

1. Start the standalone server

$ ./bigquery-emulator --project=test --data-from-yaml=./server/testdata/data.yaml
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060
  • server/testdata/data.yaml is here

2. Call endpoint from bq client

$ bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM dataset1.table_a WHERE id = 1"

+----+-------+---------------------------------------------+------------+----------+---------------------+
| id | name  |                  structarr                  |  birthday  | skillNum |     created_at      |
+----+-------+---------------------------------------------+------------+----------+---------------------+
|  1 | alice | [{"key":"profile","value":"{\"age\": 10}"}] | 2012-01-01 |        3 | 2022-01-01 12:00:00 |
+----+-------+---------------------------------------------+------------+----------+---------------------+

How to use from python client

1. Start the standalone server

$ ./bigquery-emulator --project=test --dataset=dataset1
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

2. Call endpoint from python client

Create ClientOptions with api_endpoint option and use AnonymousCredentials to disable authentication.

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig

client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
client = bigquery.Client(
  "test",
  client_options=client_options,
  credentials=AnonymousCredentials(),
)
client.query(query="...", job_config=QueryJobConfig())

If you use a DataFrame as the download destination for the query results, You must either disable the BigQueryStorage client with create_bqstorage_client=False or create a BigQueryStorage client that references the local grpc port (default 9060).

https://cloud.google.com/bigquery/docs/samples/bigquery-query-results-dataframe?hl=en

result = client.query(sql).to_dataframe(create_bqstorage_client=False)

or

from google.cloud import bigquery_storage

client_options = ClientOptions(api_endpoint="0.0.0.0:9060")
read_client = bigquery_storage.BigQueryReadClient(client_options=client_options)
result = client.query(sql).to_dataframe(bqstorage_client=read_client)

Synopsis

If you use the Go language as a BigQuery client, you can launch the BigQuery emulator on the same process as the testing process.
Please imports github.com/goccy/bigquery-emulator/server ( and github.com/goccy/bigquery-emulator/types ) and you can use server.New API to create the emulator server instance.

See the API reference for more information: https://pkg.go.dev/github.com/goccy/bigquery-emulator

package main

import (
  "context"
  "fmt"

  "cloud.google.com/go/bigquery"
  "github.com/goccy/bigquery-emulator/server"
  "github.com/goccy/bigquery-emulator/types"
  "google.golang.org/api/iterator"
  "google.golang.org/api/option"
)

func main() {
  ctx := context.Background()
  const (
    projectID = "test"
    datasetID = "dataset1"
    routineID = "routine1"
  )
  bqServer, err := server.New(server.TempStorage)
  if err != nil {
    panic(err)
  }
  if err := bqServer.Load(
    server.StructSource(
      types.NewProject(
        projectID,
        types.NewDataset(
          datasetID,
        ),
      ),
    ),
  ); err != nil {
    panic(err)
  }
  if err := bqServer.SetProject(projectID); err != nil {
    panic(err)
  }
  testServer := bqServer.TestServer()
  defer testServer.Close()

  client, err := bigquery.NewClient(
    ctx,
    projectID,
    option.WithEndpoint(testServer.URL),
    option.WithoutAuthentication(),
  )
  if err != nil {
    panic(err)
  }
  defer client.Close()
  routineName, err := client.Dataset(datasetID).Routine(routineID).Identifier(bigquery.StandardSQLID)
  if err != nil {
    panic(err)
  }
  sql := fmt.Sprintf(`
CREATE FUNCTION %s(
  arr ARRAY<STRUCT<name STRING, val INT64>>
) AS (
  (SELECT SUM(IF(elem.name = "foo",elem.val,null)) FROM UNNEST(arr) AS elem)
)`, routineName)
  job, err := client.Query(sql).Run(ctx)
  if err != nil {
    panic(err)
  }
  status, err := job.Wait(ctx)
  if err != nil {
    panic(err)
  }
  if err := status.Err(); err != nil {
    panic(err)
  }

  it, err := client.Query(fmt.Sprintf(`
SELECT %s([
  STRUCT<name STRING, val INT64>("foo", 10),
  STRUCT<name STRING, val INT64>("bar", 40),
  STRUCT<name STRING, val INT64>("foo", 20)
])`, routineName)).Read(ctx)
  if err != nil {
    panic(err)
  }

  var row []bigquery.Value
  if err := it.Next(&row); err != nil {
    if err == iterator.Done {
        return
    }
    panic(err)
  }
  fmt.Println(row[0]) // 30
}

Debugging

If you have specified a database file when starting bigquery-emulator, you can check the status of the database by using the zetasqlite-cli tool. See here for details.

How it works

BigQuery Emulator Architecture Overview

After receiving ZetaSQL Query via REST API from bq or Client SDK for each language, go-zetasqlite parses and analyzes the ZetaSQL Query to output AST. After generating a SQLite query from the AST, go-sqite3 is used to access the SQLite Database.

Type Conversion Flow

BigQuery has a number of types that do not exist in SQLite (e.g. ARRAY and STRUCT). In order to handle them in SQLite, go-zetasqlite encodes all types except INT64 / FLOAT64 / BOOL with the type information and data combination and stores them in SQLite. When using the encoded data, decode the data via a custom function registered with go-sqlite3 before use.

Reference

Regarding the story of bigquery-emulator, there are the following articles.

License

MIT

bigquery-emulator's People

Contributors

cabljac avatar daku10 avatar dependabot[bot] avatar goccy avatar gyuta avatar kitagry avatar marcellustavares avatar ohaibbq avatar seanmcnealy avatar shohei-ihaya avatar tomrussello avatar totem3 avatar tyamagu2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigquery-emulator's Issues

Crash when MIN/MAX/COUNT aggregate functions are used as window functions

I am running into issues when I try to use certain aggregate functions as window functions. For example, for this query:

SELECT MIN(a) OVER (PARTITION BY b)
FROM UNNEST([STRUCT(1 AS a, 2 AS b)]);

I get this error:

internal error	{"error": "failed to format query SELECT MIN(a) OVER (PARTITION BY b) AS min_a FROM UNNEST([STRUCT(1 AS a, 2 as b)]);: min function is unimplemented"}

I am seeing a similar crash for MAX and COUNT functions as well, but not AVG and SUM.

Thanks!

Ambiguous column name when creating table from select

This Project looks awesome!

Im hoping to use this to test the transformation queries in our ETL pipeline. A pretty common use case is to select specific columns from two source tables joined on an id / target_id foreign key.

I spun up this quick POC with some simple test data and ran into issues when attempting to join two tables within a CREATE OR REPLACE TABLE where the join condition contains a column that appears on both tables (in this case id).

from pprint import pprint
from typing import List

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig
from google.cloud.exceptions import NotFound


class EmulatorPOC:
    def __init__(self, project_id: str) -> None:
        self.client = self._client(project_id)
        self.project_id = project_id

    def _client(self, project_id: str):
        client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
        client = bigquery.Client(
            project_id,
            client_options=client_options,
            credentials=AnonymousCredentials(),
        )

        return client

    def _json_schema_to_bq_schema(self, schema: List[dict]) -> List:
        return [
            bigquery.SchemaField(item.get("name"), item.get("type"), mode="REQUIRED")
            for item in schema
        ]

    def create_table(
        self, dataset_id: str, table_id: str, schema: List[dict]
    ) -> bigquery.Table:
        table_name = f"{self.project_id}.{dataset_id}.{table_id}"
        try:
            table = self.client.get_table(table_name)
        except NotFound:
            bq_schema = self._json_schema_to_bq_schema(schema)
            table = bigquery.Table(table_name, schema=bq_schema)
            table = self.client.create_table(table)

        return table

    def create_dataset(self, dataset_id: str) -> bigquery.Dataset:
        dataset_ref = f"{self.project_id}.{dataset_id}"
        try:
            dataset = self.client.get_dataset(dataset_ref)
        except NotFound:
            dataset = bigquery.Dataset(dataset_ref)
            dataset.location = "US"
            dataset = self.client.create_dataset(dataset)

        return dataset

    def poc(self, dataset_id: str, source_table_id: str, join_table_id: str):
        # Create dataset
        dataset = self.create_dataset(dataset_id=dataset_id)

        # Create tables
        source_table = self.create_table(
            dataset_id=dataset_id,
            table_id=source_table_id,
            schema=[
                {"name": "id", "type": "INTEGER"},
                {"name": "key", "type": "STRING"},
                {"name": "thing", "type": "STRING"},
            ],
        )

        join_table = self.create_table(
            dataset_id=dataset_id,
            table_id=join_table_id,
            schema=[
                {"name": "id", "type": "INTEGER"},
                {"name": "key", "type": "STRING"},
                {"name": "another_thing", "type": "STRING"},
                {"name": f"{source_table_id}_id", "type": "INTEGER"},
            ],
        )

        # Insert data
        self.client.insert_rows_json(
            source_table,
            [{"id": 1, "key": "key_123", "thing": "something"}],
        )

        self.client.insert_rows_json(
            join_table,
            [{"id": 1, "key": "key_123", "another_thing": "something_else", f"{source_table_id}_id": 1}],
        )

        # Select thing / another_thing and join on source.id = join.source_id
        transform_statement = f"""
        CREATE OR REPLACE TABLE {self.project_id}.{dataset_id}.test_transformation AS (
        SELECT
            t.thing,
            j.another_thing,
        FROM {self.project_id}.{dataset_id}.{source_table_id} t
        JOIN {self.project_id}.{dataset_id}.{join_table_id} j ON t.id = j.{source_table_id}_id
        );
        """

        # Execute query to create table
        transformed_table = self.client.query(
            query=transform_statement, job_config=QueryJobConfig()
        ).result()

        # Select results and return them
        result = self.client.query(query=f"select * from {self.project_id}.{dataset_id}.test_transformation", job_config=QueryJobConfig())

        return [dict(row) for row in result]



resp = EmulatorPOC("poc").poc("emulator_poc", "source_table", "join_table")
pprint(resp)

image

As I am only selecting specific cols from the source tables I don't believe this should be an issue. To confirm I ran this same script connecting to a sandbox BQ instance in gcp and it produced the expected resulting transformation table.

Also, if I rename the join table id column to id_ this executes without issue, so the underlying CREATE OR REPLACE TABLE appears to work just fine when a duplicated column is not specified in a join condition.

Thanks again for putting this project together! Let me know if this is something you are able to replicate and if you think it can / should be addressed.

Cheers

Docker Image Causes API to Timeout

When I download the bigquery emulator using go install ..., I am able to hit the API after starting up the server with the CLI bigquery-emulator. However, when I run

docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test

I only receive timeout errors when trying to hit the API. For example, when trying to create a dataset in python

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery

client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
client = bigquery.Client(
  "test",
  client_options=client_options,
  credentials=AnonymousCredentials(),
)
client.create_dataset(bigquery.Dataset("test.dataset1"), timeout=30)

will just timeout, but on the bigquery-emulator CLI, I am succesfully able to create a dataset.

However, this may also be because I am running this on an M1 Macbook, since when I run docker run, i get

>>> docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

So I instead did

docker run --platform linux/amd64 -it ghcr.io/goccy/bigquery-emulator:latest --project=test
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

to make the warning go away, but the issue still persisted. But maybe this is because there is no support for the bigquery emulator for the M1 arm images?

Thanks!

[Question] STRUCT support in emulator

Hey there, Thank you for your work on BQ emulator, we are really missing it in our development :(

So i am having a little issue with STRUCT datatype. I am trying to create such table (using nodejs sdk, link below)

And this works fine for other data types, but for STRUCT it fails:(

Some additional resources:

Example script that i use to create table:

script.mjs

import { BigQuery } from '@google-cloud/bigquery';

const schema = {
  "fields": [
    {
      "name": "tenantId",
      "type": "STRING",
      "mode": "REQUIRED"
    },
    {
      "name": "transactions",
      "type": "RECORD",
      "mode": "REPEATED",
      "fields": [
        {
          "name": "transactionType",
          "type": "STRING",
          "mode": "NULLABLE"
        }
     }
  ]
}

const bq = new BigQuery({
  projectId: 'project-id',
  apiEndpoint: 'http://localhost:9050',
});

const dataset = bq.dataset('local_testing_dataset');
if (!(await dataset.exists())[0]) {
  await bq.createDataset('local_testing_dataset');
}

const table = dataset.table('transactions_table');
if (!(await table.exists())[0]) {
   await dataset.createTable('transactions_table', { schema });
}

Note: script is written for node 16 & es modules, so no run it you have to have node16 and have to name file exactly script.mjs

Logs with errors

[bigquery-emulator] listening at 0.0.0.0:9050
{"L":"INFO","T":"2022-08-10T09:19:34.399Z","C":"server/middleware.go:41","M":"GET /bigquery/v2/projects/project-id/datasets/local_testing_dataset","query":"prettyPrint=false"}
{"L":"INFO","T":"2022-08-10T09:19:34.402Z","C":"server/middleware.go:155","M":"dataset is not found","datasetID":"local_testing_dataset"}
{"L":"INFO","T":"2022-08-10T09:19:34.408Z","C":"server/middleware.go:41","M":"POST /bigquery/v2/projects/project-id/datasets","query":"prettyPrint=false"}
{"L":"INFO","T":"2022-08-10T09:19:34.422Z","C":"server/middleware.go:41","M":"GET /bigquery/v2/projects/project-id/datasets/local_testing_dataset/tables/transactions_table","query":"prettyPrint=false"}
{"L":"INFO","T":"2022-08-10T09:19:34.425Z","C":"server/middleware.go:205","M":"table is not found","tableID":"transactions_table"}
{"L":"INFO","T":"2022-08-10T09:19:34.432Z","C":"server/middleware.go:41","M":"POST /bigquery/v2/projects/project-id/datasets/local_testing_dataset/tables","query":"prettyPrint=false"}
{"L":"ERROR","T":"2022-08-10T09:19:34.442Z","C":"server/handler.go:40","M":"internal error","error":"failed to create table CREATE TABLE `transactions_table` (`tenantId` STRING,`transactions` STRUCT): failed to analyze query: CREATE TABLE `transactions_table` (`tenantId` STRING,`transactions` STRUCT): INVALID_ARGUMENT: Syntax error: Expected \"<\" but got \")\" [at 1:75]"}

Docker Compose with Emulator

version: "3.3"
services:
  bigquery:
    env_file: .env
    image: ghcr.io/goccy/bigquery-emulator:latest
    ports:
      - '9050:9050'
    command: ["/bin/bigquery-emulator", "--port=9050", "--project=project-id", "--log-level=debug", "--log-format=json"]

How to build docker image for M1 processor?

Hello, first of all thank you for this amazing project!

I'm trying to build using the Dockerfile provided and I'm getting this error, not sure how to proceed to fix.

Here is the log:

> [stage-0 7/7] RUN make emulator/build:
#0 0.188 CGO_ENABLED=1 CXX=clang++ go build -o bigquery-emulator \
#0 0.188        -ldflags='-s -w -X main.version=latest -X main.revision=353f38c -linkmode external -extldflags "-static"' \
#0 0.188        ./cmd/bigquery-emulator
#0 805.4 # github.com/goccy/bigquery-emulator/cmd/bigquery-emulator
#0 805.4 /usr/local/go/pkg/tool/linux_arm64/link: running clang++ failed: exit status 1
#0 805.4 /usr/bin/ld: /usr/bin/ld: DWARF error: could not find variable specification at offset 62f34c
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 6ef417
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 6f16b9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e4b4f7
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e5329a
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e53315
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e53374
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e533d3
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e5341a
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e53461
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e66d11
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset e67919
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fa8c72
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd21fe
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd26c2
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd2a08
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd2d4e
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd3094
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd33da
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd3720
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd3a66
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd3df9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset fd418c
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1060adf
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10632b0
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1063a71
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1078783
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 108bc96
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 108befd
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 108c166
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10e8ab6
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10ee31e
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10eeff5
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f0771
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f29d8
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f348f
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f3f54
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f49e1
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f5a9e
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f67cd
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f746e
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10f8950
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10fcbf5
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10fd6ba
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 10fe171
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1100611
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11027ba
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1102c0d
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1102e5c
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11030af
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 110330a
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1103579
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11037e8
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11039e9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1103bec
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1103df3
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104004
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104216
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11044f9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11047de
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104b8c
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104c05
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104c63
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1104ca9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1158e10
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1159939
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 115a54b
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 115b277
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 115bf30
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1164bd2
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1164d92
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 116507a
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11768c5
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1177ca5
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1179160
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 117a0e9
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 117a96f
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 117b2d5
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 117c2cf
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 117cf32
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11883c8
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a020b
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a0ebd
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a196b
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a2263
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a3167
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11a4103
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11e1fa3
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11ee64b
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11f271a
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11f4a57
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11f5900
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11f67b4
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 11fefe3
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 1200093
#0 805.4 /usr/bin/ld: DWARF error: could not find variable specification at offset 123b56c
#0 805.4 /tmp/go-link-3564573297/000024.o: in function `export_zetasql_uprv_dl_open':
#0 805.4 /go/pkg/mod/github.com/goccy/[email protected]/internal/ccall/icu/common/putil.cpp:2294: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000054.o: in function `mygetgrouplist':
#0 805.4 /_/os/user/getgrouplist_unix.go:15: warning: Using 'getgrouplist' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000053.o: in function `mygetgrgid_r':
#0 805.4 /_/os/user/cgo_lookup_unix.go:37: warning: Using 'getgrgid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000053.o: in function `mygetgrnam_r':
#0 805.4 /_/os/user/cgo_lookup_unix.go:42: warning: Using 'getgrnam_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000053.o: in function `mygetpwnam_r':
#0 805.4 /_/os/user/cgo_lookup_unix.go:32: warning: Using 'getpwnam_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000053.o: in function `mygetpwuid_r':
#0 805.4 /_/os/user/cgo_lookup_unix.go:27: warning: Using 'getpwuid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/ld: /tmp/go-link-3564573297/000004.o: in function `_cgo_6cc2654a8ed3_C2func_getaddrinfo':
#0 805.4 /tmp/go-build/cgo-gcc-prolog:58: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
#0 805.4 /usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../aarch64-linux-gnu/libc.a(init-first.o): in function `__libc_init_first':
#0 805.4 (.text+0x14): relocation truncated to fit: R_AARCH64_LD64_GOTPAGE_LO15 against undefined symbol `_dl_starting_up'
#0 805.4 /usr/bin/ld: (.text+0x14): warning: too many GOT entries for -fpic, please recompile with -fPIC
#0 805.4 clang: error: linker command failed with exit code 1 (use -v to see invocation)
#0 805.4 
#0 805.8 make: *** [Makefile:9: emulator/build] Error 2

I'm having performance issues using the default image, as Docker for Mac emulates the amd64 environment, that's why I'm trying this operation.

Thanks!

Array-type results causing crashes in the Python client

When I run any query with an ARRAY type column in a result, the Python client (I am using 2.34.4, an older version) crashes with this trace:

  File ".../lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1522, in result
    self._query_results.schema,
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/query.py", line 967, in schema
    return _parse_schema_resource(self._properties.get("schema", {}))
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/schema.py", line 356, in _parse_schema_resource
    return [SchemaField.from_api_repr(f) for f in info.get("fields", ())]
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/schema.py", line 356, in <listcomp>
    return [SchemaField.from_api_repr(f) for f in info.get("fields", ())]
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/schema.py", line 148, in from_api_repr
    field_type = api_repr["type"].upper()
KeyError: 'type'

Here is a script that produces this result:

from typing import Iterator

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job = self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()


if __name__ == '__main__':
    project_id_ = 'test_project'

    client = BigQueryClientWrapper(project_id=project_id_)

    client.run_query(f"SELECT [1, 2, 3] as a;")

The same script does not crash for me when I swap out the bigquery.Client for one that is querying against a real BigQuery project.

Thank you!

Want to see Resource Information detail

There does not seem to be much information available in the emulator.
In my use case, requires Job and Table information.
Can you make it so that I get the same information as BigQuery?

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables

Job in BigQuery

$ bq query --use_legacy_sql=false --project_id=my-project --job_id=dummy-job-id 'select * from `dataset1`.`table_a`'
[
  {
    "id": "1", 
    "name": "alice"
  }, 
  {
    "id": "2", 
    "name": "bob"
  }
]
$ bq show --project_id=my-project --job=true dummy-job-id
{
  "configuration": {
    "jobType": "QUERY", 
    "query": {
      "destinationTable": {
        "datasetId": "<datasetId>", 
        "projectId": "my-project", 
        "tableId": "<tableId>"
      }, 
      "priority": "INTERACTIVE", 
      "query": "select * from `dataset1`.`table_a`", 
      "useLegacySql": false, 
      "writeDisposition": "WRITE_TRUNCATE"
    }
  }, 
  "etag": "<etag>", 
  "id": "my-project:<location>.dummy-job-id", 
  "jobReference": {
    "jobId": "dummy-job-id", 
    "location": "<location>", 
    "projectId": "my-project"
  }, 
  "kind": "bigquery#job", 
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/my-project/jobs/dummy-job-id?location=<location>", 
  "statistics": {
    "creationTime": "1662625182766", 
    "endTime": "1662625182927", 
    "query": {
      "cacheHit": true, 
      "statementType": "SELECT", 
      "totalBytesBilled": "0", 
      "totalBytesProcessed": "0"
    }, 
    "startTime": "1662625182805", 
    "totalBytesProcessed": "0"
  }, 
  "status": {
    "state": "DONE"
  }, 
  "user_email": "<my email address>"
}

Job in emulator

$ bq --api=http://localhost:9050/ query --use_legacy_sql=false --project_id=test --job_id=dummy-job-id 'select * from `dataset1`.`table_a`'
[
  {
    "id": "1", 
    "name": "alice"
  }, 
  {
    "id": "2", 
    "name": "bob"
  }
]
$ bq --api=http://localhost:9050/ show --project_id=test --job=true dummy-job-id{
  "configuration": {
    "query": {
      "query": "select * from `dataset1`.`table_a`", 
      "useLegacySql": false
    }
  }, 
  "jobReference": {
    "jobId": "dummy-job-id", 
    "projectId": "test"
  }, 
  "status": {
    "state": "DONE"
  }
}

emulator logs

2022-09-08T09:08:50.000Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-09-08T09:09:04.985Z	INFO	server/middleware.go:41	POST /projects/test/jobs	{"query": "alt=json"}
2022-09-08T09:09:04.993Z	INFO	contentdata/repository.go:144		{"query": "select * from `dataset1`.`table_a`", "values": []}
2022-09-08T09:09:04.994Z	DEBUG	contentdata/repository.go:210	query result	{"rows": [[1,"alice"],[2,"bob"]]}
2022-09-08T09:09:05.007Z	INFO	server/middleware.go:41	GET /projects/test/queries/dummy-job-id	{"query": "timeoutMs=0&startIndex=0&alt=json&maxResults=100"}

2022-09-08T09:09:10.764Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-09-08T09:09:25.770Z	INFO	server/middleware.go:41	GET /projects/test/jobs/dummy-job-id	{"query": "alt=json"}

Table in BigQuery

$ bq show --project_id=my-project dataset1.table_a
{
  "creationTime": "1662527315669", 
  "etag": "<etag>", 
  "id": "my-project:dataset1.table_a", 
  "kind": "bigquery#table", 
  "lastModifiedTime": "1662625126801", 
  "location": "<location>", 
  "numActiveLogicalBytes": "28", 
  "numActivePhysicalBytes": "3853", 
  "numBytes": "28", 
  "numLongTermBytes": "0", 
  "numLongTermLogicalBytes": "0", 
  "numLongTermPhysicalBytes": "0", 
  "numRows": "2", 
  "numTimeTravelPhysicalBytes": "1071", 
  "numTotalLogicalBytes": "28", 
  "numTotalPhysicalBytes": "3853", 
  "schema": {
    "fields": [
      {
        "mode": "NULLABLE", 
        "name": "id", 
        "type": "INTEGER"
      }, 
      {
        "mode": "NULLABLE", 
        "name": "name", 
        "type": "STRING"
      }
    ]
  }, 
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/my-project/datasets/dataset1/tables/table_a", 
  "tableReference": {
    "datasetId": "dataset1", 
    "projectId": "my-project", 
    "tableId": "table_a"
  }, 
  "type": "TABLE"
}

Table in emulator

$ bq --api=http://localhost:9050/ show --project_id=test dataset1.table_a
{}

emulator logs

2022-09-08T09:06:49.803Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-09-08T09:07:04.792Z	INFO	server/middleware.go:41	GET /projects/test/datasets/dataset1/tables/table_a	{"query": "alt=json"}

Feature request: Implemented features table

First of all thanks for your amazing work on this project.

It would be nice to know which features are implemented and which are not yet / impossible to implement (maybe a table in readme?).

I think this will be very helpful for everyone who is considering to use the emulator.

Support providing project and schema from env vars

Thanks for this great project!

For users who run the emulator as a docker container or even inside a kubernetes cluster, it makes things more convenient to provide arguments through env vars.

  • --project can be replaced as is with a simple string env var $PROJECT_NAME for example.
  • --data-from-yml would perhaps be better replaced with a env var named $SCHEMA where the actual contents of the schema yaml can be placed. It saves the need to mount a volume with the yml file that holds the schema which makes things a bit more complex.

Freezing on Begin Transaction

Example POST to /bigquery/v2/projects/{project-id}/jobs

{
    "configuration": {
        "query": {
            "defaultDataset": {
                "datasetId": "<my dataset ID>"
            },
            "query": "begin transaction;",
            "useLegacySql": false,
            "useQueryCache": true,
            "scriptOptions.keyResultStatement": "FIRST_SELECT"
        }
    },
    "jobReference": {
        "jobId": "SimbaJDBC_Job_12312312312313"
    }
}

This is a query originating from a spring JPA Repository using the Simba Bigquery Driver. The emulator never responds to this request and from that point forward can't respond to other requests, and I have to kill it and restart.

Content-Type not set to application/json (does not work when using a dart client)

When inserting data using the dart bigquery API the client crashes.

Unhandled exception:
ApiRequestError(message: Unable to read response with content-type text/plain; charset=utf-8.)
#0      ApiRequester.request (package:_discoveryapis_commons/src/clients.dart:86:9)
<asynchronous suspension>
#1      DatasetsResourceApi.insert (package:googleapis/bigquery/v2.dart:234:32)

The problem seems to be that although the response is JSON the Content-Type header is text/plain. You can see the dart code checking for this here:

Stream<String>? _decodeStreamAsText(http.StreamedResponse response) {
  // TODO: Correctly handle the response content-types, using correct
  // decoder.
  // Currently we assume that the api endpoint is responding with json
  // encoded in UTF8.
  if (isJson(response.headers['content-type'])) {
    return response.stream.transform(const Utf8Decoder(allowMalformed: true));
  } else {
    return null;
  }
}

Would suggest fixing by setting Content-Type: application/json; charset=utf-8 on all JSON responses.

Problem inserting into a REPEATED RECORD column

Hi, I'm in a project that uses a metadata column of type repeated record:

bq show project:dataset.table
...
+- metadata: record (repeated)
 |  |- name: string (required)
 |  |- value: string

I've been able to create our table with bigquery-emulator but I'm not able to get the values properly from the metadata column.

./bigquery-emulator --project=test_project --log-level=debug
# ...
bq --api http://0.0.0.0:9050 mk --project_id=test_project --dataset test_dataset 

bq --api http://0.0.0.0:9050 mk --project_id=test_project --table test_project:test_dataset.test_metadata ./metadata_schema.json
bq --api http://0.0.0.0:9050 show --project_id=test_project --schema test_project:test_dataset.test_metadata
# [{"fields":[{"mode":"REQUIRED","name":"name","type":"STRING"},{"mode":"NULLABLE","name":"value","type":"STRING"}],"mode":"REPEATED","name":"metadata","type":"RECORD"}]
bq --api http://0.0.0.0:9050 query --project_id=test_project 'INSERT INTO test_dataset.test_metadata (metadata) VALUES (ARRAY[ STRUCT("FOO","BAR") ])'
bq --api http://0.0.0.0:9050 query --project_id=test_project 'SELECT * FROM test_dataset.test_metadata'
# +--------------+
# |  metadata    |
# +--------------+
# | [{"":"BAR"}] |
# +--------------+

The schema file is as follows:

[
  {
    "fields": [
      {
        "mode": "REQUIRED",
        "name": "name",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "value",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "metadata",
    "type": "RECORD"
  }
]

and when I query our test table, I get:

+-----------+-----------------------------------------+
|  service  |        metadata                         |
+-----------+-----------------------------------------+
| XxxxXxxx  | [{"name":"NNNNNNN","value":"VVVVVVV1"}] |
| XxxxXxxx  | [{"name":"NNNNNNN","value":"VVVVVVV2"}] |
+-----------+-----------------------------------------+

Am I doing the insert correctly and can I achieve the same result as google bigquery?

bq.py can't handle notFound error

For example, bq --api="http://localhost:9050" --project_id="test" mk --dataset test (with ./bigquery-emulator --project test) failed with following error:

BigQuery error in mk operation: Could not connect with BigQuery server.
Http response status: 404
Http response content:
b'dataset test is not found'

bq checks if dataset/table/etc already exists before creating it. Of course bigquery-emulator already returns 404, but bq can't handle plain text error.

fmt.Fprintf(w, "job %s is not found", jobID)

I fixed errors by following https://cloud.google.com/bigquery/docs/error-messages with minimal example:

--- a/server/middleware.go
+++ b/server/middleware.go
diff --git a/server/middleware.go b/server/middleware.go
index 7a51ae9..25a8d8a 100644
--- a/server/middleware.go
+++ b/server/middleware.go
@@ -153,8 +153,18 @@ func withDatasetMiddleware() func(http.Handler) http.Handler {
                                dataset := project.Dataset(datasetID)
                                if dataset == nil {
                                        logger.Logger(ctx).Info("dataset is not found", zap.String("datasetID", datasetID))
+                                       w.Header().Add("Content-Type", "application/json")
                                        w.WriteHeader(http.StatusNotFound)
-                                       fmt.Fprintf(w, "dataset %s is not found", datasetID)
+                                       encodeResponse(ctx, w, map[string]interface{}{
+                                               "error": map[string]interface{}{
+                                                       "errors": []interface{}{
+                                                               map[string]interface{}{
+                                                                       "message": fmt.Sprintf("dataset %s is not found", datasetID),
+                                                                       "reason":  "notFound",
+                                                               },
+                                                       },
+                                               },
+                                       })
                                        return
                                }
                                ctx = withDataset(ctx, dataset)

UDF with table name doesn't work

Hi, I found that when UDF and table name like dataset.table_a exists in query at the same time, it doesn't work.

A sample is here.

package main

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
	"github.com/goccy/bigquery-emulator/server"
	"github.com/goccy/bigquery-emulator/types"
	"google.golang.org/api/iterator"
	"google.golang.org/api/option"
)

func main() {
	ctx := context.Background()

	const (
		projectName = "test"
	)

	bqServer, err := server.New(server.TempStorage)
	if err != nil {
		panic(err)
	}
	if err := bqServer.Load(
		server.StructSource(
			types.NewProject(
				projectName,
				types.NewDataset(
					"dataset1",
					types.NewTable(
						"table_a",
						[]*types.Column{
							types.NewColumn("id", types.INTEGER),
							types.NewColumn("name", types.STRING),
						},
						types.Data{
							{
								"id":   1,
								"name": "alice",
							},
							{
								"id":   2,
								"name": "bob",
							},
						},
					),
				),
			),
		),
	); err != nil {
		panic(err)
	}
	testServer := bqServer.TestServer()
	defer func() {
		testServer.Close()
		bqServer.Close()
	}()

	client, err := bigquery.NewClient(
		ctx,
		projectName,
		option.WithEndpoint(testServer.URL),
		option.WithoutAuthentication(),
	)
	if err != nil {
		panic(err)
	}
	defer client.Close()

	query := client.Query(`
CREATE TEMP FUNCTION DoubleFn(x INT64)
RETURNS INT64
AS (
	x + x
);
SELECT
	id
FROM
	dataset1.table_a;
`)

	it, err := query.Read(ctx)
	if err != nil {
		panic(err)
	}
	for {
		var row []bigquery.Value
		if err := it.Next(&row); err != nil {
			if err == iterator.Done {
				break
			}
			panic(err)
		}
		fmt.Println("row = ", row)
	}
	if err := client.Dataset("dataset1").DeleteWithContents(ctx); err != nil {
		panic(err)
	}
}

and got

panic: googleapi: Error 400: INVALID_ARGUMENT: Table not found: dataset1.table_a [at 10:9], jobInternalError

goroutine 1 [running]:
main.main()
        /workspaces/bigquery-emulator/main.go:109 +0x907
exit status 2

Of couce, change query to

	query := client.Query(`
SELECT
	id
FROM
	dataset1.table_a;
`)

or to

	query := client.Query(`
CREATE TEMP FUNCTION DoubleFn(x INT64)
RETURNS INT64
AS (
	x + x
);
SELECT
	id
FROM
	UNNEST([1,2]) AS id;
`)

works properly.

and got

row =  [1]
row =  [2]

Can I use INSERT DML?

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language

BigQuery supports DML statements.
It can be run with the bq command line tool.

$ bq query --use_legacy_sql=false --project_id=my-project 'insert into `dataset1`.`table_a` (`id`, `name`) values (3, "cook")'
Waiting on bqjob_r1064c0f0fe670c54_00000183165a9c07_1 ... (0s) Current status: DONE   
Number of affected rows: 1

But when I run INSERT DML on emulator, it does not return any results.

$ bq --api=http://localhost:9050/ query --use_legacy_sql=false --project_id=test 'insert into `dataset1`.`table_a` (`id`, `name`) values (3, "cook")'
^C

The emulator outputs the following logs.

2022-09-07T05:13:48.496Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-09-07T05:14:03.481Z	INFO	server/middleware.go:41	POST /projects/test/jobs	{"query": "alt=json"}
2022-09-07T05:14:03.489Z	INFO	contentdata/repository.go:144		{"query": "insert into `dataset1`.`table_a` (`id`, `name`) values (3, \"cook\")", "values": []}

schema

image

data

$ bq --api=http://localhost:9050/ query --project_id=test 'select * from `dataset1`.`table_a`'
[
  {
    "id": "1", 
    "name": "alice"
  }, 
  {
    "id": "2", 
    "name": "bob"
  }
]

Odd cast to string from numeric value

Hi again, now I have a numeric field with a value of 1234.567 and I want to get a sum of the values as string with CAST(SUM(quantity) as STRING) as quantity to show elsewhere. From bq + Google (BigQuery) I get the usual 1234.567 decimal-looking sum but from the emulator, I get 1234567/1000 (which is technically correct but it looks like some evaluation is missing).

Crash using MIN/MAX as window functions with a date value

I'm seeing a crash when I try to use MIN/MAX as a window function with a date value as an argument, i.e. a query of this form:

SELECT MAX(some_date_column) OVER (PARTITION BY another_column) FROM `project.dataset.table`;

Here is a script that reproduces the issue:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def get_table(self, dataset_id: str, table_id: str) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        return self.client.get_table(table_ref)

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )

    def load_data(
            self,
            dataset_id: str,
            table_id: str,
            rows: List[Dict[str, Any]],
    ) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        table = self.client.get_table(table_ref)
        errors = self.client.insert_rows(table, rows)
        if errors:
            raise RuntimeError(
                f"Failed to insert rows into {dataset_ref.dataset_id}.{table_id}:\n"
                + "\n".join(str(error) for error in errors)
            )


if __name__ == '__main__':
    project_id_ = 'test-project'
    dataset_id = 'my_dataset'
    table_id = 'my_table'
    schema = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.DATE.value,
            mode="REQUIRED",
        ),
        bigquery.SchemaField(
            "b",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id)
    client.create_table(dataset_id, table_id, schema)

    client.load_data(dataset_id, table_id, [{"a": "2022-01-01", "b": 1}])

    # This query does not crash
    client.run_query(f"SELECT MIN(b) OVER (PARTITION BY a) FROM `test-project.{dataset_id}.{table_id}`;")

    # This query does crash with 'no such function: zetasqlite_window_min_date'
    client.run_query(f"SELECT MIN(a) OVER (PARTITION BY b) FROM `test-project.{dataset_id}.{table_id}`;")

Logs:

2022-08-29T23:53:39.372Z	ERROR	server/handler.go:40	internal error	{"error": "failed to query SELECT `$analytic1#4` FROM (SELECT `a#1`,`b#2`,( SELECT zetasqlite_window_min_date(`a#1`,zetasqlite_window_partition_string(`b#2`),zetasqlite_window_order_by_string(`b#2`, true),zetasqlite_window_frame_unit_string(1),zetasqlite_window_boundary_start_string(1, 0),zetasqlite_window_boundary_end_string(5, 0),zetasqlite_window_rowid_string(`row_id`)) FROM (SELECT `a` AS `a#1`,`b` AS `b#2` FROM `test-project_my_dataset_my_table`) ) AS `$analytic1#4` FROM (SELECT *, ROW_NUMBER() OVER() AS `row_id` FROM (SELECT `a` AS `a#1`,`b` AS `b#2` FROM `test-project_my_dataset_my_table`)) ORDER BY `b#2`,`a#1`): no such function: zetasqlite_window_min_date"}

This is a follow-up to #19, which did correctly implement MIN/MAX/COUNT as window functions for integer values.

Thank you!

[Question] does not supported Positional Parameters?

Thanks for the goccy/bigquery-emulator, it is a very useful feature.

Can I use BigQuery Positional Parameters?
https://cloud.google.com/bigquery/docs/samples/bigquery-query-params-positional

In my environment, an error has occurred. (Using docker image)

$ bq --api http://0.0.0.0:9050 query --use_legacy_sql=false --parameter=:integer:1 --project_id=test "select * from dataset1.table_a where id = ?"
You have encountered a bug in the BigQuery CLI. Please file a bug report in our public issue tracker:
https://issuetracker.google.com/issues/new?component=187149&template=0
Please include a brief description of the steps that led to this issue, as well as any rows that can be made public from the following information:

========================================
== Platform ==
  CPython:2.7.16:Darwin-19.6.0-x86_64-i386-64bit
== bq version ==
  2.0.75

...

========================================

Unexpected exception in query operation: You have encountered a bug in the BigQuery CLI. Please file a bug report in our public issue tracker:
https://issuetracker.google.com/issues/new?component=187149&template=0
Please include a brief description of the steps that led to this issue, as well as any rows that can be made public from the following information:

bigquery-emulator logged the following.

2022-08-26T03:24:31.949Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-08-26T03:24:31.977Z	INFO	server/middleware.go:41	POST /projects/test/jobs	{"query": "alt=json"}
2022-08-26T03:24:31.982Z	INFO	contentdata/repository.go:145		{"query": "select * from dataset1.table_a where id = ?", "values": ["1"]}
2022-08-26T03:24:31.990Z	INFO	server/middleware.go:41	GET /projects/test/queries/bqjob_r4eb0707bb65b00af_00000182d82e0938_1	{"query": "timeoutMs=0&startIndex=0&alt=json&maxResults=100"}
2022-08-26T03:24:32.097Z	ERROR	server/handler.go:41	internal error	{"error": "INVALID_ARGUMENT: Positional parameters are not supported [at 1:43]"}

In actual Google BigQuery, it works as follows.

$ bq query --use_legacy_sql=false --parameter=:integer:1 --project_id=my-project 'select * from dataset1.table_a where id = ?'
[
  {
    "id": "1", 
    "name": "alice"
  }
]

Is there any option to make Positional Parameters available?

Error while writing timestamp column value to destination table

The emulator fails to write query result with a timestamp column value to the destination table with the error like the following.

googleapi: Error 400: failed to add table data: failed to convert value from {Name: Ordinal:1 Value:1668610120.347600}: failed to convert 1668610120.347600 to time.Time type, jobInternalError

Here's the code to reproduce the error. It uses CURRENT_TIMESTAMP() but I confirmed that the same error occurs when the query refers to a timestamp column of a table.

func TestQueryWithDestination(t *testing.T) {
	const (
		projectName = "test"
		datasetName = "dataset1"
		tableName   = "table_a"
	)

	ctx := context.Background()

	bqServer, err := server.New(server.TempStorage)
	if err != nil {
		t.Fatal(err)
	}
	project := types.NewProject(projectName, types.NewDataset(datasetName))
	if err := bqServer.Load(server.StructSource(project)); err != nil {
		t.Fatal(err)
	}

	testServer := bqServer.TestServer()
	defer func() {
		testServer.Close()
		bqServer.Close()
	}()

	client, err := bigquery.NewClient(
		ctx,
		projectName,
		option.WithEndpoint(testServer.URL),
		option.WithoutAuthentication(),
	)
	if err != nil {
		t.Fatal(err)
	}
	defer client.Close()

	table := client.Dataset(datasetName).Table(tableName)
	if err := table.Create(ctx, &bigquery.TableMetadata{
		Schema: []*bigquery.FieldSchema{
			{Name: "ts", Type: bigquery.TimestampFieldType},
		},
	}); err != nil {
		t.Fatalf("%+v", err)
	}

	query := client.Query("SELECT CURRENT_TIMESTAMP() AS ts")
	query.QueryConfig.Dst = &bigquery.Table{
		ProjectID: projectName,
		DatasetID: datasetName,
		TableID:   table.TableID,
	}
	job, err := query.Run(ctx)
	if err != nil {
		t.Fatal(err) // Fails here.
	}
	status, err := job.Wait(ctx)
	if err != nil {
		t.Fatalf("%+v", err)
	}
	if err := status.Err(); err != nil {
		t.Fatalf("%+v", err)
	}
}

M1/ARM Mac support

I have been trying to use the project for integration testing purposes, but when I try to install it locally ( via go install github.com/goccy/bigquery-emulator/cmd/bigquery-emulator@latest ) or integrating into my code, the process gets stuck. I wonder if this is some kind of problem with CGO and Macs with ARM chips. I see a lot of clang processes being started ( and consuming ~4gb of ram ) but they never finish and I have to force kill them.

Have had any experience running this project with this environment ? Any tips on how to run it ?

I'm open to contributions, but not sure where to start.

Crash querying when the project_id has a hyphen (-) in it

I am seeing this crash running SQL queries when I use the emulator with a project_id value that has a hyphen in it :

google.api_core.exceptions.BadRequest: 400 GET http://0.0.0.0:9050/bigquery/v2/projects/test-project/queries/483be6fb-cb54-40b7-83fc-73e86deb2f46?maxResults=0&location=US&prettyPrint=false: failed to query SELECT `a#1` FROM (SELECT `a` AS `a#1` FROM test-project_my_dataset_my_table): near "-": syntax error

The following script crashes when I use project_id = 'test-project' but does not crash if I use project_id = 'test_project':

from typing import List, Iterator

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def table_exists(self, dataset_id: str, table_id: str) -> bool:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)

        try:
            self.client.get_table(table_ref)
            return True
        except exceptions.NotFound:
            return False

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )


if __name__ == '__main__':
    project_id_ = 'test-project'
    dataset_id_ = 'my_dataset'
    table_id_ = 'my_table'
    schema_ = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    client.create_dataset(dataset_id_)

    client.create_table(dataset_id_, table_id_, schema_)

    # Verify the table now does exist
    if not client.table_exists(dataset_id_, table_id_):
        raise ValueError(
            f"Table already exists: {dataset_id_}.{table_id_}."
        )

    # !! This crashes !!
    client.run_query(f"SELECT * FROM `{project_id_}.{dataset_id_}.{table_id_}`")

The Google Cloud docs do mention that project ids can contain hyphens: https://cloud.google.com/resource-manager/docs/creating-managing-projects.

Thank you!

Nested use of TO_JSON produces incorrect results

I'm seeing incorrect query behavior on nested usages of TO_JSON. For example for this query:

SELECT TO_JSON(
  STRUCT(
    "foo" AS a, 
    TO_JSON(STRUCT("bar" AS c)) AS b
  )
) AS result;

Expected result:

result
{"a":"foo","b":{"c":"bar"}}

Actual result:

result
{"\"foo\"":"foo","zetasqlite_to_json_json(\"zetasqlitestruct:eyJjIjoiYmFyIn0=\",false)":{"c":"bar"}}

This query should also produce the exact same result, but produces a slightly different, still incorrect result:

WITH inner_json AS (
  SELECT TO_JSON(STRUCT("bar" AS c)) AS b
)
SELECT TO_JSON(STRUCT("foo" as a, b)) AS result
FROM inner_json;

Actual result:

result
{"\"foo\"":"foo","b":{"c":"bar"}}

(Looks like you're using the value as the column name instead of the proper column name?)

Thank you!

DELETE request for datasets does not properly delete tables in that dataset when `deleteContents=True`

When I try to delete a dataset with a table in it, it appears that the table is not being deleted properly. When I do this:

  • Create dataset my_dataset
  • Create table my_dataset.my_table
  • Delete my_dataset with deleteContents=True
  • Create dataset my_dataset again
  • Create table my_dataset.my_table again

I see this crash:

2022-08-24T17:58:17.432Z	ERROR	server/handler.go:41	internal error	{"error": "failed to create table CREATE TABLE `my_table` (`a` INT64): failed to exec CREATE TABLE `my_table` (`a` INT64): table `recidiviz-bq-emulator-project_my_dataset_my_table` already exists"}

Here is a script that reproduces the issue:

from typing import List

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def dataset_exists(self, dataset_id: str) -> bool:
        dataset_ref = self._dataset_ref(dataset_id)
        try:
            self.client.get_dataset(dataset_ref)
            return True
        except exceptions.NotFound:
            return False

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def delete_table(self, dataset_id: str, table_id: str) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        self.client.delete_table(table_ref)

    def delete_dataset(self, dataset_id: str) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        return self.client.delete_dataset(dataset_ref, delete_contents=True)

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )


if __name__ == '__main__':
    project_id_ = 'my-project'
    dataset_id_ = 'my_dataset'
    table_id_ = 'my_table'
    schema_1 = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id_)
    client.create_table(dataset_id_, table_id_, schema_1)

    # Delete the whole dataset
    client.delete_dataset(dataset_id_)

    if client.dataset_exists(dataset_id_):
        raise ValueError("Dataset still exists after it has been deleted")

    # Creating the dataset succeeds
    client.create_dataset(dataset_id_)

    # !! This crashes: table `recidiviz-bq-emulator-project_my_dataset_my_table` already exists
    client.create_table(dataset_id_, table_id_, schema_1)

Thank you!

Cannot update the schema of already existed table

Code snippet in kotlin :

val bigQuery = BigQueryOptions.newBuilder()
            .setHost("http://0.0.0.0:9050")
            .setProjectId("test")
            .setCredentials(NoCredentials.getInstance())
            .build().service

        val tableDefinition: TableDefinition = StandardTableDefinition.newBuilder()
            .setSchema(Schema.of(
                Field.of("id", LegacySQLTypeName.STRING)
            )).build()

        val tableInfo = TableInfo.of(TableId.of("internal", "test"), tableDefinition)

        bigQuery.update(tableInfo)      // first call it with .create() so it will create the table 

so I have the issue:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 500 Internal Server Error
POST http://0.0.0.0:9050/bigquery/v2/projects/test/datasets/internal/tables/test?prettyPrint=false
{
  "code" : 500,
  "errors" : [ {
    "location" : "",
    "message" : "unexpected request path: /bigquery/v2/projects/test/datasets/internal/tables/test",
    "reason" : "internalError",
    "debugInfo" : ""
  } ],
  "message" : "unexpected request path: /bigquery/v2/projects/test/datasets/internal/tables/test"
}

Add compatibility with the BigQuery JDBC driver

Hi @goccy,

We use the BigQuery JDBC driver (see documentation) in our project and would like to add integration tests using bigquery-emulator.

It's possible to specify a custom endpoint for the driver using the undocumented RootURL argument, e.g.:

jdbc:bigquery://http://localhost/:9050;RootURL=http://localhost:9050;ProjectId=test;OAuthType=2;OAuthAccessToken=<token>

During our tests we found the following issues caused by the JDBC driver's behavior. We would like to contribute to your project and fix these issues if that's ok for you.

Path prefix

The driver always uses path prefix /bigquery/v2, e.g. POST /bigquery/v2/projects/test/queries. This is not configurable.
We propose to add a command line switch that allows overriding the default prefix "", e.g.:

--path-prefix="/bigquery/v2"

Content-Encoding gzip

The driver sends payload with Content-Encoding: gzip. This is not configurable.
We propose to add a middleware function to server.go that unzips the request payload if necessary.
This would require adding an additional dependency, e.g. github.com/klauspost/compress/gzip.

Job ID

When sending a SQL query, the JDBC driver sends request POST /bigquery/v2/projects/test/queries with the following payload:

{"dryRun":false,"maxResults":10000,"query":"select 2*5","timeoutMs":10000,"useLegacySql":false,"useQueryCache":true}

The emulator replies with this response:

{"jobComplete":true,"jobReference":{"projectId":"test"},"rows":[{"f":[{"v":"10"}]}],"schema":{"fields":[{"name":"$col1","type":"INTEGER"}]},"totalRows":"1"}

This causes the following exception in the JDBC driver which is caused by the missing jobId in jobReference.

java.sql.SQLException: [Simba][BigQueryJDBCDriver](100030) Error trying to obtain Google Bigquery object.
	at com.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.<init>(Unknown Source)
	at com.simba.googlebigquery.googlebigquery.dataengine.BQSQLExecutor.execute(Unknown Source)
	at com.simba.googlebigquery.jdbc.common.SStatement.executeNoParams(Unknown Source)
	at com.simba.googlebigquery.jdbc.common.BaseStatement.executeQuery(Unknown Source)
	at com.exasol.adapter.dialects.bigquery.BigQueryJdbcTest.test(BigQueryJdbcTest.java:23)

The emulator uses r.queryRequest.RequestId as jobId (see handler.go) which is nil in this case.

We propose to generate a random ID in case r.queryRequest.RequestId is nil.

If these changes are OK for you, we would like create separate pull requests for each of these issues, or discuss how to solve them in a different way.

Thank you very much for your support!

Several issues on OS X

Hi, I REALLY want to use your emulator (also, thank you for taking the initiative to put this together!) but I am having several issues with it.

  1. I tried using this exact configuration as described here (#29 (comment)) but I see:
bigquery_1         | unknown flag `c'
spdb_bigquery_1 exited with code 1
  1. When trying to use --data-from-yaml when starting the docker container, I always get an error about missing file:
> ls -l data.yaml
-rw-r--r--    1 cb871d    10050   316 Sep 10 17:46 data.yaml
> docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test --data-from-yaml=./data.yaml
open ./data.yaml: no such file or directory
  1. This issue magically disappeared, I am not sure why, but previously when I successfully started the docker container, I would just keep getting SQL connection errors.

I'd start the container with:

> docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test --port=9050 --log-level=debug --dataset=dataset1
[bigquery-emulator] listening at 0.0.0.0:9050

And any http request, e.g., http://0.0.0.0:9050/bigquery/v2/projects would get the following response:
"sql: connection is already closed"

Is there some prerequisites perhaps not listed?

Unable to compile and the big query emulator from Go Program

when i include the package for the big query emulator, i am getting the following error

github.com/goccy/go-zetasql/internal/ccall/go-absl/time/go_internal/cctz/time_zone

bind.cc:97:10: fatal error: 'absl/time/internal/cctz/include/cctz/time_zone.h' file not found

Any resolution for this would be greatly helpful. Thanks

ARRAY_AGG(... IGNORE NULLS) crash when null value is in first row

When I run a query like this:

SELECT b, ARRAY_AGG(a IGNORE NULLS) AS a_list
FROM `project.my_dataset.my_table`
GROUP BY b;

It works if any null values in column a are NOT the first row. However, if the first inserted row has a null value for a, I am seeing this crash:

Traceback (most recent call last):
  File ".../test.py", line 107, in <module>
    print(list(result))
  File ".../lib/python3.9/site-packages/google/api_core/page_iterator.py", line 209, in _items_iter
    for item in page:
  File ".../lib/python3.9/site-packages/google/api_core/page_iterator.py", line 131, in __next__
    result = self._item_to_value(self._parent, item)
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/table.py", line 2684, in _item_to_row
    _helpers._row_tuple_from_json(resource, iterator.schema),
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 429, in _row_tuple_from_json
    row_data.append(_field_from_json(cell["v"], field))
  File ".../lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 403, in _field_from_json
    return [converter(item["v"], field) for item in resource]
TypeError: 'NoneType' object is not iterable

Here is the script I am using to reproduce:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def get_table(self, dataset_id: str, table_id: str) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        return self.client.get_table(table_ref)

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )

    def load_data(
            self,
            dataset_id: str,
            table_id: str,
            rows: List[Dict[str, Any]],
    ) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        table = self.client.get_table(table_ref)
        errors = self.client.insert_rows(table, rows)
        if errors:
            raise RuntimeError(
                f"Failed to insert rows into {dataset_ref.dataset_id}.{table_id}:\n"
                + "\n".join(str(error) for error in errors)
            )


if __name__ == '__main__':
    project_id_ = 'recidiviz-bq-emulator-project'
    dataset_id = 'my_dataset'
    table_id = 'my_table'
    schema = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
        bigquery.SchemaField(
            "b",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="REQUIRED",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id)
    client.create_table(dataset_id, table_id, schema)

    client.load_data(dataset_id, table_id, [{"a": None, "b": 2}, {"a": 1, "b": 2}])

    result = client.run_query(f"""
SELECT b, ARRAY_AGG(a IGNORE NULLS) AS a_list
FROM `{project_id_}.{dataset_id}.{table_id}`
GROUP BY b;
""")

    print(list(result))

If you change the load_data line to client.load_data(dataset_id, table_id, [{"a": 1, "b": 2}, {"a": None, "b": 2}]), the query works perfectly.

Thank you!

Bigquery Query Like ORDER BY GROUP BY FORMAT_DATE count and subquery not working on bigquery emulator.

Hey @goccy

I have written one query regarding BIG QUERY Like the below for inserting data and fetching data
Whenever insert data it will work fine but when I tried to fetch data using group by or count or order by or using subquery it will not give me a proper record and it will give me error.

This is below code I tried.

type MockUsageEvent struct {
	Key       string `json:"key" binding:"required" conform:"trim"`
	Product   string `json:"product" conform:"trim"`
	Version   string `json:"version" conform:"trim"`
	Platform  string `json:"platform" conform:"trim"`
	Event     string `json:"event" binding:"required,oneofCI=activation" conform:"trim,lower"`
	InstallId string `json:"installid" conform:"trim"`
	Created   string `json:"created"`
}
func (i *MockUsageEvent) Save() (map[string]bigquery.Value, string, error) {
	return map[string]bigquery.Value{
		"key":       i.Key,
		"product":   i.Product,
		"version":   i.Version,
		"platform":  i.Platform,
		"event":     i.Event,
		"installid": i.InstallId,
		"created":   i.Created,
	}, bigquery.NoDedupeID, nil
}
ctx := context.Background()
	meta := &bigquery.DatasetMetadata{
		Location: "US", // See https://cloud.google.com/bigquery/docs/locations
	}
	if err := bigQueryClient.Dataset("dataset").Create(ctx, meta); err != nil {
		return
	}

	sampleSchema := bigquery.Schema{
		{Name: "key", Type: bigquery.StringFieldType},
		{Name: "product", Type: bigquery.StringFieldType},
		{Name: "version", Type: bigquery.StringFieldType},
		{Name: "platform", Type: bigquery.StringFieldType},
		{Name: "event", Type: bigquery.StringFieldType},
		{Name: "installid", Type: bigquery.StringFieldType},
		{Name: "created", Type: bigquery.StringFieldType},
	}

	metaData := &bigquery.TableMetadata{
		Schema:         sampleSchema,
		ExpirationTime: time.Now().Add(time.Duration(1*60) * time.Second), // Table will be automatically deleted in 1 day.
	}
	tableRef := bigQueryClient.Dataset("dataset").Table("table")

	if err := tableRef.Create(ctx, metaData); err != nil {
		return
	}

	inserter := bigQueryClient.Dataset("dataset").Table("table").Inserter()

	items := []*MockUsageEvent{
		{Key: "1bjCljWkq6tinyAMuu0tEAhB80", Product: "Phrased Flintstones0", Version: "Phrased Flintstones0", Platform: "Phrased Flintstones0", Event: "license-request", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB81", Product: "Phrased Flintstones1", Version: "Phrased Flintstones1", Platform: "Phrased Flintstones1", Event: "camera", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB82", Product: "Phrased Flintstones2", Version: "Phrased Flintstones2", Platform: "Phrased Flintstones2", Event: "activation", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB83", Product: "Phrased Flintstones3", Version: "Phrased Flintstones3", Platform: "Phrased Flintstones3", Event: "textsearch", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB84", Product: "Phrased Flintstones4", Version: "Phrased Flintstones4", Platform: "Phrased Flintstones4", Event: "barcode", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB85", Product: "Phrased Flintstones5", Version: "Phrased Flintstones5", Platform: "Phrased Flintstones5", Event: "ocr", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB86", Product: "Phrased Flintstones6", Version: "Phrased Flintstones6", Platform: "Phrased Flintstones6", Event: "fooddetection", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB87", Product: "Phrased Flintstones7", Version: "Phrased Flintstones7", Platform: "Phrased Flintstones7", Event: "classifications", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB88", Product: "Phrased Flintstones8", Version: "Phrased Flintstones8", Platform: "Phrased Flintstones8", Event: "customdetection", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB89", Product: "Phrased Flintstones9", Version: "Phrased Flintstones9", Platform: "Phrased Flintstones9", Event: "detectinimage", InstallId: "1", Created: "2022-10-11T17:31:39"},
		{Key: "1bjCljWkq6tinyAMuu0tEAhB89", Product: "Phrased Flintstones9", Version: "Phrased Flintstones10", Platform: "Phrased Flintstones10", Event: "packagedfood", InstallId: "1", Created: "2022-10-11T17:31:39"},
	}

	if err := inserter.Put(ctx, items); err != nil {
		fmt.Println("err=>", err)
		return
	}

While I try to fetch the record using the below query it is not working

HERE THIS QUERY WILL NOT WORKING

SELECT yearMonth, count, key
			FROM (
					SELECT key, FORMAT_DATE('%Y-%m', PARSE_DATETIME('%Y-%m-%d %H:%M:%S', STRING(created))) AS yearMonth, count(distinct installid) as count
					FROM test.dataset.table
					WHERE created BETWEEN '2022-09-14' AND DATE_ADD(DATE '2022-10-14', INTERVAL 1 DAY)
						 AND installid IS NOT NULL AND length(installid) > 0 
					GROUP BY key, yearMonth
					ORDER BY yearMonth
				) AS counts
		 WHERE key IN ("1bjCljWkq6tinyAMuu0tEAhB80","1bjCljWkq6tinyAMuu0tEAhB82","1bjCljWkq6tinyAMuu0tEAhB86") ORDER BY key 

It will give me the error panic: runtime error: invalid memory address or nil pointer dereference [recovered]

Whenever I run this SIMPLE query it will give me 11 records and its works fine but whenever I run the above query it would not work yet.

rows := bigQueryClient.Query(`
			SELECT *
			FROM ` + "`test.dataset.table`" + `
	`)

	it, err := rows.Read(ctx)
	fmt.Println("TOTAL ROWS COUNT", it.TotalRows) 

// OUTPUT : TOTAL ROWS COUNT 11

Please suggest me proper solution for the above query
Thank you @goccy

Issues deleting and recreating a table with a different schema

I am seeing a handful of issues on version v0.1.6 when I do the following:

  • Create a table at location my_dataset.my_table with column a
  • Delete the table
  • Create a table again at location my_dataset.my_table with column b

Both insert and query operations seem to think that the table has the original schema, even though the GET table operation returns the correct info.

Here is a script that reproduces the issues:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def get_table(self, dataset_id: str, table_id: str) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        return self.client.get_table(table_ref)

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )

    def load_data(
            self,
            dataset_id: str,
            table_id: str,
            rows: List[Dict[str, Any]],
    ) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        table = self.client.get_table(table_ref)
        errors = self.client.insert_rows(table, rows)
        if errors:
            raise RuntimeError(
                f"Failed to insert rows into {dataset_ref.dataset_id}.{table_id}:\n"
                + "\n".join(str(error) for error in errors)
            )

    def delete_table(self, dataset_id: str, table_id: str) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        self.client.delete_table(table_ref)


if __name__ == '__main__':
    project_id_ = 'my-project'
    dataset_id_ = 'my_dataset'
    table_id_ = 'my_table'
    schema_1 = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id_)
    client.create_table(dataset_id_, table_id_, schema_1)

    # Delete that table
    client.delete_table(dataset_id_, table_id_)

    schema_2 = [
        bigquery.SchemaField(
            "b",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    # Create the same table again, this time with a different schema
    client.create_table(dataset_id_, table_id_, schema_2)

    table = client.get_table(dataset_id_, table_id_)

    # The get_table() call returns the correct, new schema :-)
    if table.schema != schema_2:
        raise ValueError(f"Found incorrect table schema: {table.schema}")

    # !! This crashes: INVALID_ARGUMENT: Column b is not present in table  !!
    client.load_data(dataset_id_, table_id_, [{"b": 1}])

    # !! This crashes: INVALID_ARGUMENT: Unrecognized name: b !!
    client.run_query(f"SELECT b FROM `{project_id_}.{dataset_id_}.{table_id_}`")

Thank you!

Request: Support for JSON / TO_JSON

I would love to be able to run a query like this against the emulator:

SELECT TO_JSON([1, 2, 3]) as a;

Which would return a result like this:

a
"[1,2,3]"

Right now, that query throws an error:

ERROR	server/handler.go:41	internal error	{"error": "INVALID_ARGUMENT: Function not found: TO_JSON [at 1:8]"}

Thank you!

Crash when re-creating a table after it has been deleted

Hi there! Thank you so much for your work on this project. It has the potential to be incredibly useful for my work.

I was trying to integrate the emulator into some Python unit tests and I think I have found a bug related to table creation. In particular, if I create a table, delete the table, then create a table at the same location, I see a crash like this:

google.api_core.exceptions.BadRequest: 400 POST http://0.0.0.0:9050/bigquery/v2/projects/test_project/datasets/my_dataset/tables?prettyPrint=false: failed to create table CREATE TABLE `my_table` (`a` INT64): failed to exec CREATE TABLE `my_table` (`a` INT64): table `test_project_my_dataset_my_table` already exists.

Here is the script I'm using to reproduce the issue:

from typing import List, Iterator

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
        self,
        dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def table_exists(self, dataset_id: str, table_id: str) -> bool:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)

        try:
            self.client.get_table(table_ref)
            return True
        except exceptions.NotFound:
            return False

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def delete_table(self, dataset_id: str, table_id: str) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        self.client.delete_table(table_ref)

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )


if __name__ == '__main__':
    project_id_ = 'test_project'
    dataset_id_ = 'my_dataset'
    table_id_ = 'my_table'
    schema_ = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    client.create_dataset(dataset_id_)

    # Verify table does not yet exist
    if client.table_exists(dataset_id_, table_id_):
        raise ValueError(
            f"Table already exists: {dataset_id_}.{table_id_}."
        )

    # Create the table
    client.create_table(dataset_id_, table_id_, schema_)

    # Verify the table now does exist
    if not client.table_exists(dataset_id_, table_id_):
        raise ValueError(
            f"Table does not exist: {dataset_id_}.{table_id_}."
        )

    # Verify that we can query the table successfully
    client.run_query(f"SELECT * FROM `{project_id_}.{dataset_id_}.{table_id_}`")

    # Delete the table
    client.delete_table(dataset_id_, table_id_)

    # Verify table does not yet exist (according to the API)
    if client.table_exists(dataset_id_, table_id_):
        raise ValueError(
            f"Table already exists: {dataset_id_}.{table_id_}."
        )

    # Create the table a second time - !!! This crashes, saying table exists !!!
    client.create_table(dataset_id_, table_id_, schema_)

Thanks!

Empty result instead of runtime error when ARRAY_AGG has null values

When I run this query in the BigQuery UI, I get a runtime execution error: 'Array cannot have a null element; error in writing field a_list''

SELECT b, ARRAY_AGG(a) AS a_list
FROM UNNEST([
   STRUCT(1 AS a, 2 AS b),
   STRUCT(NULL AS a, 2 AS b)
])
GROUP BY b;

However, I do an ARRAY_AGG the same data in the emulator, the row with NULL values in the ARRAY_AGG result is just silently dropped with no runtime error.

Here is the script that reproduces:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def get_table(self, dataset_id: str, table_id: str) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        return self.client.get_table(table_ref)

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )

    def load_data(
            self,
            dataset_id: str,
            table_id: str,
            rows: List[Dict[str, Any]],
    ) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        table = self.client.get_table(table_ref)
        errors = self.client.insert_rows(table, rows)
        if errors:
            raise RuntimeError(
                f"Failed to insert rows into {dataset_ref.dataset_id}.{table_id}:\n"
                + "\n".join(str(error) for error in errors)
            )


if __name__ == '__main__':
    project_id_ = 'recidiviz-bq-emulator-project'
    dataset_id = 'my_dataset'
    table_id = 'my_table'
    schema = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
        bigquery.SchemaField(
            "b",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="REQUIRED",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id)
    client.create_table(dataset_id, table_id, schema)

    client.load_data(dataset_id, table_id, [{"a": 1, "b": 1}, {"a": None, "b": 1}, {"a": 2, "b": 2}])

    # This query does not crash and returns an empty list, but should throw a runtime error.
    result = client.run_query(f"""
SELECT b, ARRAY_AGG(a) AS a_list
FROM `{project_id_}.{dataset_id}.{table_id}`
GROUP BY b;
""")

(Note: I couldn't actually test with the UNNEST statement above because of #30)

Thank you!

Emulator does not handle two tables with same name, different datasets

I am seeing a number of issues when I create two tables that have the same table_id but different dataset_ids and schemas.

If I create table my_dataset_1.my_table with single column a and table my_dataset_2.my_table with single column b, the emulator seems to be confused about which table/schema to use.

If I try to load data into my_dataset_2.my_table after creating both tables, I get this error:

google.api_core.exceptions.BadRequest: 400 POST http://0.0.0.0:9050/bigquery/v2/projects/test_project/datasets/my_dataset_2/tables/my_table/insertAll?prettyPrint=false: INVALID_ARGUMENT: Column b is not present in table my_table [at 1:20]

If I try to load data into my_dataset_1.my_table after adding creating both tables, I get this error:

google.api_core.exceptions.BadRequest: 400 POST http://0.0.0.0:9050/bigquery/v2/projects/test_project/datasets/my_dataset_2/tables/my_table/insertAll?prettyPrint=false: unknown column name a

Finally, if I query the table information of my_dataset_1.my_table after creating both tables, it returns info for my_dataset_2.my_table.

Here is the script I have been using to reproduce:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions


class BigQueryClientWrapper:
    def __init__(self, project_id: str) -> None:

        self.project_id = project_id
        self.client = bigquery.Client(
            project_id,
            client_options=ClientOptions(api_endpoint="http://0.0.0.0:9050"),
            credentials=AnonymousCredentials(),
        )

    def create_dataset(
            self,
            dataset_id: str
    ) -> bigquery.Dataset:
        dataset = bigquery.Dataset(self._dataset_ref(dataset_id))
        return self.client.create_dataset(dataset)

    def create_table(
            self,
            dataset_id: str,
            table_id: str,
            schema: List[bigquery.SchemaField],
    ) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = bigquery.TableReference(dataset_ref, table_id)
        table = bigquery.Table(table_ref, schema)
        return self.client.create_table(table)

    def get_table(self, dataset_id: str, table_id: str) -> bigquery.Table:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        return self.client.get_table(table_ref)

    def run_query(self, query_str: str) -> Iterator[bigquery.Row]:

        query_job =  self.client.query(
            query=query_str,
            location="US",
            job_config=bigquery.QueryJobConfig(),
        )

        return query_job.result()

    def _dataset_ref(self, dataset_id: str) -> bigquery.DatasetReference:
        return bigquery.DatasetReference.from_string(
            dataset_id, default_project=self.project_id
        )


    def load_data(
        self,
        dataset_id: str,
        table_id: str,
        rows: List[Dict[str, Any]],
    ) -> None:
        dataset_ref = self._dataset_ref(dataset_id)
        table_ref = dataset_ref.table(table_id)
        table = self.client.get_table(table_ref)
        errors = self.client.insert_rows(table, rows)
        if errors:
            raise RuntimeError(
                f"Failed to insert rows into {dataset_ref.dataset_id}.{table_id}:\n"
                + "\n".join(str(error) for error in errors)
            )


if __name__ == '__main__':
    project_id_ = 'test_project'
    dataset_id_1 = 'my_dataset'
    table_id_1 = 'my_table'
    schema_1 = [
        bigquery.SchemaField(
            "a",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    dataset_id_2 = 'my_dataset_2'
    schema_2 = [
        bigquery.SchemaField(
            "b",
            field_type=bigquery.enums.SqlTypeNames.INTEGER.value,
            mode="NULLABLE",
        ),
    ]

    client = BigQueryClientWrapper(project_id=project_id_)

    # Create the first table
    client.create_dataset(dataset_id_1)
    client.create_table(dataset_id_1, table_id_1, schema_1)

    # Create a second table with the same table name in a different dataset, with a
    # different schema.
    client.create_dataset(dataset_id_2)
    client.create_table(dataset_id_2, table_id_1, schema_2)

    # Querying the proper columns works without crashing.
    client.run_query(f"SELECT a FROM `{project_id_}.{dataset_id_1}.{table_id_1}`")
    client.run_query(f"SELECT b FROM `{project_id_}.{dataset_id_2}.{table_id_1}`")

    try:
        # This throws, as expected: 'INVALID_ARGUMENT: Unrecognized name: a'
        client.run_query(f"SELECT a FROM `{project_id_}.{dataset_id_2}.{table_id_1}`")
    except Exception as e:
        print(e)

    # !! This returns info for dataset_id_2.table_id_1 !!
    table_1 = client.get_table(dataset_id_1, table_id_1)

    try:
        # !! This throws: 'unknown column name a' !!
        client.load_data(dataset_id_1, table_id_1, [{"a": 1}])
    except Exception as e:
        print(e)

    try:
        # !! This throws: 'Column b is not present in table my_table' !!
        client.load_data(dataset_id_2, table_id_1, [{"b": 1}])
    except Exception as e:
        print(e)

Thank you!

UPDATE statement is not working

Hello, first of all, I would like to congrats for this amazing project.

I'm trying to do a simple UPDATE using the python example provided in this repo:

from typing import List, Iterator, Dict, Any

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery, exceptions

def main():
    client = bigquery.Client(
        "test",
        client_options=ClientOptions(api_endpoint="http://bigquery:9050"),
        credentials=AnonymousCredentials(),
    )
    job = client.query(
        # query="SELECT * FROM dataset1.table_a;",
        query="UPDATE dataset1.table_a SET name = 'foo' WHERE id = 1;",
        job_config=bigquery.QueryJobConfig(),
    )
    print(list(job.result()))

main()

The program freezes for some reason. Can someone help me? Is this normal?

Thanks

Recommended docker-compose health check

Is there a recommended health check for use with docker?

# docker-compose.yaml
version: '3'
services:
  bigquery:
    container_name: bigquery-local
    image: ghcr.io/goccy/bigquery-emulator:latest
    healthcheck:
      test: # HERE
      interval: 5s
      timeout: 5s
      retries: 5

thanks!

Response format for Timestamp type data is different from actual Big Query?

Using the following schema and data.

id (INTEGER) name (STRING) created_at (TIMESTAMP)
1 alice 2020-01-01T00:00:00+09:00
2 bob 2020-06-15T09:00:00+09:00

The following image shows data on Big Query.

image

The emulator is populated with data using the following yaml.

projects:
- id: test
  datasets:
    - id: dataset1
      tables:
        - id: table_a
          columns:
            - name: id
              type: INTEGER
            - name: name
              type: STRING
            - name: created_at
              type: TIMESTAMP
          data:
            - id: 1
              name: alice
              created_at: 2020-01-01T00:00:00+09:00 
            - id: 2
              name: bob
              created_at: 2020-06-15T09:00:00+09:00

I can get the following response from Big Query. (Using bq command line tool)

$ bq query --use_legacy_sql=false --project_id=my-project 'select * from dataset1.table_a'
[
  {
    "created_at": "2019-12-31 15:00:00", 
    "id": "1", 
    "name": "alice"
  }, 
  {
    "created_at": "2020-06-15 00:00:00", 
    "id": "2", 
    "name": "bob"
  }
]

But I can get the following response from the emulator.

$ bq --api http://0.0.0.0:9050 query --use_legacy_sql=false --project_id=test 'select * from dataset1.table_a'
[
  {
    "created_at": "<date out of range for display>", 
    "id": "1", 
    "name": "alice"
  }, 
  {
    "created_at": "<date out of range for display>", 
    "id": "2", 
    "name": "bob"
  }
]

The emulator outputs the following logs.

2022-08-30T03:48:52.030Z	INFO	server/middleware.go:41	GET /discovery/v1/apis/bigquery/v2/rest	{"query": ""}
2022-08-30T03:48:52.061Z	INFO	server/middleware.go:41	POST /projects/test/jobs	{"query": "alt=json"}
2022-08-30T03:48:52.065Z	INFO	contentdata/repository.go:144		{"query": "select * from dataset1.table_a", "values": []}
2022-08-30T03:48:52.066Z	DEBUG	contentdata/repository.go:207	query result	{"rows": [[1,"alice","2019-12-31T15:00:00Z"],[2,"bob","2020-06-15T00:00:00Z"]]}
2022-08-30T03:48:52.076Z	INFO	server/middleware.go:41	GET /projects/test/queries/bqjob_r95c129985562ba2_00000182ecddc0bb_1	{"query": "timeoutMs=0&startIndex=0&alt=json&maxResults=100"}

Also, (I'm using a PHP client.) I can get microtime format response like 1577836800.0 from Big Query whereas I can get datetime format response like 2020-01-01 00:00:00 +0000 UTC from the emulator.

Because of this difference, I am having trouble parsing to Timestamp type objects on the client.

Fetching Query Job Results with Java BigQuery Library Fails.

Hi thanks for an awesome library!

It seems that the java lib expects that a created job populates destination table even for a query, since it uses it to actually list data: source. Right now the emulator returns anymous for dataset and table for query jobs: source.

exception:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
GET http://localhost:60034/bigquery/v2/projects/test-project/datasets/anonymous/tables/anonymous/data?prettyPrint=false
{
  "code" : 404,
  "errors" : [ {
    "location" : "",
    "message" : "",
    "reason" : "notFound",
    "debugInfo" : ""
  } ],
  "message" : ""
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.listTableData(HttpBigQueryRpc.java:525)
	... 105 more

kotlin code to repro:

val datasetId = DatasetId.of("test-project", "ds")
bigQueryClient.create(DatasetInfo.of(datasetId))
val tableId = TableId.of("ds", "test-table")
bigQueryClient.create(
  TableInfo.of(
    tableId,
    StandardTableDefinition.of(Schema.of(Field.of("test_field", LegacySQLTypeName.STRING)))
  )
)

bigQueryClient.insertAll(InsertAllRequest.newBuilder(tableId).addRow("row1", mapOf("test_field" to "test_val")).build())

bigQueryClient
  .create(JobInfo.of(QueryJobConfiguration.of("SELECT * FROM `test-project.ds.test-table`")))
  .getQueryResults()

Sample Docker Compose File & Docker File

Hello Team,

Thanks for providing Big Query Emulator which will definitely help in local testing.

I am trying to set this up on my local machine using docker. I am stuck and getting error like "project flag is not set" & "unknown flag c."
Could you please provide a sample Docker Compose File & Docker File which I can use to resolve the issue. (Referred to other issue raised where it is said to create new image from provided one using dockerfile)

Note :- I am using docker image to start emulator & want to use python as base for using big query emulator

Thanks

Cooperation with google's java library google-cloud-bigquery

Hi! Thank you for creating bigquery-emulator. This gives hope for regular testing experience which we all need :)

I have tried to use it with google-cloud-bigquery java library from Google and encountered some problems. I hope that it will be possible to include fixes in bigquery-emulator.

I have made a simple example which shows problems in here: SampleTest.kt

  1. Handling gziped requests

When library sends requests with Content-Encoding: gzip header emulator is not handling this properly which gives an error:

invalid character '\x1f' looking for beginning of value
com.google.cloud.bigquery.BigQueryException: invalid character '\x1f' looking for beginning of value
	at app//com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
	at app//com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.create(HttpBigQueryRpc.java:170)
	at app//com.google.cloud.bigquery.BigQueryImpl$1.call(BigQueryImpl.java:269)
	at app//com.google.cloud.bigquery.BigQueryImpl$1.call(BigQueryImpl.java:266)
	at app//com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at app//com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at app//com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at app//com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:265)
	at app//info.szadkowski.bqissue.SampleTest.create dataset and table(SampleTest.kt:41)

And in logs of bigquery-emulator can be seen:

2022-10-31T13:33:43.563+0100	ERROR	server/handler.go:608	invalid	{"error": "invalid: invalid character '\\x1f' looking for beginning of value"}

\\x1f is a first byte when request is gzipped.

Compression of requests cannot be turned off in google-cloud-bigquery, but on debug this behaviour can be altered by putting break point in com.google.api.client.http.HttpRequest:889 and evaluating encoding=null. This allowed me to encounter next issue.

  1. Missing type in response for table creation

When request is sent without compression there is another issue which appears. google-cloud-bigquery expects "type" key in response which informs what kind of "table" is it. This key is not sent by library by itself.

Empty enum constants not allowed.
java.lang.IllegalArgumentException: Empty enum constants not allowed.
	at com.google.cloud.StringEnumType.valueOf(StringEnumType.java:66)
	at com.google.cloud.bigquery.TableDefinition$Type.valueOf(TableDefinition.java:102)
	at com.google.cloud.bigquery.TableDefinition.fromPb(TableDefinition.java:159)
	at com.google.cloud.bigquery.TableInfo$BuilderImpl.<init>(TableInfo.java:195)
	at com.google.cloud.bigquery.Table.fromPb(Table.java:630)
	at com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:291)
	at info.szadkowski.bqissue.SampleTest.create dataset and table(SampleTest.kt:48)
	...

I would really appreciate any assistance.

Crash when using DATE/NULL inside of an UNNEST statement

I'm seeing a crash when I try to use a date inside of a STRUCT in UNNEST statement.

For reference, this query does NOT crash:

SELECT DATE(2022, 1, 1) AS a, b 
FROM UNNEST([
  STRUCT(1 AS b)
]);

But this very similar query that should produce the same results does:

SELECT a, b 
FROM UNNEST([
  STRUCT(DATE(2022, 1, 1) AS a, 1 AS b)
]);

This is the error log:

2022-08-29T23:38:03.026Z	ERROR	server/handler.go:40	internal error	{"error": "failed to query SELECT zetasqlite_get_struct_field_date(`$unnest1#1`, 0) AS `a#2`,zetasqlite_get_struct_field_int64(`$unnest1#1`, 1) AS `b#3` FROM (SELECT json_each.value AS `$unnest1#1` FROM json_each(zetasqlite_decode_array_string(zetasqlite_make_array_array(zetasqlite_make_struct_struct('zetasqlite_date_date(2022,1,1)',zetasqlite_date_date(2022,1,1),'1',1))))): no such function: zetasqlite_make_array_array"}

I also see the same crash when using a different date function:

SELECT a, b 
FROM UNNEST([
  STRUCT(PARSE_DATE('%Y-%m-%d', '2022-01-01') AS a, 1 AS b)
]);

Thank you!

Support Table valued Functions

Would it be possible to provide support for table-valued functions? In particular, I am trying to use the change history feature of bigquery.

To reproduce the issue:

Run the big query emulator like the example in the example docs.

./bigquery-emulator --project=test --data-from-yaml=./server/testdata/data.yaml

Then run the change history query against the emulator with

bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM APPENDS(TABLE dataset1.table_a, NULL, NULL)"

which results in this error:

BigQuery error in query operation: Error processing job
'test:bqjob_r428ff2cbc7cf45e0_00000184cf2e3298_1': failed to analyze:
INVALID_ARGUMENT: Table-valued functions are not supported [at 1:15]

Also, thanks so much for your work on the emulator! i really appreciate it.

Creating and dropping and recreating tables

In our automated test suite, we create tables in the setUp method and drop them again in the tearDown method. And then they will get recreated in the setUp for the next test. The emulator doesn't seem to be able to drop and recreate a table. The drop command works, but then the next create command throws this error:

ERROR server/handler.go:2037 duplicate {"error": "duplicate: table asset1669896670005735815: table is already created"}

[Question] Does emulator support streaming data with BigQuery Storage Write API?

Hey,
Thanks for creating the bigquery emulator.

Can i use the BigQuery Storage Write API?

https://cloud.google.com/python/docs/reference/bigquerystorage/latest/google.cloud.bigquery_storage_v1.services.big_query_write.BigQueryWriteClient#google_cloud_bigquery_storage_v1_services_big_query_write_BigQueryWriteClient_create_write_stream

I tried to create a write stream in Python for the emulator, and when creating it the execution get stuck.

The emulator was deployed on a docker container

Thanks

Using the emulator with Apache Beam's BigQuery IOConnector

I tried to figure out how to use the Emulator with Apache Beam's BigQuery IOConnector. But the connector seems to have hardwired flow that it downloads data first to GCP and then loads it from there. I couldn't see any way to insert the Emulator into it.

Has anyone tried this and figured out how to make it work?

Will also post this question on the Apache Beam mailing list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.