GithubHelp home page GithubHelp logo

martinsahlen / bq-utils Goto Github PK

View Code? Open in Web Editor NEW
13.0 5.0 1.0 7.15 MB

Utitilties for BigQuery such as downloading table / query to csv/ndjson/excel/gsheet or new table using iterators for a low memory footprint.

Go 100.00%

bq-utils's Introduction

bq-utils

TLDR: Export BigQuery tables and queries to csv, ndjson, Excel and Google Sheets (!), supporting Excel/Google Sheet files with multiple sheets for a mix of full tables and ad-hoc queries. If you provide a filename starting with gs:// it will upload the file to the provided GCP Storage bucket.

It also supports build your own "whatever" using the code. The CLI is just composed of the functions and components contained in the bqutils package. In does some of the same things as bq, and some more.

It also supports use cases such as getting data from biguqery and adding columns, and then uploading it to another table. This can be useful in cases where you need to perform a lookup based on some data in a table, such as adding a country code based on latitude / longitude fields.

Install

go get github.com/MartinSahlen/bq-utils

Background and motivation

So, you know the feeling when the sales people ask you to "just grab some data"? Or, many times I have simple wanted to grab a set of data to play around with in a frontend / visualisation or some other analytics tool like a graph database.

Assuming your company is using BigQuery for yo' data and analytics needs, this is an easy task. Just launch the web UI and do the query. You can then download it as csv / ndjson file as well as exporting it to a Google sheet. Or you can use the bq command line tool from

There are however, some limitations to this. Let's say, for instance that the query result is too big to save or download. Then the query must be saved (or you can use the temporary table) and exported to a GCS bucket, after which you must download and upload it to a sheet.... You get it. Also, What if somebody wants an excel file with multiple sheets?

What if I told you that you that there is a tool that allows you to do run a query / scan a table and dump it to your local file system. You can even run multiple queries / table scans and have them appear in different sheets in an excel file. I present: bq-utils, a package for working with ad-hoc biguquery data in a friendlier way.

Overview

bq-utils is a CLI as well as a set of functions that allow you to do stuff with bigquery such as

  • Write a table or query to CSV
  • Write a set of tables/queries to an excel spreadsheet with a sheet per query result
  • Due to composability, most functions and features are exported. I give no guarantees to maintain compatibility, so please use a dependency manager to pin a version to make sure I don't break your builds! If this package matures I am sure there will be some conventions, semver and pull requests from the community that will stabilize the feature set.
  • Using dep for dependency management, and ignoring vendor folder.
  • Using STANDARD SQL only
  • It will BLAST (probably) if the provided project is not the one you are currently authenticated against.
  • Everything (all data) runs through this computer so you might want to have a strong connection / run this in a box within google cloud.

Usage

Command Overview

BigQuery Utilities

Usage:
bq-utils --project=<project> (--csv|--ndjson) --output=<file> (--query=<query>|--table=<table>)
bq-utils --project=<project> (--excel|--google_sheet) --output=<file> (--query=<query> <query-sheet-name>|--table=<table> <table-sheet-name>)...

Options:
-h --help                     Show this screen
-p project --project=project  The GCP project you are working with.
-q query --query=query        The query to use as input to the csv writer
-t table --table=table        The table to use as input to the csv writer
-c --csv                      Use CSV as output for the writer
-n --ndjson                   Use Newline delimited JSON as output for the writer
-e --excel                    Use Excel as the output for the writer
-g --google_sheet             Use Google sheet as the output for the writer
-o file --output=file         The path (name) of the output file, i.e ~/Desktop/file.csv
-v --version                  Show version`

CSV

Exporting a table to CSV

bq-utils -p my-project -o file.csv -c -t dataset.table

Exporting a query to CSV

bq-utils -p my-project -o file.csv -c -q 'SELECT * FROM dataset.table'

Exporting a complex query to CSV

In some cases, we might have complex queries that would be bad to write out in the CLI. Currently I haven't bothered to support anything here, it can be solved pretty easily like this:

bq-utils -p my-project -o file.csv -c -q "$(cat query.sql)"

Newline delimited json (NDJSON)

Exporting a table to NDJSON

bq-utils -p my-project -o file.ndjson -n -t dataset.table

Exporting a query to NDJSON

bq-utils -p my-project -o file.ndjson -n -q 'SELECT * FROM dataset.table'

Exporting a complex query to NDJSON

bq-utils -p my-project -o file.ndjson -n -q "$(cat query.sql)"

Excel

When exporting to Excel, remember to include the sheet name (even though you only have one sheet)

Exporting a table to Excel

bq-utils -p my-project -o file.xslx -e -t dataset.table table-sheet

Exporting a query to Excel

bq-utils -p my-project -o file.xslx -e -q 'SELECT * FROM dataset.table' query-sheet

Exporting a complex query to Excel

bq-utils -p my-project -o file.csv -e -q "$(cat query.sql)" complex-query-sheet

Exporting a mix of queries and tables to Excel

bq-utils -p my-project -o file.csv -e -q "$(cat query.sql)" complex-query-sheet -q 'SELECT * FROM dataset.table' query-sheet -t dataset.table table-sheet

Exporting to google Sheets

Exporting to google sheets is a bit more work. you need to follow the guide in step 1 here and then, you need to teel bq-utils where to find the credentials by exposing it as the environment variable CLIENT_SECRET=path-to-client-secret. You can also run it like this CLIENT_SECRET=path-to-client-secret bq-utils ...

When exporting to Google sheets, remember to include the sheet name (even though you only have one sheet). NB NB if the total number of cells (across sheets) exceeds 2000000 (2 million), Exporting to google sheets will not work. The code tries to minimize the use of columns * rows per sheet, but this is a hard limit from Google's side. Sorry maaan!

Exporting a table to Google Sheets

bq-utils -p my-project -o file -g -t dataset.table table-sheet

Exporting a query to Google Sheets

bq-utils -p my-project -o file -g -q 'SELECT * FROM dataset.table' query-sheet

Exporting a complex query to Google Sheets

bq-utils -p my-project -o file -g -q "$(cat query.sql)" complex-query-sheet

Exporting a mix of queries and tables to Google Sheets

bq-utils -p my-project -o file.csv -g -q "$(cat query.sql)" complex-query-sheet -q 'SELECT * FROM dataset.table' query-sheet -t dataset.table table-sheet

Roadmap

  • Workerpool for running parallel queries and table scans, and populating excel sheets in parallel.
  • Support legacy SQL
  • Auto-detecting if you send a file for query parameter
  • Add sending spreadsheets to a specified email if your boss wants some data TOMORROW
  • Wrap errors to get origin of error
  • Write tests
  • DOCUMENTATION AND COMMENTS PLEASE

bq-utils's People

Contributors

martinsahlen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

pelicanrouge

bq-utils's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.