GithubHelp home page GithubHelp logo

tylertreat / bigquery-python Goto Github PK

View Code? Open in Web Editor NEW
449.0 34.0 175.0 1.16 MB

Simple Python client for interacting with Google BigQuery.

License: Apache License 2.0

Makefile 0.26% Python 99.74%
bigquery google-bigquery python

bigquery-python's Introduction

BigQuery-Python

Simple Python client for interacting with Google BigQuery.

This client provides an API for retrieving and inserting BigQuery data by wrapping Google's low-level API client library. It also provides facilities that make it convenient to access data that is tied to an App Engine appspot, such as request logs.

Documentation

Installation

pip install bigquery-python

Basic Usage

from bigquery import get_client

# BigQuery project id as listed in the Google Developers Console.
project_id = 'project_id'

# Service account email address as listed in the Google Developers Console.
service_account = '[email protected]'

# PKCS12 or PEM key provided by Google.
key = 'key.pem'

client = get_client(project_id, service_account=service_account,
                    private_key_file=key, readonly=True)

# JSON key provided by Google
json_key = 'key.json'
 
client = get_client(json_key_file=json_key, readonly=True)

# Submit an async query.
job_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')

# Check if the query has finished running.
complete, row_count = client.check_job(job_id)

# Retrieve the results.
results = client.get_query_rows(job_id)

Executing Queries

The BigQuery client allows you to execute raw queries against a dataset. The query method inserts a query job into BigQuery. By default, query method runs asynchronously with 0 for timeout. When a non-zero timeout value is specified, the job will wait for the results, and throws an exception on timeout.

When you run an async query, you can use the returned job_id to poll for job status later with check_job.

# Submit an async query
job_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')

# Do other stuffs

# Poll for query completion.
complete, row_count = client.check_job(job_id)

# Retrieve the results.
if complete:
    results = client.get_query_rows(job_id)

You can also specify a non-zero timeout value if you want your query to be synchronous.

# Submit a synchronous query
try:
    _job_id, results = client.query('SELECT * FROM dataset.my_table LIMIT 1000', timeout=10)
except BigQueryTimeoutException:
    print "Timeout"

Query Builder

The query_builder module provides an API for generating query strings that can be run using the BigQuery client.

from bigquery.query_builder import render_query

selects = {
    'start_time': {
        'alias': 'Timestamp',
        'format': 'INTEGER-FORMAT_UTC_USEC'
    }
}

conditions = [
    {
        'field': 'Timestamp',
        'type': 'INTEGER',
        'comparators': [
            {
                'condition': '>=',
                'negate': False,
                'value': 1399478981
            }
        ]
    }
]

grouping = ['Timestamp']

having = [
    {
        'field': 'Timestamp',
        'type': 'INTEGER',
        'comparators': [
            {
                'condition': '==',
                'negate': False,
                'value': 1399478981
            }
        ]
    }
]

order_by ={'fields': ['Timestamp'], 'direction': 'desc'}

query = render_query(
    'dataset',
    ['table'],
    select=selects,
    conditions=conditions,
    groupings=grouping,
    having=having,
    order_by=order_by,
    limit=47
)

job_id, _ = client.query(query)

Managing Tables

The BigQuery client provides facilities to manage dataset tables, including creating, deleting, checking the existence, and getting the metadata of tables.

# Create a new table.
schema = [
    {'name': 'foo', 'type': 'STRING', 'mode': 'nullable'},
    {'name': 'bar', 'type': 'FLOAT', 'mode': 'nullable'}
]
created = client.create_table('dataset', 'my_table', schema)

# Delete an existing table.
deleted = client.delete_table('dataset', 'my_table')

# Check if a table exists.
exists = client.check_table('dataset', 'my_table')

# Get a table's full metadata. Includes numRows, numBytes, etc. 
# See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables
metadata = client.get_table('dataset', 'my_table')

There is also functionality for retrieving tables that are associated with a Google App Engine appspot, assuming table names are in the form of appid_YYYY_MM or YYYY_MM_appid. This allows tables between a date range to be selected and queried on.

# Get appspot tables falling within a start and end time.
from datetime import datetime, timedelta
range_end = datetime.utcnow()
range_start = range_end - timedelta(weeks=12)
tables = client.get_tables('dataset', 'appid', range_start, range_end)

Inserting Data

The client provides an API for inserting data into a BigQuery table. The last parameter refers to an optional insert id key used to avoid duplicate entries.

# Insert data into table.
rows =  [
    {'one': 'ein', 'two': 'zwei'},
    {'id': 'NzAzYmRiY', 'one': 'uno', 'two': 'dos'},
    {'id': 'NzAzYmRiY', 'one': 'ein', 'two': 'zwei'} # duplicate entry
]

inserted = client.push_rows('dataset', 'table', rows, 'id')

Write Query Results to Table

You can write query results directly to table. When either dataset or table parameter is omitted, query result will be written to temporary table.

# write to permanent table
job = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100',
                            'dataset',
                            'table')
try:
    job_resource = client.wait_for_job(job, timeout=60)
    print job_resource
except BigQueryTimeoutException:
    print "Timeout"

# write to permanent table with UDF in query string
external_udf_uris = ["gs://bigquery-sandbox-udf/url_decode.js"]
query = """SELECT requests, title
            FROM
              urlDecode(
                SELECT
                  title, sum(requests) AS num_requests
                FROM
                  [fh-bigquery:wikipedia.pagecounts_201504]
                WHERE language = 'fr'
                GROUP EACH BY title
              )
            WHERE title LIKE '%ç%'
            ORDER BY requests DESC
            LIMIT 100
        """
job = client.write_to_table(
  query,
  'dataset',
  'table',
  external_udf_uris=external_udf_uris
)

try:
    job_resource = client.wait_for_job(job, timeout=60)
    print job_resource
except BigQueryTimeoutException:
    print "Timeout"

# write to temporary table
job = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100')
try:
    job_resource = client.wait_for_job(job, timeout=60)
    print job_resource
except BigQueryTimeoutException:
    print "Timeout"

Import data from Google cloud storage

schema = [ {"name": "username", "type": "string", "mode": "nullable"} ]
job = client.import_data_from_uris( ['gs://mybucket/mydata.json'],
                                    'dataset',
                                    'table',
                                    schema,
                                    source_format=JOB_SOURCE_FORMAT_JSON)

try:
    job_resource = client.wait_for_job(job, timeout=60)
    print job_resource
except BigQueryTimeoutException:
    print "Timeout"

Export data to Google cloud storage

job = client.export_data_to_uris( ['gs://mybucket/mydata.json'],
                                   'dataset',
                                   'table')
try:
    job_resource = client.wait_for_job(job, timeout=60)
    print job_resource
except BigQueryTimeoutException:
    print "Timeout"

Managing Datasets

The client provides an API for listing, creating, deleting, updating and patching datasets.

# List datasets
datasets = client.get_datasets()


# Create dataset
dataset = client.create_dataset('mydataset', friendly_name="My Dataset", description="A dataset created by me")

# Get dataset
client.get_dataset('mydataset')

# Delete dataset
client.delete_dataset('mydataset')
client.delete_dataset('mydataset', delete_contents=True) # delete even if it contains data

# Update dataset
client.update_dataset('mydataset', friendly_name="mon Dataset") # description is deleted

# Patch dataset
client.patch_dataset('mydataset', friendly_name="mon Dataset") # friendly_name changed; description is preserved

# Check if dataset exists.
exists = client.check_dataset('mydataset')

Creating a schema from a sample record

from bigquery import schema_from_record

schema_from_record({"id":123, "posts": [{"id":123, "text": "this is a post"}], "username": "bob"})

Contributing

Requirements to commit here:

  • Branch off master, PR back to master.
  • Your code should pass Flake8.
  • Unit test coverage is required.
  • Good docstrs are required.
  • Good commit messages are required.

bigquery-python's People

Contributors

aaronkavlie-wf avatar andrewgardners avatar aneilbaboo avatar cbhimavarapu-datalicious avatar ciaran-blewitt-webjet avatar coffenbacher avatar davidquintana avatar e271828- avatar egeucak avatar hagino3000 avatar mdanieli avatar nickstanisha avatar orangain avatar paulliang1 avatar paulw54jrn avatar pirsquare avatar puhitaku avatar rhoboro avatar ruxandraburtica avatar sagarrakshe avatar scribu avatar sleepless-se avatar sshrestha-datalicious avatar trentonsmith-wf avatar tuanavu avatar tushar946 avatar tylertreat avatar tylertreat-wf avatar ybastide avatar yujikanazawa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigquery-python's Issues

hi still get the error [('asn1 encoding routines', 'ASN1_D2I_READ_BIO', 'not enough data')]

My code is:

    project_id=r"foretribebigquery"
    service_account=r"account"
    # secret_key=r'somekey'
    with open('private_key', 'rb') as f:
        secret_key = f.read()
    dataset=r"testdata"
    table=r"postdata"
    client=get_client(project_id, service_account=service_account, private_key=secret_key, readonly=True)
    schema=[
        {'name':'zip','type':'STRING','node':'nullable'},
        {'name':'type','type':'STRING','node':'nullable'},
        {'name':'primary_city','type':'STRING','node':'nullable'},
        {'name':'unacceptable_cities','type':'STRING','node':'nullable'},
        {'name':'state','type':'STRING','node':'nullable'},
        {'name':'country','type':'STRING','node':'nullable'},
        {'name':'timezone','type':'STRING','node':'nullable'},
        {'name':'area_codes','type':'STRING','node':'nullable'},
        {'name':'latitude','type':'FLOAT','node':'nullable'},
        {'name':'longitude','type':'FLOAT','node':'nullable'},
        {'name':'estimated_population','type':'INTEGER','node':'nullable'}
    ]

    created=client.create_table(dataset,table, schema)

I read the key file by 'rb', but still get the error:

Traceback (most recent call last):
File "C:\Python27\lib\site-packages\flask\app.py", line 1836, in call
return self.wsgi_app(environ, start_response)
File "C:\Python27\lib\site-packages\flask\app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "C:\Python27\lib\site-packages\flask\app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "C:\Python27\lib\site-packages\flask\app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "C:\Python27\lib\site-packages\flask\app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Python27\lib\site-packages\flask\app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "C:\Python27\lib\site-packages\flask\app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Python27\lib\site-packages\flask\app.py", line 1461, in dispatch_request
return self.view_functionsrule.endpoint
File "C:\personal\foretribebigquery\app\routes\index.py", line 41, in googledata
client=get_client(project_id, service_account=service_account, private_key=secret_key, readonly=True)
File "C:\personal\foretribebigquery\bigquery\client.py", line 77, in get_client
readonly=readonly)
File "C:\personal\foretribebigquery\bigquery\client.py", line 94, in _get_bq_service
service = build('bigquery', 'v2', http=http)
File "C:\Python27\lib\site-packages\oauth2client\util.py", line 135, in positional_wrapper
return wrapped(_args, *_kwargs)
File "C:\Python27\lib\site-packages\googleapiclient\discovery.py", line 198, in build
resp, content = http.request(requested_url)
File "C:\Python27\lib\site-packages\oauth2client\util.py", line 135, in positional_wrapper
return wrapped(_args, *_kwargs)
File "C:\Python27\lib\site-packages\oauth2client\client.py", line 530, in new_request
self._refresh(request_orig)
File "C:\Python27\lib\site-packages\oauth2client\client.py", line 744, in _refresh
self._do_refresh_request(http_request)
File "C:\Python27\lib\site-packages\oauth2client\client.py", line 768, in _do_refresh_request
body = self._generate_refresh_request_body()
File "C:\Python27\lib\site-packages\oauth2client\client.py", line 1375, in _generate_refresh_request_body
assertion = self._generate_assertion()
File "C:\Python27\lib\site-packages\oauth2client\client.py", line 1504, in _generate_assertion
private_key, self.private_key_password), payload)
File "C:\Python27\lib\site-packages\oauth2client\crypt.py", line 138, in from_string
pkey = crypto.load_pkcs12(key, password).get_privatekey()
File "build\bdist.win32\egg\OpenSSL\crypto.py", line 2216, in load_pkcs12
_raise_current_error()
File "build\bdist.win32\egg\OpenSSL_util.py", line 22, in exception_from_error_queue
raise exceptionType(errors)
Error: [('asn1 encoding routines', 'ASN1_D2I_READ_BIO', 'not enough data')]

I have read the issue 7 it seems load secret key with binary format can solve it, but my case it seems do not work.

Public method for getting all tables for a dataset

Need

Currently, there's no support for getting all tables from a given dataset (there is _get_all_tables, but it skips tables that do not respect a certain format).

Solution

Add get_all_tables(dataset_id) to the BigQueryClient class.

ImportError: No module named bigquery.client

After having run make, I tried to run the simple usage example and got the following:

from bigquery.client import get_client

ImportError: No module named bigquery.client

please advise.

thanks!

push_col available?

Hi!

Is there a push_col function available? I want to write a DF on that pivot. Lots of rows, not as many columns. Is there a suggestion if I could pull and push back some source? Someplace I could start?

Thanks!
Chris

Allow Large Results

Is there currently a way to set the allowLargeResults job argument to true using this package?

Error with your basic connection script example?

When I try to run your Basic Usage connection example, with the path to the .12 key file for "key", I get an error saying:
get_client() got an unexpected keyword argument 'private_key_file'

I think your example needs updating?

CryptoUnavailableError: No crypto library available

trying to use the library on google app engine and i run in this error:

File "C:\Users\BELVI-PC\Documents\Eclipse Projects\OpenTraffic\bigquery\client.py", line 97, in _get_bq_service

credentials = _credentials()(service_account, private_key, scope=scope)

File "C:\Users\BELVI-PC\Documents\Eclipse Projects\OpenTraffic\oauth2client\util.py", line 142, in positional_wrapper

return wrapped(*args, **kwargs)

File "C:\Users\BELVI-PC\Documents\Eclipse Projects\OpenTraffic\oauth2client\client.py", line 1622, in init

_RequireCryptoOrDie()

File "C:\Users\BELVI-PC\Documents\Eclipse Projects\OpenTraffic\oauth2client\client.py", line 1573, in _RequireCryptoOrDie

raise CryptoUnavailableError('No crypto library available')

CryptoUnavailableError: No crypto library available

Add integration tests

It would be nice to have an integration suite which tests each piece of functionality against a live (configurable) BigQuery project.

OpenSSL not availble on appengine

I'm getting the ImportError: cannot import name SignedJwtAssertionCredentials due to the fact that this module only tries SSL with OpenSSL. Unfortunately appengine doesn't have this lib, so maybe there should be a compatibility with pycrypto ?

AssertionError: No api proxy found for service "memcache"

When I run this code manually everything works fine.. but when I run it inside my code, it prints a warning message which is bothering me and I don't know why it happens..

        client = get_client(project,
                                 service_account=self.service_account,
                                 private_key_file=self.google_key,
                                 readonly=False)

WARNING:root:No api proxy found for service "memcache"
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/appengine_memcache.py", line 42, in get
return memcache.get(url, namespace=NAMESPACE)
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 559, in get
rpc = self.get_multi_async([key], namespace=namespace, for_cas=for_cas)
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 612, in get_multi_async
self.get_hook, user_key)
File "/home/ricci/google_appengine/google/appengine/api/memcache/__init
.py", line 381, in _make_async_call
rpc = create_rpc()
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 295, in create_rpc
return apiproxy_stub_map.UserRPC('memcache', deadline, callback)
File "/home/ricci/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 414, in init
self.rpc = CreateRPC(service, stubmap)
File "/home/ricci/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 68, in CreateRPC
assert stub, 'No api proxy found for service "%s"' % service
AssertionError: No api proxy found for service "memcache"
WARNING:root:No api proxy found for service "memcache"
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/appengine_memcache.py", line 48, in set
memcache.set(url, content, time=int(self._max_age), namespace=NAMESPACE)
File "/home/ricci/google_appengine/google/appengine/api/memcache/__init
.py", line 764, in set
namespace=namespace)
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 869, in _set_with_policy
time, '', namespace)
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 971, in _set_multi_async_with_policy
(server_keys, user_key))
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 381, in _make_async_call
rpc = create_rpc()
File "/home/ricci/google_appengine/google/appengine/api/memcache/init.py", line 295, in create_rpc
return apiproxy_stub_map.UserRPC('memcache', deadline, callback)
File "/home/ricci/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 414, in init
self.__rpc = CreateRPC(service, stubmap)
File "/home/ricci/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 68, in CreateRPC
assert stub, 'No api proxy found for service "%s"' % service
AssertionError: No api proxy found for service "memcache"

Can someone help me with this?

Thanks

Unable to get failure reason for jobs that have failed

Happy to write this myself, but just wanted to validate direction:

Use Case: I've submitted a query asynchronously to BQ. The query fails for any number of reasons. The job will be marked as "done" but I do not have the ability to get the reason for the failure and the current check_job implementation will just say 0 rows.

Solutions, in no particular order:

  1. Leave it alone, it's good enough
  2. Hijack check_job and have it return richer job status information.
  3. Add a new get_job_status method that's responsible for returning the status, including failure reasons if any.

I'm guessing number 2 is probably the best option, but wanted to pass it by before making a pull request.

Whatdya think?

Build using google credentials

It's possible to build a service using GoogleCredentials? I tried giving the GoogleCredentials as parameter, but it crashes when building the service:
service = build('bigquery', 'v2', http=http, discoveryServiceUrl=service_url)
I'm doing something like this:
get_client(project, credentials=GoogleCredentials.get_application_default(), private_key="private_key.json", readonly=False)
But I got:

oauth2client.client.AccessTokenRefreshError: invalid_scope: Empty or missing scope not allowed.

I just don't know if I'm doing something wrong or if I don't know how to use the lib/services correctly.. but I believe I need something like this
discovery.build('bigquery', 'v2', credentials=credentials)

Data limit?

Hi,

It seems that there is a limit of how much data you can read? Whenever I read more data ( a fat table or a skinny long table), the reading is not successful and it only returns partial data or nothing. Is there anyway to get around this?

Thanks,

Hongli

Versions on requirements.txt modules

Hi,

would it be possible to maintain a versioned requirements.txt file?

I'm getting an error which I think is due to wrong versioning of python-dateutil:

Traceback (most recent call last):
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1272, in default_dispatcher
    self.handlers[handler] = handler = import_string(handler)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1850, in import_string
    return getattr(__import__(module, None, None, [obj]), obj)
  File "/Users/nicholas/starmaker/gae/webapp2/tasks/bq.py", line 5, in <module>
    import bigquery
  File "/Users/nicholas/starmaker/gae/webapp2/smutils/../modules/bigquery/__init__.py", line 1, in <module>
    from client import get_client
  File "/Users/nicholas/starmaker/gae/webapp2/smutils/../modules/bigquery/client.py", line 17, in <module>
    from bigquery.schema_builder import schema_from_record
  File "/Users/nicholas/starmaker/gae/webapp2/smutils/../modules/bigquery/schema_builder.py", line 5, in <module>
    import dateutil.parser
ImportStringError: import_string() failed for 'tasks.bq.TestBQ'. Possible reasons are:

- missing __init__.py in a package;
- package or module path not included in sys.path;
- duplicated package or module name taking precedence in sys.path;
- missing module, class, function or variable;

Original exception:

ImportError: No module named dateutil.parser

Error if no insert id specified for push_rows

File "seg_size_bq_dump.py", line 81, in <module> find_segments() File "seg_size_bq_dump.py", line 76, in find_segments save_segment_size_to_bq(a["aid"],a["name"],s["slug_name"],s["name"],s["id"],s["size"]) File "seg_size_bq_dump.py", line 36, in save_segment_size_to_bq inserted = client.push_rows('customer_success','segment_count_history', row) File "/Library/Python/2.7/site-packages/bigquery/client.py", line 909, in push_rows if insert_id_key in row: TypeError: coercing to Unicode: need string or buffer, NoneType found

Example with no 4th parameter:

inserted = client.push_rows('dataset','table', row)

the platform is not working on google app_engine

I try to build a google app_engine include your code as additional libs.
currently google app_engine have only libs:
https://cloud.google.com/appengine/docs/python/tools/libraries27

I add:

sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'lib'))
basedir = os.path.abspath(os.path.dirname(__file__))

to make the lib directory as my external libs.
I added some libs oauth2client, dateutil, simplejson, and bigquery

  File "C:\Users\sheng\PycharmProjects\helloboard\main.py", line 104, in get
    client=bigquery.get_client(project_id, service_account=service_email, private_key_file=key,  readonly=True)
  File "C:\Users\sheng\PycharmProjects\helloboard\lib\bigquery\client.py", line 85, in get_client
    readonly=readonly)
  File "C:\Users\sheng\PycharmProjects\helloboard\lib\bigquery\client.py", line 99, in _get_bq_service
    credentials = _credentials()(service_account, private_key, scope=scope)
  File "C:\Users\sheng\PycharmProjects\helloboard\lib\oauth2client\util.py", line 135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "C:\Users\sheng\PycharmProjects\helloboard\lib\oauth2client\client.py", line 1454, in __init__
    _RequireCryptoOrDie()
  File "C:\Users\sheng\PycharmProjects\helloboard\lib\oauth2client\client.py", line 1408, in _RequireCryptoOrDie
    raise CryptoUnavailableError('No crypto library available')
CryptoUnavailableError: No crypto library available

the error is no crypto libary, I try to add pycrypto, but the pycrypto contain lots of c++ or code libraries.
I do not think it will run on could when do you know the library do not contain the other language library for crypto?
or you can use other way to invoke oauth2client

_get_all_tables(datasetID,cache=False) returns defaultdict(None, {})

Hi i am trying to retrieve all the tables under a dataset using python client. why is it returning none when there are 7 tables under the public dataset.

Here is my code:

class get_Tables:
def GET(self,r):
tables = []
datasetID = web.input().dataSetID
project_id = 'publicdata'
service_account = '10113222700-epqo6lmkl67j6u1qafha9dlke0pmcck3@developer.gserviceaccount.com'
key = 'Digin-d245e7da9.p12'
client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=True)
tables = client._get_all_tables(datasetID,cache=False)
print tables

not all rows being returned

when I run the following, it returns a row_count of 9025 which is the same number of rows returned when the query is run through BigQuery's web interface.

complete, row_count = client.check_job(job_id)

I run this command next:

results = client.get_query_rows(job_id)

and len(results) returns 8374, so it appears that not all the rows are being returned in the results object.

Allow get_client to read key from a file

get_client currently requires the private key to be passed in as a string. It would be more convenient to just pass the file name and let it handle reading it. Allow the key itself or the file name to be passed in through the private_key kwarg.

it stopped working ?

Hi @tylertreat

I was using this package since 3-4 days and it was working very well but it suddenly stopped working yesterday. It's not also working for one of my friend.

can you please see for this issue, is this wrong happening with two of us only ?

ImportError: cannot import name get_client

Hi there,

I can't seem to resolve this issue. I upgrade all my Anaconda packages and reinstalled this package. However, I keep getting this error:

ImportError: cannot import name get_client

When I try to import as such:

from bigquery import get_client

Any tips?

Here is what I see when I install this package:

Collecting bigquery-python
Requirement already satisfied (use --upgrade to upgrade): google-api-python-clie
nt in c:\python27_x64\lib\site-packages (from bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in c:\
python27_x64\lib\site-packages (from bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): httplib2 in c:\python2
7_x64\lib\site-packages (from bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): uritemplate<1,>=0.6 in
 c:\python27_x64\lib\site-packages (from google-api-python-client->bigquery-pyth
on)
Requirement already satisfied (use --upgrade to upgrade): oauth2client<3,>=2.0.0
 in c:\python27_x64\lib\site-packages (from google-api-python-client->bigquery-p
ython)
Requirement already satisfied (use --upgrade to upgrade): six<2,>=1.6.1 in c:\py
thon27_x64\lib\site-packages (from google-api-python-client->bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): simplejson>=2.5.0 in c
:\python27_x64\lib\site-packages (from uritemplate<1,>=0.6->google-api-python-cl
ient->bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): pyasn1>=0.1.7 in c:\py
thon27_x64\lib\site-packages (from oauth2client<3,>=2.0.0->google-api-python-cli
ent->bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): pyasn1-modules>=0.0.5
in c:\python27_x64\lib\site-packages (from oauth2client<3,>=2.0.0->google-api-py
thon-client->bigquery-python)
Requirement already satisfied (use --upgrade to upgrade): rsa>=3.1.4 in c:\pytho
n27_x64\lib\site-packages (from oauth2client<3,>=2.0.0->google-api-python-client
->bigquery-python)
Installing collected packages: bigquery-python
Successfully installed bigquery-python-1.8.0

Incorrect error logging

There are a couple places where the wrong thing is being logged on error which makes it difficult to debug what's going on. Refer to #44.

Raise Exceptions on Timeout

I was thinking that for "client.query" and "client.wait_for_job" methods, should we raise exceptions if timeout instead of returning empty results for "client.query" and job resource for "client.wait_for_job". What do you think?

# Instead of this
job_id, results = client.query('SELECT * FROM dataset.my_table LIMIT 1000', timeout=5)
complete, row_count = client.check_job(job_id)
if not complete:
    print "Timeout"

# We can run this
try:
    job_id, results = client.query('SELECT * FROM dataset.my_table LIMIT 1000', timeout=5)
except BigQueryTimeoutException:
    print "Timeout"


# Instead of this
job = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100')
job = client.wait_for_job(job, timeout=60)
if not job["status"]["state"] == u'DONE':
    print "Timeout"

# We can run this
job = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100')
try:
    client.wait_for_job(job, timeout=60)
except BigQueryTimeoutException:
    print "Timeout"

Feature Request: Export Query Results to Table

This library could become a data pipeline powerhouse if it added the hooks to flush query results to tables with batch priority.

This is something you can do from the web ui by fiddling with those options at the bottom that nobody likes to look at:

image

Relevant docs:

https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query.destinationTable

potential entrypoint?

https://github.com/tylertreat/BigQuery-Python/blob/master/bigquery/client.py#L147-L152

Alternatively, maybe the service could be exposed through this library to let users construct their own query obj according to the BQ docs?

Error while creating tables.

Hi,

I am new to big query, and i am trying to create a test table into big query using this BigQuery-Python. I was able to query an existing table, but get the following error while trying to create/delete tables.

ERROR:root:Cannot create table sample.my_table
Http Error:

Could you please advice on what might be wrong?

Thanks!

Queries results limited

Hello, I don't know why but all of my queries are limited it can get maximum 100 000 rows per query
I see that there is a parameter allow_large_results in the original API and I added it in the query method in the dict query_data but it is still limited...

I use the synchronous "query" method.

Do you know what is going on ?

Thank you very much

Argument limit ignored in get_query_rows

Method get_query_rows in client.py ignores the argument limit when page_token is not None. What this means is that when a query returns enough rows to require pagination, the rows returned will be the first limit of each page. This goes against the intuitive behavior which would be to only return limit rows.

Using insertId and still get lots of duplicated entries for data upload

Hello!

Great tool! All works like charm except the data upload, where I get lots of duplicated data:

inserted = client.push_rows('dataset', 'table', rows, 'id')

mine is:

inserted = client.push_rows('twitter', 'tweets', rows, 'id')

I also have a list of dictionaries, where on key is called id with a unique number (integer). Still I have some values up to ten times in the test table.

Any hint? Thanks in advance!

No handlers could be found for logger "bigquery.client"

from bigquery import get_client

BigQuery project id as listed in the Google Developers Console.

project_id = 'project_id'

Service account email address as listed in the Google Developers Console.

service_account = '[email protected]'

PKCS12 or PEM key provided by Google.

key = 'key.pem'

client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=True)

JSON key provided by Google

json_key = 'key.json'

client = get_client(json_key_file=json_key, readonly=True)

Insert data into table.

rows = [
{'one': 'ein', 'two': 'zwei'}
{'id': 'NzAzYmRiY', 'one': 'uno', 'two': 'dos'},
{'id': 'NzAzYmRiY', 'one': 'ein', 'two': 'zwei'} # duplicate entry
]

inserted = client.push_rows('dataset', 'table', rows, 'id')

Unflat results

I'm about to write this but just wanted to confirm if it's something I can send as a PR.

It's the ability to unflat results using the schema when you have nested/repeated properties. This is very useful and I think it should be part of the library.

TIMESTAMP returned as unicode

Simple query on a table with a BigQuery TIMESTAMP column 'dt'

    _job_id, results = client.query('SELECT * FROM dataset.table LIMIT 1', timeout=10)

Yields this, instead of what I would expect (ideally a Python datetime or alternatively a UNIX timestamp int):

   print(results)

   [
       {
           u'dt': u'1.447216127E9'
       }, 
   ]

datetimes not json serializable

Attempting to use push_rows with datetime fields, I get this:

*** TypeError: datetime.datetime(2015, 4, 8, 11, 53, 2, 65490, tzinfo=) is not JSON serializable

get_datasets() returns only partial list

On a project with 63 datasets, a get_datasets() call returned only the first 50 datasets. Since the docstring reads List all datasets in the project, this does not seem to be intended behavior.

AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'urlencode'

Hello!

Here is my error code:
Traceback (most recent call last): File "/Users/Nikhil/Dropbox/Eclipse Workspace/BigQueryAPI/main/BigQueryWrapperTes.py", line 19, in <module> private_key_file=key, readonly=True) File "/Library/Python/2.7/site-packages/bigquery/client.py", line 146, in get_client service_url=service_url) File "/Library/Python/2.7/site-packages/bigquery/client.py", line 158, in _get_bq_service discoveryServiceUrl=service_url) File "/Library/Python/2.7/site-packages/oauth2client/util.py", line 135, in positional_wrapper return wrapped(*args, **kwargs) File "/Library/Python/2.7/site-packages/googleapiclient/discovery.py", line 207, in build cache) File "/Library/Python/2.7/site-packages/googleapiclient/discovery.py", line 254, in _retrieve_discovery_doc resp, content = http.request(actual_url) File "/Library/Python/2.7/site-packages/oauth2client/client.py", line 616, in new_request self._refresh(request_orig) File "/Library/Python/2.7/site-packages/oauth2client/client.py", line 873, in _refresh self._do_refresh_request(http_request) File "/Library/Python/2.7/site-packages/oauth2client/client.py", line 900, in _do_refresh_request body = self._generate_refresh_request_body() File "/Library/Python/2.7/site-packages/oauth2client/client.py", line 1613, in _generate_refresh_request_body body = urllib.parse.urlencode({ AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'urlencode'

I initially attempted to use just my JSON key to authenticate but it was throwing me an error asking for the PEM key then this error gets thrown when I provided my p12 key.

Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.