Comments (14)
It looks like you are retrieving the anonymous table to which the query result was written, rather than setting the destination table when submitting the query for execution. All query results are written to tables in BigQuery, but the workaround is to specify an output table rather than letting the system create an anonymous table for you.
I don't know the BQQuerier object that you're using here, but at the BQ API level, this means using the jobs.insert API rather than jobs.query, and specifying the destination table in your JobConfigurationQuery.
from spark-bigquery-connector.
@sharadbhadouria the fix is being rolled out at the moment. I will update this bug when it's enabled globally.
from spark-bigquery-connector.
Just to add the where clause works perfectly fine from BigQuery Query Editor UI. I tried this with another BigQuery table and the results were same.
from spark-bigquery-connector.
This is a KI with the read API, where the result of the query needs to be at least 10MB. We are working on fixing it on the backend when we do, it will work with no changes in the client.
I'll leave this open until it is fixed.
from spark-bigquery-connector.
Thanks pmkc. Do you know (tentative date) by when it could be fixed?
from spark-bigquery-connector.
There is no tentative date for this work item at this point, although it's a high-priority item for us. I'll update this bug when we have something more to share here.
from spark-bigquery-connector.
Hi @kmjung Any updates on this, we want to query only a smaller subset (50 rows), but we are Unable to, because of this bug. Is there a work-around for this?
from spark-bigquery-connector.
As a short-term workaround, you can set a destination table on the original query and use the storage API to read from that table.
from spark-bigquery-connector.
@kmjung
We are currently setting a destination table
bq = BQQuerier.instance()
query_job = bq.get_bq_query_job(query=self._query)
query_job.result() # execute query
table = (f'{query_job.destination.dataset_id}.'
f'{query_job.destination.table_id}')
Can you please elaborate more on using storage API to read from the table? We are currently reading using Spark connector. Could you please give some references/ example of using Storage API to read from destination table?
df = (spark
.read
.format('bigquery')
.option('table', table)
df = df.select(*self._columns).where(self._where)
from spark-bigquery-connector.
The downside of this workaround is, of course, that you have to pay for storing these tables while the temp tables are without charge. So this issue is still of importance. To avoid creating tables for small result sets i added this function where the query is performed by Spark:
def selectFromBigQuery(query: String): DataFrame = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
val allTables = logicalPlan.collectLeaves()
.collect { case x: UnresolvedRelation => x }
.map(_.tableIdentifier)
.map(x => {
val dataset = x.database.getOrElse(throw new IllegalArgumentException("A table id without a dataset id was encountered: " + x.table))
TableId.of(parentProject, dataset, x.table)
})
.map { table =>
val viewName = getSqlTableName(table).replace('.', '_') + "_view"
readFromBigQueryPartition(table).createOrReplaceTempView(viewName) // we register the table as a temp view name
table -> viewName
}.toMap
val transformed = logicalPlan.transformDown {
case tab: UnresolvedRelation => // we replace the table name with the tem view name
val tid = TableId.of(
tab.tableIdentifier.database.getOrElse(throw new IllegalArgumentException("No dataset provided in addition to table name.")),
tab.tableIdentifier.table)
UnresolvedRelation(TableIdentifier(allTables.getOrElse(tid, throw new IllegalArgumentException), None))
}
val analyzed = spark.sessionState.analyzer.executeAndCheck(transformed) // we have to resolve the plan first
new org.apache.spark.sql.Dataset[Row](spark, analyzed, RowEncoder(analyzed.schema)) // now we can create a dataframe
}
Which seems to work just fine.
from spark-bigquery-connector.
@kmjung any update on the API fix ?
Not sure this is the right place to follow up on backend issues as many other libraries are impacted.
from spark-bigquery-connector.
We're targeting early Q2 2020 for a fix here.
from spark-bigquery-connector.
Hi @kmjung , is this still on your radar?
from spark-bigquery-connector.
The fix has been rolled out globally now.
from spark-bigquery-connector.
Related Issues (20)
- does spark read from bq multiple times when joining? HOT 2
- No enum constant com.google.cloud.spark.bigquery.PartitionOverwriteMode.dynamic HOT 2
- How many times a the bq connector hits a table in bigquery ? HOT 1
- Support spark 3.5 HOT 2
- Next release? HOT 1
- Bug: Enabling predicate pushdown fails HOT 2
- Unable to write in a BQ table, with the new spark connector update - issue persists HOT 9
- [question][hope-reply]: why here we use Java to implement spark-connector as for supporting spark datasourceV2 HOT 3
- Unable to write changed table to BigQuery with the new Spark Connector - 0.35.1 HOT 1
- How to configure `spark-bigquery` connector in KFP (Kubeflow) GCP DataProc operator which uses DataProc serverless under the hood? HOT 1
- Spark BQ connector doesn't work when reading table that is partitioned? HOT 5
- Support use of bigquery-emulator for integration testing
- Flakey behavior when writing to BigQuery HOT 4
- Clarification on Billing and Improved README.md Explanation HOT 1
- AWS Glue - Indirect write mode errors HOT 3
- Schema mismatch error needs to be more verbose
- INVALID_ARGUMENT When attempting to show df from BigQuery HOT 3
- Load failure caused by comment at top of query string (llegalArgumentException: Invalid Table ID) HOT 1
- BigQueryConnectorException: Error creating destination table HOT 5
- Unable to overwrite partition HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-bigquery-connector.