Light

FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported about spark-bigquery-connector HOT 14 CLOSED

googleclouddataproc commented on May 21, 2024

FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported

from spark-bigquery-connector.

Comments (14)

kmjung commented on May 21, 2024 1

It looks like you are retrieving the anonymous table to which the query result was written, rather than setting the destination table when submitting the query for execution. All query results are written to tables in BigQuery, but the workaround is to specify an output table rather than letting the system create an anonymous table for you.

I don't know the BQQuerier object that you're using here, but at the BQ API level, this means using the jobs.insert API rather than jobs.query, and specifying the destination table in your JobConfigurationQuery.

from spark-bigquery-connector.

kmjung commented on May 21, 2024 1

@sharadbhadouria the fix is being rolled out at the moment. I will update this bug when it's enabled globally.

from spark-bigquery-connector.

virender7 commented on May 21, 2024

Just to add the where clause works perfectly fine from BigQuery Query Editor UI. I tried this with another BigQuery table and the results were same.

from spark-bigquery-connector.

pmkc commented on May 21, 2024

This is a KI with the read API, where the result of the query needs to be at least 10MB. We are working on fixing it on the backend when we do, it will work with no changes in the client.

I'll leave this open until it is fixed.

from spark-bigquery-connector.

virender7 commented on May 21, 2024

Thanks pmkc. Do you know (tentative date) by when it could be fixed?

from spark-bigquery-connector.

kmjung commented on May 21, 2024

There is no tentative date for this work item at this point, although it's a high-priority item for us. I'll update this bug when we have something more to share here.

from spark-bigquery-connector.

sreenivasan-fathom commented on May 21, 2024

Hi @kmjung Any updates on this, we want to query only a smaller subset (50 rows), but we are Unable to, because of this bug. Is there a work-around for this?

from spark-bigquery-connector.

kmjung commented on May 21, 2024

As a short-term workaround, you can set a destination table on the original query and use the storage API to read from that table.

from spark-bigquery-connector.

sreenivasan-fathom commented on May 21, 2024

@kmjung
We are currently setting a destination table

            bq = BQQuerier.instance()
            query_job = bq.get_bq_query_job(query=self._query)
            query_job.result() # execute query
            table = (f'{query_job.destination.dataset_id}.'
                     f'{query_job.destination.table_id}')

Can you please elaborate more on using storage API to read from the table? We are currently reading using Spark connector. Could you please give some references/ example of using Storage API to read from destination table?

        df = (spark
              .read
              .format('bigquery')
              .option('table', table)
      df = df.select(*self._columns).where(self._where)

from spark-bigquery-connector.

chile12 commented on May 21, 2024

The downside of this workaround is, of course, that you have to pay for storing these tables while the temp tables are without charge. So this issue is still of importance. To avoid creating tables for small result sets i added this function where the query is performed by Spark:

def selectFromBigQuery(query: String): DataFrame = {
    val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
    val allTables = logicalPlan.collectLeaves()
      .collect { case x: UnresolvedRelation => x }
      .map(_.tableIdentifier)
      .map(x => {
        val dataset = x.database.getOrElse(throw new IllegalArgumentException("A table id without a dataset id was encountered: " + x.table))
        TableId.of(parentProject, dataset, x.table)
      })
      .map { table =>
        val viewName = getSqlTableName(table).replace('.', '_') + "_view"
        readFromBigQueryPartition(table).createOrReplaceTempView(viewName) // we register the table as a temp view name
        table -> viewName
      }.toMap

    val transformed = logicalPlan.transformDown {
      case tab: UnresolvedRelation => // we replace the table name with the tem view name
        val tid = TableId.of(
          tab.tableIdentifier.database.getOrElse(throw new IllegalArgumentException("No dataset provided in addition to table name.")),
          tab.tableIdentifier.table)
        UnresolvedRelation(TableIdentifier(allTables.getOrElse(tid, throw new IllegalArgumentException), None))
    }
    val analyzed = spark.sessionState.analyzer.executeAndCheck(transformed)             // we have to resolve the plan first
    new org.apache.spark.sql.Dataset[Row](spark, analyzed, RowEncoder(analyzed.schema)) // now we can create a dataframe
  }

Which seems to work just fine.

from spark-bigquery-connector.

yan-hic commented on May 21, 2024

@kmjung any update on the API fix ?
Not sure this is the right place to follow up on backend issues as many other libraries are impacted.

from spark-bigquery-connector.

kmjung commented on May 21, 2024

We're targeting early Q2 2020 for a fix here.

from spark-bigquery-connector.

sharadbhadouria commented on May 21, 2024

Hi @kmjung , is this still on your radar?

from spark-bigquery-connector.

emkornfield commented on May 21, 2024

The fix has been rolled out globally now.

from spark-bigquery-connector.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs