GithubHelp home page GithubHelp logo

FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported about spark-bigquery-connector HOT 14 CLOSED

googleclouddataproc avatar googleclouddataproc commented on May 21, 2024
FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported

from spark-bigquery-connector.

Comments (14)

kmjung avatar kmjung commented on May 21, 2024 1

It looks like you are retrieving the anonymous table to which the query result was written, rather than setting the destination table when submitting the query for execution. All query results are written to tables in BigQuery, but the workaround is to specify an output table rather than letting the system create an anonymous table for you.

I don't know the BQQuerier object that you're using here, but at the BQ API level, this means using the jobs.insert API rather than jobs.query, and specifying the destination table in your JobConfigurationQuery.

from spark-bigquery-connector.

kmjung avatar kmjung commented on May 21, 2024 1

@sharadbhadouria the fix is being rolled out at the moment. I will update this bug when it's enabled globally.

from spark-bigquery-connector.

virender7 avatar virender7 commented on May 21, 2024

Just to add the where clause works perfectly fine from BigQuery Query Editor UI. I tried this with another BigQuery table and the results were same.

from spark-bigquery-connector.

pmkc avatar pmkc commented on May 21, 2024

This is a KI with the read API, where the result of the query needs to be at least 10MB. We are working on fixing it on the backend when we do, it will work with no changes in the client.

I'll leave this open until it is fixed.

from spark-bigquery-connector.

virender7 avatar virender7 commented on May 21, 2024

Thanks pmkc. Do you know (tentative date) by when it could be fixed?

from spark-bigquery-connector.

kmjung avatar kmjung commented on May 21, 2024

There is no tentative date for this work item at this point, although it's a high-priority item for us. I'll update this bug when we have something more to share here.

from spark-bigquery-connector.

sreenivasan-fathom avatar sreenivasan-fathom commented on May 21, 2024

Hi @kmjung Any updates on this, we want to query only a smaller subset (50 rows), but we are Unable to, because of this bug. Is there a work-around for this?

from spark-bigquery-connector.

kmjung avatar kmjung commented on May 21, 2024

As a short-term workaround, you can set a destination table on the original query and use the storage API to read from that table.

from spark-bigquery-connector.

sreenivasan-fathom avatar sreenivasan-fathom commented on May 21, 2024

@kmjung
We are currently setting a destination table

            bq = BQQuerier.instance()
            query_job = bq.get_bq_query_job(query=self._query)
            query_job.result() # execute query
            table = (f'{query_job.destination.dataset_id}.'
                     f'{query_job.destination.table_id}')

Can you please elaborate more on using storage API to read from the table? We are currently reading using Spark connector. Could you please give some references/ example of using Storage API to read from destination table?

        df = (spark
              .read
              .format('bigquery')
              .option('table', table)
      df = df.select(*self._columns).where(self._where)

from spark-bigquery-connector.

chile12 avatar chile12 commented on May 21, 2024

The downside of this workaround is, of course, that you have to pay for storing these tables while the temp tables are without charge. So this issue is still of importance. To avoid creating tables for small result sets i added this function where the query is performed by Spark:

def selectFromBigQuery(query: String): DataFrame = {
    val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
    val allTables = logicalPlan.collectLeaves()
      .collect { case x: UnresolvedRelation => x }
      .map(_.tableIdentifier)
      .map(x => {
        val dataset = x.database.getOrElse(throw new IllegalArgumentException("A table id without a dataset id was encountered: " + x.table))
        TableId.of(parentProject, dataset, x.table)
      })
      .map { table =>
        val viewName = getSqlTableName(table).replace('.', '_') + "_view"
        readFromBigQueryPartition(table).createOrReplaceTempView(viewName) // we register the table as a temp view name
        table -> viewName
      }.toMap

    val transformed = logicalPlan.transformDown {
      case tab: UnresolvedRelation => // we replace the table name with the tem view name
        val tid = TableId.of(
          tab.tableIdentifier.database.getOrElse(throw new IllegalArgumentException("No dataset provided in addition to table name.")),
          tab.tableIdentifier.table)
        UnresolvedRelation(TableIdentifier(allTables.getOrElse(tid, throw new IllegalArgumentException), None))
    }
    val analyzed = spark.sessionState.analyzer.executeAndCheck(transformed)             // we have to resolve the plan first
    new org.apache.spark.sql.Dataset[Row](spark, analyzed, RowEncoder(analyzed.schema)) // now we can create a dataframe
  }

Which seems to work just fine.

from spark-bigquery-connector.

yan-hic avatar yan-hic commented on May 21, 2024

@kmjung any update on the API fix ?
Not sure this is the right place to follow up on backend issues as many other libraries are impacted.

from spark-bigquery-connector.

kmjung avatar kmjung commented on May 21, 2024

We're targeting early Q2 2020 for a fix here.

from spark-bigquery-connector.

sharadbhadouria avatar sharadbhadouria commented on May 21, 2024

Hi @kmjung , is this still on your radar?

from spark-bigquery-connector.

emkornfield avatar emkornfield commented on May 21, 2024

The fix has been rolled out globally now.

from spark-bigquery-connector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.