Comments (9)
@katerina-kogan Can you please share the exact steps and a code sample to reproduce the issue?
from spark-bigquery-connector.
@vishalkarve15 , please, kindly check the steps:
- dbt cloud python model in
fct_model.py
:
import pyspark.sql.functions as F
from pyspark.sql.window import Window
TWO_YEARS_DAYS = 365*2
import pyspark.sql.functions as F
from pyspark.sql.window import Window
TWO_YEARS_DAYS = 365*2
def model(dbt, session):
session.conf.set("viewsEnabled","true")
table1_df = dbt.source("dataset1", "table1")
table2_df = dbt.source("dataset1", "table2")
table3_df = dbt.source("dataset2", "table3")
table4_df = dbt.source("dataset2", "table4")
table5_df = dbt.ref("ref_table")
return table1_df \
.join(table2_df, table1_df.o_id==table2_df.oo_id,"inner") \
.join(table4_df, table4_df.o_id==table2_df.id,"inner") \
.join(table3_df, table4_df.g_id==table3_df.id,"inner") \
.join(table5_df, table4_df.status==table5_df.status_code,"left") \
.filter(table1_df.created >= F.date_sub(F.current_date(), TWO_YEARS_DAYS)) \
.filter(table4_df.created >= F.date_sub(F.current_date(), TWO_YEARS_DAYS)) \
.withColumn("attempt_number", F.row_number().over(Window.partitionBy(table1_df.id).orderBy(table2_df.created_at))) \
.withColumn("payment_type", F.coalesce(F.get_json_object(table4_df["extra"], "$.result.data.payment_type"), table3_df["payment_type"])) \
.select(
table2_df["customer_id"],
F.to_date(table1_df["created"]).alias("created"),
F.col("attempt_number"),
F.col("status_desc").alias("status"),
F.col("payment_type")
).repartition(10)
- Description of the model in
model.yml
:
version: 2
models:
- name: fct_model
description: fct_model
config:
submission_method: cluster
columns:
- name: customer_id
description: customer_id
- name: created
description: created
- name: attempt_number
description: attempt_number
- name: status_desc
description: status_desc
- name: payment_type
description: payment_type
- The model runs on Dataproc Serverless via "dbt built" command in dbt cloud
- The first run is successful, it creates a
fct_model
table in bigquery and adds data - Any subsequent run results in error: "Destination schema is not compatible"
P.S. as stated in the issue description, if columns
section is not there in model.yml
file above, the model runs fine. But once the section is added, model starts subsequently failing. Cant say it is 100% relevance, maybe unfortunate coincidence with something wrong going behind the scene
Thank you!
from spark-bigquery-connector.
updated issue details: according to the doc the Dataproc serverless uses built in connector.
I also tested on a cluster and specified connector version directly as doc suggested:
SPARK_BQ_CONNECTOR_VERSION=0.34.0
and it works fine
from spark-bigquery-connector.
It works fine, you mean, this issue does not occur in the 0.34.0 connector? Even including the columns
section ?
from spark-bigquery-connector.
I believe the issue is with built in connector, but the doc doesnt say which exact version it is
from spark-bigquery-connector.
If you're using Serverless 2.1, it comes with built in 0.28.1. See https://cloud.google.com/dataproc-serverless/docs/concepts/versions/spark-runtime-versions
from spark-bigquery-connector.
Hi @vishalkarve15 , we have developed on Dataproc Cluster, Spark 3.3.2
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.1
from spark-bigquery-connector.
That one comes with 0.27.1. We have updated the documentation to reflect this yesterday.
I'm closing this since it has been fixed in the latest 0.34.0.
Feel free to reopen if you still face issues.
from spark-bigquery-connector.
Sorry folks, have to open another issue: #1158
Same error, but now happening in 0.35 version
from spark-bigquery-connector.
Related Issues (20)
- Support use of bigquery-emulator for integration testing
- Flakey behavior when writing to BigQuery HOT 4
- Clarification on Billing and Improved README.md Explanation HOT 1
- AWS Glue - Indirect write mode errors HOT 3
- Schema mismatch error needs to be more verbose
- INVALID_ARGUMENT When attempting to show df from BigQuery HOT 3
- Load failure caused by comment at top of query string (llegalArgumentException: Invalid Table ID) HOT 1
- BigQueryConnectorException: Error creating destination table HOT 5
- Unable to overwrite partition HOT 3
- Map column of a complex type in values causes error "Data type not expected: struct<...>" HOT 1
- Table expiration with write() operation HOT 1
- Impersonate Service Account HOT 1
- Map type with Complex Value not supported any more HOT 1
- Direct writemethod not working in Databricks for Spark 3.5 HOT 5
- Idempotent write support in BQ
- JARs marked 'latest' not being updated HOT 1
- Automatically read JSON types
- Storage Read API logging HOT 5
- BigQuery Pushdown filtering on Spark 3.4.2 HOT 10
- BIGNUMERIC Precision Handling: Inaccurate Decimal Values HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-bigquery-connector.