GithubHelp home page GithubHelp logo

Comments (9)

vishalkarve15 avatar vishalkarve15 commented on June 13, 2024

@katerina-kogan Can you please share the exact steps and a code sample to reproduce the issue?

from spark-bigquery-connector.

katerina-kogan avatar katerina-kogan commented on June 13, 2024

@vishalkarve15 , please, kindly check the steps:

  1. dbt cloud python model in fct_model.py:
import pyspark.sql.functions as F
from pyspark.sql.window import Window

TWO_YEARS_DAYS = 365*2

import pyspark.sql.functions as F
from pyspark.sql.window import Window

TWO_YEARS_DAYS = 365*2

def model(dbt, session):
    session.conf.set("viewsEnabled","true")

    table1_df = dbt.source("dataset1", "table1")
    table2_df = dbt.source("dataset1", "table2")
    table3_df = dbt.source("dataset2", "table3")
    table4_df = dbt.source("dataset2", "table4")
    table5_df = dbt.ref("ref_table")


   return table1_df \
      .join(table2_df, table1_df.o_id==table2_df.oo_id,"inner") \
      .join(table4_df, table4_df.o_id==table2_df.id,"inner") \
      .join(table3_df, table4_df.g_id==table3_df.id,"inner") \
      .join(table5_df, table4_df.status==table5_df.status_code,"left") \
      .filter(table1_df.created >= F.date_sub(F.current_date(), TWO_YEARS_DAYS)) \
      .filter(table4_df.created >= F.date_sub(F.current_date(), TWO_YEARS_DAYS)) \
      .withColumn("attempt_number", F.row_number().over(Window.partitionBy(table1_df.id).orderBy(table2_df.created_at))) \
      .withColumn("payment_type", F.coalesce(F.get_json_object(table4_df["extra"], "$.result.data.payment_type"), table3_df["payment_type"])) \
      .select(
          table2_df["customer_id"], 
          F.to_date(table1_df["created"]).alias("created"),
          F.col("attempt_number"),
          F.col("status_desc").alias("status"),
          F.col("payment_type")
          ).repartition(10)
  1. Description of the model in model.yml :
version: 2

models:
- name: fct_model
    description: fct_model
    config:
        submission_method: cluster
    columns:
      - name: customer_id
        description: customer_id
      - name: created
        description: created
      - name: attempt_number
        description: attempt_number
      - name: status_desc
        description: status_desc
      - name: payment_type
        description: payment_type
  1. The model runs on Dataproc Serverless via "dbt built" command in dbt cloud
  2. The first run is successful, it creates a fct_model table in bigquery and adds data
  3. Any subsequent run results in error: "Destination schema is not compatible"

P.S. as stated in the issue description, if columns section is not there in model.yml file above, the model runs fine. But once the section is added, model starts subsequently failing. Cant say it is 100% relevance, maybe unfortunate coincidence with something wrong going behind the scene

Thank you!

from spark-bigquery-connector.

katerina-kogan avatar katerina-kogan commented on June 13, 2024

updated issue details: according to the doc the Dataproc serverless uses built in connector.
I also tested on a cluster and specified connector version directly as doc suggested:
SPARK_BQ_CONNECTOR_VERSION=0.34.0
and it works fine

from spark-bigquery-connector.

vishalkarve15 avatar vishalkarve15 commented on June 13, 2024

It works fine, you mean, this issue does not occur in the 0.34.0 connector? Even including the columns section ?

from spark-bigquery-connector.

katerina-kogan avatar katerina-kogan commented on June 13, 2024

I believe the issue is with built in connector, but the doc doesnt say which exact version it is

from spark-bigquery-connector.

vishalkarve15 avatar vishalkarve15 commented on June 13, 2024

If you're using Serverless 2.1, it comes with built in 0.28.1. See https://cloud.google.com/dataproc-serverless/docs/concepts/versions/spark-runtime-versions

from spark-bigquery-connector.

katerina-kogan avatar katerina-kogan commented on June 13, 2024

Hi @vishalkarve15 , we have developed on Dataproc Cluster, Spark 3.3.2
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.1

from spark-bigquery-connector.

vishalkarve15 avatar vishalkarve15 commented on June 13, 2024

That one comes with 0.27.1. We have updated the documentation to reflect this yesterday.
I'm closing this since it has been fixed in the latest 0.34.0.
Feel free to reopen if you still face issues.

from spark-bigquery-connector.

katerina-kogan avatar katerina-kogan commented on June 13, 2024

Sorry folks, have to open another issue: #1158
Same error, but now happening in 0.35 version

from spark-bigquery-connector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.