GithubHelp home page GithubHelp logo

Spark 3.4 Support about abris HOT 13 OPEN

rbtrtr avatar rbtrtr commented on June 24, 2024
Spark 3.4 Support

from abris.

Comments (13)

luisvicenteatprima avatar luisvicenteatprima commented on June 24, 2024 1

I have tested it on Databricks with Spark 3.4.1 and it works.

from abris.

hafizmujadidKhalid avatar hafizmujadidKhalid commented on June 24, 2024 1

I tested it with spark 3.5 as well and it works fine.

from abris.

kevinwallimann avatar kevinwallimann commented on June 24, 2024 1

Hi @jelmew
Thanks for reporting the issue. Could you please double-check if you are really using v3.5.0, or maybe the latest master version? I'm looking at https://github.com/apache/spark/blob/master/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L59 and indeed I can see that a new constructor argument useStableIdForUnionType: Boolean was added. However, this is on the master branch of Spark. I don't see that argument in v3.5.0 https://github.com/apache/spark/blob/v3.5.0/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L53-L56, and @hafizmujadidKhalid reported no issues with Spark 3.5.0.

@cerveada See https://issues.apache.org/jira/browse/SPARK-43380. We need to take care of this for the next release of Spark

from abris.

cerveada avatar cerveada commented on June 24, 2024

Unless Spark changed some of the APIs we use between versions, Abris will work fine. Thanks for reporting your tests, it's valuable information for us and for other Abris users.

from abris.

jelmew avatar jelmew commented on June 24, 2024

There is some incompatibility on spark 3.5. It seems some constructors have changed, giving Abris problems.


Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.avro.AvroDeserializer.<init>(org.apache.avro.Schema, org.apache.spark.sql.types.DataType, java.lang.String)
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getConstructor(Class.java:1825)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer$$anonfun$1.applyOrElse(AbrisAvroDeserializer.scala:38)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer$$anonfun$1.applyOrElse(AbrisAvroDeserializer.scala:37)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at scala.util.Failure.recover(Try.scala:234)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer.<init>(AbrisAvroDeserializer.scala:37)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.deserializer$lzycompute(AvroDataToCatalyst.scala:71)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.deserializer(AvroDataToCatalyst.scala:71)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.nullSafeEval(AvroDataToCatalyst.scala:87)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
	at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1139)
	at com.databricks.photon.NativeRowBatchIterator.hasNext(NativeRowBatchIterator.java:44)
	at 0xa8e0d62 <photon>.HasNext(external/workspace_spark_3_5/photon/jni-wrappers/jni-row-batch-iterator.cc:50)
	at 0x5e136fb <photon>.OpenImpl(external/workspace_spark_3_5/photon/exec-nodes/file-writer-node.cc:166)
	at com.databricks.photon.JniApiImpl.open(Native Method)
	at com.databricks.photon.JniApi.open(JniApi.scala)
	at com.databricks.photon.JniExecNode.open(JniExecNode.java:71)
	at com.databricks.photon.PhotonWriteResultHandler.$anonfun$getResult$3(PhotonWriteStageExec.scala:121)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.photon.PhotonResultHandler.timeit(PhotonResultHandler.scala:30)
	at com.databricks.photon.PhotonResultHandler.timeit$(PhotonResultHandler.scala:28)
	at com.databricks.photon.PhotonWriteResultHandler.timeit(PhotonWriteStageExec.scala:67)
	at com.databricks.photon.PhotonWriteResultHandler.$anonfun$getResult$2(PhotonWriteStageExec.scala:121)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1542)
	at com.databricks.photon.PhotonWriteResultHandler.getResult(PhotonWriteStageExec.scala:118)
	... 38 more

from abris.

jelmew avatar jelmew commented on June 24, 2024

Hi @kevinwallimann

This is using databricks runtime 14.2 https://docs.databricks.com/en/release-notes/runtime/14.2.html. They might have backported something from master?

Kind regards,
Jelmer

Edit,

Okay, yup. They included it.

image

from abris.

sauletawil avatar sauletawil commented on June 24, 2024

+1

from abris.

kevinwallimann avatar kevinwallimann commented on June 24, 2024

Hi @jelmew
Unfortunately, I don't have access to Spark on Databricks Runtime, but I will reproduce and fix the error by building Spark locally using the latest commit on branch-3.5. Since I've identified a few issues on running the tests for Abris with Spark 3.5.0, I need to fix those first. See #350

from abris.

lucafurrer avatar lucafurrer commented on June 24, 2024

+1

from abris.

kevinwallimann avatar kevinwallimann commented on June 24, 2024

Hi @jelmew
I've released v6.4.0 with the bugfix. https://github.com/AbsaOSS/ABRiS/releases/tag/v6.4.0. Please let me know if it fixes your issue.

from abris.

jelmew avatar jelmew commented on June 24, 2024

We are testing.

from abris.

roicostas avatar roicostas commented on June 24, 2024

Same problem here. It seems that AvroDeserializer contructor changed a lot recently, adding more params to the constructor. This way:
3.5.0 has 3 parameters
3.5.1 has 4 parameters
current master version has 5 parameters

We are also using Databricks last LTS version 14.3 which in theory uses 3.5.0 but it should have some customizations and it seems AvroDeserializer is already being provided by one of their jars and it has the new version with 5 parameters.
The jar which provides AvroDeserializer is: file:/databricks/jars/----ws_3_5--connector--avro--avro-hive-2.3__hadoop-3.2_2.12_shaded---606136534--avro-unshaded-hive-2.3__hadoop-3.2_2.12_deploy.jar!/org/apache/spark/sql/avro/AvroDeserializer.class

We printed out available contructors and we checked that the available constructor is the one with 5 parameters.

With databricks 13.3 it works fine

from abris.

jelmew avatar jelmew commented on June 24, 2024

I agree on that. Databricks 13.3 works fine, but 14.3 lts had the above problem once again

from abris.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.