GithubHelp home page GithubHelp logo

Comments (7)

cerveada avatar cerveada commented on September 25, 2024

Hello,

Could you print d2.schema() and post it here?

What version of org.apache.spark:spark-avro library is on your classpath?

from abris.

kevinwallimann avatar kevinwallimann commented on September 25, 2024

Hi @joserivera1990 Unfortunately, your avro schema is invalid. Per the avro spec, "Scale must be [...] less than or equal to the precision.", see https://avro.apache.org/docs/current/spec.html#Decimal, which is not the case (127 > 64).
What happens is that the logical type (Decimal) is not validated when the schema is parsed and instead falls back to null. So, Spark doesn't see the avro logical type decimal and interprets it as a BinaryType instead of a DecimalType.

To solve the issue, you could

from abris.

joserivera1990 avatar joserivera1990 commented on September 25, 2024

Hello,

Could you print d2.schema() and post it here?

What version of org.apache.spark:spark-avro library is on your classpath?

Hi @cerveada,

I added the print to d2.schema()
StructType(StructField(STFAMPRO,StringType,true), StructField(CHFAMPRO,StringType,true), StructField(TEST_NUMBER,BinaryType,true), StructField(TEST_NUMBER_DECIMAL,BinaryType,true), StructField(table,StringType,true), StructField(SCN_CMD,StringType,true), StructField(OP_TYPE_CMD,StringType,true), StructField(op_ts,StringType,true), StructField(current_ts,StringType,true), StructField(row_id,StringType,true), StructField(username,StringType,true))

Checking the external libraries I have -> org.apache.spark:spark-avro_2.12:2.4.8

Regards.

from abris.

joserivera1990 avatar joserivera1990 commented on September 25, 2024

Hi @joserivera1990 Unfortunately, your avro schema is invalid. Per the avro spec, "Scale must be [...] less than or equal to the precision.", see https://avro.apache.org/docs/current/spec.html#Decimal, which is not the case (127 > 64). What happens is that the logical type (Decimal) is not validated when the schema is parsed and instead falls back to null. So, Spark doesn't see the avro logical type decimal and interprets it as a BinaryType instead of a DecimalType.

To solve the issue, you could

Hi @kevinwallimann,

I got your points, about changing the schema generation process, this schema is generated by Confluent on the connector CDC Oracle
https://docs.confluent.io/kafka-connect-oracle-cdc/current/troubleshooting.html#numeric-data-type-with-no-precision-or-scale-results-in-unreadable-output

I did the test using the function .provideReaderSchema and setting in the schema the values: precicion:38 and scale:10

{\"name\":\"TEST_NUMBER\",\"type\":[\"null\",{\"type\":\"bytes\",\"scale\":10,\"precision\":38,\"connect.version\":1,\"connect.parameters\":{\"scale\":\"10\"},\"connect.name\":\"org.apache.kafka.connect.data.Decimal\",\"logicalType\":\"decimal\"}],\"default\":null}

And throw the next error: Decimal precision 128 exceeds max precision 38

Finally, I think that should have some way to get the number with scale of 127 and precicion of 64, I don't know if as a string in place of decimal. Por example, I'm using the connector com.snowflake.kafka.connector.SnowflakeSinkConnector and in the properties in value.converter io.confluent.connect.avro.AvroConverter In the Snowflake DataBase that row is saved in this way:

{
  "STFAMPRO": "AA",
  "TEST_NUMBER": "5.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  "TEST_NUMBER_DECIMAL": "5.1500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  "current_ts": "1651059748133",
}

The problem is than I don't know as Snowflake implements this connector com.snowflake.kafka.connector.SnowflakeSinkConnector

Thanks for your time!

from abris.

kevinwallimann avatar kevinwallimann commented on September 25, 2024

Hi @joserivera1990 I see, the problem now is that your data has decimal precision 128 which is larger than the maximum that Spark supports (38). In this case, Spark uses the BinaryType as a fallback. You could try to convert the BinaryType to a human-readable format after it's already in a Spark Dataframe. I think this problem should be solved outside of Abris.
Just for the sake of completeness, there is a way to have your own custom logic to convert from Avro to a Spark Dataframe in Abris, see https://github.com/AbsaOSS/ABRiS#custom-data-conversions. However, it's quite involved and I wouldn't recommend you that approach.

from abris.

joserivera1990 avatar joserivera1990 commented on September 25, 2024

Hi @joserivera1990 I see, the problem now is that your data has decimal precision 128 which is larger than the maximum that Spark supports (38). In this case, Spark uses the BinaryType as a fallback. You could try to convert the BinaryType to a human-readable format after it's already in a Spark Dataframe. I think this problem should be solved outside of Abris. Just for the sake of completeness, there is a way to have your own custom logic to convert from Avro to a Spark Dataframe in Abris, see https://github.com/AbsaOSS/ABRiS#custom-data-conversions. However, it's quite involved and I wouldn't recommend you that approach.

Hi @kevinwallimann , I will review your advices. Thanks for your time!

from abris.

joserivera1990 avatar joserivera1990 commented on September 25, 2024

Hi everyone,
This issue was solved from Connector CDC Oracle version 2.0.0 adding the property numeric.mapping with value best_fit_or_decimal.

https://docs.confluent.io/kafka-connect-oracle-cdc/current/configuration-properties.html
Explication Conector Confluent:
"Use best_fit_or_decimal if NUMERIC columns should be cast to Connect’s primitive type based upon the column’s precision and scale. If the precision and scale exceed the bounds for any primitive type, Connect’s DECIMAL logical type will be used instead."

In this way when the column in Oracle is Numeric without precision or scale the schema registry added the field as double. The connector just will use decimalType if the number value is major than double type maximun number.

My new schema registry version is:

{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[{"name":"STFAMPRO","type":["null","string"],"default":null},{"name":"CHFAMPRO","type":["null","string"],"default":null},{"name":"**TEST_NUMBER**","type":["null","**double**"],"default":null},{"name":"**TEST_NUMBER_DECIMAL**","type":["null","**double**"],"default":null},{"name":"table","type":["null","string"],"default":null},{"name":"SCN_CMD","type":["null","string"],"default":null},{"name":"OP_TYPE_CMD","type":["null","string"],"default":null},{"name":"op_ts","type":["null","string"],"default":null},{"name":"current_ts","type":["null","string"],"default":null},{"name":"row_id","type":["null","string"],"default":null},{"name":"username","type":["null","string"],"default":null}]}
I will close this issue. Thanks everyone!

from abris.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.