GithubHelp home page GithubHelp logo

openmetadata-spark-agent's Issues

Support iceberg catalog of type Glue

The agent does not support glue-backed iceberg catalogs, failing with:

24/04/23 12:17:37 ERROR PlanUtils3: Catalog glue is unsupported
io.openlineage.spark3.agent.lifecycle.plan.catalog.UnsupportedCatalogException: glue
	at io.openlineage.spark3.agent.lifecycle.plan.catalog.IcebergHandler.getDatasetIdentifier(IcebergHandler.java:83)
	at io.openlineage.spark3.agent.lifecycle.plan.catalog.CatalogUtils3.lambda$getDatasetIdentifier$2(CatalogUtils3.java:61)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)

where glue is a catalog of type glue.

This is supported in openlineage-spark starting from 1.8.0

Data Lineage Not Added for New Table Creation in Spark

Description:
When creating a new Impala table using Spark with existing Impala tables, the data lineage is not being added. It seems that the toEntity in the code doesn’t retrieve information for the new table.

Expected Behavior:
in general, if it's a new table, I thought it would be appropriate to add a table to the data service (impala here) inside open metadata and data lineage should be automatically added for the new Impala table created in Spark, ensuring complete metadata lineage.

Actual Behavior:
No data lineage is added for the new table, resulting in incomplete metadata lineage.

Slack Link

Additional Information:

  • Environment: YARN
  • Openmetadata Configs: 1.3x
  • OpenMetadata-spark-agent : 1.0.0-beta
  • Spark Version: 3.2.3
  • Impala Version: 4.1.1

infinite recursion error。

df1 = spark.sql("select ....");
df.createOrRepalceTempView("view1");
df2 = spark.sql("select c1 from view1");
This program will report an error.
infinite recursion error。
org.apache.spark.sql.catalyst.expressions.AttributeReference->org.apache.spark.sql.catalyst.expressions.AttributeReference["canonicalized"]

Data lineage between different database services is not showing

We have openmetadata setup on AKS clsuter using helm chart. we have connected databricks and azure sql database services. when we create table on databricks using a table from azure sql, the data lineage should be azure_sql_table --> databricks_table, but this lineage is not coming, we can see lineage between tables on databricks, like databricls_table1 --> databricks_table2. we tried creating tables with openmetadata-spark-agent as well as openmetadata-spark-agent-1.0-beta, nothing is giving expected result.

Jar Files

Kindly assist in providing the jar files for this project

Improve column level lineage

spark.sql("insert into table1 (col1) select concat(col2, col3) from table2 limit 100").show()

this query only generates column level lineage between col1 and col2, we are missing col3 in this lineage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.