GithubHelp home page GithubHelp logo

Comments (18)

wajda avatar wajda commented on August 17, 2024

It could be caused by a different event type sent by Spark when in Hive environment.
Can you reproduce it locally having Hive enabled?
Or if you could connect with a debugger to a Spark driver and put a breakpoint on this line: https://github.com/AbsaOSS/spline-spark-agent/blob/release/0.5.2/core/src/main/scala/za/co/absa/spline/harvester/builder/write/WriteCommandExtractor.scala#L42

def asWriteCommand(operation: LogicalPlan): Option[WriteCommand] = {
    val maybeWriteCommand = condOpt(operation) {
    ...

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

Here Spline tries to figure out what is the LogicalPlan instance it holds in hands, whether it represents any kind of write command or not. If it's not recognized as a known write command the lineage is ignored, because Spline only captures lineage of persistent operations where data is actually written to some persistent location. This is vital.

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

I tried with hive enabled locally (local spark master) and it works just fine.

No way to connect debugger to live spark job on cluster - our setup does not allow it.

Would changing spline mode to REQUIRED generate any helpful exception error message to intercept cause?

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

No, I don't think it's related. I bet it's because of some specific logical plan node that only appears on that environment, and is ignored as non-write node.

Could you try to add some log statements into that method that I pointed above? And see what happens. Hopefully it'll shed some lights. Specifically focus on that LogicalPlan instance that is passed into the asWriteCommand method. Print its full class name and try to find out what library it belongs.

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

@wajda sure i'll try to get some info but I've already seen this PR: 0e0c549

do you have snapshot maven repo available so i could get it directly from snapshot repo?

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

No we don't. But you can build it yourself. It's as simple as ./build-all.sh. All you need is Java 1.8 and Maven

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

Thx, I managed to build it locally and run job. It's "strange" because I can clearly see that spline reports it harvested lineage from command which was supposed to be harvested

Successfully harvested lineage from class org.apache.spark.sql.hive.execution.InsertIntoHiveTable

still no lineage in spline UI

here's full log from spline:

2020-05-27 11:33:10.435 | class org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand was not recognized as a write-command. Skipping.
2020-05-27 11:33:10.434 | Harvesting lineage from class org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand
2020-05-27 11:33:07.857 | Successfully harvested lineage from class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
2020-05-27 11:33:07.843 | Harvesting lineage from class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
2020-05-27 11:32:59.284 | Harvesting lineage from class org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand
2020-05-27 11:32:59.284 | class org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:58.028 | Successfully harvested lineage from class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
2020-05-27 11:32:58.000 | Harvesting lineage from class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
2020-05-27 11:32:51.158 | Successfully harvested lineage from class org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
2020-05-27 11:32:51.133 | Harvesting lineage from class org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
2020-05-27 11:32:45.584 | Successfully harvested lineage from class org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
2020-05-27 11:32:45.300 | Harvesting lineage from class org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
2020-05-27 11:32:42.302 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.Aggregate
2020-05-27 11:32:42.302 | class org.apache.spark.sql.catalyst.plans.logical.Aggregate was not recognized as a write-command. Skipping.
2020-05-27 11:32:41.764 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.Aggregate
2020-05-27 11:32:41.764 | class org.apache.spark.sql.catalyst.plans.logical.Aggregate was not recognized as a write-command. Skipping.
2020-05-27 11:32:40.637 | class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit was not recognized as a write-command. Skipping.
2020-05-27 11:32:40.636 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit
2020-05-27 11:32:40.615 | Harvesting lineage from class org.apache.spark.sql.execution.command.DescribeTableCommand
2020-05-27 11:32:40.615 | class org.apache.spark.sql.execution.command.DescribeTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:40.411 | class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit was not recognized as a write-command. Skipping.
2020-05-27 11:32:40.411 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit
2020-05-27 11:32:40.371 | class org.apache.spark.sql.execution.command.DescribeTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:40.371 | Harvesting lineage from class org.apache.spark.sql.execution.command.DescribeTableCommand
2020-05-27 11:32:39.929 | class org.apache.spark.sql.execution.datasources.RefreshTable was not recognized as a write-command. Skipping.
2020-05-27 11:32:39.929 | Harvesting lineage from class org.apache.spark.sql.execution.datasources.RefreshTable
2020-05-27 11:32:39.929 | class org.apache.spark.sql.execution.datasources.RefreshTable was not recognized as a write-command. Skipping.
2020-05-27 11:32:39.929 | Harvesting lineage from class org.apache.spark.sql.execution.datasources.RefreshTable
2020-05-27 11:32:34.686 | class org.apache.spark.sql.catalyst.plans.logical.LocalRelation was not recognized as a write-command. Skipping.
2020-05-27 11:32:34.686 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.LocalRelation
2020-05-27 11:32:34.675 | Harvesting lineage from class org.apache.spark.sql.execution.command.ShowCreateTableCommand
2020-05-27 11:32:34.675 | class org.apache.spark.sql.execution.command.ShowCreateTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:34.528 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.LocalRelation
2020-05-27 11:32:34.528 | class org.apache.spark.sql.catalyst.plans.logical.LocalRelation was not recognized as a write-command. Skipping.
2020-05-27 11:32:34.516 | class org.apache.spark.sql.execution.command.ShowCreateTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:34.515 | Harvesting lineage from class org.apache.spark.sql.execution.command.ShowCreateTableCommand
2020-05-27 11:32:33.182 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.LocalRelation
2020-05-27 11:32:33.182 | class org.apache.spark.sql.catalyst.plans.logical.LocalRelation was not recognized as a write-command. Skipping.
2020-05-27 11:32:33.172 | class org.apache.spark.sql.execution.command.ShowCreateTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:33.171 | Harvesting lineage from class org.apache.spark.sql.execution.command.ShowCreateTableCommand
2020-05-27 11:32:33.091 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.LocalRelation
2020-05-27 11:32:33.091 | class org.apache.spark.sql.catalyst.plans.logical.LocalRelation was not recognized as a write-command. Skipping.
2020-05-27 11:32:33.039 | class org.apache.spark.sql.execution.command.ShowCreateTableCommand was not recognized as a write-command. Skipping.
2020-05-27 11:32:33.039 | Harvesting lineage from class org.apache.spark.sql.execution.command.ShowCreateTableCommand
2020-05-27 11:32:32.037 | class org.apache.spark.sql.catalyst.plans.logical.Aggregate was not recognized as a write-command. Skipping.
2020-05-27 11:32:31.758 | Harvesting lineage from class org.apache.spark.sql.catalyst.plans.logical.Aggregate
2020-05-27 11:32:31.732 | Spline successfully initialized. Spark Lineage tracking is ENABLED.
2020-05-27 11:32:31.731 | Selected Producer API version: 1
2020-05-27 11:32:31.661 | readTimeout = 30000 milliseconds
2020-05-27 11:32:31.660 | connectionTimeout = 30000 milliseconds
2020-05-27 11:32:31.660 | baseURL = http://my-domain.com:9090/producer
2020-05-27 11:32:31.649 | Instantiating LineageDispatcher for class name: za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher
2020-05-27 11:32:31.648 | Instantiating UserExtraMetadataProvider for class name: za.co.absa.spline.harvester.extra.NoopUserExtraMetaDataProvider
2020-05-27 11:32:31.646 | Instantiating IgnoredWriteDetectionStrategy for class name: za.co.absa.spline.harvester.iwd.DefaultIgnoredWriteDetectionStrategy
2020-05-27 11:32:31.644 | Spline mode: BEST_EFFORT
2020-05-27 11:32:31.643 | Spline version: null
2020-05-27 11:32:31.631 | Initializing Spline agent...

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

Oh i found different log :) it was org.apache.spark.sql.util.ExecutionListenerManager logger

java.lang.RuntimeException: Cannot send lineage data to http://my-spline:9090/producer/execution-plans
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.sendJson(HttpLineageDispatcher.scala:152)
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.send(HttpLineageDispatcher.scala:134)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler.$anonfun$onSuccess$2(QueryExecutionEventHandler.scala:45)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler.$anonfun$onSuccess$2$adapted(QueryExecutionEventHandler.scala:43)
	at scala.Option.foreach(Option.scala:407)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:43)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:37)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1$adapted(SplineQueryExecutionListener.scala:37)
	at scala.Option.foreach(Option.scala:407)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:37)
	at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$onSuccess$2(QueryExecutionListener.scala:127)
	at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$onSuccess$2$adapted(QueryExecutionListener.scala:126)
	at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$withErrorHandling$1(QueryExecutionListener.scala:148)
	at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$withErrorHandling$1$adapted(QueryExecutionListener.scala:146)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
	at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
	at org.apache.spark.sql.util.ExecutionListenerManager.withErrorHandling(QueryExecutionListener.scala:146)
	at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$onSuccess$1(QueryExecutionListener.scala:126)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:159)
	at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:126)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:678)
	at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:340)
	at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:320)
	(***)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: scalaj.http.HttpStatusException: 400 Error: HTTP/1.1 400
	at scalaj.http.HttpResponse.throwIf(Http.scala:156)
	at scalaj.http.HttpResponse.throwError(Http.scala:168)
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.sendJson(HttpLineageDispatcher.scala:147)
	... 42 more

I'm using spline-rest-server 0.5.1 with 0.6.0-SNAPSHOT agent now

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

Unfortunately nothing helpful from docker container catalina logs

10.73.161.130 - - [27/May/2020:09:32:22 +0000] "HEAD /producer/status HTTP/1.1" 200 -
10.73.66.119 - - [27/May/2020:09:32:31 +0000] "HEAD /producer/status HTTP/1.1" 200 -
10.73.66.119 - - [27/May/2020:09:32:49 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.66.119 - - [27/May/2020:09:32:51 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.66.119 - - [27/May/2020:09:32:58 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.66.119 - - [27/May/2020:09:33:07 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
172.29.1.84 - - [27/May/2020:09:34:01 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590572040939 HTTP/1.1" 200 2651
10.73.161.130 - - [27/May/2020:09:34:09 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.161.130 - - [27/May/2020:09:34:09 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.161.130 - - [27/May/2020:09:34:29 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
10.73.161.130 - - [27/May/2020:09:34:34 +0000] "POST /producer/execution-plans HTTP/1.1" 400 284
172.29.1.84 - - [27/May/2020:09:34:55 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590572095845 HTTP/1.1" 200 2651
172.29.1.84 - - [27/May/2020:09:35:49 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590572149286 HTTP/1.1" 200 2651
172.29.1.84 - - [27/May/2020:09:38:14 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590572294214 HTTP/1.1" 200 2651
172.29.1.84 - - [27/May/2020:10:03:47 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590573827527 HTTP/1.1" 200 2651
172.29.1.84 - - [27/May/2020:10:04:01 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590573841481 HTTP/1.1" 200 2651
172.29.1.84 - - [27/May/2020:10:14:23 +0000] "GET /consumer/execution-events?sortOrder=desc&sortField=timestamp&pageNum=1&asAtTime=1590574463625 HTTP/1.1" 200 2651

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

Hm, That's interesting! The logs were helpful though :)

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

@wajda does spline-rest-server return anything in http response? We could add spline logs to log exact cause of lineage data sending failure ?

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

So, from what I see is the error is most likely caused by incorrectly deserialized data (HTTP 400). Also the payload size (if it's a payload size displayed after 400) is way to small to me. So I'd check for data corruption.

Regarding the logs and response, normally if the error happens on the server it is both logged (with a unique ID) and also returned in a response body to the client.
Try to check for server logs (catalina.out). And also try to sniff the request if you can, to see what is actually sent to and returned from the server.

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

Another thought - since version 0.5 the agent negotiates data encoding with the server, and gzips the data if the server supports compressed requests. In this case the agent adds "Content-Encoding" header to the request. Unfortunately request encoding isn't quite an HTTP standard. So what could hypothetically happen is that if the request is passed through a proxy that is very strict in term of HTTP spec and removes the non-compliant headers from the request, one would not be correctly decoded on the server.

from spline-spark-agent.

wajda avatar wajda commented on August 17, 2024

To exclude this case could you try Spline 0.4 on the server. If it works, then the problem would in the broken encoding.

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

There's no http proxy in between

I'll check tommorow with some additional logs

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

@wajda I added some logs to spark agent and managed to get cuase

{"error":"JSON parse error: ; nested exception is com.twitter.finatra.json.internal.caseclass.exceptions.CaseClassMappingException: nErrors:ttcom.twitter.finatra.json.internal.caseclass.exceptions.CaseClassValidationException: agentInfo.version: field is requirednn"}

So probably it's directly related to #75

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

@wajda I wanted to test your branch but it looks that it already contains version fix so i cannot really test if logging is fine, but my problem no longer occurs :)

2020-05-29 12:57:56.391 | Spline version: 0.5.3-SNAPSHOT

from spline-spark-agent.

DaimonPl avatar DaimonPl commented on August 17, 2024

@wajda looks like logging works OK, here are details of other problem found #86

from spline-spark-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.