Monitoring Azure Databricks in an Azure Log Analytics Workspace
❗
This branch of the library supports Azure Databricks Runtimes 10.x (Spark 3.2.x) and earlier (see Supported configurations). Databricks has contributed an updated version to support Azure Databricks Runtimes 11.0 (Spark 3.3.x) and above on the l4jv2 branch at: https://github.com/mspnp/spark-monitoring/tree/l4jv2. Be sure to use the correct branch and version for your Databricks Runtime.
⚠️
This library and GitHub repository are in maintenance mode. There are no plans for further releases, and issue support will be best-effort only. For any additional questions regarding this library or the roadmap for monitoring and logging of your Azure Databricks environments, please contact [email protected].
This repository extends the core monitoring functionality of Azure Databricks to send streaming query event information to Azure Monitor. For more information about using this library to monitor Azure Databricks, see Monitoring Azure Databricks
The project has the following directory structure:
The spark-listeners-loganalytics and spark-listeners directories contain the code for building the two JAR files that are deployed to the Databricks cluster. The spark-listeners directory includes a scripts directory that contains a cluster node initialization script to copy the JAR files from a staging directory in the Azure Databricks file system to execution nodes. The pom.xml file is the main Maven project object model build file for the entire project.
The spark-sample-job directory is a sample Spark application demonstrating how to implement a Spark application metric counter.
The perftools directory contains details on how to use Azure Monitor with Grafana to monitor Spark performance.
Prerequisites
Before you begin, ensure you have the following prerequisites in place:
Clone or download this GitHub repository.
An active Azure Databricks workspace. For instructions on how to deploy an Azure Databricks workspace, see get started with Azure Databricks.
This library currently has a size limit per event of 25MB, based on the Log Analytics limit of 30MB per API Call with additional overhead for formatting. The default behavior when hitting this limit is to throw an exception. This can be changed by modifying the value of EXCEPTION_ON_FAILED_SEND in GenericSendBuffer.java to false.
Note: You will see an error like: java.lang.RuntimeException: Failed to schedule batch because first message size nnn exceeds batch size limit 26214400 (bytes). in the Spark logs if your workload is generating logging messages of greater than 25MB, and your workload may not proceed. You can query Log Analytics for this error condition with:
SparkLoggingEvent_CL
| where TimeGenerated > ago(24h)
| where Message contains"java.lang.RuntimeException: Failed to schedule batch because first message size"
Build the Azure Databricks monitoring library
You can build the library using either Docker or Maven. All commands are intended to be run from the base directory of the repository.
The jar files that will be produced are:
spark-listeners_<Spark Version>_<Scala Version>-<Version>.jar - This is the generic implementation of the Spark Listener framework that provides capability for collecting data from the running cluster for forwarding to another logging system.
spark-listeners-loganalytics_<Spark Version>_<Scala Version>-<Version>.jar - This is the specific implementation that extends spark-listeners. This project provides the implementation for connecting to Log Analytics and formatting and passing data via the Log Analytics API.
Option 1: Docker
Linux:
# To build all profiles:
docker run -it --rm -v `pwd`:/spark-monitoring -v "$HOME/.m2":/root/.m2 mcr.microsoft.com/java/maven:8-zulu-debian10 /spark-monitoring/build.sh
# To build a single profile (example for latest long term support version 10.4 LTS):
docker run -it --rm -v `pwd`:/spark-monitoring -v "$HOME/.m2":/root/.m2 -w /spark-monitoring/src mcr.microsoft.com/java/maven:8-zulu-debian10 mvn install -P "scala-2.12_spark-3.2.1"
Windows:
# To build all profiles:
docker run -it --rm -v %cd%:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 mcr.microsoft.com/java/maven:8-zulu-debian10 /spark-monitoring/build.sh
# To build a single profile (example for latest long term support version 10.4 LTS):
docker run -it --rm -v %cd%:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 -w /spark-monitoring/src mcr.microsoft.com/java/maven:8-zulu-debian10 mvn install -P "scala-2.12_spark-3.2.1"
Option 2: Maven
Import the Maven project project object model file, pom.xml, located in the /src folder into your project. This will import two projects:
spark-listeners
spark-listeners-loganalytics
Activate a single Maven profile that corresponds to the versions of the Scala/Spark combination that is being used. By default, the Scala 2.12 and Spark 3.0.1 profile is active.
Execute the Maven package phase in your Java IDE to build the JAR files for each of the these projects:
If you do not want to add your Log Analytics workspace id and key into the init script in plaintext, you can also create an Azure Key Vault backed secret scope and reference those secrets through your cluster's environment variables.
In order to add x-ms-AzureResourceIdheader as part of the http request, modify the following environment
variables on /src/spark-listeners/scripts/spark-monitoring.sh.
For instance:
Now the _ResourceId /subscriptions/11111111-5c17-4032-ae54-fc33d56047c2/resourceGroups/myAzResourceGroup/providers/Microsoft.Databricks/workspaces/myDatabricks will be part of the header.
(Note: If at least one of them is not set the header won't be included.)
Use the Azure Databricks CLI to copy src/spark-listeners/scripts/spark-monitoring.sh to the directory created in step 3:
Navigate to your Azure Databricks workspace in the Azure Portal.
Under "Compute", click "Create Cluster".
Choose a name for your cluster and enter it in "Cluster name" text box.
In the "Databricks Runtime Version" dropdown, select Runtime: 10.4 LTS (Scala 2.12, Spark 3.2.1).
Under "Advanced Options", click on the "Init Scripts" tab. Go to the last line under the "Init Scripts section" Under the "destination" dropdown, select "DBFS". Enter "dbfs:/databricks/spark-monitoring/spark-monitoring.sh" in the text box. Click the "add" button.
Click the "Create Cluster" button to create the cluster. Next, click on the "start" button to start the cluster.
Run the sample job (optional)
The repository includes a sample application that shows how to send application metrics and application logs to Azure Monitor.
When building the sample job, specify a maven profile compatible with your
databricks runtime from the supported configurations section.
Use Maven to build the POM located at sample/spark-sample-job/pom.xml or run the following Docker command:
Navigate to your Databricks workspace and create a new job, as described here.
In the job detail page, set Type to JAR.
For Main class, enter com.microsoft.pnp.samplejob.StreamingQueryListenerSampleJob.
Upload the JAR file from /src/spark-jobs/target/spark-jobs-1.0-SNAPSHOT.jar in the Dependent Libraries section.
Select the cluster you created previously in the Cluster section.
Select Create.
Click the Run Now button to launch the job.
When the job runs, you can view the application logs and metrics in your Log Analytics workspace. After you verify the metrics appear, stop the sample application job.
Viewing the Sample Job's Logs in Log Analytics
After your sample job has run for a few minutes, you should be able to query for
these event types in Log Analytics:
SparkListenerEvent_CL
This custom log will contain Spark events that are serialized to JSON. You can limit the volume of events in this log with filtering. If filtering is not employed, this can be a large volume of data.
Note: There is a known issue when the Spark framework or workload generates events that have more than 500 fields, or where data for an individual field is larger than 32kb. Log Analytics will generate an error indicating that data has been dropped. This is an incompatibility between the data being generated by Spark, and the current limitations of the Log Analytics API.
Example for querying SparkListenerEvent_CL for job throughput over the last 7 days:
letresults=SparkListenerEvent_CL
| where TimeGenerated > ago(7d)
| where Event_s == "SparkListenerJobStart"
| extend metricsns=column_ifexists("Properties_spark_metrics_namespace_s",Properties_spark_app_id_s)
| extend apptag=iif(isnotempty(metricsns),metricsns,Properties_spark_app_id_s)
| project Job_ID_d,apptag,Properties_spark_databricks_clusterUsageTags_clusterName_s,TimeGenerated
| orderby TimeGenerated ascnulls last
| joinkind= inner (
SparkListenerEvent_CL
| where Event_s == "SparkListenerJobEnd"
| where Job_Result_Result_s == "JobSucceeded"
| project Event_s,Job_ID_d,TimeGenerated
) on Job_ID_d;
results
| extend slice=strcat("#JobsCompleted ",Properties_spark_databricks_clusterUsageTags_clusterName_s,"-",apptag)
| summarizecount() bybin(TimeGenerated, 1h),slice
| orderby TimeGenerated ascnulls last
SparkLoggingEvent_CL
This custom log will contain data forwarded from Log4j (the standard logging system in Spark). The volume of logging can be controlled by altering the level of logging to forward or with filtering.
Example for querying SparkLoggingEvent_CL for logged errors over the last day:
SparkLoggingEvent_CL
| where TimeGenerated > ago(1d)
| where Level == "ERROR"
The library is configurable to limit the volume of logs that are sent to each of the different Azure Monitor log types. See filtering for more details.
Debugging
If you encounter any issues with the init script, you can refer to the docs on debugging.
It is not quite clear in the docs how the get python notebooks log to Log Analytics.
I have read about the two options mentioned in #28
The reason it works in the sample is because we have configured log4j to log from our sample job package. To use this from a Databricks Notebook, you will need to do the same. There are a couple of options.
You can configure the whole cluster to log to Log Analytics, which will include notebooks
You can include the code below in every Databricks Notebook. It should work for any notebook because it pulls the class name from the notebook when it runs, but this is lightly tested, so YMMV.
Option A: is that simply the init script does this? Or do I need to configure something extra in order to have all notebooks send logs to log analytics?
Option B: the code is scala, but what about python based notebooks?
Readme says that I should be able to open the sample project's pom located at sample/spark-sample-job/pm.xml and it should be able to build using maven.
I don't have much experience with Java and Maven. I'm using IntelliJ IDEA. I'm able to build the other projects, but the sample one says: Cannot resolve com.microsoft.pnp:spark-listeners_2.11_2.4.3:1.0.0
Looking in the Maven repository there isn't a package by that name so I'm assuming this should come from project in /src.
The sample pom.xml has this but I'm not sure how to tell it where to find that dependency
I built the library and tried it out in our environment for a few days and the cluster ran only a few hours per day but I'm receiving huge amount of log in the Azure Log Analytics workspace. I assume it can be configured to forward only the chosen logs. Could anybody help me out?
So the sample project is setup to send logs to three different custom logs: SparkListenerEvent_CL, SparkLoggingEvent_CL and SparkMetric_CL. These are mostly for internal spark logs and metrics.
What we need is to setup our own logger class that will write our custom application logs (by calling logInfo, logWarning, logError, etc. throughout our application).
We've extended the sample project to include the following class:
packagecom.microsoft.pnp.samplejobimportjava.utilimportcom.microsoft.pnp.logging.{Log4jConfiguration, MDCCloseableFactory}
importcom.microsoft.pnp.util.TryWithimportorg.apache.spark.internal.Loggingimportscala.util.TrycaseclassLogAnalyticsLogger(runId: String, sqlServer: String, storageRoot: String) extendsLogging {
// Configure our loggingTryWith(getClass.getResourceAsStream("/com/microsoft/pnp/samplejob/log4j.properties")) {
stream => {
Log4jConfiguration.configure(stream)
}
}
// Create a HashMap to add the custom properties that will be added to the event.valcontext=new util.HashMap[String, AnyRef]()
context.put("runId", runId)
context.put("sqlServer", sqlServer)
context.put("storageRoot", storageRoot)
// Create a MDC Factory that will add the properties to the event.// This class is not thread-safe so wrap it in a TryWith and don't make it live longer than needed.valmdcFactory:MDCCloseableFactory=newMDCCloseableFactory()
TryWith(mdcFactory.create(context))(
_ => {
logInfo("Initializing logger with context")
logWarning("Warning logger with context")
logError("Error logger with context")
}
)
definfo(msg: String):Try[Unit] = {
valmdcFactory:MDCCloseableFactory=newMDCCloseableFactory()
TryWith(mdcFactory.create(context))(
_ => {
logInfo(msg)
}
)
}
defwarning(msg: String):Try[Unit] = {
valmdcFactory:MDCCloseableFactory=newMDCCloseableFactory()
TryWith(mdcFactory.create(context))(
_ => {
logWarning(msg)
}
)
}
deferror(msg: String, ex: Throwable):Try[Unit] = {
valmdcFactory:MDCCloseableFactory=newMDCCloseableFactory()
TryWith(mdcFactory.create(context))(
_ => {
logError(msg, ex)
}
)
}
}
So far this works as expected, logging with our custom properties and all. But the logs are all mixed with the rest of the Spark internal logs in SparkLoggingEvent_CL. We want to have them in our own separate Custom Log group.
We're assuming we need to edit the log4j.properties file but we're not sure how and we can't find documentation about it.
I am trying to use the logging framework with in the DBX notebook, however when i use the logger to write any type of log(warn,debug) it does not write to my loganalytics workspace. It does work using the sample job/jar you provided. If possible can you provide an example on how to use this logging framework within a Databricks notebook. Thank you!
Recently found the volume of logs flushing into the log analytics workspace is amazingly large. While usually only some specific levels of logs are necessary, is there a way to do some filtering before sending?
I have added the monitoring package for my Structured Steaming job.(Azure Databricks runtime 6.4 spark 2.4.5, Scala 2.11).
I can see the logs showing up in SparkLoggingEvent_CL and SparkListenerEvent_CL. But my spark job log in Databricks UI,
continuously log error
"Error sending to Log Analytics
java.io.IOException: Error sending to Log Analytics"
(attached image)
Since i am using LogAnalytics to create any alert ,based on log query having any Error records. But I am not sure if some of the logs are not getting pushed to log analytics and my alert will not fire up if there are any error records which get dropped due to above error.
I built the jars and followed the instructions on the page, but my cluster does not start and send metrics to the Azure monitor.
Attaching the stdout and stderr logs for reference. The error message shown is workspaceId cannot be null. Any help or guidance would be great. stdout.txt stderr.txt
I am trying to send azure databricks custom logs to azure log analytics workspace using the steps given in github Documentation. The code i am using in databricks notebook is
import com.microsoft.pnp.util.TryWith
import com.microsoft.pnp.logging.Log4jConfiguration
import java.io.ByteArrayInputStream
import org.slf4j.LoggerFactory
import org.slf4j.Logger
val loggerName :String = "fromNotebook"
val level : String = "INFO"
val logType: String = "testlogs"
val log4jConfig = s"""
log4j.appender.logAnalytics=com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender
log4j.appender.logAnalytics.layout=com.microsoft.pnp.logging.JSONLayout
log4j.appender.logAnalytics.layout.LocationInfo=false
log4j.appender.logAnalytics.logType=$logType
log4j.additivity.$loggerName=false
log4j.logger.$loggerName=$level, logAnalytics
"""
TryWith(new ByteArrayInputStream(log4jConfig.getBytes())) {
stream => {
Log4jConfiguration.configure(stream)
}
}
val logger = LoggerFactory.getLogger(loggerName);
logger.info("logging info from " + loggerName)
logger.warn("Warn message " + loggerName)
logger.error("Error message " + loggerName)
My
/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
for this appender looks like
log4j.rootCategory=INFO, console, logAnalyticsAppender
# logAnalytics
log4j.appender.logAnalyticsAppender=com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender
log4j.appender.logAnalyticsAppender.filter.spark=com.microsoft.pnp.logging.SparkPropertyEnricher
#Disable all other logs
log4j.appender.logAnalyticsAppender.Threshold=INFO
But it is showing weird behavior to me. It will work fine for level INFO but if I try to log anything below or above the level I declared in configuration it will throw below error. and after that no matter what changes I make in my code, it will only work after I restart my cluster. My cluster Performance is also getting impacted once it throw the error. Sometime This code even keep on running for indefinite period of time.
Error I am getting is:
> log4j:ERROR Error sending logging event to Log Analytics
> java.util.concurrent.RejectedExecutionException: Task
> com.microsoft.pnp.client.loganalytics.LogAnalyticsSendBufferTask@5b2a430
> rejected from
> java.util.concurrent.ThreadPoolExecutor@699636fd[Terminated, pool size
> = 0, active threads = 0, queued tasks = 0, completed tasks = 112] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at
> com.microsoft.pnp.client.GenericSendBuffer.send(GenericSendBuffer.java:88)
> at
> com.microsoft.pnp.client.loganalytics.LogAnalyticsSendBufferClient.sendMessage(LogAnalyticsSendBufferClient.java:43)
> at
> com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender.append(LogAnalyticsAppender.java:52)
> at
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> at
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> at org.apache.log4j.Category.callAppenders(Category.java:206) at
> org.apache.log4j.Category.forcedLog(Category.java:391) at
> org.apache.log4j.Category.log(Category.java:856) at
> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
> at log4jWrapper.MyLogger.info(MyLogger.scala:48) at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-764707897465587:10)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-764707897465587:63)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw$$iw$$iw$$iw.<init>(command-764707897465587:65)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw$$iw$$iw.<init>(command-764707897465587:67)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw$$iw.<init>(command-764707897465587:69)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$$iw.<init>(command-764707897465587:71)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read.<init>(command-764707897465587:73)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$.<init>(command-764707897465587:77)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$read$.<clinit>(command-764707897465587)
> at
> line07d51ea7c1834afc957316967b0d0e8225.$eval$.$print$lzycompute(<notebook>:7)
> at line07d51ea7c1834afc957316967b0d0e8225.$eval$.$print(<notebook>:6)
> at line07d51ea7c1834afc957316967b0d0e8225.$eval.$print(<notebook>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
> at
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
> at
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
> at
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
> at
> scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
> at
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
> at
> scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576) at
> scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572) at
> com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
> at
> com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
> at
> com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
> at
> com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
> at
> com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:685)
> at
> com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:638)
> at
> com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
> at
> com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:373)
> at
> com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:350)
> at
> com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
> com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
> at
> com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:48)
> at
> com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:271)
> at
> com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:48)
> at
> com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:350)
> at
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
> at
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
> at scala.util.Try$.apply(Try.scala:192) at
> com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
> at
> com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
> at
> com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
> at
> com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
> at
> com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
> at
> com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
> at java.lang.Thread.run(Thread.java:748)
Hi,
I don't have any custom application logging done, but i want only "Error" logs from spark to be written to Log analytics. Is there any option to configure this? Presently it is logging everything Info/Warning/Error to Log analytics.
Do you plan to publish a release of this library in a public artifact repostiory like Maven Central in a near future? I want to use this lib in one of my projects, but it's very complicate for me to manage dependencies if the lib is not published in a public artifact repository.
The stack trace from uncaught exceptions in jobs sometimes (more often than not) never reach Log Analytics because of the 5000ms/25M log buffer implemented in the underlying layers of the appender.
The LogAnalyticsAppender log4j appender implementation doesn't expose the fact that it is buffering, so even when immediate flush is set to true (the default), the underlying classes buffer the log messages. When a DataBricks job is killed due to an uncaught exception, it appears the last few buffered messages (including stack trace) get lost along with it.
We've somewhat worked around this by adding LogManager.shutdown() to a try-catch block in the job logic, however this appears to have side effects we are still investigating -- my current hunch is that this call causes all other (e.g. cluster infra logs) to stop flowing as well after the job terminates.
Tangentially related - it looks like the buffering code was refactored at one point and LogAnalyticsSendBufferClient has references to the 5000ms/25M but those aren't used anywhere, and are in fact controlled by the LogAnalyticsSendBuffer instance.
I guess my previous cluster where the monitoring solution was working was runtime 5.5. But as I updated runtime of my cluster to the latest version, based on Scala 2.12, the monitoring tool does not work. More specifically, it looks like that there is a problem on the init bash (spark-monitoring.sh).
I wonder if you guys have a plan to update the spark-monitoring pacakge according to new version of Databricks runtime.
I have seen in my azure log analytics workspace errors like this one:
Data of type SparkListenerEvent was dropped: The number of custom fields 501 is above the limit of 500 fields per data type. See https://aka.ms/AA593as to find instructions for removing unnecessary custom fields for this type.
Guys, in Log analytics, all the requests that leverage SparkListenerEvent_CL give me the following error message, whereas the ones that rely on SparkMetric_CL work well.
'where' operator: Failed to resolve table or column expression named 'SparkListenerEvent_CL'
I'm running DB 6.4 (includes Apache Spark 2.4.5, Scala 2.11)
I had to tweak the pom.xml and build.sh to compile the libs for spark 2.4.5
The lib are properly uploaded to DB and the init script ran well.
Can you help me?
I am new to java and need help in understanding how can i use Dropwizard to create counters and it says application code where is the code i can see.Can you help me understand in detail on this point
How can I make use of this in a project that uses SBT instead of Maven? The sample shows how to do it on Maven but I can't seem to get it working on my SBT project.
This is a followup on #68 that fixed support for databricks 6.0-6.3, but does not cover 6.4 runtime.
The runtime uses Apache Spark version 2.4.5 as stated in the docs. The PR#69 added maven build profile for 6.0-6.3 and updated cluster init script to load required .jar files dynamically based on Databricks runtime versions.
This results in the script trying to load a jar file containing scala-2.11_spark-2.4.5 in the file name.
The quickfix would be to add a new maven build profile that creates the proper jar file with right dependency to spark version 2.4.5.
I will try to submit a PR for the quickfix, but I'm guessing this might be an issue in the future for any updates to the Databricks runtime, e.g 6.5 and 7.0.
Information:java: Errors occurred while compiling module 'spark-listeners'
Information:javac 11.0.4 was used to compile java sources
Information:10/8/2019 7:21 AM - Build completed with 2 errors and 6 warnings in 27 s 90 ms
C:\Users\sandeep_avuthu\Downloads\spark-monitoring-master\spark-monitoring-master\src\spark-listeners\src\main\java\com\microsoft\pnp\logging\SparkPropertyEnricher.java
Error:(7, 24) java: cannot find symbol
symbol: class SparkInformation
location: package org.apache.spark
Error:(24, 40) java: cannot find symbol
symbol: variable SparkInformation
location: class com.microsoft.pnp.logging.SparkPropertyEnricher
C:\Users\sandeep_avuthu\Downloads\spark-monitoring-master\spark-monitoring-master\src\spark-listeners\src\main\java\com\microsoft\pnp\client\GenericSendBuffer.java
C:\Users\sandeep_avuthu\Downloads\spark-monitoring-master\spark-monitoring-master\src\spark-listeners\src\main\java\com\microsoft\pnp\logging\JSONLayout.java
we tried to compile a jar file, but there is an issue with the dependency.
We are using the following comand to resolve the dependencies: mvn dependency:resolve
.. and recieve the following error message:
[ERROR] Failed to execute goal on project spark-jobs: Could not resolve dependencies for project com.microsoft.pnp:spark-jobs:jar:1.0-SNAPSHOT: Could not find artifact com.mi
crosoft.pnp:spark-listeners:jar:1.0-SNAPSHOT -> [Help 1]
When I used this library on 6.0 data bricks runtime version then it is working fine but I want to use the latest version 6.3 with this library. I am facing an error in the init script get failed while the cluster is up.
Event Logs:
TERMINATING
2020-01-28 11:22:28 IST
Cluster terminated. Reason: Init Script Failure
INIT_SCRIPTS_FINISHED
2020-01-28 11:22:27 IST
Finished Init Scripts execution.
INIT_SCRIPTS_STARTED
2020-01-28 11:22:26 IST
Starting Init Scripts execution.
After Init script finished its execution, it failed.
The reason is:
Message Cluster terminated. Reason: Init Script Failure Init script dbfs:/databricks/spark-monitoring/spark-monitoring.sh failed: Script exit status is non-zero
The same script is working fine in the 5.5 and 6.0 data bricks runtime version.
Am getting this error: The POM for com.microsoft.pnp:spark-listeners_2.4.5_2.11:jar:1.0.0 is missing, no dependency information available while building jar for spark-listeners-loganalytics.
I am getting the below error while starting the cluster. 625514421484409 is my workspaceid
Init Script Failure: Init script dbfs:/databricks/monitoring-staging/listeners.sh failed: /625514421484409/databricks/monitoring-staging/listeners.sh: No such file or directory.
We are monitoring structured streaming using the spark monitoring library. I am interested in progress events only. I set the log4j logging level to Info ("log4j.logger.org.apache.spark.sql.execution.streaming.StreamExecution=INFO"). This is still writing a lot of data. Is it possible to restrict this to progress events only? (i.e. QueryProgressEvent)
I am working with Databricks and am looking to set up monitoring by collecting metrics into Azure Log Analytics, that's exactly what you are suggesting. After some issues I've finally managed to set it up on my DEV databricks using a basic cluster & default run time, no issue it's collecting the metrics however when trying with a more complex cluster using a custom docker image the init script is failing at the very begining.
Here is my cluster:
The Init Script configuration:
After starting and finally stopped in error that's the error found in the cluster-logs:
cp: cannot create regular file '/mnt/driver-daemon/jars': No such file or directory
It's happening in dbfs:/databricks/spark-monitoring/spark-monitoring.sh at line 63 when trying to copy the JAR files to the cluster:
It seems this folder do not exist in the docker image I'm using, is there a way to bypass this? Is there a way to use another folder? If yes what other files shall I update to ensure this new folder will be used to use the JAR?
I tried to build the library today and it gives me Service 'sparkDriver' could not bind to a random port.
I am attaching the screen.
I was able to successfully build some time before Aug 16.
Can someone
Output:
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.115 s
[INFO] Finished at: 2020-03-03T23:31:18Z
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "scala-2.11_spark-2.4.3" could not be activated because it does not exist.
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/). Please verify you invoked Maven from the correct directory. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException
Can you please help me identify what should be changed? It seems that maven is not able to find the pom file.
I am not able to generate Jar file for spark-listeners-loganalytics.
I am getting below two error. Can you please help me here
Error:(6, 8) object SparkListenerSink is not a member of package org.apache.spark.listeners.sink
import org.apache.spark.listeners.sink.SparkListenerSink
Error:(11, 8) object SparkInformation is not a member of package org.apache.spark
import org.apache.spark.SparkInformation
When any debug logging is turned on the log4j appender produces a StackOverflowError
This is caused by the appender and it's dependent classes using log4j to log statements during the setup of the appender. This causes and endless loop and so the StackOverFlowError
I believe the correct way to log during log4j initialization is via the LogLog class which outputs to System.out
We had added the spark monitoring init script within our workspace for interactive and automated clusters which was running with runtime version of 6.3 (includes Apache Spark 2.4.4, Scala 2.11). But we now have issues with this runtime version since, this cluster's runtime version is out of support. We tried to change the run time version of the cluster instance beyond 6.3 and upon adding the init scripts we could see that the cluster start up time is more than expected.
Current LogAnalyticsClient.java doesn't support setting x-ms-AzureResourceId request header. This property would help to populate a unique identifier to track the az resource involved.
I am trying to retrieve a report on the SQL queries that have been executed on a given cluster. What I noticed, however, when running querie agains the Log Analytics Workspace is that longer SQL queries are cut and followed by "..." at the end. Attribute that houses this information is Properties_spark_job_description_s.
Can you please help me understand how this can be modified to report the full query?
This did not work for me through Scala Notebook. I see cluster logs going to Log Analytics workspace, but not the notebook logs. Steps followed fir DBR Runtime 5.5 LTS:
Created a new cluster to run listeners.sh as Init Script
Configured logging in notebook:
// Configure our logging
TryWith(getClass.getResourceAsStream("/log4j.properties")) {
stream => {
Log4jConfiguration.configure(stream)
}
}
Logged events which did not come up in the Log Analytics workspace:
val logger = LoggerFactory.getLogger(getClass)
logger.info("Info message")
logger.warn("Warn message")
logger.error("Error message")
Am I missing something here? Would appreciate any help.
I am very new in this and I am still learning about IT. I am trying to follow all the steps but I am dealing with an error in building the Jar files and I cannot find the answer. If someone could help me I will be very glad.
I am executing the maven package build phase in Intellij but the building is failing. The error says "Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.4.2:compile (default) on project spark-listeners: Execution default of goal net.alchim31.maven:scala-maven-plugin:3.4.2:compile failed.: CompileFailed ".
I also have this error about compiler in mirror not found:
I am using Azure instance of Grafana, and i am trying to setup monitoring for Azure databricks using grafana dashboards. I have used the link : https://docs.microsoft.com/bs-latn-ba/azure/architecture/databricks-monitoring/dashboards to setup the grafana and its dashboard. But when i completed the setup, the panels in the Grafana dashboard is showing “Request Error” on top it. When i looked in to the query of the panel, its showing "TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator)) as error.
The syntax error which i could find from the browser console is :
'extend' operator: Failed to resolve scalar expression named 'Properties_spark_metric_namespace_s'
The instructions in the readme are helpful to build the monitoring library to start using it. It would be great if there was an Azure DevOps Yaml pipeline definition file to help those who want to use the library in Azure Databricks but may not be very familiar with building Scala on their local machine.
If this is a useful contribution, I would be willing to create the yaml file and update the readme doc with update instructions.
I have an issue getting spark-monitoring to work in automated clusters running a python notebook job. The python notebook has both python code and some scala code running in seperate %scala command. Both python and scala code needs to log events that should be forwarded to log analytics.
Using the default cluster init script from this repo, and configuring it like in the README.md works as expected.
But I have tweaked the log4j config in the cluster init script spark-monitoring.sh like in this code snippet
Basically I want to only forward log events of log level WARN to log analytics by setting a Threshold on the default logAnalyticsAppender used by the rootCategory logger. Then I want a seperate logger called pipelineLog which forwards log events of log level INFO to a new specific table called PipelingLoggingEvent in log analytics.
The log4j config shown in the above snippet works fine when running the notebook in an interactive cluster, but when scheduling it using jobs and automated clusters it does not work. Nothing gets send when using the pipelineLog logger from either python or scala code.
Both the interactive cluster and the automated clusters are using the revised cluster init script and the Databricks 5.5 LTS runtime.
Hope to get some insight into why interactive clusters and automated clusters behave differently with this log4j config.