GithubHelp home page GithubHelp logo

phdata / pulse Goto Github PK

View Code? Open in Web Editor NEW
13.0 21.0 18.0 11.67 MB

phData Pulse application log aggregation and monitoring

License: Apache License 2.0

Makefile 1.35% Scala 77.40% Shell 5.59% Python 7.08% Java 8.57%
log-aggregation solr solrcloud hadoop akka-streams scala csd

pulse's Introduction

Travis Build Tag

logo.png

Hadoop log aggregation, alerting, and lifecycle management


Pulse

Pulse is an Apache 2.0 licensed log aggregation framework built on top of Solr Cloud (Cloudera Search). It can be used with applications written in any language, but was built especially for improving logging in Apache Spark Streaming applications running on Apache Hadoop.

Pulse gives application users full text centralized search of their logs, flexible alerts on their logs, and works with several visualization tools.

Pulse handles log lifecycle, so application developers don't have to worry about rotating or maintaining log indexes themselves.

See our documentation page on readthedocs.org for more details: https://pulse-logging.readthedocs.io/en/latest/

pulse's People

Contributors

afoerster avatar astadtler avatar bpmcd avatar brockn avatar caseycrawford avatar cattmarlin avatar davidbluml avatar edemiraydin avatar jtbirdsell avatar keithssmith avatar kjmccarthy avatar mariyamg avatar namanj avatar niveditha-phdata avatar paladin235 avatar raymondblanc avatar sada3390 avatar safwanislam avatar shashireddypalle avatar sumitbsn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pulse's Issues

Create default configurations for first run

The default configuration would allow the Collection Roller and Alert Engine to start on a first install and validate everything is working. It can use a test collection 'pulse-test-default'

Add a role action to trigger test email

  • Create a main class that will take arguments (the conf and an email address) and sends an email
  • Add a role action in 'control.sh' and the service.sdl CSD file

Adding a non-string value to the MDC can cause the app to hang

When using the HttpAppender adding a non-string value to the MDC can cause the app to hang.

The error:

Exception in thread "HTTP appender dispatcher" java.lang.RuntimeException: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
	at io.phdata.pulse.log.HttpAppender$Dispatcher.flush(HttpAppender.java:357)
	at io.phdata.pulse.log.HttpAppender$Dispatcher.run(HttpAppender.java:341)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
	at io.phdata.pulse.log.JsonParser.marshallEventInternal(JsonParser.java:68)
	at io.phdata.pulse.log.JsonParser.marshallArray(JsonParser.java:24)
	at io.phdata.pulse.log.HttpAppender$Dispatcher.flush(HttpAppender.java:355)

The location:

      for (Map.Entry<String, String> entry : props) {
        jg.writeStringField(entry.getKey(), entry.getValue());
      }

I think the easy fix is to just call toString on the entry.getValue(), I don't see any downsides to this.

Application ID logging for Spark Driver

Pulse will automatically log the application ID for executors based off environmental variables passed into Yarn containers, but the same method isn't working for the drivers.

Figure out a way to automatically log application ids in drivers or write an example using the log4j MDC.

Create an endpoint to ingest raw json

The current endpoint requires the json payload conform to a LogEvent type. This LogEvent type has data needed for log4j or Python logging, but isn't flexible if we want to insert arbitrary data, like Metrics, into Pulse.

  • Create a new endpoint in LogCollectorRoutes with the path `v1/json/
  • Parse the json using Spray (this is already a project dependency) into a map[String, String]
  • Change all code in SolrCloudStreams to work with Map[String, String] instead of LogEvent. Figure out: do we still need the LogEvent class? How can we move it up the call stack?
  • The new endpoint should dump the Map[String, String] onto the solr stream.
  • At the end of the stream here: solrService.insertDocuments(latestCollectionAlias, events.map(DocumentConversion.toSolrDocument) we need a function to convert the Map into a SolrDocument

Client configurations?

From the base log appender:

curl -X POST -H 'Content-Type: application/json' -d '{"category": "'$category'","timestamp": '$timestamp', "level": "'$level'", "message": "'$message'", "threadName": "'$threadName'"}' http://0.0.0.0:9005/log?application=$application

It'd be nice to get this from a client config, eg /etc/pulse/conf/env.sh so I'll I'd have to use as a user is:

source /opt/cloudera/parcels/PULSE/lib/appenders/logger.sh

and I'd be running.

Queries calling StatsComponent on TextField are failing

Arcadia is calling StatsComponent on some text fields causing the error:

 error: Error reading Solr data: Field type text_general{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100, class=solr.TextField}} is not currently supported 

This should be fixed by only using 'text_general' type, which uses TextField, on any short fields (like 'category'). Arcadia tries to aggregate on category, so I suspect this is where the issue is.

The Bash logger script adds extra single quotes around fields and breaks the log collector.

Stack trace:

Error posting documents to solr
org.apache.solr.common.SolrException: Could not find collection : 'kafka_kudu_streaming'_latest
at org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:324)
at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:563)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at io.phdata.pulse.common.SolrService.insertDocuments(SolrService.scala:176)

Can be fixed by updating the logger.sh script to not quote every property, or by sanitizing the params in the log-collector POST handler...

Add secured solrconfig set option

It can be copied from solrconfigv2. There are some changes made to solrconfig.xml that will need to be moved to the secure version of the file.

Empty SMTP password config results in alert engine crash

2018-08-27 14:40:20,134 INFO io.phdata.pulse.alertengine.notification.MailNotificationService: starting notification for profile mailProfile1
2018-08-27 14:40:20,185 INFO io.phdata.pulse.alertengine.notification.MailNotificationService: sending alert
2018-08-27 14:40:20,186 INFO io.phdata.pulse.alertengine.notification.Mailer: authenticating with password
2018-08-27 14:40:20,320 ERROR io.phdata.pulse.alertengine.AlertEngineMain$: caught exception in Collection Roller task
javax.mail.AuthenticationFailedException: null
at javax.mail.Service.connect(Service.java:306)
at javax.mail.Service.connect(Service.java:156)
at javax.mail.Service.connect(Service.java:105)
at javax.mail.Transport.send0(Transport.java:168)
at javax.mail.Transport.send(Transport.java:98)
at io.phdata.pulse.alertengine.notification.Mailer.sendMail(Mailer.scala:60)
at io.phdata.pulse.alertengine.notification.MailNotificationService$$anonfun$notify$1.apply(MailNotificationService.scala:35)
at io.phdata.pulse.alertengine.notification.MailNotificationService$$anonfun$notify$1.apply(MailNotificationService.scala:31)
at scala.collection.immutable.List.foreach(List.scala:392)
at io.phdata.pulse.alertengine.notification.MailNotificationService.notify(MailNotificationService.scala:31)
at io.phdata.pulse.alertengine.AlertEngineImpl$$anonfun$sendAlert$1.apply(AlertEngineImpl.scala:139)
at io.phdata.pulse.alertengine.AlertEngineImpl$$anonfun$sendAlert$1.apply(AlertEngineImpl.scala:138)
at scala.collection.immutable.List.foreach(List.scala:392)
at io.phdata.pulse.alertengine.AlertEngineImpl.sendAlert(AlertEngineImpl.scala:138)
at io.phdata.pulse.alertengine.AlertEngineImpl$$anonfun$notify$1.apply(AlertEngineImpl.scala:112)
at io.phdata.pulse.alertengine.AlertEngineImpl$$anonfun$notify$1.apply(AlertEngineImpl.scala:110)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at io.phdata.pulse.alertengine.AlertEngineImpl.notify(AlertEngineImpl.scala:110)
at io.phdata.pulse.alertengine.AlertEngineImpl.run(AlertEngineImpl.scala:45)
at io.phdata.pulse.alertengine.AlertEngineMain$AlertEngineTask.run(AlertEngineMain.scala:143)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-08-27 14:40:20,321 WARN io.phdata.pulse.alertengine.AlertEngineMain$: Caught exit signal, trying to cleanup tasks

Kafka Integration

This story includes all work associated with getting Kafka integration working. It can be broken out as needed into smaller pieces.

Add kafka arguments to LogCollectorCliParser

LogCollectorCliParser

Here is a rough cut:

class LogCollectorCliParser(args: Seq[String]) extends ScallopConf(args) {
  lazy val port = opt[Int]("port", required = false, descr = "Listening port")
  lazy val zkHosts = opt[String]("zk-hosts", required = true, descr = "Zookeeper hosts")
  lazy val topic = opt[String]("topic", required = false, descr = "Kafka Topic")
  lazy val mode = opt[String]("consume-mode", required = false, descr = "'http' or 'kafka'", default = "http")

  verify()
}
}

Verify a mode is chosen and it is valid. To keep backward compatibility, if no mode is chosen it should default to 'http'.

Verify 'port' is chosen with the http listen mode and 'topic' is provided in the kafka listen mode.

Expose Kafka

Branch on the new 'listen-mode' in LogCollector.scala to start listening to the kafka topic

Integrate new arguments into control.sh

Add environment variables in control.sh for the new arguments.
Create a script in the bin directory to run the kafka consume mode by calling control.sh. This will make it easy to test changes to the scripts and arguments.

Create a test producer

Create a test producer that will put events onto a topic that will then be read by the kafka consumer. The test producer will make it easy to run the kafka consumer outside of unit tests but not in full production

Integrate new arguments into service.sdl

Service.sdl is the configuration file for the CSD https://github.com/cloudera/cm_ext/wiki/Service-Descriptor-Language-Reference

There should be at least two new arguments for consume-mode and topic.
Listen mode should default to http.

Deploy the CSD to Valhalla and test

Test all changes with the CSD and new parcel deployed on a test cluster.

I have scripts for this that are not yet committed, hopefully by the time this is reached they will be.

Document changes

Add a page to the docs dir describing usage and limitations. Register it in mkdocs.yml

Messages can be lost in the HttpAppender buffer when application exits

Since Pulse v2 we have an asynchronous appender, and it doesn’t get properly flushed because there’s nothing to call ‘close’ when the application is completed, so log events at the end of an application can get lost.

A workaround would be to close the buffer manually in a high levelfinally block or add a shutdown hook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.