GithubHelp home page GithubHelp logo

witnesslq / eagleeye Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yeleid/eagleeye

0.0 0.0 0.0 855 KB

An app built on Cloudera Enterprise for tracking metrics of jobs that run in YARN framework

License: Apache License 2.0

Shell 14.58% Java 85.42%

eagleeye's Introduction

Eagle Eye

An app built on Cloudera Enterprise for tracking jobs that run in YARN framework. Those runtime "logs" are important, as they can be used for:

  • real-time monitor, check whether the most critical jobs have gained enough resources;
  • resource usage summary reporting at job/queue granularity, get the deeper insights of your cluster;
  • resource allocation snapshot, quickly check which user, job, or queue is hogging resources for any time period.

As the best Hadoop administration console, Cloudera Manager can fulfill almost all administration requirements, e.g. monitor cluster overall resources usage, service/host status, etc.

kpi

In some circumstances, customers may trace back to check the job status given a specific time window. Currently, they always go to "yarn application" console

yarn

However, there is only summarized information there, it's NOT that straightforward to reconstruct what happened at a specific time spot. To facilitate it, a better solution would be to capture the behavior of a job during its whole lifecycle, index them for searching, and make them available to other analytic applications.

1. Architecture

architecture

The high-level diagram above illustrates a simple setup. More specifically, Eagle Eye service will retrieve all running application metrics and send them to Kafka Channel, which in turn passes them to a MorphlineSolr sink in Flume agent. MorphlineSolr parses the messages, converts them into Solr documents, and sends them to Solr server. After the indexed documents appear in Solr, dashboards (e.g. Hue, or other dashboards) can be built for visualization. Also the metrics data can be sent to Spark (and/or Spark Streaming) for more advanced analysis.

In case there all no Kafka brokers deployed yet, Eagle Eye service is able to send the metrics to Flume (through Http source) directly.

Application metrics (Resource Manager REST API) break down

<app>
    <id>application_1326815542473_0001</id>
    <user>user1</user>
    <name>word count</name>
    ...
    <allocatedMB>0</allocatedMB>
    <allocatedVCores>0</allocatedVCores>
    <runningContainers>0</runningContainers>
    <memorySeconds>151730</memorySeconds>
    <vcoreSeconds>103</vcoreSeconds>
</app>

For simplicity, we only extract allocatedMB and allocatedVCores to track CPU/Memory usage of a job.

2. Prerequisite

Given the requirement discussed above, we need at least Solr, Kafka, and Flume. Fortunately All in one, they are already in CDH (with professional support). So any Java developer with Cloudera Enterprise at hand is able to implement this prototype from scratch within 1 day. What you have to do is to write some simple Java code and morphline/flume configurations.

Deploy Kafka

Add Kafka service in Cloudera Manager Admin console

kafka

Create a topic for eagle eye

bin/kafka/create_topic.sh
After you create the topic, you can run bin/kafka/list_topics.sh to see whether the topic is created correctly and it should display like below:
eagleeye

Deploy Solr

Add solr service in Cloudera Manager Admin console

solr

bin/solr/create_collection.sh 
create solr collection...
Uploading configs from /tmp/eagleeye_configs/conf to 172.31.13.75:2181/solr. This may take up to a minute.

Later you may consider Solr data purge for storage/query efficiency via Solr alias.

Deploy Flume

To get metrics from Kafka, you may use configuration

a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.morphlineFile = bin/flume/morphline/morphline-json.conf
...
# Use a channel which buffers events in memory
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.parseAsFlumeEvent = false
...

This flume agent will consume data from Kafka directly and leverage morphline to index data in Solr in near real time (NRT). To Start flume agent a1, run the command below:

bin/flume/ng-flume.sh

To receive metrics directly from Eagle Eye service, you may use configuration

a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.morphlineFile = bin/flume/morphline/morphline-csv.conf
...
a1.sources.r1.type = http
a1.sources.r1.port = 4140

3. Eagle Eye Setup & Demonstration

Let's first look into the properties eagle eye use to configure the app dynamically:

# define resource manager address where to retrieve application information
yarn_applications_path=/ws/v1/cluster/apps?states=RUNNING
rm_address=http://172.31.13.75:8088

# define the sampling frequency, the format of metrics, and how to publish metrics
schedule_period=5
producer=http,logger
# producer=kafka,logger
format=csv
# format=json

# define the kafka broker and topic properties
kafka_topic=eagleeye
kafka_broker_list=172.31.13.76:9092,172.31.13.77:9092,172.31.13.78:9092

# define the http server
http_server=http://172.31.13.75:4140

By now, environments for eagle eys are all set, and you can start this app by typing the command as follows:

bin/eagleeye.sh 
16/02/04 01:40:28 INFO solutions.Main: Eagle eye starts monitoring ...

Eagle eye is good to go, you can run command bin/job-submit.sh to submit several jobs on behalf of different users, e.g. hue, hive, and impala. Every 5 seconds, eagle eye will pull information from Resource Manager, parse it and publish the final metrics into Kafka. During the period, you may see the console logs like:

16/02/04 01:54:09 INFO solutions.Publisher: [{"timestamp":"2016-02-04T01:54:08Z","queue":"root.hive","allocatedMB":1024,"allocatedVCores":1,"applicationId":"application_1454403433606_0061","user":"hive","name":"QuasiMonteCarlo","id":"application_1454403433606_0061$1454550848874"}]

To visualize the metrics, we use Hue, the easiest way to build custom dashboards:

hue

Since the data are all in CDH, with standard/open protocol, you can easily retrieve the partition and build a more dedicated dashboards, e.g. using d3 library.

d3

4. References

Flume solrsink: https://flume.apache.org/FlumeUserGuide.html
Cloudera distribution of Apache Kafka: http://www.cloudera.com/documentation/kafka/latest/topics/kafka_packaging.html
Yarn RM REST api: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

eagleeye's People

Contributors

yeleid avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.