GithubHelp home page GithubHelp logo

hadoop-job-analyzer's Introduction

Example Graphs using Graphite and Graphitus

File bytes read per logical group Job Counts per Submitting Host Job Counts per User Job Durations per submitting host Failed maps per logical group Map tasks per User Reduce bytes per jobname tag Spilled record count per user

Hadoop Job Analyzer

Hadoop job analyzer is a tool that analyzes hadoop jobs, aggregates the information according to user specified crosssections, and sends the output to a metrics backend for visualization and analysis.

The tool is non intrusive and is based on analyzing the history folder in the jobtracker.

Job Analysis

The tool analyzes and exposes all the information about the job. This ranges from mapper/reducer counts (including failed counts), processing durations, input/output record counts and bytes, up to dynamic counters created by higher layers (such as Hive-Level counters), and even by the tool itself, such as the duration of time between job submission and actually beginning job execution.

All the data for each job is broken down into "fields", each contain a piece of information regarding the jobs. For example, SUBMIT_TIME contains the job submission time, USER contains the name of the user running the job and SOURCE_HOST contains the name of the host submitting the job.

The tool also supports extraction of job metadata from the jobname using regular expressions. The extracted metadata becomes part of the job information and the user can choose to aggregate based on it similar to any other fields. See below for examples.

Aggregation

The tool provides intrinsic aggregation capabilities. The aggregation is done on two levels simultanously:

  • Time based aggregation - Every job is put into a "time bucket", which is calculated from one of the time related fields of the job. By default, the time bucket is calculated to 1 minute intervals according to the job submission time.

  • Field based aggregation - The user can ask for multiple views or "projections" of the data, based on the value of fields. The most common projections are SOURCE_HOST and USER, which will provide views based on the submitting host and the requesting user, respectively. The tool supports an arbitrary number of projections of the same data.

Metric Sending

Ultimately, the tool is meant for exposing the job information in a usable way to a metric backend, for analysis, trending and troubleshooting purposes.

Currently, the tool supports sending the metrics to graphite and stdout. Additional metric backends can be supported through a simple plugin mechanism.

A Graphitus dashboard for the metrics sent to graphite is also included, for easy visualization of the data the tool sends. If you don't know graphitus, you should.

To use the graphitus dashboard, look at the README in the graphitus dashboard folder.

The example graphs above were done with the help of graphitus.

Highlights

  • Full slicing and dicing support - Allows to pinpoint exact changes in behavior over time
  • Provides insights to all internal job metrics
  • Proven in production - Works on three 300-node production clusters for over a year
  • Allows extending job metadata through the job name itself, using a job name convention
  • Supports both CDH3 and CDH4 hadoop versions.
  • Additional metric backends can be easily add if needed

Installation

The current stable version is 0.8. To install it follow these steps:

  1. Extract this tar.gz to a folder in the job tracker.

  2. Run pip install cElementTree

  3. Create a script which runs the tool with your selected parameters. You can use the -l flag and the -C stdout flags while creating the script in order to see the output before actually productizing it. See usage for details.

  4. Run this script in cron every time interval (e.g. 1 hour).

Installation through RPM/Debian is under way.

Usage

The tool needs to run periodically on the job tracker, using a set of parameters to control its behavior.

Look at the Usage Page for details.

Examples

Several examples are provided here.

Relaxed Mode

By default the tool will stop if there is an analysis problem. You can use the -r relaxed mode flag in order ot make the tool continue processing to the next jobs even in the face of errors. Metrics will be sent on job analysis failures and on job name parsing failures.

Part of the problems can reside in minor changes between the various hadoop cluster versions, which can be fixed quickly, so I would appreciate any feedback regarding issues like that, and quickly fix them.

Metadata Naming Convention and Measurement Units

See here for details.

Metric backend plugins

Read here on the metric backend plugins.

Logging

The tool writes a rotated log file to the logs/ folder under the location of the tool itself.

Note that this doesn't affect the stdout metric plugin, which writes directly to stdout regardless of the logging.

Limitations

  • Metric names are currently fully processed by the infrastructure and not by the metric plugin. This might change in the future.

  • Currently, all aggregations are sum() aggregations. If needed, aggregation types will be added, possibly by providing a .AVG, .SUM versions of the metrics.

Contact

Any feedback would be much appreciated, as well as pull requests, of course.

Harel Ben-Attia, [email protected], @harelba on Twitter

hadoop-job-analyzer's People

Contributors

harelba avatar jeromeserrano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hadoop-job-analyzer's Issues

Parsing fails if "JOB_TRACKER_START_TIME" is empty

For what ever reason that value is empty on our cluster. The parser fails with

There was an error during job analysis. Consider using relaxed flag if needed. Traceback (most recent call last):
File "./hadoop-job-analyzer", line 900, in
parser.parse()
File "./hadoop-job-analyzer", line 636, in parse
self.maximum_jobname_metadata_length)
File "./hadoop-job-analyzer", line 167, in init
self.analyze()
File "./hadoop-job-analyzer", line 174, in analyze
self.convert_times_to_seconds()
File "./hadoop-job-analyzer", line 184, in convert_times_to_seconds
self.data[k] = float(self.data[k] / 1000.0)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

it can't run

my hadoop cluster version is apache 1.0.3, run hadoop-job-analyzer return error,the message is There was an error during job analysis. Consider using relaxed flag if needed. Traceback (most recent call last):
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 755, in ?
parser.parse()
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 514, in parse
jhp = JobHistoryProvider(self.history_folder,self.relaxed)
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 337, in init
self.analyze()
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 353, in analyze
jcf = JobConfFilename(conf_filename)
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 276, in init
self.analyze_filename()
File "/home/shidongjie/hadoop-job-analyzer-master/hadoop-job-analyzer", line 280, in analyze_filename
self.jt,self.jt_start_ts,,self.job_ts,self.job_number, = base_name.split("_",5)
ValueError: need more than 4 values to unpack

my python version is 2.4

AttributeError: 'NoneType' object has no attribute 'replace'

when i run hadoop-job-analyzer , An exception occur like this

./hadoop-job-analyzer -f hadoop-0.20/ -l
Traceback (most recent call last):
File "./hadoop-job-analyzer", line 712, in
metric_name_prefix = materialize_metric_name_prefix(options.metric_name_prefix)
File "./hadoop-job-analyzer", line 589, in materialize_metric_name_prefix
result = metric_name_prefix.replace("${HOSTNAME}",HOSTNAME)
AttributeError: 'NoneType' object has no attribute 'replace'

plz help me to slove the problem this harelba .thansk ,wait for your answer

Attribute Error

Hello Harel,

When running the test one of your sample tests ./hadoop-job-analyzer -f example-history-folder/ -l -s 2, on the example-job-history-folder that comes when you install hja, I get attribute errors and I cannot explain it. Do you have any clue why this might be happening?

Data not being plotted in graphite

Hi ,

I was trying to run this tool and submit the data on graphite using command -
./hadoop-job-analyzer -f /root/hadoop-job-analyzer/data/18 -r -l -C graphite -P server=IP,port=2003

I have my graphite server up and running on port 2003 but i see only system metrics on graphite but nothing regarding the parsed log info.

I have also tried with "server=graphite" as mentioned in README in graphitus-dashboard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.