Comments (7)
Yes, it's a bug I fixed which I didn't push yet.
It's about a missing metric name prefix - Adding -n aaa.bbb
to the command line would work around it.
However, I just pushed the fix for it.
Thanks for noticing.
Harel
from hadoop-job-analyzer.
That did the trick. Thanks! Also, this may be a silly question, but is there a way to run aggregations and only select specific parameters. i.e. aggregate over all users, the total duration, failed reducers, and total maps?
from hadoop-job-analyzer.
Great.
I didn't think about providing such filtering, but it can be easily added. I'll try to add support for this (e.g. some kind of parameter for filtering metrics according to a regexp), and update here when I push it.
Harel
from hadoop-job-analyzer.
So how is it that you are able to graph only single metric fields like spilled record count per selected cross region? Is that something that graphite handles? Also, I know that graphitus is included, but do we also have to download graphite ourselves?
from hadoop-job-analyzer.
hadoop-job-analyzer outputs metrics to graphite, broken down ("grouped") by what you request in the -p
parameter. The metrics are output in a format that allows smart querying by graphite. Try running the tool using -C stdout
instead of -C graphite
and you'll be able to see the metric names.
Part of the metric names is the "group by" values. For example, if you request -p SUBMIT_HOST/USER
, then the metric name will be something like:
<prefix>.projections.SUBMIT_HOST-USER.<submit-host>.<user>.<metric-name>.<value>
Graphite is very powerful in querying and dissecting the data once it's there, and can be used to group by any part of the metric name and to set parts of the metric name as "constant" (such as the spilled record metric name). Look at the groupByNode() function ( graphite api here ).
Example metrics when running ./hadoop-job-analyzer -f example-history-folder/ -n aaa.bbb -C stdout -p SUBMIT_HOST/USER
(the -C stdout
parameters tell the tool to output parameters to stdout instead of to graphite):
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineA.diana.MAP_COUNTERS.org_apache_hadoop_mapred_Task__Counter.COMBINE_INPUT_RECORDS.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineA.diana.COUNTERS.org_apache_hadoop_mapred_Task__Counter.REDUCE_SHUFFLE_BYTES.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineB.diana.MAP_COUNTERS.org_apache_hadoop_mapred_Task__Counter.MAP_OUTPUT_BYTES.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineB.diana.FAILED_REDUCES.value value is 0.000 timestamp is 1368598500
You do need to install graphite on your own, though. It's not part of hadoop-job-analyzer. However, maybe I'll try to provide a script for downloading and auto-installing a simple graphite installation in order to ease the initial overhead in cases where the user doesn't have graphite already installed.
from hadoop-job-analyzer.
Thank you for all of the support. It has been incredibly helpful. My (hopefully) last question is if there is anything that must be done differently for YARN. Thanks in advance!
from hadoop-job-analyzer.
No problem. Glad it helps.
I'm not sure about YARN, but I haven't played with it in that regard, and we don't have a production cluster running YARN yet. I do believe that some changes will be needed in order to support it, since YARN has a separate JobHistoryServer which collects and manages historical data.
Harel
from hadoop-job-analyzer.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop-job-analyzer.