GithubHelp home page GithubHelp logo

Attribute Error about hadoop-job-analyzer HOT 7 CLOSED

ayuk23 avatar ayuk23 commented on May 30, 2024
Attribute Error

from hadoop-job-analyzer.

Comments (7)

harelba avatar harelba commented on May 30, 2024

Yes, it's a bug I fixed which I didn't push yet.

It's about a missing metric name prefix - Adding -n aaa.bbb to the command line would work around it.

However, I just pushed the fix for it.

Thanks for noticing.
Harel

from hadoop-job-analyzer.

ayuk23 avatar ayuk23 commented on May 30, 2024

That did the trick. Thanks! Also, this may be a silly question, but is there a way to run aggregations and only select specific parameters. i.e. aggregate over all users, the total duration, failed reducers, and total maps?

from hadoop-job-analyzer.

harelba avatar harelba commented on May 30, 2024

Great.

I didn't think about providing such filtering, but it can be easily added. I'll try to add support for this (e.g. some kind of parameter for filtering metrics according to a regexp), and update here when I push it.

Harel

from hadoop-job-analyzer.

ayuk23 avatar ayuk23 commented on May 30, 2024

So how is it that you are able to graph only single metric fields like spilled record count per selected cross region? Is that something that graphite handles? Also, I know that graphitus is included, but do we also have to download graphite ourselves?

from hadoop-job-analyzer.

harelba avatar harelba commented on May 30, 2024

hadoop-job-analyzer outputs metrics to graphite, broken down ("grouped") by what you request in the -p parameter. The metrics are output in a format that allows smart querying by graphite. Try running the tool using -C stdout instead of -C graphite and you'll be able to see the metric names.

Part of the metric names is the "group by" values. For example, if you request -p SUBMIT_HOST/USER, then the metric name will be something like:

<prefix>.projections.SUBMIT_HOST-USER.<submit-host>.<user>.<metric-name>.<value>

Graphite is very powerful in querying and dissecting the data once it's there, and can be used to group by any part of the metric name and to set parts of the metric name as "constant" (such as the spilled record metric name). Look at the groupByNode() function ( graphite api here ).

Example metrics when running ./hadoop-job-analyzer -f example-history-folder/ -n aaa.bbb -C stdout -p SUBMIT_HOST/USER (the -C stdout parameters tell the tool to output parameters to stdout instead of to graphite):

Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineA.diana.MAP_COUNTERS.org_apache_hadoop_mapred_Task__Counter.COMBINE_INPUT_RECORDS.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineA.diana.COUNTERS.org_apache_hadoop_mapred_Task__Counter.REDUCE_SHUFFLE_BYTES.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineB.diana.MAP_COUNTERS.org_apache_hadoop_mapred_Task__Counter.MAP_OUTPUT_BYTES.value value is 0.000 timestamp is 1368598500
Metric - name is aaa.bbb.projections.SUBMIT_HOST-USER.machineB.diana.FAILED_REDUCES.value value is 0.000 timestamp is 1368598500

You do need to install graphite on your own, though. It's not part of hadoop-job-analyzer. However, maybe I'll try to provide a script for downloading and auto-installing a simple graphite installation in order to ease the initial overhead in cases where the user doesn't have graphite already installed.

from hadoop-job-analyzer.

ayuk23 avatar ayuk23 commented on May 30, 2024

Thank you for all of the support. It has been incredibly helpful. My (hopefully) last question is if there is anything that must be done differently for YARN. Thanks in advance!

from hadoop-job-analyzer.

harelba avatar harelba commented on May 30, 2024

No problem. Glad it helps.

I'm not sure about YARN, but I haven't played with it in that regard, and we don't have a production cluster running YARN yet. I do believe that some changes will be needed in order to support it, since YARN has a separate JobHistoryServer which collects and manages historical data.

Harel

from hadoop-job-analyzer.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.