GithubHelp home page GithubHelp logo

jtune's Introduction

JTune - a high precision Java CMS optimizer

NOTE

Version 4.0 removes Python 2.0 compat and is now Py3+ only. There are no feature additions. If you're running under a Python 2 environment, you must run a version of JTune prior to v4.

Overview

JTune is a tool that will help you tune and troubleshoot a running JVM (Java 6 - Java 8) without restarting. It currently doesn't work with the G1 garbage collector, and will error out if this is detected. Tuning is based on two metrics: the aggregate time spent doing GCs, and the standard deviation of the GCs. Upon invocation, JTune captures the output of jstat for the given pid as well as the GC log data during the sample period.

Installation

The easiest way to install JTune is via pip. A pip install will fetch the latest version of JTune from the PyPI repo.

pip install jtune

The latest development branch can be installed via setuptools, by performing the following.

git clone https://github.com/linkedin/JTune.git
cd JTune
python setup.py install

If you wish to run JTune as a one time action, all JTune logic is present in a single file (jtune.py). Simply copying jtune.py to your target host will work as well.

It is recommended that you install JTune into a Python virtualenv.

Options Help

In normal use, run JTune with the -p <pid> parameter, and it will run indefinitely. When you are ready, hit CTRL-C, and detailed information about its findings will be printed, along with recommendations for improvement (if any).

There are additional options that you can take advantage of. See below output:

$ jtune -h
usage: jtune [-h] [-o OPTIMIZE] [-P] [-s FGC_STOP_COUNT]
                 [-y YGC_STOP_COUNT] [-c STOP_COUNT] [-n] (-r [FILE] | -p PID)

Run jstat w/ analytics

optional arguments:
  -h, --help            show this help message and exit
  -o OPTIMIZE, --optimize OPTIMIZE
                        Optimize for latency or throughput (range 0-11, 0 =
                        ygc @ 180/min, 11 = ygc @ 1/min). Floats allowed.
  -P, --no-paste        Don't save the screen output to the paste service
  -s FGC_STOP_COUNT, --fgc-stop-count FGC_STOP_COUNT
                        How many full gcs should happen before I stop (very
                        important for analytics)
  -y YGC_STOP_COUNT, --ygc-stop-count YGC_STOP_COUNT
                        How many young gcs should happen before I stop
  -c STOP_COUNT, --stop-count STOP_COUNT
                        How many iterations of jstat to run before stopping
  -n, --no-jstat-output
                        Do not show jstat output - only print summary
  -p PID, --pid PID     Which java PID should I attach to
  • You can also have it stop after X number of YGCs, FGCs, or jstat iterations (-y, -s, -c respectively). If you want it to make tuning suggestions, you'll want to let it run for at least 3 FGCs (-s <#>) before exiting.
  • There may be cases where you want jtune to optimize for a given number of CMS GCs, you can do this with the '-o #' parameter. Right now you can specify a range between 0 and 11 which corresponds to the 180 CMS/min to 1 CMS/min respectively. In most cases you can leave it as default. The way this parameter is used will likely change.

Command Output

Here's an example of a JTune run for a test instance (broken up into chunks)

  • JTune is running against PID 25815 for 40 iterations, exit, and report its findings:
$ jtune -c 40 -p 25815
#####
# Start Time:  2015-03-23 12:31:45.079102 GMT
# Host:        fake-host.linkedin.com
#####
   EC      EP      EU  S0C/S1C     S0U     S1U      OC      OP      OU     MC     MU    YGC  YGCD  FGC  FGCD
~~~~~~  ~~~~~~  ~~~~~~  ~~~~~~~  ~~~~~~  ~~~~~~  ~~~~~~  ~~~~~~  ~~~~~~  ~~~~~  ~~~~~  ~~~~~  ~~~~  ~~~  ~~~~
  1.1G   96.0%    1.1G   117.3M   76.5M      0K   13.6G   50.6%    6.9G    90M  88.2M  71876     -  138     -
  1.1G   28.0%    329M   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877    +1  138     -
  1.1G   51.5%  604.7M   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877     -  138     -
  1.1G   55.1%  646.4M   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877     -  138     -
  1.1G   74.5%  874.6M   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877     -  138     -
  1.1G   88.1%      1G   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877     -  138     -
  1.1G   92.5%    1.1G   117.3M      0K    9.5M   13.6G   50.7%    6.9G    90M  88.2M  71877     -  138     -
  1.1G   15.6%  182.8M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878    +1  138     -
  1.1G   50.3%  589.7M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878     -  138     -
  1.1G   60.8%  713.8M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878     -  138     -
  1.1G   65.4%  767.2M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878     -  138     -
  1.1G   66.0%  774.7M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878     -  138     -
  1.1G   78.6%  922.5M   117.3M     17M      0K   13.6G   50.7%    6.9G    90M  88.2M  71878     -  138     -
  1.1G    5.1%   59.7M   117.3M      0K   14.4M   13.6G   50.7%    6.9G    90M  88.2M  71879    +1  138     -
  1.1G   28.5%    335M   117.3M      0K   14.4M   13.6G   50.7%    6.9G    90M  88.2M  71879     -  138     -
  1.1G   63.3%  742.9M   117.3M      0K   14.4M   13.6G   50.7%    6.9G    90M  88.2M  71879     -  138     -
  1.1G   67.5%  791.6M   117.3M      0K   14.4M   13.6G   50.7%    6.9G    90M  88.2M  71879     -  138     -
  1.1G   74.2%  870.4M   117.3M      0K   14.4M   13.6G   50.7%    6.9G    90M  88.2M  71879     -  138     -
  1.1G    3.2%   38.1M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880    +1  138     -
  1.1G    8.9%  104.2M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880     -  138     -
  1.1G   27.9%  327.2M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880     -  138     -
  1.1G   29.5%  346.1M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880     -  138     -
  1.1G   35.2%  413.6M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880     -  138     -
  1.1G   53.5%  628.1M   117.3M   48.6M      0K   13.6G   50.7%    6.9G    90M  88.2M  71880     -  138     -
  1.1G    1.0%   12.1M   117.3M      0K   47.3M   13.6G   50.7%    6.9G    90M  88.2M  71881    +1  138     -
  1.1G   19.2%  225.7M   117.3M      0K   47.3M   13.6G   50.7%    6.9G    90M  88.2M  71881     -  138     -
  1.1G   72.6%  852.3M   117.3M      0K   47.3M   13.6G   50.7%    6.9G    90M  88.2M  71881     -  138     -
  1.1G   79.5%  933.1M   117.3M      0K   47.3M   13.6G   50.7%    6.9G    90M  88.2M  71881     -  138     -
  1.1G    5.6%     66M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882    +1  138     -
  1.1G   36.2%  424.9M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   47.5%  557.2M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   57.0%  669.3M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   66.9%  785.5M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   87.1% 1022.3M   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   99.8%    1.1G   117.3M   53.3M      0K   13.6G   50.7%    6.9G    90M  88.2M  71882     -  138     -
  1.1G   36.7%  430.6M   117.3M      0K     78M   13.6G   50.7%    6.9G    90M  88.2M  71883    +1  138     -
  1.1G   78.1%  915.9M   117.3M      0K     78M   13.6G   50.7%    6.9G    90M  88.2M  71883     -  138     -
  1.1G   19.7%  231.4M   117.3M  117.3M      0K   13.6G   50.9%    6.9G    90M  88.2M  71884    +1  138     -
  1.1G   68.5%  804.3M   117.3M  117.3M      0K   13.6G   50.9%    6.9G    90M  88.2M  71884     -  138     -
  1.1G   87.2% 1022.7M   117.3M  117.3M      0K   13.6G   50.9%    6.9G    90M  88.2M  71884     -  138     -

* Reading gc.log file... done. Scanned 45 lines in 0.0001 seconds.
* Reading the public access log file... done. Scanned 169 lines in 0.0014 seconds.
  • When it exits, this first section gives useful meta information about the process, information about the requests that are coming into it, GC allocation/promotion rates, and survivor death rates.
Meta:
-----
Sample Time:    40 seconds
System Uptime:  1046d18h
CPU Uptime:     25122d21h
Proc Uptime:    6d23h
Proc Usertime:  3d15h (0.01%)
Proc Systime:   8h28m (0.00%)
Proc RSS:       36.95G
Proc VSize:     55.46G
Proc # Threads: 771

YG Allocation Rates*:
---------------------
per sec (min/mean/max):     177.37M/s     214.35M/s     311.97M/s
per day (min/mean/max):      14.61T/d      17.66T/d      25.71T/d

OG Promotion Rates:
-------------------
per sec (min/mean/max):     326.80K/s       6.66M/s      27.12M/s
per hr (min/mean/max):        1.12G/h      23.43G/h      95.33G/h

Survivor Death Rates:
---------------------
Lengths (min/mean/max): 1/1.8/2
Death Rate Breakdown:
   Age 1:  4.6% / 63.9% / 95.2% / 36.1% (min/mean/max/cuml alive %)
   Age 2:  0.0% / 21.3% / 79.2% / 28.4% (min/mean/max/cuml alive %)

GC Information:
---------------
YGC/FGC Count: 8/0 (Rate: 12.00/min, 0.00/min)

GC Load (since JVM start): 0.39%
Sample Period GC Load:     0.51%

CMS Sweep Times: 0.000s /  0.000s /  0.000s / 0.00 (min/mean/max/stdev)
YGC Times:       17ms / 26ms / 51ms / 11.04 (min/mean/max/stdev)
FGC Times:       0ms / 0ms / 0ms / 0.00 (min/mean/max/stdev)
Agg. YGC Time:   205ms
Agg. FGC Time:   0ms

Est. Time Between FGCs (min/mean/max):         12h8m     34m53s      8m34s
Est. OG Size for 1 FGC/hr (min/mean/max):      1.12G     23.43G     95.33G

Overall JVM Efficiency Score*: 99.488%

Current JVM Configuration:
--------------------------
          NewSize: 1.38G
          OldSize: 13.62G
    SurvivorRatio: 10
 MinHeapFreeRatio: 40
 MaxMetaspaceSize: 16E
 MaxHeapFreeRatio: 70
      MaxHeapSize: 15G
    MetaspaceSize: 20.80M
         NewRatio: 2
  • This section provides what analysis it was able to do. For it to do a very accurate/detailed analysis, you need to let it run long enough to capture at least 3 FGCs. It will warn you if it doesn't have enough data, and will not do analysis in a specific area if there is insufficient data. Here you can see that there wasn't enough FGCs.
Recommendation Summary:
-----------------------
Warning: There were only 8 YGC entries to do the analysis on. It's better to
have > 10 to get more realistic results.


* Error: Your survivor age is too short, your last age of 2 has 63.89% of it's
objects still alive. Unset or increase the MaxTenuringThreshold to mitigate this
problem.


---
* The allocation rate is the increase in usage before a GC done. Growth rate
  is the increase in usage after a GC is done.

* The JVM efficiency score is a convenient way to quantify how efficient the
  JVM is. The most efficient JVM is 100% (pretty much impossible to obtain).

* There were no full GCs during this sample period. This reporting will
  be less useful/accurate as a result.

* A copy of the critical data used to generate this report is stored
  in /tmp/jtune_data-{user}.bin.bz2. Please copy this to your homedir if you
  want to save/analyze this further.

License

This application is distributed under the terms of the Apache Software License version 2.0. See COPYING file for more details.

Authors

FAQ

Q: Do I have to run jtune as root?
A: You should run it as the user of the Java process you want to analyze (or root).

Q: What versions of Java does this support?
A: JTune works with Java versions 6-8.

Q: What JVM options should I have turned on to properly use this tool?
A: You should have the following enabled: -Xloggc, -XX:+PrintTenuringDistribution,
   -XX:+PrintGCDetails, and -XX:+PrintGCDateStamps

Q: Can it tune the G1 GC?
A: Not at this time. G1 is quite a bit harder to tweak, but it's work in progress.

jtune's People

Contributors

blangel avatar ericbullen avatar eshiferax avatar jesseward avatar kei-yamazaki avatar klauern avatar mosheeshel avatar mrevutskyi avatar rajkannanreddy avatar ryanmaclean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jtune's Issues

Not able to parse GC logs with `-verbose:gc -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps`

When I run jtune.py with some GC logs that I have created previously, I get this output:

 $ python jtune.py -r mygc.log
2015-04-15 12:39:40,439: "root" (line: 1858) - ERROR I was not able to read the replay file. Exiting.

I have a set of JVM's that I am already using GC logging for, but I am unable to have JTune parse the log files. The following are the GC log settings I have in place already:

-verbose:gc -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps

It may be due to the difference between what -XX:+PrintGCDateStamps provides versus the default time output, or something that -verbose:gc is doing already.

Issue finding GC log file with log file rotation

JTune cannot locate the GC log file when log file rotation is enabled with the -XX:+UseGCLogFileRotation flag. This happens when running on a Java 8 VM (1.8.0_66), and probably some later versions of Java 7. Support for GCLogFileRotation was added in PR #5.

The issue is that the current GC log is now suffixed with ".current". This change was discussed in JDK-7164841.

The suffix should make finding the current GC log file quite a bit easier.

can not find java_path

hi,my process use java_path like this :
~/java/bin/java

but at 1204~1205

if 'java_path' not in details: details['java_path'] = ''.join(liverun("which java")).strip().replace("/java", "")
set java_path to "~/bin"
try change to
details['java_path'] = ''.join(liverun("which java")).strip()[:-5]

that works well

JTune Mistakingly Thinks JVM is Running GC1, Java HotSpot 1.7.0_75"

Here are the JAVA_OPTS:

java -Djava.util.logging.config.file=/app/apache/apache-tomcat-8.0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xloggc:/app/apache/apache-tomcat-8.0.20/logs/gc.log -verbose:gc -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xmx2048m -Xms2048m -XX:MaxPermSize=1024m

Output of jtune run:

$ ./jtune.py -p $(pgrep -f tomcat)
Start Time:  2015-04-22 13:01:09.133616 GMT
Host:        tomcathost.example.com
2015-04-22 13:01:09,328: "root" (line: 1533) - ERROR Looks like you're running the G1 collector. This tool unfortunately doesn't currently support G1 analysis.

java -version:

$ ./java -version 
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

OS Oracle Enterprise Linux 6.6:

$ uname -a
Linux tomcathost.example.com 2.6.32-504.8.1.el6.x86_64 #1 SMP Tue Jan 27 12:21:41 PST 2015 x86_64 x86_64 x86_64 GNU/Linux

Are negative CMSInitiatingOccupancyFraction values valid?

I have been running JTune on a couple of my servers and applying the tuning suggestions. I re-run it and re-apply those new values. It appears I'm getting honed in on an optimal set of JVM flags for the server, but I get what look like really odd -XX:CMSInitiatingOccupancyFraction values:

    * Reading gc.log file... done. Scanned 3008 lines in 0.0016 seconds.

    Meta:
    ~~~~~
    Sample Time:    29m13s (1753 seconds)
    System Uptime:  52d49m
    CPU Uptime:     104d1h
    Proc Uptime:    1m50s
    Proc Usertime:  2m22s (0.00%)
    Proc Systime:   5s (0.00%)
    Proc RSS:       1.71G
    Proc VSize:     5.07G
    Proc # Threads: 93

    YG Allocation Rates*:
    ~~~~~~~~~~~~~~~~~~~~~
    per sec (min/mean/max):       1.42M/s     145.62M/s     470.29M/s
    per day (min/mean/max):     119.83G/d         12T/d      38.75T/d

    OG Promotion Rates:
    ~~~~~~~~~~~~~~~~~~~
    per sec (min/mean/max):          9K/s      47.39M/s     583.80M/s
    per hr (min/mean/max):       31.63M/h     166.60G/h          2T/h

    Survivor Death Rates:
    ~~~~~~~~~~~~~~~~~~~~~
    Lengths (min/mean/max): 0/1.9/12
    Death Rate Breakdown:
       Age 1:  0.0% / 32.9% / 100.0% / 67.1% (min/mean/max/cuml alive %)
       Age 2: -0.4% / 11.3% / 89.2% / 59.5% (min/mean/max/cuml alive %)
       Age 3: -0.2% /  1.5% / 50.8% / 58.7% (min/mean/max/cuml alive %)
       Age 4: -0.0% /  1.0% / 40.9% / 58.0% (min/mean/max/cuml alive %)
       Age 5: -0.2% /  0.6% / 54.4% / 57.7% (min/mean/max/cuml alive %)
       Age 6: -0.0% /  0.6% / 31.7% / 57.4% (min/mean/max/cuml alive %)
       Age 7: -0.0% /  0.5% / 48.7% / 57.1% (min/mean/max/cuml alive %)
       Age 8:  0.0% /  0.3% / 23.4% / 56.9% (min/mean/max/cuml alive %)
       Age 9: -0.2% /  0.1% /  5.7% / 56.9% (min/mean/max/cuml alive %)
       Age 10: -0.0% /  0.1% / 12.7% / 56.8% (min/mean/max/cuml alive %)
       Age 11: -0.2% /  0.1% / 15.6% / 56.8% (min/mean/max/cuml alive %)
       Age 12: -0.0% /  0.0% /  0.5% / 56.8% (min/mean/max/cuml alive %)
       Age 13: -0.0% /  0.0% /  8.0% / 56.8% (min/mean/max/cuml alive %)
       Age 14:  0.0% /  0.1% / 22.9% / 56.7% (min/mean/max/cuml alive %)

    GC Information:
    ~~~~~~~~~~~~~~~
    YGC/FGC Count: 430/12 (Rate: 14.72/min, 0.41/min)

    GC Load (since JVM start): 3.80%
    Sample Period GC Load:     3.20%

    CMS Sweep Times: 2.326s /  4.335s /  5.275s / 1.21 (min/mean/max/stdev)
    YGC Times:       0ms / 122ms / 570ms / 100.47 (min/mean/max/stdev)
    FGC Times:       0ms / 51ms / 112ms / 30.62 (min/mean/max/stdev)
    Agg. YGC Time:   55480ms
    Agg. FGC Time:   673ms

    Est. Time Between FGCs (min/mean/max):          4d6h       1m8s         5s
    Est. OG Size for 1 FGC/hr (min/mean/max):     31.63M    166.60G         2T

    Overall JVM Efficiency Score*: 96.797%

    Current JVM Configuration:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
              NewSize: 172M
              OldSize: 5.19M
        SurvivorRatio: 1
     MinHeapFreeRatio: 40
     MaxHeapFreeRatio: 70
          MaxHeapSize: 3.34G
             PermSize: 240M
             NewRatio: 2

    Recommendation Summary:
    ~~~~~~~~~~~~~~~~~~~~~~~
    Warning: The process I'm doing the analysis on has been up for 1m50s,
    and may not be in a steady-state. It's best to let it be up for more
    than 5 minutes to get more realistic results.

    * Warning: The calculated recommended survivor ratio of 0.46 is less than 1.
    This is not possible, so I increased the size of newgen by 87.43M, and set the
    survivor ratio to 1. Try the tuning suggestions, and watch closely.

    - With a mean YGC time goal of 50ms, the suggested (optimized for a
    YGC rate of 33.55/min) size of NewGen (including adjusting for
    calculated max tenuring size) considering the above criteria should be
    163 MiB (currently: 172 MiB).
    - Because we're decreasing the size of NewGen, it can have an impact
    on system load due to increased memory management requirements.
    There's not an easy way to predict the impact to the application, so
    watch this after it's tuned.
    - It's recommended to have the PermGen size 1.2-1.5x (used 1.5x) the size of the
    live PermGen size. New recommended size is 241MiB (currently: 240MiB).
    - Looking at the worst (max) survivor percentages for all the ages, it looks
    like a TenuringThreshold of 5 is ideal.
    - The survivor size should be 2x the max size for tenuring threshold
    of 5 given above. Given this, the survivor size of 163M is ideal.
    - To ensure enough survivor space is allocated, a survivor ratio of 1 should be
    used.
    - It's recommended to have the max heap size 3-4x the size of the live data size
    (OldGen + PermGen), and adjusted to include the recommended survivor and newgen
    size. New recommended size is 4293MiB (currently: 3416MiB).
    - With a max 99th percentile OG promotion rate of 122.10M/s, and the max CMS
    sweep time of 5.275s, you should not have a occupancy fraction any higher than
    -12363.

    Java G1 Settings:
    ~~~~~~~~~~~~~~~~~~~
    - With a max ygc stdev of 46.95, and a 99th percentile ygc mean ms of 190ms,
    your config is probably not ready to move to the G1 garbage collector. Try
    tuning the JVM, and see if that improves things first.

    The JVM arguments from the above recommendations:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    -Xmx4293m -Xms4293m -Xmn163m -XX:SurvivorRatio=1 -XX:MaxTenuringThreshold=5
    -XX:CMSInitiatingOccupancyFraction=-12363 -XX:PermSize=241m -XX:MaxPermSize=241m
    ~~~

    * The allocation rate is the increase is usage before a GC done. Growth rate
      is the increase in usage after a GC is done.

    * The JVM efficiency score is a convenient way to quantify how efficient the
      JVM is. The most efficient JVM is 100% (pretty much impossible to obtain).

    * A copy of the critical data used to generate this report is stored
      in /tmp/jpulse_data-eaihost.bin.bz2. Please copy this to your homedir if you
      want to save/analyze this further.

Not able to start jtune

Hi team,

Thanks for developing a great tool. But unfortunately we aren't able to use it because of a common mistake. We tried to resolve it but couldn't get through. have listed the error thrown. Could you please look into it

python jtune.py -h
File "jtune.py", line 272
return iter(subproc.stdout.readline, b'')
^
SyntaxError: invalid syntax

Thanks
Sattish.

Couldn't connect to jvm via jmap to get valid data

ranjith@pc-ranjith-1290:~$ python jtune.py -c 10 -p 29874
2016-08-11 14:52:56,204: "root" (line: 1775) - WARNING: Couldn't connect to jvm via jmap to get valid data. Sleeping 2 seconds, and trying again.
2016-08-11 14:52:58,327: "root" (line: 1775) - WARNING: Couldn't connect to jvm via jmap to get valid data. Sleeping 4 seconds, and trying again.
2016-08-11 14:53:02,453: "root" (line: 1775) - WARNING: Couldn't connect to jvm via jmap to get valid data. Sleeping 6 seconds, and trying again.
2016-08-11 14:53:08,586: "root" (line: 1775) - WARNING: Couldn't connect to jvm via jmap to get valid data. Sleeping 8 seconds, and trying again.
^Z
[3]+ Stopped python jtune.py -c 10 -p 2987

JTune fails when run as root due to bug in jstat

At some point, jstat/jps were broken in that they could not discover processes owned by other users when run as root. See JDK-8075773, where it looks like the issue has been fixed.

However, it was somewhat difficult figuring out the cause when running JTune, since the failure would occur when accessing a nonexistent key in the jstat_data map.

I think it would be preferable to throw an exception when the return code from the subprocess call is non-zero. This might also warrant a mention in the documentation, since the FAQ says you can run JTune as root.

Release to PyPI

It would be great if JTune were on PyPI, so that it could be installed with pip, easy_install, buildout, etc.

Error trying to run it

When I execute jtune:

jtune.py -p 1234

This error occur:
╰─✘ ./jtune.py --pid 13817
Traceback (most recent call last):
File "./jtune.py", line 1899, in
proc_details = get_proc_info(cmd_args.pid)
File "./jtune.py", line 1211, in get_proc_info
gc_path = line.split(":", 1)[1]
IndexError: list index out of range

I am executing jtune on:

  • Ubuntu 14.04
  • Linux kernel 4.0.0
  • Java version 1.8.0_25

jTune output off by orders of magnitude

Java HotSpot(TM) 64-Bit Server VM (25.5-b02) for linux-amd64 JRE (1.8.0_05-b13), built on Mar 18 2014 00:29:27 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 65699620k(21908400k free), swap 67092472k(67092472k free)
CommandLine flags: -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSScavengeBeforeRemark -XX:ErrorFile=logs/hs_err.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/export/content/spelin-server-heapdumps/ -XX:InitialHeapSize=4294967296 -XX:+ManagementServer -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=536870912 -XX:NewSize=536870912 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Resulted in tune thinking I had a 4TB heap

Is it possible to pipe output to another tool in Unix?

So one of the things that I'm doing is trying to run a number of JTune's against a bunch of our different environments and apps. One thing I wanted to do was have it email me the results automatically. I currently set up a screen session to get JTune running, and then check in periodically to see if it completed.

My next thought was to try piping the output to a file, and then calling mailx on it, but it seems that at least with naive understanding of Unix, Ctrl-C seems to break how these pipe commands will work.

For instance, in the below use, Ctrl-C just outputs ^C, and calling it again seems to cause an expection in the tee app. I see similar errors with things like mail, etc., or anything that I want to pipe (|) or have run after it completes (&&):

[user@host gc]$ python jtune.py -s 7 -p 49023 | tee jtune-Thing-49023.txt
^C^Cclose failed in file object destructor:
Error in sys.excepthook:

Original exception was:

I guess my first thought is whether it's possible to just have it start on it's merry way immediately when I call JTune instead of having to kick it off with Ctrl-C every time? I would then be able to do something more along the lines of:

sleep 300 && python jtune --immediately -s 7 -p 49023 | tee jtune-Output-49023.txt

But this really could boil down to some missing knowledge of process control and the shell prompt that I'm overlooking. In any event, I would love some advice on how to accomplish this.

Does it work with OpenJDK ?

Hi,

I am trying to run JTune with OpenJDK 1.8 but its throwing an exception:

$ jtune -c 40 -p 27164
File "/usr/local/bin/jtune", line 9, in
load_entry_point('jtune==2.0.3', 'console_scripts', 'jtune')()
File "/usr/local/lib/python2.7/site-packages/jtune/jtune.py", line 1923, in main
jmap_data = get_jmap_data(cmd_args.pid, proc_details)
File "/usr/local/lib/python2.7/site-packages/jtune/jtune.py", line 1775, in get_jmap_data
jmap_data = _run_jmap(pid, procdetails)
File "/usr/local/lib/python2.7/site-packages/jtune/jtune.py", line 1331, in _run_jmap
java_int = procdetails['java_ver_int']
KeyError: 'java_ver_int'

jtune run yields KeyError: 'OC'

  File "jtune-2.0.0-py2-none-any.whl/jtune/__init__.py", line 1928, in main
    jstat_data = run_jstat(cmd_args.pid, java_path, cmd_args.no_jstat_output, cmd_args.fgc_stop_count, cmd_args.stop_count, cmd_args.ygc_stop_count)
  File "jtune-2.0.0-py2-none-any.whl/jtune/__init__.py", line 1439, in run_jstat
    if jstat_data['OC'] and jstat_data['OU']:
KeyError: 'OC'

negative CMSInitiatingOccupancyFraction values

I have a issue similar to issue #9.

java command args:

/usr/bin/java -Xms31g -Xmx31g -Xmn20g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -XX:PermSize=256m -XX:-UseAdaptiveSizePolicy -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:SurvivorRatio=6 -XX:TargetSurvivorRatio=90 -XX:+UseBiasedLocking -XX:MaxTenuringThreshold=20 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintClassHistogram -XX:+PrintTenuringDistribution -Xloggc:/var/log/elasticsearch/gc.log -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.5.2.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/data1/elasticsearch,/data2/elasticsearch,/data3/elasticsearch,/data4/elasticsearch,/data5/elasticsearch,/data6/elasticsearch,/data7/elasticsearch,/data8/elasticsearch,/data9/elasticsearch,/data10/elasticsearch,/data11/elasticsearch,/data12/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

jtune's bin data:
https://www.dropbox.com/s/kyxlixafuirs84u/jtune_data-elasticsearch.bin.bz2?dl=0

jtune's output:

* Reading gc.log file... done. Scanned 852 lines in 0.0004 seconds.

Meta:
~~~~~
Sample Time:    33m50s (2030 seconds)
System Uptime:  35d21h
CPU Uptime:     861d8h
Proc Uptime:    1d31m
Proc Usertime:  1d4h (0.14%)
Proc Systime:   1h35m (0.01%)
Proc RSS:       32.33G
Proc VSize:     131G
Proc # Threads: 315

YG Allocation Rates*:                                                                                                                                                                                                                                                 [162/1949]
~~~~~~~~~~~~~~~~~~~~~
per sec (min/mean/max):     268.76M/s     392.23M/s       1.16G/s
per day (min/mean/max):      22.15T/d      32.32T/d      98.12T/d

OG Promotion Rates:
~~~~~~~~~~~~~~~~~~~
per sec (min/mean/max):     127.97K/s      19.58M/s         89M/s
per hr (min/mean/max):      449.90M/h      68.84G/h     312.88G/h

Survivor Death Rates:
~~~~~~~~~~~~~~~~~~~~~
Lengths (min/mean/max): 7/11.8/13
Death Rate Breakdown:
   Age 1: 19.2% / 78.8% / 96.9% / 21.2% (min/mean/max/cuml alive %)
   Age 2:  2.3% / 20.0% / 62.1% / 17.0% (min/mean/max/cuml alive %)
   Age 3:  0.0% / 11.5% / 42.2% / 15.0% (min/mean/max/cuml alive %)
   Age 4: -0.2% /  6.2% / 38.5% / 14.1% (min/mean/max/cuml alive %)
   Age 5:  0.0% /  5.0% / 19.8% / 13.4% (min/mean/max/cuml alive %)
   Age 6:  0.0% /  6.0% / 44.9% / 12.6% (min/mean/max/cuml alive %)
   Age 7:  0.0% /  4.6% / 15.4% / 12.0% (min/mean/max/cuml alive %)
   Age 8: -0.0% /  4.5% / 29.5% / 11.5% (min/mean/max/cuml alive %)
   Age 9: -0.0% /  3.8% / 18.3% / 11.0% (min/mean/max/cuml alive %)
   Age 10: -0.0% /  3.1% / 16.6% / 10.7% (min/mean/max/cuml alive %)
   Age 11: -17.2% /  2.6% / 23.0% / 10.4% (min/mean/max/cuml alive %)
   Age 12: -0.4% /  3.7% / 18.4% / 10.0% (min/mean/max/cuml alive %)
   Age 13:  0.0% /  3.8% / 48.6% /  9.7% (min/mean/max/cuml alive %)
   Age 14: -203819.3% / -70191.6% / -10048.5% / 6786.1% (min/mean/max/cuml alive %)

GC Information:
~~~~~~~~~~~~~~~
YGC/FGC Count: 46/4 (Rate: 1.36/min, 0.12/min)

GC Load (since JVM start): 0.79%
Sample Period GC Load:     1.87%

CMS Sweep Times: 2.388s /  2.468s /  2.549s / 0.11 (min/mean/max/stdev)
YGC Times:       0ms / 203ms / 769ms / 327.00 (min/mean/max/stdev)
FGC Times:       0ms / 240ms / 489ms / 277.50 (min/mean/max/stdev)
Agg. YGC Time:   33105ms
Agg. FGC Time:   4766ms

Est. Time Between FGCs (min/mean/max):          1d1h      9m35s       2m6s
Est. OG Size for 1 FGC/hr (min/mean/max):    449.90M     68.84G    312.88G

Overall JVM Efficiency Score*: 98.134%

Overall JVM Efficiency Score*: 98.134%

Current JVM Configuration:
~~~~~~~~~~~~~~~~~~~~~~~~~~
          NewSize: 20G
          OldSize: 5.19M
    SurvivorRatio: 6
 MinHeapFreeRatio: 40
 MaxHeapFreeRatio: 70
      MaxHeapSize: 31G
         PermSize: 256M
         NewRatio: 2

Recommendation Summary:
~~~~~~~~~~~~~~~~~~~~~~~
* Warning: The calculated recommended survivor ratio of 0.00 is less than 1.
This is not possible, so I increased the size of newgen by 391.88M, and set the
survivor ratio to 1. Try the tuning suggestions, and watch closely.

- With a mean YGC time goal of 50ms, the suggested (optimized for a
YGC rate of 33.55/min) size of NewGen (including adjusting for
calculated max tenuring size) considering the above criteria should be
393 MiB (currently: 20 MiB).
- It's recommended to have the PermGen size 1.2-1.5x (used 1.5x) the size of the
live PermGen size. New recommended size is 63MiB (currently: 256MiB).
- Looking at the worst (max) survivor percentages for all the ages, it looks
like a TenuringThreshold of 74 is ideal.
- The survivor size should be 2x the max size for tenuring threshold
of 74 given above. Given this, the survivor size of 393M is ideal.
- To ensure enough survivor space is allocated, a survivor ratio of 1 should be
used.
- It's recommended to have the max heap size 3-4x the size of the live data size
(OldGen + PermGen), and adjusted to include the recommended survivor and newgen
size. New recommended size is 28944MiB (currently: 31744MiB).
- With a max 99th percentile OG promotion rate of 89M/s, and the max CMS sweep
time of 2.549s, you should not have a occupancy fraction any higher than -1337.

Java G1 Settings:
~~~~~~~~~~~~~~~~~~~
- With a max ygc stdev of 159.67, and a 99th percentile ygc mean ms of 681ms,
your config is probably not ready to move to the G1 garbage collector. Try
tuning the JVM, and see if that improves things first.

The JVM arguments from the above recommendations:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Xmx28944m -Xms28944m -Xmn393m -XX:SurvivorRatio=1 -XX:MaxTenuringThreshold=74
-XX:CMSInitiatingOccupancyFraction=-1337 -XX:PermSize=63m -XX:MaxPermSize=63m
~~~

* The allocation rate is the increase is usage before a GC done. Growth rate
  is the increase in usage after a GC is done.

* The JVM efficiency score is a convenient way to quantify how efficient the
  JVM is. The most efficient JVM is 100% (pretty much impossible to obtain).

* A copy of the critical data used to generate this report is stored
  in /tmp/jtune_data-elasticsearch.bin.bz2. Please copy this to your homedir if you
  want to save/analyze this further.

TypeError in Recommendation Summary

In running JTune with a JVM that I've previously adjusted heap and size values based on a prior run, I get this:

* Reading gc.log file... done. Scanned 3007 lines in 0.0014 seconds.

Meta:
~~~~~
Sample Time:    13m50s (830 seconds)
System Uptime:  52d1h
CPU Uptime:     104d2h
Proc Uptime:    27m15s
Proc Usertime:  11m59s (0.01%)
Proc Systime:   30s (0.00%)
Proc RSS:       3.83G
Proc VSize:     5.16G
Proc # Threads: 183

YG Allocation Rates*:
~~~~~~~~~~~~~~~~~~~~~
per sec (min/mean/max):       2.85M/s     160.04M/s     512.28M/s
per day (min/mean/max):     240.86G/d      13.19T/d      42.21T/d

OG Promotion Rates:
~~~~~~~~~~~~~~~~~~~
per sec (min/mean/max):      52.96K/s      50.73M/s     114.85M/s
per hr (min/mean/max):      186.20M/h     178.33G/h     403.76G/h

Survivor Death Rates:
~~~~~~~~~~~~~~~~~~~~~
Lengths (min/mean/max): 0/1.5/11
Death Rate Breakdown:
   Age 1:  0.0% / 33.2% / 100.0% / 66.8% (min/mean/max/cuml alive %)
   Age 2: -0.0% /  9.3% / 89.3% / 60.6% (min/mean/max/cuml alive %)
   Age 3:  0.0% /  0.6% / 23.5% / 60.3% (min/mean/max/cuml alive %)
   Age 4:  0.0% /  0.5% / 23.4% / 60.0% (min/mean/max/cuml alive %)
   Age 5:  0.0% /  0.6% / 99.6% / 59.7% (min/mean/max/cuml alive %)
   Age 6:  0.0% /  0.5% / 81.6% / 59.4% (min/mean/max/cuml alive %)
   Age 7: -0.0% /  0.3% / 94.9% / 59.1% (min/mean/max/cuml alive %)
   Age 8:  0.0% /  0.1% / 15.3% / 59.1% (min/mean/max/cuml alive %)
   Age 9:  0.0% /  0.0% /  8.7% / 59.1% (min/mean/max/cuml alive %)
   Age 10:  0.0% /  0.1% / 30.2% / 59.0% (min/mean/max/cuml alive %)
   Age 11: -0.0% /  0.1% / 12.3% / 58.9% (min/mean/max/cuml alive %)
   Age 12:  0.0% /  0.2% / 45.0% / 58.9% (min/mean/max/cuml alive %)
   Age 13:  0.0% /  0.0% /  1.2% / 58.9% (min/mean/max/cuml alive %)
   Age 14:  0.0% /  0.0% / 16.0% / 58.8% (min/mean/max/cuml alive %)

GC Information:
~~~~~~~~~~~~~~~
YGC/FGC Count: 469/12 (Rate: 33.90/min, 0.87/min)

GC Load (since JVM start): 7.37%
Sample Period GC Load:     7.45%

CMS Sweep Times: 3.382s /  4.366s /  4.982s / 0.66 (min/mean/max/stdev)
YGC Times:       0ms / 130ms / 705ms / 102.68 (min/mean/max/stdev)
FGC Times:       19ms / 47ms / 108ms / 23.44 (min/mean/max/stdev)
Agg. YGC Time:   61306ms
Agg. FGC Time:   568ms

Est. Time Between FGCs (min/mean/max):        17h25m       1m3s        28s
Est. OG Size for 1 FGC/hr (min/mean/max):    186.20M    178.33G    403.76G

Overall JVM Efficiency Score*: 92.545%

Current JVM Configuration:
~~~~~~~~~~~~~~~~~~~~~~~~~~
          NewSize: 172M
          OldSize: 5.19M
    SurvivorRatio: 1
 MinHeapFreeRatio: 40
 MaxHeapFreeRatio: 70
      MaxHeapSize: 3.34G
         PermSize: 240M
         NewRatio: 2

Recommendation Summary:
~~~~~~~~~~~~~~~~~~~~~~~
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "jtune/jtune.py", line 1756, in _at_exit
    optimized_for_ygcs_rate)
  File "jtune/jtune.py", line 752, in _run_analysis
    proc_details)
  File "jtune/jtune.py", line 1029, in _show_recommendations
    adj_ng_size)
  File "jtune/jtune.py", line 873, in _get_survivor_info
    survivor_ratio, reduce_k((max_tenuring_size - adj_ng_size) / 1024)),
  File "jtune/jtune.py", line 418, in reduce_k
    return reduce_k(size / Decimal("1024.0"), precision=precision,
TypeError: unsupported operand type(s) for /: 'float' and 'Decimal'
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "jtune/jtune.py", line 1756, in _at_exit
    optimized_for_ygcs_rate)
  File "jtune/jtune.py", line 752, in _run_analysis
    proc_details)
  File "jtune/jtune.py", line 1029, in _show_recommendations
    adj_ng_size)
  File "jtune/jtune.py", line 873, in _get_survivor_info
    survivor_ratio, reduce_k((max_tenuring_size - adj_ng_size) / 1024)),
  File "jtune/jtune.py", line 418, in reduce_k
    return reduce_k(size / Decimal("1024.0"), precision=precision,
TypeError: unsupported operand type(s) for /: 'float' and 'Decimal'

Jstat readings gets printed while using -n/--no-jstat-output arguments

Arguments "-n"/"--no-jstat-output" can be used if we dont want to see the jstat output and to print only the summary
if my understanding is correct,
Please find below my finding
It suppresses only the header column and the jstat readings(calculated by script) still gets printed(I can send the pull request on you confirmation).
If my understanding is wrong, can you please give me more insights on this argument

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.