The approach to <a href="https://github.com/prometheus/jmx_exporter/blob/master/collec

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Attribute querying approach doesn't scale,about prometheus/jmx_exporter

Comments (34)

jimmidyson commented on July 30, 2024 1

@brian-brazil Thing is without any rules this should only be exporting JVM stats & this should be super fast, right? Saying it's faster than something else that is really slow doesn't seem like a sane argument to me. This could very easily lead to overlapping scrapes.

Is there any chance we could reopen this issue & think about a solution to this together?

from jmx_exporter.

mdillenk commented on July 30, 2024 1

OMG that was the whole problem. Thanks so much!

from jmx_exporter.

brian-brazil commented on July 30, 2024

Assuming that people tend to generate quite generic and large rulesets so that they can have a uniform agent installation everywhere so that they can have a uniform agent installation everywhere, this multiples to the number above.

The assumption is that you have a per-service ruleset as is shown in the example configs. Do you have benchmarks that indicate that this will be a problem compared to mBean slowness?

Time measurement is done with metrics and exported also via MBeans.

In general I wouldn't use that for benchmarks, as it's an exponentially decaying number (and also about 10x slower compared to simpler types of metrics). I'd suggest simple logging, JMH, or something like the simpleclient Summary/Histogram as they'll give you exact numbers.

My suggestion is to apply the rule before and use it for the query

The challenge with that is that with the features offered we need the full attribute with it's value in order to apply the rule.
I do expect a blacklist on mBean name to be added at some point, but I don't think this helps you. I guess we could extend that to attribute name too.
I'd also expect most rulesets to end up exposing most attributes, so I'm not sure you'll save much.

The fundamental issue is that JMX is slow as your benchmarks show, do you have any idea for ways to make that better?
If this is code you control I suggest switching your instrumentation to the java simpleclient which should be much faster and easier to use (see https://github.com/prometheus/client_java/blob/master/benchmark/README.md). The JMX exporter (as with all exporters) is intended for cases where you're transitioning to Prometheus or you don't have the option to use Prometheus more directly so we are limited to what other systems offer us in terms of metrics and performance.

from jmx_exporter.

rhuss commented on July 30, 2024

The assumption is that you have a per-service ruleset as is shown in the example configs. Do you have benchmarks that indicate that this will be a problem compared to mBean slowness?

That not but we had the following use case in mind: We want to create a base Java Docker image similar to this one, which can install and configure a Java agent during startup. It would have been nice to provide some base, very open, ruleset to match all possible application components (camel, jms, jdbc pools, ...) where rules applied only when the corresponding is applied and are ignored otherwise. This would ease maintenance, but of course, if the rules are not used in the MBean selection process, that doesn't make much sense.

In general I wouldn't use that for benchmarks, as it's an exponentially decaying number (and also about 10x slower compared to simpler types of metrics). I'd suggest simple logging, JMH, or something like the simpleclient Summary/Histogram as they'll give you exact numbers.

You are right, that was just a quick hack to demonstrate the problem, which hopefully becomes clear: Many MBeans usually, many attributes. Its not a benchmark.

The challenge with that is that with the features offered we need the full attribute with it's value in order to apply the rule.

Whats about making this a multi stage process: Use the MBean part of the rule to filter out MBeans in the query, then filter on the attributes and only then read it and filter on the value in a loop ?

The fundamental issue is that JMX is slow as your benchmarks show, do you have any idea for ways to make that better?

I don't think that JMX per se is slow, it depends on the amount of requests (4500 attribute reads/run in this example) and of course, how the MBeans are implemented (what you can't really influence). If you could filter out the 'expensive' reads already when querying for the MBeans that would help quite a bit. I'm pretty sure that your simpleclient easily becomes slow, too, if he had to read expensive (i.e. on-the-fly calculated) attributes, too. It uses MXBeans internally as well for some metrics, so there is no much difference here.

Personally, I don't consider JMX to be legacy (albeit quite old), but you get tons of metrics already for free on many platforms. Also with tools like hawtio there is some sort of JMX revival these days. A good and fast JMX exporter/bridge/acess ... for prometheus would be a good thing IMO, not only for transitioning.

To summarize: I don't think that JMX is slow, its the way how values are fetched here.

from jmx_exporter.

brian-brazil commented on July 30, 2024

This would ease maintenance, but of course, if the rules are not used in the MBean selection process, that doesn't make much sense.

It still makes sense I think.

Whats about making this a multi stage process: Use the MBean part of the rule to filter out MBeans in the query, then filter on the attributes and only then read it and filter on the value in a loop ?

As MBeans are used in such a wide variety of way, the way we generically support this is by putting everything into a string and letting you throw regexes at it. Accordingly we can't split out the MBean name as that'd require reverse engineering the regexes.

A blacklist of some form is an option, to deal with particularly slow or problematic mbeans/attributes.

I don't think that JMX per se is slow, it depends on the amount of requests (4500 attribute reads/run in this example)

4.5k metrics isn't particularly large. Doing a quick test with the Python client (which is less performant than the Java client), 4.5k metrics takes 60ms to render.

I'm pretty sure that your simpleclient easily becomes slow, too, if he had to read expensive (i.e. on-the-fly calculated) attributes, too.

It does permit that for special cases (e.g. that how the jmx exporter works), but in the general case we encourage usage such that updating metrics takes 10-15ns. On-the-fly calculations are discouraged, as they tend to be slow, harder to implement and there's usually a better way.

To summarize: I don't think that JMX is slow, its the way how values are fetched here.

It sounds like it may be how the values are generated. Something like this should take nanoseconds or microseconds per read, not milliseconds.

from jmx_exporter.

rhuss commented on July 30, 2024

The pattern you mention which prevents for optimisations, is this something Prometheus specific or specific to this exporter ? If the latter is the case, and if you would agree that optimisation on the query level for JMX makes sense, why not change the configuration syntax into something a bit more restrictive (but optimizable) ? E.g. a separate MBean pattern (which you can use in MBeanServer.query(), Attribute pattern (where you can iterate over before fetching the value), Value pattern (finally the pattern on the value) ?

from jmx_exporter.

brian-brazil commented on July 30, 2024

The pattern you mention which prevents for optimisations, is this something Prometheus specific or specific to this exporter ?

It's for this exporter. JMX is sufficiently non-standardised that a generic and highly configurable approach is called for.

If the latter is the case, and if you would agree that optimisation on the query level for JMX makes sense, why not change the configuration syntax into something a bit more restrictive (but optimizable) ?

That would make it less usable for the user, as they could no longer use the matching groups from a single regexp to determine details of a sample.

It's my belief that wanting to not read certain attributes for latency is an edge case, so I don't want to sacrifice usability for it.
What sort of speedups are you expecting from this?

from jmx_exporter.

rhuss commented on July 30, 2024

If I want to track 5000 attributes per server than none. If I want to concentrate on the 20 most important value than I expect a speedup of at least 100. Most of the values exposed via JMX are configuration values anyway which typically won't change every 15 seconds (the "M" in JMX).

Sorry, I think we come from different angles, so I don't want to bother you any longer.

Some final remarks, though:

I really don't see many uses cases with filtering on the exact value (maybe on thresholds, though)
Patterns with matching groups as you do can be easily done on MBean names and attribute names. E.g. a ^org.apache.cassandra.metrics<type=(\w+), name=(\w+)><count>Value: can be easily translated into MBeanServer.queryNames(new ObjectName("org.apache.cassandra.metrics:type=*,name=*,*"), iterate over the result, check for atributes named "count" and only then fetch the value. The only restriction for this to work is that capturing groups doesn't cross bracket boundaries. Such a big sacrifice to usability ?
You know that the order of key-value props in an ObjectName is insignificant ? (there is a 'natural' and 'canonical' ordering, but in fact the order of the key-value pair doesn't matter from a JMX perspective). I wonder because you select for the default format the first key to have a special meaning in naming.
Somewhat unrelated but might be helpful to you: Many appservers (older JBoss, especially Weblogic and Websphere), don't use the PlatformMBeanServer but their own implementation and sometimes not only a single one (Weblogic uses up to three). If you are interested which server uses which MBeanServer you might want to have a look into these classes

from jmx_exporter.

brian-brazil commented on July 30, 2024

If I want to track 5000 attributes per server than none. If I want to concentrate on the 20 most important value than I expect a speedup of at least 100. Most of the values exposed via JMX are configuration values anyway which typically won't change every 15 seconds (the "M" in JMX).

You indicated above that it was things being calculated on the fly that were slowing things down, so accordingly I'd expect static configuration values to be quite fast to process.

My suspicion is that there's a handful of mBeans/attributes that are really slow. Do you have a better breakdown of where time is being spent?

With the jmx servers I've worked with so far (apache things), static configuration is very much in the minority and I've wanted almost all the beans coming back as metrics.

Sorry, I think we come from different angles, so I don't want to bother you any longer.

I'm happy to accept adding a blacklist, or if JMX general performance is really that bad a whitelist.

I really don't see many uses cases with filtering on the exact value (maybe on thresholds, though)

It's mainly for extracting the value to put in a label as prometheus doesn't do strings, or where there's an enum filtering to convert it to a more useful set of variables.

The only restriction for this to work is that capturing groups doesn't cross bracket boundaries. Such a big sacrifice to usability ?

Yes, as it'd require a custom parser for the replacement strings and that wouldn't look like a normal regex substitution string.

I wonder because you select for the default format the first key to have a special meaning in naming.

That format pre-dates making the jmx exporter configurable, and I didn't want to break existing users so it remained as-is. The algorithm does seem to provide reasonably okay metrics in most cases.

Many appservers (older JBoss, especially Weblogic and Websphere), don't use the PlatformMBeanServer but their own implementation and sometimes not only a single one (Weblogic uses up to three).

Good to know thanks. This aspect of JMX was not documented that I could find.
This is something jmx_exporter should probably support in some form.

from jmx_exporter.

rhuss commented on July 30, 2024

FYI: I just was enable to run jmx_exporter on a vanilla Wildfly 8.2.0.Final without any rules and deployed application (needs some tweeks on the bootclasspath for wildfly):

# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 2.438419079
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 1.0

from jmx_exporter.

brian-brazil commented on July 30, 2024

Speed wise that doesn't sound too bad (Cassandra takes about 12s), your run bailed out with an error though so probably didn't get to read everything.

from jmx_exporter.

rhuss commented on July 30, 2024

you mean, it stops after an read error occurs ? Read error happen quite frequently because many MBeans are generated automatically and things like 'IllegalArgumentException' or 'UnsupportedOperationException' are quite common.

I suggest to simply to continue (but I think that is what jmx_exporter does anyway)

from jmx_exporter.

jimmidyson commented on July 30, 2024

@brian-brazil Our use case for jmx_exporter is to provide a way to export JMX metrics to Prometheus without the app having any dependency on any other libraries than it needs to function, including simpleclient.

Using jmx_exporter as a jvm agent is perfect for this use case as we can simply attach it to JVMs with appropriate config & we get JMX metrics ready to be scraped by Prometheus.

from jmx_exporter.

brian-brazil commented on July 30, 2024

Thing is without any rules this should only be exporting JVM stats & this should be super fast, right?

It'll export everything in PlatformMBeanServer by default. For the JVM stats the hope would be to add them to the simpleclient_hotspot collector so that all jvm users can benefit from them.

Is there any chance we could reopen this issue & think about a solution to this together?

Sure, I'm happy to look for a solution. Without knowing why things are slow it's hard to select the right solution.

I suggest to simply to continue (but I think that is what jmx_exporter does anyway)

It'll continue on common errors:
https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L89
If there's some missing we can add them, though I wonder if we should just catch Exception.

Using jmx_exporter as a jvm agent is perfect for this use case as we can simply attach it to JVMs with appropriate config & we get JMX metrics ready to be scraped by Prometheus.

That's the intended use case.

from jmx_exporter.

rhuss commented on July 30, 2024

@brian-brazil thanks for reopening. BTW, here is how we use jmx_exporter in a Docker base image --> https://registry.hub.docker.com/u/fabric8/java-agent-bond/

from jmx_exporter.

jimmidyson commented on July 30, 2024

Without knowing why things are slow it's hard to select the right solution.

Do you think that @rhuss' initial assessment is inaccurate? It makes sense to me, but you're right we haven't done any profiling to be 100% sure.

It'll export everything in PlatformMBeanServer by default.

If rules are specified does the exporter still list everything & filter after the fact or target JMX queries to the specified rules?

The challenge with that is that with the features offered we need the full attribute with it's value in order to apply the rule.

Could you give me an example of where the value is useful in the rule? Is there potentially a new flag we could add to enable the pre-query filtering, with the proviso that you would not be allowed to use values in rules? Would that be a fair compromise?

from jmx_exporter.

brian-brazil commented on July 30, 2024

Do you think that @rhuss' initial assessment is inaccurate?

It only gives an average. The question is it a handful of attributes that are slow, or is everything slow?

Knowing that, we can tell whether the filtering should happen at the mbean or attribute level and whether a whitelist or blacklist is appropriate.

If rules are specified does the exporter still list everything & filter after the fact or target JMX queries to the specified rules?

It still lists everything, and filters after the fact.

Could you give me an example of where the value is useful in the rule?

If it's an enum I can get labels out of it, or for getting strings into labels (once I get around to adding that).

Is there potentially a new flag we could add to enable the pre-query filtering, with the proviso that you would not be allowed to use values in rules? Would that be a fair compromise?

That'd mean some existing regexes may not work. It'd have to be be a separate option that'd apply at the scrape stage.

from jmx_exporter.

rhuss commented on July 30, 2024

It only gives an average. The question is it a handful of attributes that are slow, or is everything slow?

That's true. It's really hard to predict what's 'out there' in the JMX world since everybody can register any sort of MBeans with not restrictions what happens when you read an attribute. In my JMX experience over the years I found out that all sort of objects are exposed via JMX, even internal Objects like HibernateSessionFactory or database pools themselves etc. Many data was configuration data only which doesn't change over the lifetime of an server. Some (not so few) attributes or operations throw exception by default. Values declared as read-only can only be written. Sometimes security exceptions occur. And only a few (~5-10% but that's more a feeling) of the exposed attributes can be used for monitoring.

A lot of metrics which are useful for various appservers which I found out over time can be found here. Not so many.

To make it short: Not everything is slow, ok you have to go through the JMX subsystem but that adds maybe 2-3 extra layers, but some reads are expensive because of how they are implemented. And you can't influence this and maintaining a blacklist will become a nightmare when you have 1k+ attributes.

A whitelist could probably help, but why then not use the given rules as white list ? I guess most of the rules will have a filter on the MBean and/or Attribute name. This filter can be applied before values are read. And if there is a filter on the value this then can be applied afterwards.

Of course, this would mean that the rules need to be analysed before they are applied. Its probably not super easy but should be feasible. If you don't mind I would give it a try and prepare a PR potentially.

If it's an enum I can get labels out of it, or for getting strings into labels (once I get around to adding that).

Yes, for preparing labels. But is the value important to specify the target what should be read ? Only in this case you need to read all attribute's value in order to select the metrics by this specified value. I really don't see this use case. It's not about capturing but about selecting.

That'd mean some existing regexes may not work. It'd have to be be a separate option that'd apply at the scrape stage.

One could either keep the flag backwards compatible (so, keep the current algorithm if the flag is not given) or, ideally, one could detect according to the given rule pattern what kind of algorithm is appropriate.

from jmx_exporter.

brian-brazil commented on July 30, 2024

Of course, this would mean that the rules need to be analysed before they are applied. Its probably not super easy but should be feasible. If you don't mind I would give it a try and prepare a PR potentially.

Here's a thought I just had. Doing such an analysis will be very difficult, however what would work is to cache what attributes we didn't use at the last collection and then not read them the next time around.
The first scrape will still be slow, but as long as it's not too long that should work out.

from jmx_exporter.

brian-brazil commented on July 30, 2024

Actually that won't work due to the values. We could use some dummy value for the value when testing the regex and document it. The value is an uncommon use case, so making it a bit more difficult to use is okay.

from jmx_exporter.

rhuss commented on July 30, 2024

Good idea!

Caching what not has been matched could indeed help when reading attributes, although you still have to iterate over all MBean.

Alternatively one could keep track about what has been matched and register a JMX NotificationListener for getting notified when MBeans are registered/deregistered. This might be even more efficient since then one only has to iterate over the set of matched MBeans (not all).

from jmx_exporter.

brian-brazil commented on July 30, 2024

Some initial research on activemq indicates that about 30% of the time is spent on attribute reads, and another 30% on regex matching. getmbeaninfo is at ~7%, and there's another ~16% in JmxCollector and ~17% in JmxScraper that I need to pinpoint.

This would indicate towards switching to a more efficient regex engine, and some form of mbean-based whitelist.

from jmx_exporter.

brian-brazil commented on July 30, 2024

@bbaja42 was looking at this at the Hackaton yesterday, it seems that Google's RE2 in Java has some performance issues: https://groups.google.com/forum/#!topic/re2j-discuss/8c3L06m6wbY

from jmx_exporter.

brian-brazil commented on July 30, 2024

I've added support for mbean whitelists and blacklists, which in my small test case on activemq reduced the time by 90%. Can you try it out and see if it helps?

from jmx_exporter.

brian-brazil commented on July 30, 2024

Reports are that the whitelists/blacklists allow for a 50x speedup in a large production setup. We can't do anything about the regexs currently unfortunately, but I think that should be sufficient for most use cases.

from jmx_exporter.

rhuss commented on July 30, 2024

Thanks for the update, that's indeed a big improvement. I will try it out, but unfortunately quite busy these days.

thanks !

from jmx_exporter.

mdillenk commented on July 30, 2024

If the jmx-scraper is erroring out how can you find out what it's erroring on?

from jmx_exporter.

brian-brazil commented on July 30, 2024

It shouldn't be erroring out with other changes, but http://www.robustperception.io/viewing-logs-for-the-jmx-exporter/ will help you debug.

from jmx_exporter.

mdillenk commented on July 30, 2024

Thanks so much I will try this. Just for reference I am trying to monitor Apache Active MQ version 5.8. Do you know of an existing config for that. If not just being able to debug my config will help a lot. Thanks again.

from jmx_exporter.

brian-brazil commented on July 30, 2024

An empty config should always work, though typically won't produce the best possible results.

from jmx_exporter.

mdillenk commented on July 30, 2024

{
"hostPort": "127.0.0.1:9892",
"rules": [
{"pattern": ".*"},
]
}

Gives me a lot of jvm memory and cpu and garbage collection info but nothing about amqp or topics or queues. And always jmx_scrape_error 1.0

from jmx_exporter.

brian-brazil commented on July 30, 2024

You probably have the wrong port. I think it uses port 1099, and in general the java agent approach is preferred.

from jmx_exporter.

mdillenk commented on July 30, 2024

i'm using the java agent should i omit the hostPort option from the json? I was confused about that because the scraper port is defined in the AMQP start script when you define -javaagent:etc.....

from jmx_exporter.

brian-brazil commented on July 30, 2024

Yes. It's probably best to bring this onto the mailing list, rather than confusing those reading this bug in future.

from jmx_exporter.

Attribute querying approach doesn't scale about jmx_exporter HOT 34 CLOSED

Comments (34)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs