Comments (8)
I'm still not sure what to do about this yet, but I wanted to get the issue open because this is a critical regression and we need to figure out something before 1.13 GA or we may need to revert the performance enhancement that introduced this behavior which would be rather unfortunate.
from micrometer.
When I said config, I didn't literally mean a property.
I understood. If it is pluggable, it is exposed API that adds complexity users need to understand (as opposed to an implementation detail), which is the concern.
Disallowing a metric would mean the metric is no more usable because its count can be relied upon as it represents only a sample whereas flattening out the high cardinal dimension.
I understand why the example filter results in more useful metrics than denying meters, but metrics instrumentation with a high cardinality tag is essentially a bug that should be fixed if at all possible. It already requires prior knowledge that such a filter needs to be configured for a high cardinality tag on a meter. Such a filter should be a stop-gap solution or for cases where it is out of control of the thing doing the config to ensure tags are limited to low cardinality (for example, Spring Boot configures a high cardinality tag filter for HTTP client metrics because users may not use uri templates - see here). With the most common case of HTTP client instrumentation, there should always be a way that templated paths can be tagged, but this requires users to use the HTTP client in a certain way that isn't (can't be) enforced. Proper instrumentation should make it possible for the cardinality to be low as long as usage follows some pattern. Then it is up to the end user (application code) to use it that way. If not, such MeterFilters will be needed, but they should be an indication the code should be fixed; not a permanent "solution". The filter's primary purpose is to prevent an OOM error and indicate a problem to the application owners to fix.
Does that make sense? Am I missing something?
I like the idea of a pluggable strategy (other than the complexity it adds), but I do worry about getting the API for this right directly in GA with no pre-releases. I'm wondering if maybe the safest way forward is to disable the cache by default and offer some way to enable it or possibly configure a strategy for it in an experimental way that it is clear it is not GA. I'll try to experiment with this.
from micrometer.
WDYT about making this cache pluggable via config? We can provide a default cache that would cap the cache at 2x the size of the number of meters registered or any other strategy that would be an ideal default.
from micrometer.
@lenin-jaganathan I usually prefer to not make more configuration, but I think in this case we may not have a better way forward. I like the general direction of what you mentioned. It somewhat reminds me of the HighCardinalityTagsDetector
we already have. I'll think on it some and discuss internally. A challenge is how to describe to users what is configurable, why, and what the tradeoffs are of different configurations. I think (but unfortunately we can't really know) that most users should be unaffected by this issue, which means most should not need to touch the default or know about this configuration. However, for those that do need to know about it and change the default, how do they find out they need to change it? We can log when we reach the cap (or some point before reaching the cap perhaps) but we'll need to figure out what the log message should say that will be clear to users.
A tangential question is: should the cause of this (a filter like the one in the issue description) be something we discourage or eventually even disallow? It doesn't seem unreasonable without the new implementation detail in MeterRegistry that results in this issue. That said, we have in some way, I think, encouraged using a deny filter when reaching max allowable tags rather than the approach in the meter filter implementation here. There may be other use cases to consider as well.
from micrometer.
I usually prefer to not make more configuration
When I said config, I didn't literally mean a property. What I was thinking is more of a pluggable interface with a basic implementation which would be ideal. But, can be plugged to have desired behavior based on needs.
should the cause of this (a filter like the one in the issue description) be something we discourage or eventually even disallow?
I probably would disagree a bit on this. There are a lot of open source libraries out there using Micrometer and there are probabilities sometimes that they can have high cardinal dimensions E.g: could HTTP path etc.
Disallowing a metric would mean the metric is no more usable because its count can be relied upon as it represents only a sample whereas flattening out the high cardinal dimension. Sure we are losing out on a dimension, but we can still answer what is the total count of things (distributions in case of timers/summaries).
from micrometer.
I do worry about getting the API for this right directly in GA with no pre-releases
I agree with this.
from micrometer.
After much thought, I eventually went to write a test for this only to realize we don't have the memory leak we thought we would because meters are only added to the preFilterIdToMeterMap if they are not already in the meterMap. So in the case given, or the general case of a filter that maps multiple IDs to a same ID, only the first time will it be added to the cache. After that, other IDs that end up mapping to the same post-filter ID will not use the cache and will have all filters applied to them and the shared meter will be retrieved from the meterMap. Incidentally, this seems like the right behavior here without the need to make anything configurable. I think we get the performance gain in most cases like we want and in cases like this where we don't get the performance boost, it is avoiding a memory leak by not adding to the cache. Here's the test I wrote:
@Test
void differentPreFilterIdsMapToSameId_thenCacheIsBounded() {
registry.config().meterFilter(MeterFilter.replaceTagValues("secret", s -> "redacted"));
Counter c1 = registry.counter("counter", "secret", "value");
Counter c2 = registry.counter("counter", "secret", "value2");
assertThat(c1).isSameAs(c2);
assertThat(registry.preFilterIdToMeterMap).hasSize(1);
assertThat(registry.remove(c1)).isSameAs(c2);
assertThat(registry.remove(c2)).isNull();
assertThat(registry.getMeters()).isEmpty();
assertThat(registry.preFilterIdToMeterMap).isEmpty();
}
from micrometer.
Yeah, I can see that now. We won't have a many-to-one mapping in the cache. Sorry, for the confusion and thanks for confirming it.
from micrometer.
Related Issues (20)
- Prometheus ClassCastException while scraping (Micrometer 1.13 / SB 3.3) HOT 4
- micrometer-core-1.13.0-sources.jar contains an extra main folder HOT 1
- Publish recent documentation under a fixed version number HOT 4
- Introduce Metric Filtering for CloudWatch Integration HOT 1
- Spring Boot with Kafka Template with supplied tags provider breaks spring_kafka_template metric
- Metric will be skipped when producing scrape output for PrometheusMeterRegistry HOT 9
- Fix histogram consistency in PrometheusMeterRegistry HOT 7
- ClassCastException io.prometheus.metrics.model.snapshots.HistogramSnapshot$HistogramDataPointSnapshot vs io.prometheus.metrics.model.snapshots.SummarySnapshot$SummaryDataPointSnapshot when scraping two PrometheusTimers, one with publishPercentile, the other one without after upgrade to Micrometer 1.13 HOT 3
- Timer metrics - not able to see _seconds_count and _seconds_sum metrics, _max metric is getting scrap HOT 1
- Update japicmp compatibleVersion to 1.13.0 for 1.14.x
- [gRPC + coroutines] Observation to which we're restoring is not the same as the one set as this scope's parent observation HOT 4
- ClassicHistogramBuckets creating failure HOT 2
- `ClassNotFoundException` using `Counter` from Micrometer version `1.13.1` HOT 1
- Java client 1.x consumes more memory than 0.x HOT 7
- Performance Issue: Heap is getting full slowly as using unique tag for each new entry of the. Looks like objects are not garbage collected. HOT 2
- Dynamic MeterFilter results in wrong Meter being reused HOT 4
- System memory metrics HOT 2
- Timed annotation should consider Transactional handling HOT 1
- Add KeyValue support for Observation.Event HOT 2
- Test duplicate stops on observations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from micrometer.