Component(s) target allocator What happened?</h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Target Allocator is not capable of scaling out with least-weighted strategy about opentelemetry-operator HOT 3 OPEN

utr1903 commented on June 18, 2024

Target Allocator is not capable of scaling out with least-weighted strategy

from opentelemetry-operator.

Comments (3)

jaronoff97 commented on June 18, 2024

Thank you so much for putting the time and effort in to this issue. Given how detailed your question is, i'm going to do my best to give you a worthy response... Honestly, i think we should switch to the consistent hashing strategy as the default. It lets you autoscale your target allocators, it's a much better strategy in terms of efficiency as well. To answer the question though...

This is actually the intended behavior. The worry is that if you redestribute targets too often your metrics are more prone to data loss. Internally we call this "stable allocation". There are a few scenarios we have in mind for this, I'll describe a few but let me know if you have any more questions.

You could imagine a scenario where one collector's allocated target is very busy (imagine something like kube-state-metrics which handles many high cardinality metrics). This would mean that assigned collector is more prone to OOMs and restarts. If we were to detect a restart and reallocate this target, we may actually cause another collector to OOM and drop more data. Ultimately this would cause metric patches as your collectors come up, scrape some data, and die.

Another scenario is more dangerous – imagine a large-enough cluster with millions of potential targets being discovered. The collector pool doing the sharded scraping probably has relatively small per pod resourcing to be more resilient to failures. And let's say you have 10000 targets per collector in this state. If your cluster is constantly scaling up and down, every time you add a collector, you need to lock the target allocator to delete and re-allocate the entire pool of targets. If you hold on to that lock for too long, you will begin causing the collector to timeout.

The last reason we do this is a bit more challenging to explain, but I'll try and do this... There's a known issue in the compatibility between otel and prometheus relating to start timestamps. It's explained in detail here, but to summarize, when we change the assigned collectors targets, the collector's prometheus receiver starts a new timeseries each time. If we were to re-distribute the targets from collector A to collector B back to collector A, I'm unsure we would properly count a reset for a cumulative metric to make the series accurate. This is mostly speculation on my end, I haven't tested it explicitly, but from the conversations I've had with my colleagues, we believe that keeping stable allocations is going to ultimately be best for users.

We discussed the implementation of the consistent hashing in our SIG meetings and a bit on the introductory PR here. The gist of our goal there is when a new collector comes online, we minimize the possible redistributions by having replication of our collectors on a large enough hashring. cc @swiatekm-sumo who may have more info he'd like to add here.

Circling back to where I started, I do think its time we change the default strategy, and I would support a PR that does exactly that. It wouldn't be a breaking change for target allocator users, in fact our average user may experience a performance boost. With that change we should also recommend the relabel filter strategy which also contains performance improvements.

from opentelemetry-operator.

utr1903 commented on June 18, 2024

@jaronoff97 Thanks a lot for the great response! It's definitely helpful for me to understand the background intention.
I would be happy to open up that PR to switch default allocation strategy to consistent-hashing (and to set the default filtering strategy to relabel-config, if agreed).

from opentelemetry-operator.

Target Allocator is not capable of scaling out with least-weighted strategy about opentelemetry-operator HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs