Comments (15)
I suspect that this problem has corner cases that are unresolvable in the general case, such as when rules with the same name are applied to many variables, and thus that correctness can't be guaranteed. A mostly-working linter/sanity checker should be an achievable goal though.
from prometheus.
Yeah, you can create two rules with the same name, but you usually wouldn't encourage that, right? Maybe we should even forbid that? Or are there good use cases for this?
I was btw. not thinking about a linter, but about Prometheus automatically sorting rules topologically according to their dependencies on other rules.
from prometheus.
Yeah, you can create two rules with the same name, but you usually wouldn't encourage that, right? Maybe we should even forbid that?
I think it's an option to forbid it depending on what exact naming and rule conventions we decide on.
Even if could make that work out in theory including all the knock-on effects, in practice I expect many users won't follow the conventions and or end up with clashes on metrics names. I'm willing to bet we'll have clashes just within the "official" client libraries for something like GC, before you even get to applications themselves.
There's also cases like rules that depend on themselves.
from prometheus.
I think as a first step it’d be a good idea to evaluate rules in the order in which they were defined in the config file.
from prometheus.
Currently we are reading the files and their rules in order. On each evaluation iteration, we evaluate concurrently, though, because naive sequential evaluation can take longer than the evaluation interval. In that case iterations are missed.
Side note: if the a single query in the concurrent evaluation takes longer, a whole iteration will be missed for all rules. We should not be waiting for all rules in an iteration to finish and only skip iterations for the slow queries.
from prometheus.
I think as a first step it’d be a good idea to evaluate rules in the order in which they were defined in the config file.
I've proposed that previously, some users are already at a point where that don't work for the reason @fabxc explains.
The last time we talked about this the plan was to add syntax to let you define a groups of rules, which would be run in order. Different groups could run at the same time then. This could also be used to enable things like allowing rules to access remote storage (which you wouldn't want enabled by default).
from prometheus.
Mh, it seems like we can solve this internally. Not sure whether taking the issue to the user is the best approach here.
Building a dependency graph will also be more optimal than one level of logical grouping by the user.
from prometheus.
Building a dependency graph will also be more optimal than one level of logical grouping by the user.
This is only one aspect to the problem. What we also need to do is spread out the load of the rule evaluations and allow for cases when there's loops or things we can't determine about a dependency graph.
I'm think ordered groups of rules with some micro-optimisation where it's safe is the route to take.
from prometheus.
I would think that cyclic dependencies error when reading the rules. Or are
there reasonable cases where it should be allowed?
With that, evaluating one chain of rules without blocking independent ones
is not an issue.
On Fri, May 22, 2015 at 12:48 PM Brian Brazil [email protected]
wrote:
Building a dependency graph will also be more optimal than one level of
logical grouping by the user.This is only one aspect to the problem. What we also need to do is spread
out the load of the rule evaluations, allow for cases when there's loops or
things we can't determine about a dependency graph.I'm think ordered groups of rules with some micro-optimisation where it's
safe is the route to take.—
Reply to this email directly or view it on GitHub
#17 (comment)
.
from prometheus.
I would think that cyclic dependencies error when reading the rules. Or are
there reasonable cases where it should be allowed?
There are advanced use cases where it comes up. It's also possible that there's rules that aren't actually cyclic, but where it's not possible for us to determine that due to e.g. use of regexes. You also wouldn't want to couple rule evaluation between jobs, as that could lead to a problem in one job taking out evaluation in another.
from prometheus.
Can you elaborate on the "use of regexes" part. I know that there were plans to have relabel()
in the QL at some point - are you referring to that?
from prometheus.
@fabxc For example, if someone uses a regex matcher on the metric name in the rule expression, you can't analyze anymore whether the resulting metric name(s) would need some rule to be executed first. Same for other labels (you could have parts of one metric being selected in one rule, and other parts in another rule).
from prometheus.
Ah, those regexes – of course, thanks!
from prometheus.
Superseded by #1095.
from prometheus.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
from prometheus.
Related Issues (20)
- docker image does not recognise timezone appropriately HOT 1
- OOM crashloop auto-recovery
- discovery(scaleway): instances without private IPs are not added to the target lists
- protocol error: received DATA after END_STREAM HOT 3
- Prometheus does not recognize `HELP` and `TYPE` for OpenMetrics counters HOT 3
- Idea to improve performance after missing a cache during scrape processing
- Promethues counter decreases by 1 for some time series data HOT 7
- prometheus is very slow for query and almost unavailable HOT 3
- Persist alert 'keep_firing_for' state across restarts HOT 5
- --enable-feature: Consider removing no-default-scrape-port HOT 1
- promtool syntax detects errors HOT 1
- Please sign your releases HOT 2
- Default --storage.tsdb.retention.time HOT 4
- Prometheus too old sample issue
- docs: Remove the section about remote read JSON responses - it only supports proto response or errors HOT 2
- Corrupting data written to remote storage in case sample_age_limit is hit HOT 1
- Implement support for dots in metric and label names. HOT 1
- Do the remote-write support the recording rule data? HOT 1
- Unable to add namespace in nomad_sd_configs HOT 1
- remote write 2.0 - benchmarking
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus.