GithubHelp home page GithubHelp logo

Metric Timestamps about fastly-exporter HOT 15 OPEN

fastly avatar fastly commented on August 22, 2024
Metric Timestamps

from fastly-exporter.

Comments (15)

SuperQ avatar SuperQ commented on August 22, 2024 1

I'm going to continue to do some debugging on my end. I'll get back to you on what I find out.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

You can change the poll intervals.

edit: Not the actual real-time stats interval, though — see below.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

Actually, just for my understanding...

The real-time API takes a Unix timestamp as input, and returns after an arbitrary period of time with, essentially, deltas of all the stats. So you could give ts=100 and get back ts=105 with 5s worth of data, or get back ts=113 with 13s of data, etc. What timestamp would you apply? The time when the response was received?

from fastly-exporter.

SuperQ avatar SuperQ commented on August 22, 2024

I'm thinking it should get the end timestamp of the delta accumulation.

from fastly-exporter.

SuperQ avatar SuperQ commented on August 22, 2024

Yes, my thinking is that because there is some jitter in the timing of each real-time poll, and when Prometheus performs the scrape.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

I dug around in the docs for about 15m and couldn't figure out how to do this. Given this is how metrics are updated how should I apply a specific timestamp?

from fastly-exporter.

bjk-reddit avatar bjk-reddit commented on August 22, 2024

Yup, exactly as I thought. The time to pull from the API is highly variable. This means that sometimes data arrives slightly on one side or the other from a scrape.

Test patch:

diff --git a/pkg/rt/subscriber.go b/pkg/rt/subscriber.go
index a0ec01d..8551018 100644
--- a/pkg/rt/subscriber.go
+++ b/pkg/rt/subscriber.go
@@ -131,6 +131,7 @@ func (s *Subscriber) Run(ctx context.Context) error {
 // Recoverable errors are logged internally and not returned, so any non-nil
 // error returned by this method should be considered fatal to the subscriber.
 func (s *Subscriber) query(ctx context.Context, ts uint64) (currentName string, result apiResult, delay time.Duration, newts uint64, fatal error) {
+       start := time.Now()
        name, ver, found := s.provider.Metadata(s.serviceID)
        version := strconv.Itoa(ver)
        if !found {
@@ -169,7 +170,7 @@ func (s *Subscriber) query(ctx context.Context, ts uint64) (currentName string,

        switch resp.StatusCode {
        case http.StatusOK:
-               level.Debug(s.logger).Log("status_code", resp.StatusCode, "response_ts", response.Timestamp, "err", apiErr)
+               level.Debug(s.logger).Log("status_code", resp.StatusCode, "response_ts", response.Timestamp, "err", apiErr, "took", time.Since(start).Seconds())
                if strings.Contains(apiErr, "No data available") {
                        result = apiResultNoData
                } else {

Results:

level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434059 err=<none> took=2.018836437
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434060 err=<none> took=0.971126704
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434061 err=<none> took=0.958338924
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434062 err=<none> took=1.061254601
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434063 err=<none> took=0.969232367
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434064 err=<none> took=0.979430087
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434065 err=<none> took=1.134385254
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434066 err=<none> took=0.829105108
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434067 err=<none> took=0.986448288
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434068 err=<none> took=0.988539304
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434069 err=<none> took=0.970611605
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434070 err=<none> took=1.12616397
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434071 err=<none> took=0.915297038
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434072 err=<none> took=1.005067603
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434073 err=<none> took=0.929945924
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434074 err=<none> took=1.074395758
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434075 err=<none> took=0.959273333
level=debug component=rt.fastly.com service_id=SECRET status_code=200 response_ts=1633434076 err=<none> took=0.995491316

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

I guess I don't really understand how timestamps will help here. Are you saying to use a timestamp of the response_ts? Are you seeing that as substantially different than the local time?

edit: I ran it for a bit while logging the diff between the response_ts and the local timestamp when the response was received. It varied but only up to about 1100ms, which is exactly what you'd expect given the response_ts only has 1s granularity. It also seems to line up with your data. Given the API "refresh interval" is about 1s and the Prometheus scrape interval is 15s, I can't see how the dips and peaks in the graph are a function of any kind of data skew. Can you explain like I'm 5?

from fastly-exporter.

bjk-reddit avatar bjk-reddit commented on August 22, 2024

The issue is much more visible when you have a large amount of traffic and you're looking at the extrapolated rate between two scrapes. Any small amount of missing accumulation are greatly exaggerated by the extrapolation.

Think about it this way. If the update interval is 1s, and the scrape interval is 15s. An individual scrape could be missing 1/15th of the data. This will lead to the rate being off by -6.7%. The next scrape includes the missing data so it's now +6.7%

By including the timestamp, we align the data with the actual time window of what it represents. Currently the timestamps are aligned with the end timestamp of the data in addition to the random lag in the API.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

By including the timestamp, we align the data with the actual time window of what it represents.

Ah, but the returned data always lags the truth to at least some degree. At the moment, by several seconds! See the AggregateDelay parameter described here, which we don't consider at all right now. That could be a feature request, but it's of arguable benefit, because there's always going to be the possibility of "data drift" where some increments arrive late to the system serving rt.fastly.com itself. So I don't think computing "the extrapolated rate between two scrapes" will ever be precise in the way you're assuming/hoping it is, just by the nature of the system.

Manipulating data timestamps also introduces risks related to those timestamps being incorrect, for whatever reason. That would require safeguards, and the semantics of those safeguards in the presence of unsound input data are non-obvious. Can of worms....

from fastly-exporter.

bjk-reddit avatar bjk-reddit commented on August 22, 2024

Lagging behind the truth is fine, what's important is that the timestamp of the truth is reasonable. We do this for importing data from other monitoring systems, for example AWS Cloudwatch.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

I'm sure I'm being dense here but I still don't understand how a difference of less than a second — which is the granularity of the data itself — is meaningful enough to warrant correction. The delta never exceeds 1s, right? Would you not see exactly the same squiggles in the graph if we did the correction you're suggesting, but Prometheus happens to scrape on something other than precisely the .000 of the scrape interval? I'm honestly not trying to be difficult, I honestly don't understand how this would help. Or, I think maybe equivalently: I honestly don't understand why you think the extrapolated rate between two scrapes should be as precise as you're expecting.

from fastly-exporter.

bjk-reddit avatar bjk-reddit commented on August 22, 2024

The problem is that the API is crazy jittery, and aligns perfectly with second boundaries. So the delay wobbles the results before and after scrape alignment. It's fine if there is enough offset between the Prometheus scrape and the width of the API wobble. But when they're closely aligned, the data becomes excessively noisy.

I will work on a patch to see if I can demonstrate this.

from fastly-exporter.

peterbourgon avatar peterbourgon commented on August 22, 2024

The problem is that the API is crazy jittery

In my tests it has a delta of like +/- 1s. Is that a lot? And any exporter which is a proxy to an API is surely going to have this issue, right? My intuition is that it would typically be much more "jitter" than 1s -- am I off base? Do all exporters like this need to provide timestamps?

from fastly-exporter.

SuperQ avatar SuperQ commented on August 22, 2024

Since it's basically polling another monitoring system, yes

Another way we could handle this, is to convert the exporter to become a remote write streaming sever. Rather than scrape -> scrape, scrape -> stream.

from fastly-exporter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.