GithubHelp home page GithubHelp logo

fastly / fastly-exporter Goto Github PK

View Code? Open in Web Editor NEW
93.0 14.0 35.0 3.24 MB

A Prometheus exporter for the Fastly Real-time Analytics API

License: Apache License 2.0

Go 99.66% Dockerfile 0.09% Makefile 0.25%
tool fastly-oss-tier1

fastly-exporter's Introduction

fastly-exporter Latest Release Build Status

This program consumes from the Fastly Real-time Analytics API and makes the data available to Prometheus. It should behave like you expect: dynamically adding new services, removing old services, and reflecting changes to service metadata like name and version.

Getting

Binary

Go to the releases page.

Docker

Available on the packages page as fastly/fastly-exporter.

docker pull ghcr.io/fastly/fastly-exporter:latest

Note that version latest will track RCs, alphas, etc. -- always use an explicit version in production.

Helm chart

Helm must be installed to use the prometheus-community/fastly-exporter chart. Please refer to Helm's documentation to get started.

Once Helm is set up properly, add the repo as follows:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

And install:

helm upgrade --install fastly-exporter prometheus-fastly-exporter --namespace monitoring --set token="fastly_api_token"

Source

If you have a working Go installation, you can clone the repo and install the binary from any revision, including HEAD.

git clone [email protected]:fastly/fastly-exporter
cd fastly-exporter
go build ./cmd/fastly-exporter
./fastly-exporter -h

Using

Basic

For simple use cases, all you need is a Fastly API token. See this link for information on creating API tokens. The token can be provided via the -token flag or the FASTLY_API_TOKEN environment variable.

fastly-exporter -token XXX

This will collect real-time stats for all Fastly services visible to your token, and make them available as Prometheus metrics on 127.0.0.1:8080/metrics.

Filtering services

By default, all services available to your token will be exported. You can specify an explicit set of service IDs to export by using the -service xxx flag. (Service IDs are available at the top of your Fastly dashboard.) You can also include only those services whose name matches a regex by using the -service-allowlist '^Production' flag, or exclude any service whose name matches a regex by using the -service-blocklist '.*TEST.*' flag.

For tokens with access to a lot of services, it's possible to "shard" the services among different fastly-exporter instances by using the -service-shard flag. For example, to shard all services between 3 exporters, you would start each exporter as

fastly-exporter [common flags] -service-shard 1/3
fastly-exporter [common flags] -service-shard 2/3
fastly-exporter [common flags] -service-shard 3/3

Filtering metrics

By default, all metrics provided by the Fastly real-time stats API are exported as Prometheus metrics. You can export only those metrics whose name matches a regex by using the -metric-allowlist 'bytes_total$' flag, or elide any metric whose name matches a regex by using the -metric-blocklist imgopto flag.

Filter semantics

All flags that filter services or metrics are repeatable. Repeating the same flag causes its condition to be combined with OR semantics. For example, -service A -service B would include both services A and B (but not service C). Or, -service-blocklist Test -service-blocklist Staging would skip any service whose name contained Test or Staging.

Different flags (for the same filter target) combine with AND semantics. For example, -metric-allowlist 'bytes_total$' -metric-blocklist imgopto would only export metrics whose names ended in bytes_total, but didn't include imgopto.

Service discovery

Per-service metrics are available via /metrics?target=<service ID>. Available services are enumerated as targets on the /sd endpoint, which is compatible with the generic HTTP service discovery feature of Prometheus. An example Prometheus scrape config for the Fastly exporter follows.

scrape_configs:
  - job_name: fastly-exporter
    http_sd_configs:
      - url: http://127.0.0.1:8080/sd
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: service
      - target_label: __address__
        replacement: 127.0.0.1:8080

Dashboards and Alerting

Data from the the Fastly exporter can be used to build dashboards and alerts with Grafana and Alertmanager. For a fully working example see fastly-dashboards created by @mrnetops. Fastly-dashboards contains a Docker Compose setup, which boots up a full fastly-exporter + Prometheus + Alertmanager + Grafana + Fastly dashboard stack with Slack alerting integration.

fastly-exporter's People

Contributors

arslanbekov avatar bridgetlane avatar davidbirdsong avatar dependabot[bot] avatar froesef avatar gaashh avatar leklund avatar ljagiello avatar mrnetops avatar peterbourgon avatar phamann avatar shawnps avatar skgsergio avatar superq avatar takanabe avatar thommahoney avatar tomhughes avatar xamebax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastly-exporter's Issues

metric consolidation for http2/http3

Similar to how we consolidated

  • tls_v10
  • tls_v11
  • tls_v12
  • tls_v13
    into fastly_rt_tls_total{tls_version=$version}

We should consolidate

  • http2
  • http3
  • and the silent but implicit http1
    into fastly_rt_http_total {version=$version}.

Right now we need to do hinky things like

sum(rate(fastly_rt_requests_total{}[1m])) 
- sum(rate(fastly_rt_http2_total{}[1m]))
- sum(rate(fastly_rt_http3_total{}[1m])) 

to derive http1 requests and it requires us to explicitly know every potential metric name in play to essentially derive it by exclusion.

i.e. if http4 is added, the above exclusion calculation breaks unless it is explicitly adjusted.

fastly_rt_http_total {version=1} would be soooo much nicer and way less fragile.

Default endpoint value

I ran into an issue deploying this exporter on k8s where I hadn't set the endpoint argument, which ended up defaulting to http://127.0.0.1:8080/metrics. So Prometheus wasn't able to scrape it, the pod's metrics endpoint wasn't reachable from other pods even though they all had access... While my team and I were troubleshooting, we realized that according to the Dockerfile, the entrypoint is in fact http://0.0.0.0:8080/metrics! So shouldn't the default value of this argument also be http://0.0.0.0:8080/metrics?

Docker tag for latest binary release

It looks like the Docker image tags aren't up-to-date/in sync with the binary releases on Github. While I'm successfully using the latest tag on Docker for now, in general it's preferable to tag the docker image to a specific release for compatibility reasons etc. Just wondering if that's something you'd be able to do, at least for major versions such as v3.

Also, thanks for writing this tool! It's been working out great so far. Cheers!

missing metrics

hello,

correct me if i missed something big here, but i look at the fastly historical stats api documentation [here][https://docs.fastly.com/api/stats], i see many metrics which are not exported to my prom server such as status_5xx. did i miss something or is this currently not supported?

Consider adding a way to (optionally) track tokens

Tracking tokens in order to monitor when they get old and should be renewed, or when they stop to be used and could be destroyed, is a must on security departments.

So, it would be great to be able to get such stats by using Prometheus, but requires a service to expose that data.

This applies for user and automation tokens.

Minimum data required:

  • type (automation/user)
  • creation timestamp
  • last used timestamp

Expose fastly rt api 'recorded' timestamp in prometheus metrics

The fastly real-time api includes a recorded field in the response, which is a timestamp of when a metric was generated. This is currently represented in the APIResponse struct.

Prometheus can be asked to honour the metric timestamps seen when scraping a target.

And the prometheus golang client allows for the creation of metrics with timestamps.

This issue is to update the fastly-exporter to use the recorded field as the metric timestamp when converting fastly metrics to prometheus ones. Doing this could help keep fastly metrics from being offset from other scrape targets.

Metric Timestamps

While it's normal not best practice to expose timestamps. I think this exporter may need them.

I have been seeing issues with update vs scrape time alignment.

This may also be an artifact of the latency for ingesting updates from the real-time API. If the update comes in just slightly after the scrape, the update will not include some data. But the next scrape will catch up with the current value.

This causes artifacts in the graphs.

Screen Shot 2021-10-04 at 18 04 12

Miss duration histogram has fewer buckets in v6.0.0 alpha.1 release

Hello! Thanks for your work on this project. It's super useful getting this data into Prometheus.

I've started using the v6.0.0-alpha.1 release since it makes it easier to get 429 response rates. Overall the release is working great.

The problem I'm having is that the MissDurationSeconds histogram only has three buckets for durations greater than 1 second (2.5, 5, and 10). In v5.0.0, there were double the number of buckets for durations greater than 1 second (2, 4, 8, 16, 32, 60).

In practice, I think this means I'm getting less accurate data on p99 miss latency. I'm seeing about a 300-400ms difference compared to before. Obviously, this issue will be experienced differently by users based on their specific response time patterns.

If it's desirable, I'm happy to submit a PR to either switch the bucket values back to their previous configuration, or to add a command-line option (e.g. -miss-duration-buckets 0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2,4,8,10) to allow the bucket configuration to be specified at runtime.

Fastly's API returns a significant amount of buckets:

The miss_histogram object is a histogram. Each key is the upper bound of a span of 10 milliseconds, and the values are the number of requests to origin during that 10ms period. Any origin request that takes more than 60 seconds to return will be in the 60000 bucket.

From my limited querying of the API, I seem to see the following pattern for buckets from Fastly:

  • 1ms buckets from 0-10ms
  • 10ms buckets from 10-250ms
  • 50ms buckets from 250-1000ms
  • 100ms buckets from 1000-3000ms
  • 500ms buckets from 3000-60000ms

Exporter should set it's own custom user-agent

It would be nice if the exporter can set a custom user-agent string so that it can be identified by downstream APIs. Such as User-Agent: fastly-exporter v1.2.3.

I'm happy to take this work on next week, but detailing here before I forget.

Pagination is not working properly (not fetching every pages)

Hi !
After updating to latest version we noticed that we were missing some services,
On version 7:
level=debug component=api.fastly.com refresh_took=749.283548ms total_service_count=127 accepted_service_count=0
On version 6 or before:
level=debug component=api.fastly.com refresh_took=2.052766648s total_service_count=7528

We got 76 pages of services and the 127 services found by version 7 comes from only the first page and the last one, so the pagination in exporter is not working properly and not fetching each pages but only the first one and the last one.

Thank you !

RT API meta-metric

It would be useful to have a meta-metric like fastly_rt_up that indicates if the exporter has a valid connection to the RT API for each service.

Missing Docker images

It looks like mrnetops/fastly-exporter is missing the v6.1.0 release on Docker Hub..

Support scraping multiple services

It looks like we can only specify a single Fastly service ID in the exporter right now.

Some shops have multiple services/service ID's set up, so it would be useful to be able to pass a comma-separated string or something and scrape multiple services.

`-metric-blocklist` doesn't work with `fastly_rt_datacenter_info`

The -metric-blocklist can't filter out the metric fastly_rt_datacenter_info.

Here is how to reproduce the issue with the latest stable version:

docker run \
  --env FASTLY_API_TOKEN="<your token>" \
  --interactive \
  --publish="0.0.0.0:8080:8080" \
  --rm \
  --tty \
  ghcr.io/fastly/fastly-exporter:v7.6.1 \
  -metric-blocklist='^fastly_rt_datacenter_info$'
curl -s http://127.0.0.1:8080/metrics | grep fastly_rt_datacenter_info

As you can see, the fastly_rt_datacenter_info continues to be exported, even if explicitly filtered out.


PS: Kudos for maintaining this exporter. It is really handy! πŸ™πŸΌ

Better logging for rt.fastly.com (Client.Timeout exceeded while awaiting headers)

Because of how fastly-exporter will wait for new stats to be published for services, we tend to get a ton of logging like this for services that are simply not handling requests, and so not generating stats.

level=error component=rt.fastly.com service_id=xxx during="execute request" err="Get "https://rt.fastly.com/v1/channel/xxx/ts/1666656765\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

This can make it hard to suss out of there are in fact errors with or connecting to rt.fastly.com, vs simply having a number of idle services. This is a problem that is going to scale with the number of services in play the the account in question. (assuming more services overall is going to increase the incident and volume of idle services)

Possibly these errors should be reclassed as info as they are byproducts of the intended use case of connecting and listening for stat updates. and/or we should have better logging for when there are issues (connection refused, non-2xx responses, etc)

Short term, I have attempted to minimize the spurious errors with -rt-timeout 120s to increase the likelyhood of a service request -> stat response.

Interestingly, that seems to have tentatively addressed all of the errors, which makes me wonder if there is an interaction with a maximum time to stat response from rt.fastly.com, even if stats are zero. So possibly, raise that default to > the maximum stat response time from rt.fastly.com (if that is in fact what is happening)?

Race condition between processing and scraping

There is a race condition that exists when the scrape is happening while all the per datacenter metrics are being incremented. When the results are processed from the real-time stats API it’s iterating and incrementing the metrics per-datacenter. If the scrape happens during that processing loop, the metrics that are reported won’t include all metrics for all datacenters since the response from the realtime API hasn’t finished processing yet. Therefore that scrape is reporting all the data from the last second of realtime data. I was able to easily reproduce by adding an artificial delay in the processing loop to force the scrape to happen in the middle of the loop. This can cause interesting graphs when running queries like:

(sum(rate(fastly_rt_requests_total[1m])) by(service_id)- (
sum(rate(fastly_rt_tls_total[1m]))by(service_id) ))

This line should be flat:

Screen Shot 2023-07-06 at 3 36 29 PM

A potential solution is to add some locking so that every scrape is guaranteed to have a full set of data from any given response from the API. This has some performance implications especially when running against many services.

Thanks to @mrnetops for reporting.

Continue starting up if fastly is down

Right now if Fastly is down or a temporary network glitch is occurring during startup the exporter crashes:

fastly # [    7.238528] prometheus-fastly-exporter-start[935]: level=error component=api.fastly.com during="initial API calls" err="error executing API services request: Get \"https://api.fastly.com/service\": dial tcp: lookup api.fastly.com: Temporary failure in name resolution"
fastly # [    7.242486] systemd[1]: prometheus-fastly-exporter.service: Main process exited, code=exited, status=1/FAILURE

For me it would be preferable for the exporter to continue starting up but emit a metric saying Fastly seems to be down.

re https://github.com/NixOS/nixpkgs/pull/151427/files#diff-e669f3682eb07a05197060a93f278ff57f160465c8168906e12f1f2c472026d8R262-R273

Reduce output size of metrics endpoint

Problem

Currently, when collecting stats for 201 services after running the exporter for 13 days with 12 shards the metrics endpoint output size is as follows:

Shard Services Payload (KB)
1 18 30,792
2 25 55,378
3 16 40,243
4 15 29,123
5 21 34,345
6 22 40,100
7 10 19,948
8 19 47,790
9 11 20,234
10 15 40,499
11 19 37,366
12 19 29,092
Total 210 424,910

With a scrape interval of 60 seconds the bandwidth requirement becomes 7,082 KB/s. In terms of storage requirements, this is 424,910 KB * 60 mins * 24 hours = 584 GB of raw data per day.

This can cause considerable impact on Prometheus scraping performance as this is a very large payload.

Proposal

Currently, each datacenter is a label which multiplies the number of each metric. When combined with a metric that has a status_code label this can explode the number of metrics returned.

A possible solution to reduce the output size of the metrics endpoint would be to aggregate the datacenter.

Analysis of how this might impact the output size for the earlier example is as follows:

Shard Services Payload (KB)
1 17 645
2 25 934
3 16 607
4 14 531
5 21 796
6 21 786
7 10 394
8 18 686
9 10 398
10 15 582
11 19 718
12 15 569
Total 201 7,646

With a scrape interval of 60 seconds the bandwidth requirement becomes 127 KB/s. In terms of storage requirements, this is 7,646 KB * 60 mins * 24 hours = 11 GB of raw data per day.

A comparison to the results with having individual datacenter metrics shows the following improvements:

Datacenter Payload (KB) Rate (KB/s) Storage (Daily in GB) Reduction
Individual 424,910 7082 584
Aggregated 7,646 127 11 98%

A side effect of having aggregated datacenter metrics would be the memory consumption should be reduced. It it hard to determine the exact impact but there should certainly be some improvements.

Conclusion

Aggregated data center metrics would provide an option for users that wish to reduce the metrics endpoint output size. By providing this as an option (not the default) this would allow users to decide if the benefits of reducing the output size outweigh the loss of inidivual datacenter metrics.

fastly-exporter-2.1.0-linux-amd64 doesn't work in alpine

fastly-exporter-2.1.0-linux-amd64 doesn't work in alpine container.

fastly-exporter-2.0.0-linux-amd64:

➜  fastly-exporter cat Dockerfile-2.0.0
FROM alpine:3.8

RUN apk add --no-cache ca-certificates

RUN wget https://github.com/peterbourgon/fastly-exporter/releases/download/v2.0.0/fastly-exporter-2.0.0-linux-amd64 -O /fastly-exporter && chmod a+x fastly-exporter

ENTRYPOINT ["/fastly-exporter", "-endpoint", "http://0.0.0.0:8080/metrics"]
➜  fastly-exporter docker build --pull -t fastly-exporter-2.0.0 -f Dockerfile-2.0.0 .
[…]
➜  fastly-exporter docker run --rm fastly-exporter-2.0.0:latest
level=error err="-token is required"

fastly-exporter-2.1.0-linux-amd64:

➜  fastly-exporter cat Dockerfile-2.1.0
FROM alpine:3.8

RUN apk add --no-cache ca-certificates

RUN wget https://github.com/peterbourgon/fastly-exporter/releases/download/v2.1.0/fastly-exporter-2.1.0-linux-amd64 -O /fastly-exporter && chmod a+x fastly-exporter

ENTRYPOINT ["/fastly-exporter", "-endpoint", "http://0.0.0.0:8080/metrics"]
➜  fastly-exporter docker build --pull -t fastly-exporter-2.1.0 -f Dockerfile-2.1.0 .
[…]
➜  fastly-exporter docker run --rm fastly-exporter-2.1.0:latest
standard_init_linux.go:190: exec user process caused "no such file or directory"

It looks like 2.1.0 binary is not linked statically.

/ # ldd fastly-exporter-2.1.0-linux-amd64
	/lib64/ld-linux-x86-64.so.2 (0x7fa4e59f2000)
	libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7fa4e59f2000)
	libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fa4e59f2000)
/ # ldd fastly-exporter-2.0.0-linux-amd64
ldd: fastly-exporter-2.0.0-linux-amd64: Not a valid dynamic program

remove tls_version="any" from fastly_rt_tls_total

This is essentially including a total field in the metric itself which is violates

Per https://prometheus.io/docs/practices/naming/

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful).

As this essentially means that a simple sum(rate(fastly_rt_tls_total[1m])) by (service_name) ends up double counting the values (v10 + v11 + v12 + v13 + any) as seen in this normalized snippet

fastly_rt_tls_total{datacenter="YYZ",service_id="XXX",service_name="XXX",tls_version="any"} 11686
fastly_rt_tls_total{datacenter="YYZ",service_id="XXX",service_name="XXX",tls_version="v10"} 0
fastly_rt_tls_total{datacenter="YYZ",service_id="XXX",service_name="XXX",tls_version="v11"} 0
fastly_rt_tls_total{datacenter="YYZ",service_id="XXX",service_name="XXX",tls_version="v12"} 11686
fastly_rt_tls_total{datacenter="YYZ",service_id="XXX",service_name="XXX",tls_version="v13"} 0

We shouldn't be adding any total or sum subvalue that will impact totaling or summing the metric itself.

Cannot pull new Docker image

Hello,

I'm receiving this message when trying to pull from the new place:

$ docker pull ghcr.io/fastly/fastly-exporter:latest
Error response from daemon: Head "https://ghcr.io/v2/fastly/fastly-exporter/manifests/latest": unauthorized

Thank you.

How can we get a Dockerfile for this app?

I would love to work with this exporter in a container environment like aws or gcp. Have you stopped working on a Dockerfile for this project?

Thanks for this project so far!

iterating http_sd support towards generic discovery

I believe we can make http_sd work generically out of the box without relabel_configs.

by explicitly setting the __params_target label per target (and ideally using the host header and port from the request to set the target:port) we can get service discovery to work without any jiggery pokery.

I used https://github.com/pagarme/static-response-server to host the following

[
  {
    "targets": [
      "fastly-exporter:8080"
    ],
    "labels": {
      "__param_target": "0AizkuJPvMmqhulU7fXXXX"
    }
  },
  {
    "targets": [
      "fastly-exporter:8080"
    ],
    "labels": {
      "__param_target": "0KO5PPKDAMlzAQ22fsXXXX"
    }
  }
]

along with the following minimal http_sd_configs

  - job_name: 'fastly-exporter'

    http_sd_configs:
            - url: http://static-content:7070

and everything worked out of the box.

Idle services also trigger net/http "Client.Timeout exceeded while awaiting headers"

I just noticed that one of my idle developer services is regularly reporting timeouts

level=error component=monitors service_id=xxxxxxxxxxxxxxxxxx service_name=tmol.com err="Get https://rt.fastly.com/v1/channel/xxxxxxxxxxxxxxxxxx/ts/1541720127: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error component=monitors service_id=xxxxxxxxxxxxxxxxxx service_name=tmol.com err="Get https://rt.fastly.com/v1/channel/xxxxxxxxxxxxxxxxxx/ts/1541720219: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error component=monitors service_id=xxxxxxxxxxxxxxxxxx service_name=tmol.com err="Get https://rt.fastly.com/v1/channel/xxxxxxxxxxxxxxxxxx/ts/1541720441: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error component=monitors service_id=xxxxxxxxxxxxxxxxxx service_name=tmol.com err="Get https://rt.fastly.com/v1/channel/xxxxxxxxxxxxxxxxxx/ts/1541720828: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error component=monitors service_id=xxxxxxxxxxxxxxxxxx service_name=tmol.com err="Get https://rt.fastly.com/v1/channel/xxxxxxxxxxxxxxxxxx/ts/1541721003: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

Looking at the corresponding Fastly dashboard, there is a note about There is currently no new data. Graphs will resume when data is received.

Should we update the error messaging This may also be an idle service, or is there a better way to differentiate no data vs timeout/connectivity problems?

carnality explosion with fastly_rt_datacenter_info

fastly_rt_datacenter_info looks to have an unintended cardinality explosion.

Instead of the expected ~100 pop time series, it's getting multiplied by service, so we're getting 10s of thousands of time series instead.

i.e.
fastly_rt_datacenter_info{datacenter="ACC", group="Africa", latitude="5.573", longitude="-0.203", name="Ghana", service="XXX" }

fastly_rt_service_info for group_left/group_right metadata

https://www.robustperception.io/exposing-the-software-version-to-prometheus
https://www.robustperception.io/how-to-have-labels-for-machine-roles

Technically, we could do all kinds of fun stuff like

  • fastly_rt_service_info
  • fastly_rt_datacenter_info
  • fastly_rt_domain_info
  • fastly_rt_customer_info

and stuff all sorts of other one off tidbits in without binding them to the individual metrics.

We could even do service_name that way

something like

fastly_rt_service_info{service_id="$SID", service_name="$NAME", customer_id="$CID"} 1
fastly_rt_datacenter_info{datacenter_code="$CODE", datacenter_name="$NAME", datacenter_group="$GROUP", datacenter_shield="$SHIELD"} 1

Add discovery/scrape-time service selection

With Prometheus 2.28, there is now a generic http service discovery.

The exporter can now produce an API output that lists all of the available services so that they can be scraped independently. This improves the performance of ingestion by spreading it out over time and allowing Prometheus to ingest the data over multiple target threads.

On the Prometheus side, you would configure the job like this:

scrape_configs:
- job_name: fastly
  metrics_path: /fastly
  http_sd_configs:
  - url: http://fastly-exporter:8080/sd
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: fastly-exporer:8080 

The /service-discovery endpoint would output json like this:

[
  { 
    "targets": [
      "<Service ID 1>",
      "<Service ID 2>",
      "<Service ID 3>",
      "<Service ID ...>"
    ]
  }
]

The relabel_config would then produce exporter URLs like /fastly?service=<Service ID 1>.

Question: Strategy for large number of services

Hi There,
Thanks again for your time developing this software, it's helped us out immensely thus far. We've been running an old version (version 0.x) for quite a while and it's been good to us. Right now we're running two fastly_exporter instances with approximately 150 services each on two VMs to share the load. I'm interested in the auto-discovery feature that you've implemented in the new versions of this exporter but I have concerns about how I can manage a large number of Fastly services with it.

In total, we have approximately 900 Fastly services deployed to one Fastly account. As one can imagine, if I were to even attempt to boot the fastly_exporter with autodiscovery enabled, it would only be a bad time. Up until this point, I'd been manually curating our list of 'important services' to monitor with the exporter by manually filtering out our Staging environments, etc.

I was wondering if there were any existing strategies out there for dealing with an excessive number of Fastly properties with the exporter, and how one might go about architecting the exporter and the prometheus ingestion to deal with these volumes.

A couple of key things come to mind:

  • Might be nice/necessary to distribute & co-ordinate chunks of services to different instances of the exporter
  • Perhaps a feature to be able to exclude/include services based on a regular expression command line flag? This, in combination with the already-existing autodiscovery feature could be a viable method of dynamically and predictably consuming a sub-section of work. (Lots of our services are convention-based names, and this would make it easy to filter out in bulk)

Support rt.fastly.com "demo" channel

 ./fastly-exporter-3.0.1-linux-amd64 -token xyz -service demo
level=info prometheus_addr=127.0.0.1:8080 path=/metrics namespace=fastly subsystem=rt
level=info component=api.fastly.com filtering_on="explicit service IDs" count=1
level=error component=api.fastly.com during="initial service refresh" err="error decoding API services response: json: cannot unmarshal object into Go value of type []api.Service"

There is a special channel ID demo which is used for the analytics on the fastly.com
(https://www.fastly.com) homepage.

Source: https://docs.fastly.com/api/analytics#channels

Useful for populating test data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.