GithubHelp home page GithubHelp logo

ribbybibby / s3_exporter Goto Github PK

View Code? Open in Web Editor NEW
99.0 4.0 39.0 6.29 MB

Exports Prometheus metrics about S3 buckets and objects

License: Apache License 2.0

Go 85.96% Makefile 11.51% Dockerfile 2.52%
prometheus s3-bucket metrics aws aws-s3 prometheus-exporter prometheus-metrics monitoring

s3_exporter's Introduction

AWS S3 Exporter

This exporter provides metrics for AWS S3 bucket objects by querying the API with a given bucket and prefix and constructing metrics based on the returned objects.

I find it useful for ensuring that backup jobs and batch uploads are functioning by comparing the growth in size/number of objects over time, or comparing the last modified date to an expected value.

Building

make

Running

./s3_exporter <flags>

You can query a bucket and prefix combination by supplying them as parameters to /probe:

curl localhost:9340/probe?bucket=some-bucket&prefix=some-folder/some-file.txt

AWS Credentials

The exporter creates an AWS session without any configuration. You must specify credentials yourself as documented here.

Remember, if you want to load credentials from ~/.aws/config then you need to to set:

export AWS_SDK_LOAD_CONFIG=true

Docker

docker pull ribbybibby/s3-exporter

You will need to supply AWS credentials to the container, as mentioned in the previous section, either by setting the appropriate environment variables with -e, or by mounting your ~/.aws/ directory with -v.

# Environment variables
docker run -p 9340:9340 -e AWS_ACCESS_KEY_ID=<value> -e AWS_SECRET_ACCESS_KEY=<value> -e AWS_REGION=<value> s3-exporter:latest <flags>
# Mounted volume
docker run -p 9340:9340 -e AWS_SDK_LOAD_CONFIG=true -e HOME=/ -v $HOME/.aws:/.aws s3-exporter:latest <flags>

Flags

  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --web.listen-address=":9340"
                                 Address to listen on for web interface and telemetry.
      --web.metrics-path="/metrics"
                                 Path under which to expose metrics
      --web.probe-path="/probe"  Path under which to expose the probe endpoint
      --web.discovery-path="/discovery"
                                 Path under which to expose service discovery
      --s3.endpoint-url=""       Custom endpoint URL
      --s3.disable-ssl           Custom disable SSL
      --s3.force-path-style      Custom force path style
      --log.level="info"         Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal]
      --log.format="logger:stderr"
                                 Set the log target and format. Example: "logger:syslog?appname=bob&local=7" or "logger:stdout?json=true"
      --version                  Show application version.

Flags can also be set as environment variables, prefixed by S3_EXPORTER_. For example: S3_EXPORTER_S3_ENDPOINT_URL=http://s3.example.local.

Metrics

Metric Meaning Labels
s3_biggest_object_size_bytes The size of the largest object. bucket, prefix
s3_common_prefixes A count of all the keys between the prefix and the next occurrence of the string specified by the delimiter bucket, prefix, delimiter
s3_last_modified_object_date The modification date of the most recently modified object. bucket, prefix
s3_last_modified_object_size_bytes The size of the object that was modified most recently. bucket, prefix
s3_list_duration_seconds The duration of the ListObjects operation bucket, prefix, delimiter
s3_list_success Did the ListObjects operation complete successfully? bucket, prefix, delimiter
s3_objects_size_sum_bytes The sum of the size of all the objects. bucket, prefix
s3_objects The total number of objects. bucket, prefix

Common prefixes

Rather than generating metrics for the objects with a particular prefix, you can set the delimiter parameter to produce a count of all the keys between the prefix and the next occurrence of the given delimiter.

For instance:

$ curl 'localhost:9340/probe?bucket=registry-bucket&prefix=docker/registry/v2/blobs/sha256/&delimiter=/'
# HELP s3_common_prefixes A count of all the keys between the prefix and the next occurrence of the string specified by the delimiter
# TYPE s3_common_prefixes gauge
s3_common_prefixes{bucket="registry-bucket",delimiter="/",prefix="docker/registry/v2/blobs/sha256/"} 133
# HELP s3_list_duration_seconds The total duration of the list operation
# TYPE s3_list_duration_seconds gauge
s3_list_duration_seconds{bucket="registry-bucket",delimiter="/",prefix="docker/registry/v2/blobs/sha256/"} 0.921488535
# HELP s3_list_success If the ListObjects operation was a success
# TYPE s3_list_success gauge
s3_list_success{bucket="registry-bucket",delimiter="/",prefix="docker/registry/v2/blobs/sha256/"} 1

See this page for more information.

Prometheus

Configuration

You can pass the params to a single instance of the exporter using relabelling, like so:

scrape_configs:
  - job_name: "s3"
    metrics_path: /probe
    static_configs:
      - targets:
          - bucket=stuff;prefix=thing.txt;
          - bucket=other-stuff;prefix=another-thing.gif;
    relabel_configs:
      - source_labels: [__address__]
        regex: "^bucket=(.*);prefix=(.*);$"
        replacement: "${1}"
        target_label: "__param_bucket"
      - source_labels: [__address__]
        regex: "^bucket=(.*);prefix=(.*);$"
        replacement: "${2}"
        target_label: "__param_prefix"
      - target_label: __address__
        replacement: 127.0.0.1:9340 # S3 exporter.

Service Discovery

Rather than defining a static list of buckets you can use the /discovery endpoint in conjunction with HTTP service discovery to discover all the buckets the exporter has access to.

This should be all the config required to successfully scrape every bucket:

scrape_configs:
  - job_name: "s3"
    metrics_path: /probe
    http_sd_configs:
      - url: http://127.0.0.1:9340/discovery

Use relabel_configs to select the buckets you want to scrape:

scrape_configs:
  - job_name: "s3"
    metrics_path: /probe
    http_sd_configs:
      - url: http://127.0.0.1:9340/discovery
    relabel_configs:
      # Keep buckets that start with example-
      - source_labels: [__param_bucket]
        action: keep
        regex: ^example-.*

The prefix can be set too, but be mindful that this will apply to all buckets:

scrape_configs:
  - job_name: "s3"
    metrics_path: /probe
    http_sd_configs:
      - url: http://127.0.0.1:9340/discovery
    params:
      prefix: ["thing.txt"]

Example Queries

Return series where the last modified object date is more than 24 hours ago:

(time() - s3_last_modified_object_date) / 3600 > 24

s3_exporter's People

Contributors

allapospelova avatar kiuby avatar panos-- avatar ribbybibby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

s3_exporter's Issues

1000 requests a minute

We setup our prometheus configuration so that it scrapes s3 every hour for metrics but for some reason it starting sending requests to 1000 a minute. Has anyone else run into this?

unknown long flag --s3.endpoint-url

We're deploying the s3_exporter on a managed kubernetes cluster and can't set a custom endpoint url:

s3_exporter: error: unknown long flag '--s3.endpoint-url region.domain.com', try --help

but it seems to exist:

endpointURL = app.Flag("s3.endpoint-url", "Custom endpoint URL").Default("").String()

Any idea?

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: s3-exporter
  template:
    metadata: 
      labels:
        app: s3-exporter
    spec:
      containers:
        - name: s3-exporter
          image: ribbybibby/s3-exporter:latest
          command: ["/s3_exporter"]
          args: ["--s3.endpoint-url", "region.domain.com"]
          envFrom:
          - configMapRef:
              name: s3-exporter-config

select metrics

It would be nice to select which metrics you want. For example, for some buckets I just want the latest date for a file. All the other metrics I don't need.

Investigate delimiter option

It's possible to define a delimiter in an s3 list operation which will only return the objects directly underneath the given prefix. It may make sense to support this option to help with performance when collecting metrics for large buckets.

certificate signed by unknown authority

I'm trying to connect to an internal s3 service and I configured the s3_endpoint_url but now i'm getting the error "certificate signed by unknown authority". Has anyone been successful in running this with an internal s3 service?

Service discovery for buckets

Idea: a subcommand that runs a daemon that will collect bucket/prefix combinations dynamically and write them out to a file in a format for consumption by file_sd_configs.

The buckets could be discovered via patterns, i.e --bucket=* or --bucket=my-org-*. The prefix could be taken from tags on the bucket?

How I can get prefix size?

I have this config:

  • job_name: 's3-backup'
    metrics_path: /probe
    static_configs:
    - targets:
    - bucket=c-admin;prefix=backups/conr/mysql/;
    - bucket=c-admin;prefix=backups/conr/dse/;
    - bucket=c-admin;prefix=backups/conr/opscenter;
    - bucket=c-admin;prefix=backups/conr/redis;
    - bucket=c-admin;prefix=backups/conr/redis-local;
    - bucket=c-admin;prefix=backups/conr/mysql-local;
    - bucket=c-admin;prefix=backups/conr/scripts-windows;
    - bucket=c-admin;prefix=backups/conr/webroot-iis;
    - bucket=c-admin;prefix=backups/conr/windows-server;
    relabel_configs:
    - source_labels: [address]
    regex: '^bucket=(.);prefix=(.);$'
    replacement: '${1}'
    target_label: '__param_bucket'
    - source_labels: [address]
    regex: '^bucket=(.);prefix=(.);$'
    replacement: '${2}'
    target_label: '__param_prefix'
    - target_label: address
    replacement: 127.0.0.1:9340

I want get folder(prefix) size, instead of some file. This config correct?

Add AWS_REGION as parameter

Looking at the code it doesn't seem like this is a thing already.

Would it be possible to add the AWS_REGION is a URL parameter?

This would allow a single Prometheus job to scrape information about buckets in multiple regions from a single exporter. It seems that at the moment, you need to have an s3_exporter instance per aws region you have buckets of interest in.

Something like:
http://127.0.0.1:9340/probe?bucket=bucket-in-eu-west-1&prefix=bla&region=eu-west-1
http://127.0.0.1:9340/probe?bucket=bucket-in-eu-west-2&prefix=bla&region=eu-west-2

A job config target would look like this:

  • bucket=bucket-in-eu-west-1;prefix=bla;region=eu-west-1;
  • bucket=bucket-in-eu-west-2;prefix=bla;region=eu-west-1;

Really like this exporter!

custom metrics for s3-exporter

i want to know the all the objects present in my s3 bucket because of monitoring the all file names through grafana and prometheus

s3_last_modified_object_size_bytes metric is missing

deployed image: ribbybibby/s3-exporter:latest on kubernetes cluster and s3_last_modified_object_size_bytes metric is missing.

I dont know if you are still maintaing this project, but here is feature proposal:
For dynamic filemane check eg. (backup_YYMMDD.csv) prometheus scraping configuration need to be recreated and loaded in order to get up to date metrics/alerts.

It would be interesting add feature to load configuration file with templating functionality?
eg.

BackupDate:
bucket: BucketA
prefix: /backup_{{YYMMDD}}.csv
interval:

BackupRandom:
bucket: BucketB
prefix: /backup_*.csv
interval:

which could load multiple bucket checks with logic to limit api calls to S3 with setting the interval (prometheus will scrape with its own interval, but in config file you can set to check for a file in lower intervals eg 1h - not sure how long prometheus holds metrics info - in my case its 6h).

Btw, great work.. and useful.

Regards,
Dejan

Config file for multiple prefixes of single bucket

Hello Team,

I would like to scrape the metrics of multiple prefixes of a single bucket.
Let's say I have dev-bucket contains folder-1, folder-2 and folder-3.

I tried following configuration but no luck.

  • job_name: 's3'
    metrics_path: /probe
    static_configs:
    • targets:
      • bucket=dev-bucket;prefix=folder-1;
      • bucket=dev-bucket;prefix=folder-2;
      • bucket=dev-bucket;prefix=folder-3;
        relabel_configs:
    • source_labels: [address]
      regex: '^bucket=(.)&prefix=(.);$'
      replacement: '${1}'
      target_label: '__param_bucket'
    • source_labels: [address]
      regex: '^bucket=(.)&prefix=(.);$'
      replacement: '${2}'
      target_label: '__param_prefix'
    • source_labels: [address]
      regex: '^bucket=(.)&prefix=(.);$'
      replacement: '${3}'
      target_label: '__param_prefix'
    • target_label: address
      replacement: 127.0.0.1:9347

I'm not surprised it returned only folder-3 metrics but only one metric, s3_list_success.
FYI, there no problem from exporter side as I could see all the metrics at their end points.

Kindly help us with prometheus configuration file.

Impossible to disable certificate checks

Currently, when certificate is valid, but for another domain - using exporter is impossible

Sep 29 18:24:43 host.example.com s3_exporter[59303]: time="2022-09-29T18:24:43+07:00" level=error msg="RequestError: send request failed\ncaused by: Get \"https://192.168.100.98:443/\": x509: cannot validate certificate for 192.168.100.98 because it doesn't contain any IP SANs" source="s3_exporter.go:177"

Export storage class information

See #29 which introduced this idea. It would be useful to be able to reason about the metrics based on storage class.

My thinking is that we could add a storage_class label to metrics where it makes sense. This would allow you to do useful things like compare the number of objects that have been transitioned to glacier vs those in standard.

This would be a breaking change as it would change the output of a number of queries when there are more than one storage classes. I don't mind cutting a major version for this but it might make sense to think about any other breaking changes I want to make that could be batched into the release too.

Tests

Create tests for this exporter.

s3_objects_size_sum_bytes only shows size of first 1000 objects.

Hi, we've been wanting to use your S3 Exporter for Monitoring purposes but face the issue, that the metric s3_objects_size_sum_bytes only results in the sum (Storage size wise) of the first 1000 objects. So, while S3 Exporter shows that the bucket size equals 34 GiB, the actual size is 90GiB, because there are more than 1000 objects.

Is there a way to fix this, or are we missing something?

Thanks und kind regards,
Jacob

verbose Logs

Hello ,

thank you for this project.

i'm just testing the exporter on k8s the logs showed :

`time="2021-04-22T20:58:16Z" level=info msg="Starting s3_exporter (version=0.4.0, branch=tags/v0.4.0, revision=4ecf1c121b7c4e4f1ee6c82aa0aee33ef028a081)" source="s3_exporter.go:190"

time="2021-04-22T20:58:16Z" level=info msg="Build context (go=go1.15.3, user=root@6f41cb83a3e1, date=20201018-15:45:51)" source="s3_exporter.go:191"

time="2021-04-22T20:58:16Z" level=info msg="Listening on :9340" source="s3_exporter.go:208"`

but I'm not getting any metrics from my S3 bucket.

Is there a way to activate some Debugs logs to Know what is the issue?

Many thanx

Crash on invalid resp object

Hello,
thanks for this very nice project, it's exactly what we needed at work for our use-case.

Just to report you a bug, we have a cephs3 and we use path style model. On initial setup i forgot the option and this triggers this crash (on today master)

INFO[0000] Starting s3_exporter (version=, branch=, revision=)  source="s3_exporter.go:190"
INFO[0000] Build context (go=go1.15.3, user=, date=)     source="s3_exporter.go:191"
INFO[0000] Listening on :9340                            source="s3_exporter.go:208"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb4b798]

goroutine 43 [running]:
main.(*Exporter).Collect(0xc0001f2f00, 0xc00012f320)
	/home/lblot/Devel/VP/s3_exporter/s3_exporter.go:110 +0x218
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
	/home/lblot/Devel/VP/s3_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:444 +0x1a2
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/home/lblot/Devel/VP/s3_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:455 +0x5ce

You should handle it :)

After that option fixed, i'm just burning my PC on a 700GB gitlab artifact bucket :trollface:

Cannot export metrics where the prefix matches more than 1000 objects

Since we are using the ListObjectsV2 call, the S3 API will limit the response to 1000 objects. Adding to this, objects are listed in no guaranteed order, so once you reach 1000 objects, the metrics a pretty much baloney.

To address this, I believe there are two options which start with using the ListObjectsV2Pages API call instead:

  • We can recurse through all the pages, which yields good metrics at the cost of API calls and compute time.
  • We can list the pages, and just read the last. This would potentially lead to inaccurate metrics, but have more or less the same compute and API call cost as the current implementation.

We might need this feature, so I'll likely submit a PR.

Getting nosuchbucket error , I am not able to find the option the give the s3 bucket name ?

time="2021-04-25T13:47:39Z" level=info msg="Starting s3_exporter (version=0.4.0, branch=tags/v0.4.0, revision=4ecf1c121b7c4e4f1ee6c82aa0aee33ef028a081)" source="s3_exporter.go:190"
time="2021-04-25T13:47:39Z" level=info msg="Build context (go=go1.15.3, user=root@6f41cb83a3e1, date=20201018-15:45:51)" source="s3_exporter.go:191"
time="2021-04-25T13:47:39Z" level=info msg="Listening on :9340" source="s3_exporter.go:208"
time="2021-04-25T13:48:13Z" level=error msg="NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: 3SDTXM7EGD1ATPPW, host id: HLqhGGWA0wUVBz/eMVruHiZhcQZmoE42C4QYLukqT4uqlIey03N01nZC9oJyQQ5/ujwbJQK1amc=" source="s3_exporter.go:92"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.