GithubHelp home page GithubHelp logo

prometheus-replica-operator's Introduction

Prometheus Replica Operator

A Kubernetes Operator for Prometheus + Thanos, built on top of the Operator SDK.

This blog post shows how I built the Prometheus Replica Operator, which is the first complex Go program I’ve written. While the Go code itself may not be the best example of software engineering, I wanted to try out my team’s Operator SDK. As the Product Manager for the Operator SDK, I want to have first hand knowledge of our tool. As an early employee of CoreOS, I do have a lot of knowledge about Operators, and I want to pass on some best practices through the post.

What the PRO does

The PRO will install and configure a full monitoring stack on a Kubernetes cluster, using Prometheus for ingesting, storing and querying the time series data. Archival data is automatically sent to a cloud storage bucket with Thanos.

Prometheus and Thanos architecture.

Example

The Operator watches for PrometheusReplica objects, such as this one:

apiVersion: "prometheus.robszumski.com/v1alpha1"
kind: "PrometheusReplica"
metadata:
  name: "example"
spec:
  configMap: prometheus-config
  highlyAvailable: true
  baseDomain: "ingress.example.com"
  metrics:
    retention: 24h
    blockDuration: 1h
  bucketSecret: s3-bucket

And configures the entire monitoring stack:

INFO[0000] Go Version: go1.10.2
INFO[0000] Go OS/Arch: darwin/amd64
INFO[0000] operator-sdk Version: 0.0.5+git
INFO[0000] Watching prometheus.robszumski.com/v1alpha1, PrometheusReplica, default, 5
INFO[0000] starting prometheusreplicas controller
...detected object...
INFO[0000] Parsing PrometheusReplica example in default
INFO[0000] Updating PrometheusReplica status for example
INFO[0000] Status of PrometheusReplica example is now Install
INFO[0000] Creating Prometheus StatefulSet for example
INFO[0000]   StatefulSet: Translating HighlyAvailable to 2 replicas
INFO[0000]   StatefulSet: Setting overall metrics retention to 24h
INFO[0000]   StatefulSet: Setting duration until upload to storage bucket to 1h
INFO[0000]   StatefulSet: Using Prometheus config from ConfigMap prometheus-config
INFO[0000]   StatefulSet: Using bucket parameters from Secret s3-bucket
INFO[0001] Creating Prometheus service for example
INFO[0001] Creating Thanos peers service for example
INFO[0001] Creating Thanos store StatefulSet for example
INFO[0001]   StatefulSet: Using bucket parameters from Secret s3-bucket
INFO[0001] Creating Thanos store service for example
INFO[0001] Creating Thanos query Deployment for example
INFO[0001]   Deployment: Using bucket parameters from Secret s3-bucket
INFO[0001]   Deployment: Translating HighlyAvailable to 2 replicas
INFO[0001] Creating Thanos query service for example
INFO[0001] Checking desired vs actual state for components of PrometheusReplica example
INFO[0002] Creating Prometheus StatefulSet for example
INFO[0002]   StatefulSet: Translating HighlyAvailable to 2 replicas
INFO[0002]   StatefulSet: Setting overall metrics retention to 24h
INFO[0002]   StatefulSet: Setting duration until upload to storage bucket to 1h
INFO[0002]   StatefulSet: Using Prometheus config from ConfigMap prometheus-config
INFO[0002]   StatefulSet: Using bucket parameters from Secret s3-bucket
INFO[0002]   Checking StatefulSet for Prometheus
INFO[0004] Parsing PrometheusReplica example in default
INFO[0004] Updating PrometheusReplica status for example
INFO[0004] Status of PrometheusReplica example is now Creating
INFO[0005] Checking desired vs actual state for components of PrometheusReplica example
INFO[0005]   Checking StatefulSet for Prometheus
INFO[0005]   Checking Deployment for Thanos query
...create is now done...
INFO[0005] Parsing PrometheusReplica example in default
INFO[0005] Updating PrometheusReplica status for example
INFO[0005] Status of PrometheusReplica example is now Running
INFO[0006] Checking desired vs actual state for components of PrometheusReplica example
INFO[0006]   Checking StatefulSet for Prometheus
INFO[0006]   Checking Deployment for Thanos query
...looping...
INFO[0006] Parsing PrometheusReplica example in default
INFO[0006] Updating PrometheusReplica status for example
INFO[0007] Checking desired vs actual state for components of PrometheusReplica example
INFO[0007]   Checking StatefulSet for Prometheus
INFO[0008]   Checking Deployment for Thanos query
...loop forever...

Install

First, install the CRD:

$ kubectl create -f https://raw.githubusercontent.com/robszumski/prometheus-replica-operator/master/deploy/crd.yaml

Then run the Operator:

$ kubectl create -f https://raw.githubusercontent.com/robszumski/prometheus-replica-operator/master/deploy/operator.yaml

Last, create the PrometheusReplica object:

$ kubectl create -f https://raw.githubusercontent.com/robszumski/prometheus-replica-operator/master/deploy/cr.yaml

If everything worked correctly, you should see the Pods, Services, Deployments and StatefulSets created. The status of the PrometheusReplica should also have this information:

$ kubectl -n default get prometheusreplica/example -o yaml

prometheus-replica-operator's People

Contributors

robszumski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.