grafana / cortex-tools Goto Github PK

View Code? Open in Web Editor NEW

153.0 132.0 68.0 56.64 MB

If you're using this tool with Grafana Mimir, please switch to "mimirtool" instead: https://github.com/grafana/mimir

License: Apache License 2.0

Makefile 1.51% Go 97.67% Dockerfile 0.67% Shell 0.16%

cortex-tools's People

Contributors

Stargazers

Watchers

cortex-tools's Issues

Automate the release process

It involves a number of manual steps, they can be seen here: #51

Delete series doesn't delete the metric+label rows

Queries that have a matcher would still work as they don't touch use the GetReadQueriesForMetricName function.

go.mod and repo name discrepancy

The go.mod file specifies cortextool as the name, while the repo is actually named cortex-tools.

While this is not a big problem, it is confusing when naively trying to import code from this repo:

go: github.com/sh0rez/gctl/pkg/spec imports
	github.com/grafana/cortex-tools/pkg/client: github.com/grafana/[email protected]: parsing go.mod:
	module declares its path as: github.com/grafana/cortextool
	        but was required as: github.com/grafana/cortex-tools

To not actually break go imports, the "clean" solution would probably be to rename this repo to grafana/cortextool

Remove trailing slash (/) from Cortex Address

Using an address with a trailing slash can cause unexpected behaviour.

e.g.

$ CORTEX_ADDRESS=https://prometheus-us-central1.grafana.net/ cortextool rules load test.yml
ERRO[0000] unable to load rule group                     error="requested resource not found" group=up_job namespace=example_namespace
cortextool: error: load operation unsuccessful, try --help

In some cases, this is not taken into account e.g.

when using the diff command

$ CORTEX_ADDRESS=https://prometheus-us-central1.grafana.net/ cortextool rules diff --rule-files=test.yml
Changes are indicated with the following symbols:
  + updated

The following changes will be made if the provided rule set is synced:
~ Namespace: example_namespace
  ~ Group: up_job

Diff Summary: 0 Groups Created, 1 Groups Updated, 0 Groups Deleted

I think is because the diff command does not make use of any endpoints where the trailing slashes matter (e.g. subroutes on the API)

upload tsdb block to cortex

It would be nice if cortex-tool would gain a block upload feature that enables posting a block to block storage, e.g. after a cortex downtime.

YAML format of `rules get` should match `rules sync`

It would be helpful if the output format of cortextool rules get <namespace> <group> matched the exact format expected to cortextool rules sync. This would make it easier to get down all rules and commit them into Git, then sync them back again. Right now I have to munge the YAML slightly to make it compatible to sync.

Right now the format is this:

$ cortextool rules get somenamespace anygroup
name: anygroup
rules:
    - alert: FrontEnd Prometheus
      expr: .......

Ideally it would be this:

namespace: somenamespace
groups:
    - name: anygroup
      rules:
        - alert: FrontEnd Prometheus
          expr: ........

Allow for configurable metric name with loadgen tool

Split CORTEX_ADDRESS between AM / Ruler

You can have your Ruler and Alertmanager in separate URLs. As a result, it becomes tedious having to change between commands, we should make this a bit more explicit that these are two separate components.

Allow send TLS client certificate to Cortex API

👋 hi!

I'm running Cortex on k8s for a while and I've protected the API with client TLS authentication with the help of ingress-nginx controller.

Right now I want to use cortex-tools to lint and load rules in an automated fashion from a CI pipeline. Thus I would like to authenticate the http client with a TLS client certificate.

I saw you're using go http client so it shouldn't be hard to add tls certs to CortexClient struct:

cortex-tools/pkg/client/client.go

Line 31 in 94327d5

client http.Client

It would be something like https://gist.github.com/michaljemala/d6f4e01c4834bf47a9c4

The cli flags would look like:

cortextools rules load my-rule.yml --address=ADDRESS --id=ID --cacert ca.pem --key client.key --cert client.pem

and also adding environment variables:

CORTEX_TLS_CA_CERT
CORTEX_TLS_CLIENT_KEY
CORTEX_TLS_CLIENT_CERT

I'm wondering if you consider tls client auth useful for the project. In that case I'm willing to send a PR.

Thanks!

rules list command returns generic 404 when no rules are configured

Running cortextool rules list when no rules are configured returns the following error message

$ cortextool rules list
time="2019-12-17T12:49:29-05:00" level=fatal msg="unable to read rules from cortex, requested resource not found"

This initially led me to believe there was an issue with my CORTEX_ADDRESS value. Should it return an empty list / fail silently instead?

Allow cortextool rules diff to accept an allowlist of namespaces rather than only a denylist

usage: cortextool rules diff --address=ADDRESS --id=ID [<flags>]

diff a set of rules to a designated cortex endpoint

Flags:
  --help                      Show context-sensitive help (also try --help-long and --help-man).
  --log.level="info"          set level of the logger
  --push-gateway.endpoint=PUSH-GATEWAY.ENDPOINT
                              url for the push-gateway to register metrics
  --push-gateway.job=PUSH-GATEWAY.JOB
                              job name to register metrics
  --push-gateway.interval=1m  interval to forward metrics to the push gateway
  --key=""                    Api key to use when contacting cortex, alternatively set $CORTEX_API_KEY.
  --backend=cortex            Backend type to interact with: <cortex|loki>
  --address=ADDRESS           Address of the cortex cluster, alternatively set CORTEX_ADDRESS.
  --id=ID                     Cortex tenant id, alternatively set CORTEX_TENANT_ID.
  --tls-ca-path=""            TLS CA certificate to verify cortex API as part of mTLS, alternatively set CORTEX_TLS_CA_PATH.
  --tls-cert-path=""          TLS client certificate to authenticate with cortex API as part of mTLS, alternatively set CORTEX_TLS_CERT_PATH.
  --tls-key-path=""           TLS client certificate private key to authenticate with cortex API as part of mTLS, alternatively set CORTEX_TLS_KEY_PATH.
  --ignored-namespaces=IGNORED-NAMESPACES
                              comma-separated list of namespaces to ignore during a diff.
  --rule-files=RULE-FILES     The rule files to check. Flag can be reused to load multiple files.
  --rule-dirs=RULE-DIRS       Comma separated list of paths to directories containing rules yaml files. Each file in a directory with a .yml or .yaml
                              suffix will be parsed.
  --disable-color             disable colored output

Currently, as you can see above, the diff command accepts a list of namespaces to ignore. Making it very hard to diff any particular namespace.

I'd like to suggest adding the option of adding an acceptlist of namespaces as well, make the two flags exclusive, to make this easier.

SIGSEGV in cortextool version command

When running cortextool version on a host that can't reach github, the program crashes.

± .cortextool version    
version 0.3.2
checking latest version... panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e491ce]

goroutine 1 [running]:
github.com/grafana/cortex-tools/pkg/version.getLatestFromGitHub(0xc0013c2d00, 0x1a)
	/build/source/pkg/version/version.go:40 +0x10e
github.com/grafana/cortex-tools/pkg/version.CheckLatest()
	/build/source/pkg/version/version.go:21 +0x49
main.main.func1(0xc00152c510, 0x40c5a3, 0x20d0500)
	/build/source/cmd/cortextool/main.go:33 +0x98
gopkg.in/alecthomas/kingpin%2ev2.(*actionMixin).applyActions(0xc0002cbd58, 0xc00152c510, 0x0, 0x0)
	/build/go/pkg/mod/gopkg.in/alecthomas/[email protected]/actions.go:28 +0x6d
gopkg.in/alecthomas/kingpin%2ev2.(*Application).applyActions(0xc00120c0f0, 0xc00152c510, 0x0, 0x0)
	/build/go/pkg/mod/gopkg.in/alecthomas/[email protected]/app.go:557 +0xdc
gopkg.in/alecthomas/kingpin%2ev2.(*Application).execute(0xc00120c0f0, 0xc00152c510, 0xc00103afe0, 0x1, 0x1, 0x0, 0x0, 0x0, 0xc001747f08)
	/build/go/pkg/mod/gopkg.in/alecthomas/[email protected]/app.go:390 +0x8f
gopkg.in/alecthomas/kingpin%2ev2.(*Application).Parse(0xc00120c0f0, 0xc00000e090, 0x1, 0x1, 0x1, 0xc000b8ac38, 0x0, 0x1)
	/build/go/pkg/mod/gopkg.in/alecthomas/[email protected]/app.go:222 +0x1fe
main.main()
	/build/source/cmd/cortextool/main.go:38 +0x1bf
[1]    142838 exit 2     cortextool version

Dereference at issue: https://github.com/grafana/cortex-tools/blob/v0.3.2/pkg/version/version.go#L40

Unify the docker images

I don't think we need three (and maybe more?) images, we could pack all the binaries in a single image making the release process simpler.

Add Docker Image building to the CI

With #47 we introduced a breaking change that wouldn't allow us to build docker images - it'll be good to have an image building process as part of the pipeline to catch these a bit earlier.

Rule groups are getting double escaped

#131 shows an example where GetRuleGroup would escape the space in escaped namespace and buildRequest would re-escape it, resulting in the failed test result of %2520 from %20.

Include chunktool and logtool as part of goreleaser

At the moment we only have cortextool, but would be good to include the other binaries as well.

HTTP requests are not URL encoding parameters

I tried to delete a rule group that includes a % in the name. I got an error with the following reasoning:

invalid URL escape "% n"

Command line inputs need to be sanitized before being used in a URL.

Unify use of YAML libraries

At the moment, we're using both go-yaml.v2 and go-yaml.v3, it'll be good to unify usage of both and avoid any potential pitfalls for having two versions.

No version command

We don't have a version command.

Log address, tenant and namespace when using rules commands

It would be useful to know to which server, tenant we're syncing / diff rules to given it is possible to configure using environment variables but also pass arguments.

Richer diff when running `cortextool diff|sync`

Right now, when you use the diff command it will only tell you which groups are going to be changed, but does not tell you which individual rules/alerts are changing.

I would be good to have an idea of what exactly is changing when you run this or the sync command.

cortex-tools client results in huge binary

The cortex-tools project includes a client for the cortex-ruler. This, in itself, seems pretty simple. However, including the client into a simple golang app causes that app to go from 12->72Mb, importing Cortex, AWS CLI, and lots of other things.

Is it possible to simplify the client such that it doesn't increase binary size so much?

New mixed-unit durations from prometheus/common v0.11.0 cannot be deserialized

Recently, github.com/prometheus/[email protected] introduced the ability to define durations using mixed units, e.g. 1h30m. This change is not backwards compatible, which creates issues for this project.

This change has been in the cortexproject/cortex master branch for a while, which some extremely notable users (Grafana Cloud!) are using. These changes are also in this project's master branch, but are unreleased. This means that there is no released version of cortextool that can properly interact with Prometheus rules stored in these Cortex instances.

All this requires is a new release of this project! Please cut a new release!

alertmanager get command returns generic 404 when no custom configuration has been provided

Running cortextool alertmanager get when no rules are configured returns the following error message

$ cortextool alertmanager get
cortextool.exe: error: requested resource not found, try --help

This initially led me to believe there was an issue with my CORTEX_ADDRESS value. Should it return an empty list / fail silently instead?

alertmanager and rules commands use different environment variables for tenant id

The alertmanager subcommands use CORTEX_TENANT_ID as the id parameter while the rules subcommands use CORTEX_TENTANT_ID.

Improve changelogs

Right now we keep a central changelogs for all the binaries. Consider each binary having its own separate changelog.

Invalid YAML loading a rule with a multiline field

How to reproduce?

port-forward ruler Pod

$ kubectl -n cortex port-forward deploy/ruler 8080:80

create a rule group YAML file with multiline in expr field, with a breaking line at the beginning

$ cat >> test.yml << EOF
groups:
  - name: rule-group-name
    rules:
    - alert: alert-name
      expr: |

        up{jop="my-awesome-job"} == 0
EOF

load rule

$ cortextool rules load --address "http://localhost:8080" --id=0 --log.level="debug" test.yml

cortextool debug output:

INFO[0000] log level set to debug
DEBU[0000] path built to request rule group              url=/api/prom/rules/test/rule-group-name
DEBU[0000] sending request to cortex api                 method=GET url="http://localhost:8080/api/prom/rules/test/rule-group-name"
DEBU[0000] checking response                             status="404 Not Found"
DEBU[0000] resource not found                            fields.msg="request failed with response body group does not exist\n" status="404 Not Found"
DEBU[0000] sending request to cortex api                 method=POST url="http://localhost:8080/api/prom/rules/test"
DEBU[0000] checking response                             status="400 Bad Request"
ERRO[0000] requests failed                               fields.msg="request failed with response body unable to decoded rule group\n" status="400 Bad Request"
ERRO[0000] unable to load rule group                     error="failed request to the cortex api" group=rule-group-name namespace=test
cortextool: error: load operation unsuccessful, try --help

additional data

cortex ruler version 1.4.0
cortex-tools compiled with current master HEAD 432ad77
http request dump

POST /api/prom/rules/test HTTP/1.1
Host: localhost:8080
User-Agent: Go-http-client/1.1
Content-Length: 107
X-Scope-Orgid: 0
Accept-Encoding: gzip

name: rule-group-name
rules:
    - alert: alert-name
      expr: |4

        up{jop="my-awesome-job"} == 0

As you can see, the body content is not a valid YAML.

expected behaviour

cortex-tools should ensure is sending a valid YAML before reach cortex API.

I'm willing to help to fix it with some help.

Thanks!

Add tracing to the loadgen command

In cases where the metrics reported by the cortextool and Cortex diverge, tracing would be useful.

How to list all user rule groups

I am a cortex administrator, at the moment, it is unclear which tenant create or delete rule groups.
Is there some api can export all user rule grups?

I want to record all rule group to mysql db and sync to cortex periodly.

What is the difference between the `load` and `sync` commands?

At the moment, it is unclear what is the exact difference between both commands. From a quick peek at the code it seems like load is more of a "only uploaded if the rule group does not exist under that namespace" while sync is more of a replace everything but tell me about it.

I'd be good to make clear how do we support each of the following use cases:

Create or update a namespace/group/rule only if it doesn't exist
Create or update a namespace/group/rule (regardless of its status)
Find the difference (at each of namespace/group/rules) between the input file and what already exists on the server. Then delete or create whatever is missing.

Implement a basic "linter" for rules files

Often when preparing rules ($ rules prepare) you would like to have a clear idea of what change in the diff - given there's no homogenous way of formatting rules YAML (e.g. Prometheus rules linter) a side of effect of the marshalling/unmarshalling of rules files is that your expressions and the file itself end up being linted by either the PromQL parser or the go YAML library.

This makes it difficult to have a consistent diff.

Given there's a promfmt for rules in the work, let's do something simple as an intermediary step. Take file(s), unmarshal then to our struct, marshal them, and format the promQL expressions in the rules file. With this, our users can "lint" their files before running them through the prepare command and have a more consistent diff on what changed.

Mark differences between OSS and Grafana Cloud specific

Confusing parse error messages when using loki backend

Given bug.yaml:

namespace: bug
groups:
  - name: bug
    rules:
      - alert: AlwaysFire
        expr: vector(1)

cortextool rules lint --backend=loki bug.yaml gives:

ERRO[0000] unable parse rules file                       error="could not parse expression: parse error at line 1, col 1: syntax error: unexpected IDENTIFIER" file=bug.yaml
cortextool: error: prepare operation unsuccessful, unable to parse rules files: file read error, try --help

There's nothing obviously wrong at line 1, col 1. I have an invalid logQL expression and cortextool should tell me that directly.

cortextool 0.3.2

Does cortextool support JSON format?

i want to use jsonnet framework to template Alerts. does cortextool work with Json format.

it is Alert creation and configuration as well

cortextool not creating or updating existing rules

Seems to be that cortextool sync/cortextool load isn't working in my environment. We're running Cortex v1.5.0 and CortexTool v0.5.0.

I've created a new rulegroup in a new NameSpace using the below config:

namespace: collector-rules
groups:
    - name: collector-status
      rules:
        - record: ""
          alert: PrometheuServerIsDown
          expr: absent(up)
          for: 10m
          labels:
            severity: critical
          annotations:
            assignment_group: Site Reliability Engineering
            company: REDACTED
            description: Cortex has not received any metrics from the n4monitoring tenant for 10 minutes
            impact: "1"
            suggested_actions: Check if Prometheus is running in the REDACTED namespace
            summary: 'Cortex has not received metrics for REDACTED for 10minutes'
            urgency: "1"

The NS collector-rules does not exists. cortextool rules load throws an error:

ryan@WINDOWS-H8Q4C40:~/rw170/Documents/cortex-config$ cortextool rules load n4monitoring/rulegroups/collector.yml
ERRO[0000] unable to load rule group                     error="requested resource not found" group=collector-status namespace=collector-rules
cortextool: error: load operation unsuccessful, try --help

cortextool rules sync --rule-dirs=<dir> also throws an error:

ryan@WINDOWS-H8Q4C40:~/rw170/Documents/cortex-config$ cortextool rules sync --rule-dirs=n4monitoring/rulegroups
INFO[0000] creating group                                group=collector-status namespace=collector-rules
cortextool: error: sync operation unsuccessful, unable to complete executing changes.: requested resource not found, try --help

I've got a feeling it's potentially related to our ingress rules but I can't spot anything here is the ingress:

spec:
  rules:
  - host: REDACTED
    http:
      paths:
      - backend:
          serviceName: alertmanager
          servicePort: 80
        path: /multitenant_alertmanager/status
      - backend:
          serviceName: alertmanager
          servicePort: 80
        path: /alertmanager
      - backend:
          serviceName: alertmanager
          servicePort: 80
        path: /api/v1/alerts
      - backend:
          serviceName: ruler
          servicePort: 80
        path: /ruler/ring
      - backend:
          serviceName: ruler
          servicePort: 80
        path: /api/v1/rules
      - backend:
          serviceName: ruler
          servicePort: 80
        path: /api/prom/api/v1/alerts
      - backend:
          serviceName: ruler
          servicePort: 80
        path: /api/prom/rules
      - backend:
          serviceName: distributor
          servicePort: 80
        path: /distributor/all_user_stats

rules diff subcommand reports spurious changes

Loading the follow namespace/file into cortex:

groups:
- name: my_group
  rules:
  - record: value
    expr: vector(0)
    labels:
      val: '0'

And then immediately diffing with cortextool rules diff will produce a report that the group my_group will be updated.

Changes are indicated with the following symbols:
  + updated

The following changes will be made if the provided rule set is synced:
~ Namespace: my_namespace
  ~ Group: my_group

Diff Summary: 0 Groups Created, 1 Groups Updated, 0 Groups Deleted

This is because the rules file is unmarshalled to a Prometheus Rule struct with Annotations map[string]string = nil while Cortex assigns the same field to an empty map. This leads the deep equality check to report a difference.

Users can work around this bug by adding an empty annotations map to their rule files. But this is a little counterintuitive given that documentation doesn't show annotations as a valid field for recording rules. It might be better for cortextool to fill this field in itself if it is going to rely on on reflect.deepEquals.

`ruleEquals` function does not work with yaml V3

When running Cortex locally I tried loading/diffing a local rules file against the rules endpoint multiple times. Since rulefmt started using yaml.v3 the RuleNode struct contains extra information about the formatting of the underlying yaml file. With Cortex the yaml returned from the API will not have the same formatting which can lead to diffs when none exist:

INFO[0000] updating group                                difference="rule #0 does not match {{8 0 !!str sum_up  <nil> []    3 15} {0 0    <nil> []    0 0} {8 0 !!str sum(up)  <nil> []    4 13} 0s map[] map[]} != {{8 0 !!str sum_up  <nil> []    5 13} {0 0    <nil> []    0 0} {8 0 !!str sum(up)  <nil> []    4 11} 0s map[] map[]}" group=test_rules namespace=rules

The differences in the above string are due to the yaml column and row and not the rules themselves.

`rules print` should have the option to disable color

Many tools do not handle shell colors very well. In my case, I am trying to parse the YAML output using https://github.com/kislyuk/yq, but it fails due to the colorized input. As a workaround, I have to pipe the output through sed -e $'s/\x1b\[[0-9;]*m//g', which frankly I barely understand.

Many valid alertmanager configurations do not work. Old alertmanager dependency.

This project currently pulls in version 0.13 of alertmanager. The current version is 0.20.

The result of this is that there are many valid alertmanager configurations that do not work. For example, the image_url and actions fields in slack configuration.

This should be updated.

Do not return fatal when no rules are loaded yet

When you have no rules loaded yet, and try to do a cortextool rules list the output you'll get is the following:

$ cortextool rules list
FATA[0000] unable to read rules from cortex, requested resource not found

This is a bit deceiving given, we were able to load rules there just aren't any yet.

Cannot print rules since loading a recording rule

I loaded this:

namespace: test
groups:
- name: default
  rules:
    - alert: AlwaysFiring
      record: ""
      for: 0s
      expr: 1 == bool 1
    - record: "agent:custom_server_info:up"
      alert: ""
      expr: |2
          custom_server_info * 0
        unless on (agent_hostname)
          up{job="integrations/agent"}
        or on (agent_hostname)
          custom_server_info

Since then, I can't print the rules anymore.

FATA[0000] unable to read rules from cortex, yaml: line 10: did not find expected key

I can query the API and get the expected YAML.

curl -u $CORTEX_USER:$CORTEX_KEY "$CORTEX_URL/api/v1/rules"
bug:
    - name: default
      rules:
        - record: test:scalar:bug
          expr: vector(1)
test:
    - name: default
      rules:
        - alert: AlwaysFiring
          expr: 1 == bool 1
        - record: agent:custom_server_info:up
          expr: |4
              custom_server_info * 0
            unless on (agent_hostname)
              up{job="integrations/agent"}
            or on (agent_hostname)
              custom_server_info

To try to isolate the bug, I deleted all the rules and I tried loading this one:

namespace: bug
groups:
- name: test
  rules:
    - record: "test:scalar:bug"
      expr: |2
          vector(1)
        or
          vector(2)

cortextool rules load rules-bug.yml \
--address=$CORTEX_URL \
--id=$CORTEX_USER \
--key=$CORTEX_KEY \
--log.level=debug

INFO[0000] log level set to debug
DEBU[0000] path built to request rule group              url=/api/prom/rules/bug/test
DEBU[0000] sending request to cortex api                 method=GET url="https://prometheus-us-central1.grafana.net/api/prom/rules/bug/test"
DEBU[0000] checking response                             status="404 Not Found"
DEBU[0000] resource not found                            fields.msg="request failed with response body group does not exist\n" status="404 Not Found"
DEBU[0000] sending request to cortex api                 method=POST url="https://prometheus-us-central1.grafana.net/api/prom/rules/bug"
DEBU[0000] checking response                             status="400 Bad Request"
ERRO[0000] requests failed                               fields.msg="request failed with response body unable to decoded rule group\n" status="400 Bad Request"
ERRO[0000] unable to load rule group                     error="failed request to the cortex api" group=test namespace=bug

I was able to load this rule group using curl.

name: bug
rules:
  - record: "test:scalar:bug"
    expr: |2
        vector(1)
      or
        vector(2)

curl -u $CORTEX_USER:$CORTEX_KEY "$CORTEX_URL/api/prom/rules/bug" -H "Content-Type: application/yaml" --data-binary @rules-bug-api.yml -i
HTTP/2 202                                                                                                                                                                                                           
content-length: 58                                                                                                                                                                                                   
content-type: application/json                                                                            
date: Fri, 20 Nov 2020 16:41:23 GMT                                                                                                                                                                                  
via: 1.1 google                                                                                                                                                                                                      
alt-svc: clear                                                                                            
                                                                                                          
{"status":"success","data":null,"errorType":"","error":""}

I'm still unable to print the rules:

INFO[0000] log level set to debug
DEBU[0000] sending request to cortex api                 method=GET url="https://prometheus-us-central1.grafana.net/api/prom/rules"
DEBU[0000] checking response                             status="200 OK"
FATA[0000] unable to read rules from cortex, yaml: line 3: did not find expected key

But I can GET them from the API:

curl -u $CORTEX_USER:$CORTEX_KEY "$CORTEX_URL/api/prom/rules"
bug:
    - name: bug
      rules:
        - record: test:scalar:bug
          expr: |4
              vector(1)
            or
              vector(2)

Also, this rule does not run! I don't see the test:scalar:bug metric in my database.

If I create the same rule on a single line, then it works, so I think both Cortex and Cortextool has an issue with the YAML block quotes with an indentation indicator syntax as described in Prometheus docs.

Allow cortextool to delete a whole namespace

Right now when you try to delete, it only allows a group specification.

Add a flag to check whenever rules have colons (:) on their names

Recording rules have best practices to follow, one of them is the level:metric:operations naming convention.

To ensure, rules are named in a proper way within a file I think we could add a flag to rules lint that ensures the metric name follows this.

`alertmanager load` does not use the template files

The help text for alertmanager load show the usage as:

usage: cortextool alertmanager load <config> [<template-files>...]

However, the implementation does not use the template files for anything.
https://github.com/grafana/cortextool/blob/master/pkg/commands/alerts.go#L76

`rules list` command should support json or yaml output

I'd like to be able to list rules in Cortex, then programatically process them. The output of rules list is great for the human eye, but needlessly difficult to program around.

We should add an -o, --output flag to support YAML output. JSON output would also be appreciated, though plenty of client tools can make this conversion as necessary.

Linting includes empty tags

As we lint/prepare we should not include the tags that are empty.

Specifying leading directories in path to template files causes parsing errors

I created an issue on Cortex (cortexproject/cortex#3357) about weird directory processing behavior on the alertmanager side, but @gotjosh suggested I create an issue here to address the root problem. I wouldn't assume the path that I'm asking cortextool to read it from on the local machine would have any effect on how cortex processes the file on the backend. I'm not sure the path should be sent to the backend. Even the filename is kind of annoying to have to be sent, but to match up the template name with the alert yaml I think that is necessary.

This seems like an issue where the tool should ignore the directory specified and not send that full path to the backend.

Add tenant ID header with loadgen requests

If a user does not front cortex with an authentication gateway, adding the tenant ID by default can be useful.

Homebrew formula

It would be great if this was packaged for homebrew to make it easier to update/manage via brew.

Add namespace flag to configure rules commands

Currently, cortextool always sets the namespace based on the name of the file. This behavior results in rule organization that feels quite unnatural. For example, if we want to define one alert per file, we end up with an absurd number of namespaces.

Additionally, not having control over the namespace makes the new sync functionality difficult to use. It allows us to ignore namespaces, but since there are so many namespaces that are dynamically created, this flag doesn't do anything particularly useful for us. We would much rather use sync with a specific namespace, then have all the changes applied within that namespace.

grafana / cortex-tools Goto Github PK

cortex-tools's People

Contributors

Stargazers

Watchers

Forkers

cortex-tools's Issues

How to reproduce?

additional data

expected behaviour

Recommend Projects

Recommend Topics

Recommend Org

Jobs