GithubHelp home page GithubHelp logo

robusta-dev / robusta Goto Github PK

View Code? Open in Web Editor NEW
2.4K 31.0 238.0 57.19 MB

Kubernetes observability and automation, with an awesome Prometheus integration

Home Page: https://home.robusta.dev/

License: MIT License

Python 98.94% Shell 0.56% Dockerfile 0.25% Smarty 0.26%
kubernetes monitoring notifications devops prometheus dashboard grafana alerting alertmanager observability

robusta's Introduction

Robusta.dev

Keep your Kubernetes microservices up and running

Connect your existing Prometheus, gain 360ยฐ observability

(Prometheus recommended, but not required)

twitter robusta slack robusta LinkedIn Youtube

๐Ÿ’ป About the project

Robusta is both an automations engine for Kubernetes, and a multi-cluster observability platform.

Robusta is commonly used alongside Prometheus, but other tools are supported too.

By listening to all the events in your cluster, Robusta can tell you why alerts fired, what happened at the same time, and what you can do about it.

Robusta can either improve your existing alerts, or be used to define new alerts triggered by APIServer changes.

๐Ÿ› ๏ธ How it works

Robusta's behaviour is defined by rules like this:

triggers:
  - on_prometheus_alert:
      alert_name: KubePodCrashLooping
actions:
  - logs_enricher: {}
sinks:
  - slack

In the above example, whenever the KubePodCrashLooping alert fires, Robusta will fetch logs from the right pod and attach them to the alert. The result looks like this:

Robusta also supports alert-remediations:

Over 50 types of automations and enrichments are built-in ยป

(back to top)

๐Ÿ“’ Installing Robusta

  1. Install our python cli:
python3 -m pip install -U robusta-cli --no-cache
  1. Generate a values file for Helm:
robusta gen-config
  1. Install Robusta with Helm:
helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
helm install robusta robusta/robusta -f ./generated_values.yaml

Detailed instructions ยป

๐Ÿ’กHow can I use Robusta?

  • Enhanced Prometheus Alerts: All your Prometheus alerts are transformed with better structure, labels, and priority details.
  • Enrichment: Receive alerts with graphs from Prometheus, application logs, Kubernetes events and more without any extra configuration.
  • Alert Routing: Send alerts to different teams based on the namespace, alert type or even on a different chat app all together.
  • Automatic Remediation: Want to run a bash script when an alert is triggered? How about creating a new Job and gathering some data? Done!
  • Resolve Jira Tickets: Enriched Jira tickets are created for specific alerts, if the issue is resolved they are marked resolved automatically.
  • Integrations: Get everyday alerts on Slack, and weekly application efficiency reports via email. Use Robusta's 15+ integrations to bring enriched alerts directly to your teams.

๐Ÿ“ Documentation

Interested? Learn more about Robusta

Full documentation ยป

(back to top)

โœ‰๏ธ Contact

(back to top)

๐Ÿ“‘ License

Robusta is distributed under the MIT License. See LICENSE.md for more information.

๐Ÿ• Stay up to date

We add new features regularly. Stay up to date by watching us on GitHub.

robusta's People

Contributors

aantn avatar alikhanxgrid avatar anfatum avatar arikalon1 avatar avi-robusta avatar avinashupadhya99 avatar daanvinken avatar djarv1337 avatar ganeshrvel avatar jsoref avatar k4kratik avatar kotlickya avatar leavemyyard avatar levtomer66 avatar lippertmarkus avatar martynbristow avatar mershal avatar michmartineau avatar neonsludge avatar oscgu avatar pablos44 avatar paoloyx avatar pavangudiwada avatar rishavmehra avatar robertszefler avatar roiglinik avatar samalex0808 avatar sheeproid avatar shubh28698 avatar wahajxgrid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

robusta's Issues

Improve Sinks Documentation

Today our sinks are documented in two places:

  1. On the Sinks page (and subpages of it)
  2. On the Configuration Guide page

I've seen a few people confused by this. They go to the Sinks page and read that we support, e.g., Kafka but they can't find details on how to configure Kafka.

I suggest we move the details on how to configure each sink out of the Configuration Guide page (the 2nd page mentioned above) and into a unique page for each sink. Each sink should have a subpage under the Sinks page and it should have both the screenshot and the details on yaml configuration.

@arikalon1 wdyt?

Using playbooksRepos causes error: "coalesce.go:163: warning: skipped value for playbookRepos: Not a table."

Trying to enable git playbooks in my private generated_values.yaml file causes an error
coalesce.go:163: warning: skipped value for playbookRepos: Not a table.

Steps to reproduce the behavior:

  1. in the generated_values.yaml file add:
playbookRepos:
  some_git_playbooks:
    url: "[email protected]:robusta-dev/my-playbooks.git"
  1. run helm instal robusta ...
  2. See error

Expected behavior
The git playbooks repo should be installed and loaded into Robusta runner

Support overriding the default playbooks PVC size

when setting playbooksPersistentVolume: true, the chart is trying to create a PVC of 128Mi
Some storage classes doesn't support such a small size, and the PVC creation fails.

Steps to reproduce the behavior:

  1. set the default storage class to io2
  2. playbooksPersistentVolume: true in the values.yaml file
  3. Run helm install robusta ...
  4. See error

Expected behavior
There should be a helm value that allows overriding the default PVC size

Routing messages

Hi,

thank you for the great tool!

We might be able to contribute on the feature request below but I wanted to first check if you had any thoughts, design concerns or suggestions:

It would be nice to be able to route the robusta messages similarly to how alertmanger does it.
Here's an example for Slack:

- slack_sink:
    name: prod_alerts
    match:
       namespace: prod-*
    slack_channel: prod-issues

would send messages where namespace matches prod-* to the prod-issues slack channel.
Ideally it should be possible to match against all attributes of the Finding class

thanks!

Playbooks pushed to namespace other than default are not present in `robusta playbooks list`

Describe the bug
Playbooks pushed to namespace other than default are not present in robusta playbooks list

To Reproduce
Steps to reproduce the behavior:

  1. Install robusta in a namespace other than the default using helm install robusta robusta/robusta -f ./generated_values.yaml -n robusta --create-namespace
  2. Create a custom playbook repository by following https://docs.robusta.dev/master/developer-guide/actions/playbook-repositories.html
  3. Load the playbook using robusta playbooks push ./my-playbooks-project-root --namespace robusta
  4. Run robusta playbooks list --namespace robusta | grep 'my_action', it returns nothing.

Expected behavior
The custom playbook pushed should be present in the list of playbooks returned by robusta playbooks list --namespace robusta

Screenshots
image
Happy to provide the contents of pyproject.toml and the file in my_action

Robusta version

robusta version
version 0.8.26

Additional context
N/A

Support for ElasticSearch

Is your feature request related to a problem? Please describe.
how we can send events to ElasticSearch

Add playbook action to monitor changes to ClusterRoleBindings

Motivation
It is useful to track changes to ClusterRoleBindings to stay on top of who has what permissions.

Suggested Feature
Robusta already has triggers for ClusterRoleBinding changes (see docs) but there are no builtin actions setup for those triggers. We should add an action called cluster_permissions_watcher which notifies when ClusterRoleBindings change and outputs summarized information about the change.

Alternatives
You can monitor ClusterRoleBindings today using the resource_babysitter action (see tutorial and docs) but the output there is very generic and technical. (It just shows a diff.) If we are going to implement an action for this it should be optimized for ClusterRoleBindings and print more useful data like "The ClusterRole named XYZ now has permission to...."

Exploring failed AWS API requests

I would like to have a way to inspect and analyze failed requests to the AWS API, mainly those that occur when a deployment to Kubernetes is made. In this case, I usually have access to the request's UUID, but not to the request itself or to the response.

For example, I failed to deploy an ingress controller due to a permissions issue. During the deployment process, I got events describing an unauthorized operation with a specific UUID. If I had information about the request, I could deduce what the operation was, and what were the missing permissions.

Get visibility for ingresses

Currently I can't use robusta to get notifications about ingresses.
I would like to get notifications about ingresses (or other resources) the same way I get them about pods.

Embed images directly in Slack messages instead of attaching them to the message

Today images are attached to Slack messages as attachments. It would be nice to embed the images directly into the message itself so that you don't need to click to see them.

This StackOverflow question seems relevant

When implementing this, we will need to verify that images remain private and are not available outside of the Slack community in which they're shared.

We probably can't embed SVG images and that's OK.

Thanks to @tim-sendible for the idea

Clarify in Robusta docs and error messages that Slack channel needs to exist first

In other words, Robusta doesn't create the channel.

Add something to error messages in gen-config and perhaps in robusta log output. E.g.

Cannot send to Slack channel XYZ. Please verify that the channel exists and the Robusta app was added to the channel. (See video in [docs](https://docs.robusta.dev/master/catalog/sinks/slack.html#sending-robusta-notifications-to-a-private-channel))

Thank you to @orclassiq for reporting

Robusta lost connection to Prometheus when some pods got CPU Throttling

Describe the bug
After some pod got CPU Throttling, Robusta know about this when connected to Prometheus. But after that, Robusta lost connection to Prometheus

To Reproduce
Steps to reproduce the behavior:

  1. Install Robusta version 0.8.23 with Helm
  2. Some pods got CPU throttling. Robusta try to query some information on Prometheus and after that It lost connection to Prometheus and can't send any alert from Prometheus. But some basic actions like notify when pod crashing still work

Expected behavior
Robusta should send a notify to let user know about the error (can't do some queries to Prometheus)

Logs

2022-02-17 04:35:15.024 INFO     Successfully loaded Kubernetes resource happy-backend-7bdcf76cf-p54t7 for alert CPUThrottlingHigh
2022-02-17 04:35:15.231 INFO     Successfully loaded Kubernetes resource happy-backend-7bdcf76cf-p54t7 for alert CPUThrottlingHigh
2022-02-17 04:37:24.557 WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f578cc17310>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/query_range?query=sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_throttled_periods_total%7Bcontainer%21%3D%22%22%7D%5B5m%5D%29%29+%2F+sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_periods_total%5B5m%5D%29%29+%3E+%2825+%2F+100%29&start=1644971170&end=1645072515&step=1689.0859742833334&timeout=90.0
2022-02-17 04:37:24.558 WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f578ccdfd30>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/query_range?query=sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_throttled_periods_total%7Bcontainer%21%3D%22%22%7D%5B5m%5D%29%29+%2F+sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_periods_total%5B5m%5D%29%29+%3E+%2825+%2F+100%29&start=1645068915&end=1645072515&step=60.0&timeout=90.0
2022-02-17 04:37:24.560 WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f578cbb5160>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/query_range?query=sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_throttled_periods_total%7Bcontainer%21%3D%22%22%7D%5B5m%5D%29%29+%2F+sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_periods_total%5B5m%5D%29%29+%3E+%2825+%2F+100%29&start=1644970270&end=1645072515&step=1704.0781680833331&timeout=90.0
2022-02-17 04:39:37.662 WARNING  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5770701190>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/query_range?query=sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_throttled_periods_total%7Bcontainer%21%3D%22%22%7D%5B5m%5D%29%29+%2F+sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_periods_total%5B5m%5D%29%29+%3E+%2825+%2F+100%29&start=1644970270&end=1645072515&step=1704.0781680833331&timeout=90.0
2022-02-17 04:39:37.663 WARNING  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f57707014f0>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/query_range?query=sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_throttled_periods_total%7Bcontainer%21%3D%22%22%7D%5B5m%5D%29%29+%2F+sum+by%28container%2C+pod%2C+namespace%29+%28increase%28container_cpu_cfs_periods_total%5B5m%5D%29%29+%3E+%2825+%2F+100%29&start=1645068915&end=1645072515&step=60.0&timeout=90.0

Additional infos
After restart the deployment Robusta can connect to prometheus again and send alerts.
I use Prometheus Stack with the below configuration

  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 30m
      repeat_interval: 4h
      receiver: 'robusta'
      routes:
      - match:
          alertname: Watchdog
        receiver: 'null'
    receivers:
    - name: 'null'
    - name:  'robusta'
      webhook_configs:
        - url: 'http://robusta-runner.robusta.svc.cluster.local/api/alerts'
          send_resolved: true

External action not found python_debugger

I got the above error when trying to manually trigger an action as instructed in the documentation.

"robusta playbooks trigger python_debugger name=podname namespace=default"

I'm new to kubernetes and everything related, so please let me know what information you need to help me help you debug this issue. Someone mentioned that they will need to see the logs for robusta-runner. Let me know where I can find these logs so I can get them to you for you to review.

Thanks.

Add action to show files in Volume

Motivation
It can be hard to understand exactly what data is contained inside a Kubernetes volume. We can provide visibility with Robusta actions

Suggested Implementation
Add a Robusta action which:

  1. Takes a Kubernetes volume as input
  2. Runs a "reader" job/pod which mounts that volume
  3. The reader pod should run ls -R on the volume to show all files.
  4. The Robusta action should gather the reader pod's output and send it to Slack/other sinks by creating a finding

Bonus
Add support for VolumeSnapshots too. For VolumeSnapshots you will first need to create a temporary Volume based on the snapshot, then do the above, and finally delete the temporary volume.

Caveats
The above sometims wont work if the Volume is in use - depending on the mount's AccessMode. This can be fixed various ways. For example, for ReadWriteOnce it can be bypassed by running the reader pod on the same node as the pod that is currrently using the volume. Alternatively, it could be fixed for all AccessModes by always creating a VolumeSnapshot and reading the snapshot not the original.

In any event, a first version doesn't need to support any of this.

CRD support for watching?

Is your feature request related to a problem? Please describe.
I'd like to be able to watch for changes to CRDs, but the docs don't describe that. It appears it's not supported.

Describe the solution you'd like
Extend support to register CRDs to trigger on changes

Describe alternatives you've considered
Doing it myself (writing a small service), looking at kube-watch and the open PRs there for how this may be supported.

Additional context
N/A

Thank you!

Add better documentation on CallbackBlocks

CallbackBlocks are part of Robusta's API. They're used to write playbooks where the user clickes a button (e.g. in Slack) and this triggers another playbook which runs only when the user clicked.

This powers, for example, the playbook that lets you increase the HPA max replicas. (See docs)

Today CallbackBlocks aren't documented well in the docs on writing playbook actions

We should document them there in a new page in that section of the docs. The page will be all about callbacks, how they work, and how to write a playbook that uses them.

For reference, you can search for CallbackBlock in the playbooks/ directory and read the existing playbooks to see how it works.

Poor alt text on index page

Describe the bug

To Reproduce
Steps to reproduce the behavior:

  1. Turn off loading images in your web browser
  2. Go to https://docs.robusta.dev/master/

Expected behavior
Alt text should be meaningful.

If the logo is the logo for Robusta, then the word Robusta should be in the alt text.

Screenshots
image

Desktop (please complete the following information):

  • OS: macOS Big Sur
  • Browser Chrome
  • Version 98

Additional context
Add any other context about the problem here.

Add pprof playbook action

Motivation
Many golang applications expose debug information using pprof. We should add a playbook action to collect that data.

Specify the cluster name in Slack messages

In default messages we send to Slack, we should specify the cluster name. This is important for multi-cluster environments / environments with more than one AlertManager.

We should make sure that we send the cluster name for:

  1. Crashloop notifications
  2. Prometheus alerts

If possible, it would be nice to let users customize our bot-name (Robusta) or at least to specify the cluster name there too.

Thanks again to @tim-sendible for the feedback

robusta gen-config fails with python error for invalid account token

Describe the bug
Providing an invalid account token to the robusta CLI during the generating of the value.yaml file throws different python errors for different scenarios.

To Reproduce
Steps to reproduce the behavior:

  1. run robusta gen-config
  2. Provide Y to Would you like to use Robusta UI? This is HIGHLY recommended. [y/N]: and to Do you already have a Robusta account? [y/N]:
  3. Provide an invalid account token for Please insert your Robusta account token:. Some examples are inserting the account_id or the actual account token with a few characters removed/changed.
  4. See the error

Expected behavior
Robusta should catch these errors and display an informative error message.

Errors

  • Changing some characters of the actual account token -
Traceback (most recent call last):
  File "/home/avinashupadhyaya/.local/bin/robusta", line 8, in <module>
    sys.exit(app())
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/robusta/cli/main.py", line 234, in gen_config
    token = json.loads(base64.b64decode(robusta_api_key))
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 375 (char 374)
  • Removing some characters from the account token
Traceback (most recent call last):
  File "/home/avinashupadhyaya/.local/bin/robusta", line 8, in <module>
    sys.exit(app())
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/robusta/cli/main.py", line 234, in gen_config
    token = json.loads(base64.b64decode(robusta_api_key))
  File "/usr/lib/python3.9/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
  • Providing the account_id for the account token
Traceback (most recent call last):
  File "/home/avinashupadhyaya/.local/bin/robusta", line 8, in <module>
    sys.exit(app())
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/robusta/cli/main.py", line 234, in gen_config
    token = json.loads(base64.b64decode(robusta_api_key))
  File "/usr/lib/python3.9/json/__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

Additional context

N/A

Installing Robusta into separater ns

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Usually, when installing a helm chart, it will install all resources related to the deployment into a separate ns. However, when installing robusta, it does not create a new ns but installs everything into the default ns.
Screenshot 2022-02-28 at 11 13 36

Describe the solution you'd like
A clear and concise description of what you want to happen.

I would like a new ns to be created called robusta.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

N/A

Additional context
Add any other context or screenshots about the feature request here.

I think it is common practice to have a separate ns so it would be nice if that was the case when installing the Robusta Helm Chart.

New Sink: ntfy.sh

Is your feature request related to a problem? Please describe.
I'd like an additional sink, and one that's very extensible.

Describe the solution you'd like
I use ntfy.sh for push notifications to my phone. It accepts PUT/POST to an endpoint, and then pushes them to the subscriber. I suspect there are other tools that would also accept a simple payload that could make use of this kind of output.

Send resource creations and deletions to the Robusta UI, not just modifications

Today we don't send resource creations/deletions to the Robusta UI. We should send them because they add useful information.

The relevant code to change is in values.yaml. The change is probably trivial, but it requires testing. If you work on this, please document any manual testing that you did to verify this works.

Relevant code in values.yaml:

platformPlaybooks:
- triggers:
    - on_deployment_update: {}
  actions:
    - resource_babysitter: {}
  sinks:
    - "robusta_ui_sink"
- triggers:
    - on_daemonset_update: {}
  actions:
    - resource_babysitter: {}
  sinks:
    - "robusta_ui_sink"
- triggers:
    - on_statefulset_update: {}
  actions:
    - resource_babysitter: {}
  sinks:
    - "robusta_ui_sink"

See also: https://docs.robusta.dev/master/catalog/triggers/kubernetes.html

Helm chart defaults to runner image version 0.0.0, which doesn't exist.

Describe the bug
Helm values file is defaulted to runner version 0.0.0, which causes the installation documented on the website to fail.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://docs.robusta.dev/master/getting-started/installation.html and follow the instructions.

The runner never deploys because the image tag is defaulted to 0.0.0

Expected behavior
A clear and concise description of what you expected to happen.
I should be able to install this like a normal Helm chart, with sensible defaults.

Allow silencing Prometheus alerts directly from Slack (and other sinks)

Motivation
Sometimes alerts are really noisy and you just want to silence them temporarily. You can do so by opening AlertManager and creating a silence in the UI (or by using the amtool cli) but you can't do so directly from Slack.

Suggested Solution

  1. Write a playbook action which takes a Prometheus alert and silences it
  2. Write a playbook action which is triggered on every Prometheus alert and adds a CallbackBlock which lets the user trigger (1) when they choose

Different playbooks sinks override one another

When defining multiple playbooks, with different sinks, sinks of one playbook may override other playbooks sinks.

Steps to reproduce the behavior:

  1. Define 2 sinks, sink_a, and sink_b
  2. Define 2 playbooks: both triggered by on_deployment_update, both has resource_babysitter action. For one define sink_a and for the other sink_b
  3. update one of the cluster's deployments
  4. The update message is received on sink_b only (assuming it's defined last)

Expected behavior
An update message is sent to both sinks

thanks @tuananh2508 for reporting this

Remember namespace and cluster in UI

When I logout/login to UI, I need to select the cluster and namespace (instead of default ALL). I want to configure what are my default cluster and namespace to show, and at least to remember the last choose

Python memory profiler start and stop indications

Is your feature request related to a problem? Please describe.
Using the python memory profiler is tricky, it is unclear when the profiling actually starts and stops, so timing related actions in difficult.

Describe the solution you'd like
I would like an indication to when the profiling starts and when it stops.

Hierarchy for robusta playbooks

  • As of now all the robusta playbooks are within one single folder
  • We are intending to add many more playbooks in the future Reference.
  • We need to set up the hierarchy either based on k8s resources or based on different categories.

Use new custom triggers API to refactor existing playbooks

Today we try to write Robusta actions that can be re-used in multiple scenarios. For example, an action that fetches logs should be usable both in the case of a crashing pod and a prometheus alert.

In the past, we sometimes wrote actions that included triggering logic too. For example, the restart_loop_reporter action is connected to the trigger on_pod_update. This fires very frequently and not only when a pod restarts. Therefore the restart_loop_reporter action has triggering-logic which decides when the action should even do anything.

This breaks the normal separation of triggers and actions. To solve this problem, we introduced the ability to write custom triggers. For example, you can write a crashloop_backoff trigger which inherits from on_pod_update and only fires on pod updates which are due to a crashing pod.

We should rewrite old actions to use the new custom-triggers API. This will lead to more re-usable code.

Actions to rewrite:

  • restart_loop_reporter - should be just a logs_enricher action (already exists) and a new on_restart_loop custom trigger
  • alert_on_hpa_reached_limit - should be a offer_to_resize_hpa action and a new on_hpa_max custom trigger

Allow declaring Robusta automations in Prometheus labels/annotations

This is based on feedback from @tim-sendible and I believe we've heard it at least once before from an existing user.

The general idea is that when you define a PrometheusRule (e.g. using the Prometheus Operator) you should be able to define alongside that rule the automations to run when that alert fires. This way you don't need to define the alert in one place (PrometheusRule) and the automation in another (Robusta's values.yaml)

For simple automations, this should be easy. We can parse an annotation like:

robusta.dev/action: logs_enricher

For actions which take parameters, the syntax might get more complicated.

@tim-sendible thank you for the feedback and feel free to comment if I've misunderstood or missed anything

New Sink: Telegram

Describe the solution you'd like
Telegram is a very famous tool right now for sending alerting and I think that It would be very handy if Telegram Sink is supported

More details on 'Crashing pod' Slack message

I have amount of containers inside the pod, and when the pod crashed I want to see in the Slack message which container exactly caused it. For example, my workflow is to launch some init containers before the pod starts (fetch secrets, internal dependencies, flyway), and for now I get the same message 'Crashing pod in namespace ', and I need to describe the pod to understand which container failed. I want to look ah the message and immediately know - ah, it's flyway problem, no panic.

`robusta playbooks push` fails if robusta is installed in namespace other than default and namespace is not specified

Describe the bug
robusta playbooks push fails if robusta is installed in a namespace other than the default namespace and the --namespace flag is not specified. Since it is not recommended to have pods in the default namespace, it is common to install robusta in its own namespace.

To Reproduce
Steps to reproduce the behavior:

  1. Install robusta in a namespace other than the default using helm install robusta robusta/robusta -f ./generated_values.yaml -n robusta --create-namespace
  2. Create a custom playbook repository by following https://docs.robusta.dev/master/developer-guide/actions/playbook-repositories.html
  3. Load the playbook using robusta playbooks push ./my-playbooks-project-root
  4. See an error

Expected behavior
An appropriate error should be displayed instead of the python erorr.

Errors

======================================================================
Uploading playbooks code...
======================================================================
No resources found in default namespace.
======================================================================
Runner pod not found.
======================================================================
======================================================================
Fetching logs...
======================================================================
Error from server (NotFound): deployments.apps "robusta-runner" not found
Traceback (most recent call last):
  File "/home/avinashupadhyaya/.local/bin/robusta", line 8, in <module>
    sys.exit(app())
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/robusta/cli/playbooks_cmd.py", line 84, in push
    return
  File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
    next(self.gen)
  File "/home/avinashupadhyaya/.local/lib/python3.9/site-packages/robusta/cli/utils.py", line 93, in fetch_runner_logs
    subprocess.check_call(
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'kubectl logs  deployment/robusta-runner -c runner --since=1s' returned non-zero exit status 1.

Robusta version

robusta version
version 0.8.26

Additional context

Sorry, I opened the issue without checking for the namespace flag. I found the namespace flag but would like appropriate error messages to be displayed if possible.

Initiate the golang troubleshooting guide

Hi everyone, I am new here as a member I would love to contribute to the troubleshooting section. As far as robusta docs have a python & java section guide with its own respective SDK.
go

I wanted to know and create SDK for golang so documentation and robusta's feature will be much more enhanced. Can you provide any guidance on it how we can make it and how it's functioning on python/java SDK? I wanted to learn and enhance my skills side by sidewise. Looking forward to hearing from the community.

Add Mattermost support in sinks

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
add Mattermost support in sinks

Describe alternatives you've considered
N/A

Additional context
N/A

Error installing Helm chart: Error: create: failed to create: Request entity too large: limit is 3145728

Describe the bug
Python 3.10.1
Error installing Helm chart: Error: create: failed to create: Request entity too large: limit is 3145728

To Reproduce
Steps to reproduce the behavior:

  1. cd ./helm/robusta
  2. pip install -U robusta-cli
  3. robusta gen-config
  4. helm upgrade -i robusta ./ -f ./generated_values.yaml --debug

Expected behavior
Helm chart installation success

Logs

๎‚ฐ helm upgrade -i robusta ./ -f ./generated_values.yaml  --debug
history.go:56: [debug] getting history for release robusta
Release "robusta" does not exist. Installing it now.
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: /home/nolche/Git/robusta/helm/robusta

walk.go:74: found symbolic link in path: /home/nolche/Git/robusta/helm/robusta/venv/bin/python resolves to /usr/bin/python3.10
walk.go:74: found symbolic link in path: /home/nolche/Git/robusta/helm/robusta/venv/bin/python3 resolves to /usr/bin/python3.10
walk.go:74: found symbolic link in path: /home/nolche/Git/robusta/helm/robusta/venv/bin/python3.10 resolves to /usr/bin/python3.10
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD alertmanagerconfigs.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD probes.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD thanosrulers.monitoring.coreos.com is already present. Skipping.
Error: create: failed to create: Request entity too large: limit is 3145728
helm.go:88: [debug] Request entity too large: limit is 3145728
create: failed to create
helm.sh/helm/v3/pkg/storage/driver.(*Secrets).Create
        helm.sh/helm/v3/pkg/storage/driver/secrets.go:164
helm.sh/helm/v3/pkg/storage.(*Storage).Create
        helm.sh/helm/v3/pkg/storage/storage.go:69
helm.sh/helm/v3/pkg/action.(*Install).RunWithContext
        helm.sh/helm/v3/pkg/action/install.go:340
main.runInstall
        helm.sh/helm/v3/cmd/helm/install.go:267
main.newUpgradeCmd.func2
        helm.sh/helm/v3/cmd/helm/upgrade.go:124
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.2.1/command.go:902
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
        runtime/proc.go:255
runtime.goexit
        runtime/asm_amd64.s:1581

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.