GithubHelp home page GithubHelp logo

nataizya-s / kubelet-monitor Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 9 KB

A script to allow the kubelet on AWS EKS worker nodes to be monitored and automatically restarted if failing.

Shell 100.00%

kubelet-monitor's Introduction

kubelet-monitor

A script to allow the kubelet on AWS EKS worker nodes to be monitored and automatically restarted if failing. The systemd service does define a "Restart=on-failure" that should restart the kubelet when it fails. However, this script adds an additional layer of redundancy to ensure that the kubelet service always recovers.

If the restart of the kubelet fails (after 5 attempts), the logs on the node are collected and the node is automatically terminated.

The logs on the worker node are collected using the script here.

This script can be added to the userdata of the worker nodes (i.e. via the launch template/launch configuration).

Prerequisites

IAM Permissions on Node Instance Role

IAM Action Reason
ec2:TerminateInstances This will be needed to allow the script to terminate the instance when the kubelet fails to start.
s3:PutObject This will be used to push the collected logs from the instance to a the specified S3 bucket

Resources

Resource Description
S3 Bucket This needs to be set in the healthchecker.sh script where the "s3_bucket" variable is set. Replace the with your existing bucket that the logs will be pushed to. Please note that the bucket policy needs to allow the worker node instance role as well.

How to make the script work

Once the script is added to the userdata, it will also be important to create a cron for the script to run on a regular schedule. For example, the cron could run the healthchecker.sh script every 5 minutes by using the following cron:

    */5 * * * * ./var/healthchecker.sh

It must be noted that this cron needs to be added to the userdata as well to ensure that it is configured for every worker node in the nodegroup. Steps on how to setup a cron can be found here.

Things to note

Node Draining

The script does not drain the node before the node is terminated. This would mean that there may be downtime if the node has pods on it that are running but aren't managed by a controller.

Script Changes

Feel free to make changes to the script to make it more robust and consider best practices. This script just offers a base to allow for automated management of worker nodes and ensure that they are highly available and self healing.

kubelet-monitor's People

Contributors

nataizya-s avatar

Stargazers

Sai avatar

Watchers

Sai avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.