GithubHelp home page GithubHelp logo

k8_networking_health's Introduction

K8 Networking Health

KNH is a tool to check the networking health of your kubernetes cluster.

Overview

KNH is a kubernetes networking health checker based on the bully alogrithm. It is meant to be deployed as a kubernetes daemonset or a deployment. Once deployed through a leader election mechanism a coordinator is elected and the coordinator is responsible for communicating with other KNH pods in the cluster. In case coordinator fails to communicate with the pod a webhook is triggered to indicate the failure of the peer. In case of failure of coordinator through election mechanism a new coordinator is elected and the process will continue.

Status of all the nodes on which KNH pods are spawned will also be posted to a webhook

How it works ?

KNH is based on the modified version of bully algorithm implimentation found here TimTosi/bully-algorithm

Each KNH pods hosts a bully which exposes two http endpoint /ping to respond to health check and /coordinator to set coordinator

Using the client-go package information about the IP's of other KNH pods are obtained and added as peers. IP's are updated after each endpoint refresh which may occur due to re-deployment or KNH pods being scaled up or down

In case coordinator fails to communicate with the pod a post request will be made to configured endpoint with the following payload

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "peer": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string"
        },
        "address": {
          "type": "string"
        },
        "alive": {
          "type": "boolean"
        },
        "node_name": {
          "type": "string"
        }
      },
      "required": [
        "id",
        "address",
        "alive",
        "node_name"
      ]
    },
    "coordinator": {
      "type": "string"
    },
    "coordinator_address": {
      "type": "string"
    }
  },
  "required": [
    "peer",
    "coordinator",
    "coordinator_address"
  ]
}
{"peer":{"id":"my-node1@103205","address":"10.32.0.5:8080","alive":false,"node_name":"my-node1"},"coordinator":"my-node@103209","coordinator_address":"10.32.0.9:8080"}

This webhook will be triggered once when the pod which was responsive becomes unresponsive, if the pod becomes responsive again it will be added back to list of peers with alive as true but no webhooks will be triggered

Satus of all the peers will also be posted after each leader election to configured endpoints with the following payload

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "peer_map": {
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "id": {
              "type": "string"
            },
            "address": {
              "type": "string"
            },
            "alive": {
              "type": "boolean"
            },
            "node_name": {
              "type": "string"
            }
          },
          "required": [
            "id",
            "address",
            "alive",
            "node_name"
          ]
        }
      ]
    },
    "coordinator": {
      "type": "string"
    },
    "coordinator_address": {
      "type": "string"
    }
  },
  "required": [
    "peer_map",
    "coordinator",
    "coordinator_address"
  ]
}
{"peer_map":[{"id":"my-node1@103205","address":"10.32.0.5:8080","alive":true,"node_name":"my-node1"}],"coordinator":"my-node0@103209","coordinator_address":"10.32.0.9:8080"}

Installation and using it in a K8 Cluster

  • Clone the repo
  • Stat to the project root and build the docker image by running docker build -t knh:v1.0 .
  • Create a sample deamonset as show here daemonset.yaml
  • Run kubectl apply -f daemonset.yaml
  • Verify all the pods are up and running

Configuration

Time units are in seconds

Name Required Default Description
POD_IP Yes - IP of the pod that has to be passed to container as environment variable
MY_NODE_NAME Yes - Node Name on which the pod has spawned, this is used to generate id for a the bully and while making post request to cluster health end point url's
SERVICE_NAME Yes - Service name targeting the Daemonset/Deployment
NAMESPACE Yes - Namespace of the Daemonset/Deployment
CONNECT_MAX_RETRIES No 5 Each pod ties to establish connection to peer, this environment variable controls how many times pod has to retry before giving up on establishing connection
CONNECT_COOLDOWN_PERIOD No 2 Time between each connection retries attempts
SEND_MAX_RETRIES No 5 Coordinator sends election to other pods, this this environment variable controls how many times pod has to retry sending election to other pod before giving up in case it fails to send/
Note : Pods to which coordinator fails to send election will be marked as dead
SEND_COOLDOWN_PERIOD No 1 Time between each send retry attempts
ELECTION_COOLDOWN_PERIOD No 15 Time to sleep after each election
SANITY_CHECK_URL No - Coordinator will query the given url to determine if its healthy or not, this is done to ensure that in case of coordinator failure multiple post request wont be send cluster health endpoints
PEER_STATUS_URL No - URL to send the dead peer status as post request
CLUSTER_STATUS_URL No - URL to send the cluster status as post request
LOG_LEVEL No info Following levels can be set

panic
fatal
error
warn
info
debug
trace

k8_networking_health's People

Contributors

glendsoza avatar glen-uc avatar

Watchers

 avatar

Forkers

prakx87

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.