GithubHelp home page GithubHelp logo

erda-project / kubeprober Goto Github PK

View Code? Open in Web Editor NEW
136.0 16.0 39.0 274.85 MB

Large-scale Kubernetes cluster diagnostic tool.

License: Apache License 2.0

Dockerfile 0.71% Makefile 1.31% Go 86.02% Shell 11.97%
go docker kubernetes golang k8s cluster-analysis observability

kubeprober's Introduction

English | 简体中文

KubeProber

Demo

Screenshot

What is KubeProber?

KubeProber is a diagnostic tool designed for large-scale Kubernetes clusters. It is used to perform diagnostic items in the kubernetes cluster to prove that the functions of the cluster are normal, KubeProber has the following characteristics:

  • Large-scale clusters support Supports multi-cluster management, supports configuring the relationship between clusters and diagnostic items on the management side and viewing the diagnostic results of all clusters in a unified manner;
  • Cloud Native The core logic is implemented by operator, providing complete Kubernetes API compatibility;
  • Extensible Support user-defined diagnostic items

Different from the monitoring system, KubeProber proves the functions of the cluster are normal from the perspective of diagnostic. Monitoring is a forward link and cannot cover all scenarios in the system. The monitoring data of each environment in the system is normal and cannot prove the system is 100% normal, so a tool is needed to prove the availability of the system from the reverse, and fundamentally to discover unavailable points in the cluster before users, such as:

  • Whether all nodes in the set can be scheduled, whether there are special taints, etc;
  • Whether the pod can be normally created, destroyed, and verified the entire link from kubernetes, kubelet to docker;
  • Create a service and test unicom to verify whether the kube-proxy link is normal;
  • Resolve an internal or external domain name to verify whether CoreDNS is working properly;
  • Visit an ingress domain name to verify whether the ingress component in the cluster is working properly;
  • Create and delete a namespace to verify whether the related webhook is working properly;
  • Perform operations such as put/get/delete on Etcd to verify whether Etcd is running normally;
  • Verify the normal operation of MySQL through the operation of mysql-client;
  • Simulate users to log in and operate the business system to verify whether the main business process is frequent;
  • Check whether the certificates of each environment have expired;
  • Expiration check of cloud resources;
  • ... more!

Architecture

Kubeprober Architecture

probe-master

The operator running on the management cluster. This operator maintains two CRDs, one is Cluster, which is used to manage the managed cluster, and the other is Probe, which is used to manage the built-in and user-written diagnostic items, probe-master Through watch these two CRDs, the latest diagnostic configuration is pushed to the managed cluster, and probe-master provides an interface for viewing the diagnosis results of the managed cluster.

probe-agent

The operator running on the managed cluster. This operator maintains two CRDs. One is a Probe that is exactly the same as the probe-master. The probe-agent executes the cluster’s diagnostic items according to the definition of the probe. The other is ProbeStatus for Record the diagnosis results of each Probe. Users can view the diagnosis results of the cluster through kubectl get probestatus in the managed cluster.

Getting started

Get start with this doc.

To start developing kubeprober

You can run and build probe-master and probe-agent locally. please make sure that ~/.kube/config can access the kubernetes cluster.

install crd && webhook resources

make dev

run probe-master

APP=probe-master make run

run probe-tunnel

# export env get from the create cluster crd
export PROBE_MASTER_ADDR="http://127.0.0.1:8088"
export CLUSTER_NAME="moon"
export SECRET_KEY="a944499f-97f3-4986-89fa-bc7dfc7e009a" 

# run probe-agent
APP=probe-tunnel make run

run probe-agent

APP=probe-agent make run

probe-agent parameters precedence order and format

# precedence order and format, each item takes precedence over the item below it, (e.g --cluster-name)
flag       --cluster-name
env          CLUSTER_NAME
config       cluster_name
default

build binary file

APP=probe-master make build
APP=probe-agent make build

build image

# build with default version: latest
# output image format: kubeprober/probe-master:latest
APP=probe-master make docker-build

# build with custom version: v0.0.1
# output image format: kubeprober/probe-master:v0.0.1
APP=probe-master V=v0.0.1 make docker-build

# build with default version: latest
APP=probe-agent make docker-build

# push with default version: latest
APP=probe-agent make docker-push

# build & push
APP=probe-agent make docker-build-push

Write your prober

custom probes

Contributing

Contributions are always welcomed. Please refer to Contributing to KubeProber for details.

Contact Us

If you have any questions, please feel free to contact us.

License

KubeProber is under the Apache 2.0 license. See the LICENSE file for details.

kubeprober's People

Contributors

ai-run avatar cxiongwei avatar fish-pro avatar harverywxu avatar iutx avatar jferic avatar luobily avatar njxz avatar qinlaodeqingwa avatar sixther-dc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeprober's Issues

standalone案例无法正确运行

What happened:

standalone案例无法正确运行,现象:

  1. probe-agent启动报错找不到configmap-probeagent
  2. probe-agent日志报错各类资源list等权限forbidden
  3. Prober 示例运行job提示无法找到configmap-extra-config

What you expected to happen:

standalone案例正确运行

How to reproduce it (as minimally and precisely as possible):

  1. 按照文档 https://docs.erda.cloud/1.5/manual/eco-tools/kubeprober/best-practices/standalone_kubeprober.html 进行操作,prober-agent容器无法正常启动;
  2. 按照文档 https://docs.erda.cloud/1.5/manual/eco-tools/kubeprober/guides/first_prober.html 运行prober示例无法正确运行job

Anything else we need to know?:

已自行定位问题原因并修改:
dotDuck@ade3b0d#diff-ebef744877d88253a7e2a26f413155959097fe3659045131b32418fb9af80937
如果可以的话可以进行pull-request。

问题1原因:probe-agent-standalone.yaml声明问题

  • 缺少configmapprobeagent的声明(对应现象1)
  • serviceaccountkubeprober-worker声明后未使用,应该将kubeprober-worker-rolebinding绑定的serviceacccount更换为kubeprober(对应现象2)
  • probe-agent镜像更新到docker-hub中最新版(也可忽略)
  • 仍有其他configmap找不到的日志报错,但是不影响整体运行,如:dice-cluster-info、dice-tools-info、dice-addon-info

问题2原因:prober-demo-example声明问题

  • 缺少configmapextra-config的声明,但是配置内容无从参考,也找不到合适的新增位置,临时新建cm解决(对应现象3)

Environment:

  • Erda version: 无
  • Kubernetes version (use kubectl version): Kind kubernetes v1.19.11

reconciler group": "kubeprober.erda.cloud", "reconciler kind": "Probe", "name": "probe-test01", "namespace": "kubeprober", "error": "Job.batch \"probe-test01\" is invalid: [spec.template.spec.containers: Required value, spec.template.spec.restartPolicy: Unsupported value: \"Always\": supported values: \"OnFailure\", \"Never\"]"

probe yaml

apiVersion: kubeprober.erda.cloud/v1
kind: Probe
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kubeprober.erda.cloud/v1","kind":"Probe","metadata":{"annotations":{},"name":"probe-test01","namespace":"kubeprober"},"spec":{"probeList":[{"name":"probe-test01","spec":{"containers":[{"image":"kubeprober/demo-error:v0.0.1","name":"demo-error","resources":{"requests":{"cpu":"10m","memory":"50Mi"}}}],"restartPolicy":"Never"}}]}}
creationTimestamp: "2021-08-18T09:24:26Z"
generation: 1
name: probe-test01
namespace: kubeprober
resourceVersion: "1255475"
selfLink: /apis/kubeprober.erda.cloud/v1/namespaces/kubeprober/probes/probe-test01
uid: 1c6a2f23-a68c-4283-9869-0c2ed3e891c7
spec:
policy: {}
probeList:

  • name: probe-test01
    spec:
    containers:
    • image: kubeprober/demo-error:v0.0.1
      name: demo-error
      resources:
      requests:
      cpu: 10m
      memory: 50Mi
      restartPolicy: Never
      status:
      md5: 0e69f266c6e6d360c7e4130a4b4e6ff4

probe-agent error
image

诊断结果如何实现可视化操作

关于诊断的结果实现可视化没有在项目的相关文档中体现出来,

以及如何对接到prometheus 配置以及暴露的metrics 都有哪些文档中没有相关的描述 ,是否可以完善文档或者补全相关可视化操作或功能,谢谢。

i can‘t get the probestatus.

I have deployed a test case, but i can‘t get the probestatus。

➜ samples git:(master) ✗ kubectl get cluster
NAME VERSION NODECOUNT PROBENAMESPACES HEARTBEATTIME SECRETKEY PROBE AGE
moon v1.19.0 1 kubeprober 2021-08-18 08:53:25 4012ecac-08a3-4a6c-a206-b9447a13d987 33m
➜ samples git:(master) ✗
➜ samples git:(master) ✗
➜ samples git:(master) ✗ kubectl get probe
NAME RUNINTERVAL AGE
probe-cron-link-test 2 19m
probe-link-test 13m
probe-link-test1 2m47s
➜ samples git:(master) ✗ kubectl get probestatus -A
No resources found
➜ samples git:(master) ✗
➜ samples git:(master) ✗
➜ samples git:(master) ✗ kubectl get cluster --show-labels
NAME VERSION NODECOUNT PROBENAMESPACES HEARTBEATTIME SECRETKEY PROBE AGE LABELS
moon v1.19.0 1 kubeprober 2021-08-18 08:53:50 4012ecac-08a3-4a6c-a206-b9447a13d987 33m probe/probe-cron-link-test=true,probe/probe-cron-sample=true,probe/probe-link-test1=true,probe/probe-link-test=true
➜ samples git:(master) ✗

web hook svc was deleted accidentally while excuting agent's undeploy

What happened:

web hook svc was deleted accidentally while excuting agent's undeploy

How to reproduce it (as minimally and precisely as possible):

install master and agent according to the install doc,then execute APP=probe-agent make undeploy

Environment:

  • Erda version: master branch code
  • Kubernetes version (use kubectl version): 1.19.11
  • OS (e.g: cat /etc/os-release): mac os

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.