GithubHelp home page GithubHelp logo

alipay / container-observability-service Goto Github PK

View Code? Open in Web Editor NEW
83.0 4.0 14.0 9.98 MB

Simplify Kubernetes applications operation with one-stop observability services, including resource delivery SLO,root cause diagnoses and container lifecycle tracing and more.

License: Apache License 2.0

Makefile 0.08% Go 88.73% Smarty 0.05% CSS 1.07% JavaScript 3.35% Dockerfile 0.27% TypeScript 6.41% Shell 0.04%
container diagnose kubernetes observability slo tracing

container-observability-service's Introduction

Lunettes - Container Lifecycle Observability Service

Observe Your Stack, Energize Your APP

Apache-2.0 License PRs welcome!

中文

🌾 Introduction

Kubernetes is widely used for building container-as-a-service platforms, but its numerous autonomous components working together to drive the container delivery process can create significant complexity for developers and SREs.

Lunettes' comprehensive observability service leverages different observability signals - such as apiserver requests and events - to create container lifecycle SLIs/SLOs, diagnosis, and tracing services that enable developers and SREs to monitor and manage their services on Kubernetes in a digitalized manner.

By providing a user-friendly approach to troubleshooting and performance optimization, Lunettes' solution can help improve the overall quality of services on Kubernetes.

🔥 Key features

Resource Delivery SLIs/SLOs:

Lunettes calculates the time taken by the infrastructure to attempt to deliver a container (Pod on Kubernetes) and defines this metric as the container delivery SLI. Based on this metric, Lunettes recognizes the time costs associated with different container lifecycle stages, including scheduling, image pulling, IP allocation, and container starting, thereby enabling the calculation of total infrastructure time consumption. The container delivery SLO, on the other hand, is defined based on container specifications.

Lunettes' definition of the container delivery SLI/SLO enables service owners to evaluate and improve the quality of the platform's resource delivery process in a digitalized manner.

ContainerDeliverySli/Slo

Container Lifecycle Diagnose Service

To identify the root cause of any issues, Lunettes analyzes observability signals throughout the container lifecycle and assigns an error code that covers common problems such as excessive resource consumption error, configuration errors, etc.

ContainerDeliverySli/Slo

Container Lifecycle Tracing Service

By recognizing the start and end of each container lifecycle stage, Lunettes is able to construct a tracing structure that follows OpenTelemetry standards.

ContainerDeliverySli/Slo

🎬 Getting Started

Quick Start

To get started with kind quickly, see this guide.

Deploy

Step1: Bootstrap a Kubernetes cluster with Kubeadm/Kind.

The following method will expose the service through NodePort. Please make sure that your current operating environment can access the Kubernetes nodeIP.

Step2: Install Lunettes with Helm

Note: Beginning in Helm v3.8.0, OCI support is enabled by default, and it graduated from experimental to general availability. So you‘d better choose Helm v3.8.0 or above.

# Use NodePort
helm install lunettes oci://registry-1.docker.io/lunettes/lunettes-chart --version [version] \
  # Setting enableAuditApiserver to true will enable the auditing of the apiserver for you.
  # Please note that this process will restart the apiserver.
  --set enableAuditApiserver=true \
  --set grafanaType=NodePort \
  --set jaegerType=NodePort 

see available version

Step3: Find the endpoint of Lunettes dashboard service

export LUNETTES_IP=node_ip
export GRAFANA_NODEPORT=$(kubectl -n lunettes get svc grafana -o jsonpath='{.spec.ports[0].nodePort}')
export JAEGER_NODEPORT=$(kubectl -n lunettes get svc jaeger-collector -o jsonpath='{.spec.ports[0].nodePort}')

Open http://[LUNETTES_IP]:[GRAFANA_NODEPORT] in your browser and access debugpod or debugslo endpoint, the default username and password are admin:admin.

Open http://[LUNETTES_IP]:[JAEGER_NODEPORT]/search? in your browser and access trace endpoint.

🛠 Configurations

Lunettes is highly configurable. Below we give some examples of how you can adjust resource delivery SLO and container lifecycle tracing to different scenarios with simple configurations.

Resource Delivery SLO configuration

{
    "UserOnlineConfigMap":{
        "test-ns-one":"1m30s",
        "test-ns-two":"6m"
    },
    "IgnoredNamespaceForAudit":[
        "app-ns"
    ],
    "IgnoreDeleteReasonNamespace":[
        "test-ns-three",
        "test-ns-four"
    ]
}

Container Lifecycle Tracing configuration

[
  {
    "ObjectRef":{
      "Resource":"pods",
      "Name":"PodSpans",
      "APIVersion":"v1"
    },
    "ActionType":"PodCreate",
    "LifeFlag":{
      "Mode":"start-finish",
      "StartEvent":[
        {
          "Type":"operation",
          "Operation":"pod:create:success"
        }
      ],
      "FinishEvent":[
        {
          "Type":"operation",
          "Operation":"condition:Ready:true"
        }
      ]
    },
    "ExtraProperties":{
      "bizName":{
        "Name":"",
        "ValueRex":"metadata#labels#meta.k8s.com/biz-name",
        "NeedMetric":true
      }
    },
    "Spans":[
      {
        "Name":"default_schedule_span",
        "Type":"default_schedule_span",
        "SpanOwner":"k8s",
        "Mode":"start-finish",
        "StartEvent":[
          {
            "Type":"operation",
            "Operation":"schedule:default-scheduler:entry"
          }
        ],
        "EndEvent":[
          {
            "Type":"operation",
            "Operation":"schedule:binding:success"
          }
        ]
      }
    ]
  }
]

📑 Documentation

Please visit docs

💡 Community

Any questions related to Lunettes please reach us via:

container-observability-service's People

Contributors

beilineili avatar d3c3mber avatar gaius-qi avatar hnhbwlp avatar larryck avatar linuzb avatar liubin avatar llkcoder avatar tim-zhang avatar wuchang0201 avatar zhangtong007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

container-observability-service's Issues

Feat: make images download quickly in China

What would you like to be added?

Push the lunettes image to both dockerhub and the image repository deployed in china, so that Chinese users can deploy it quickly.

Why is this change required? What problem does it solve?

China's network access to dockerhub is relatively slow, providing images deployed in China can speed up the deployment of Chinese users

the domain name adjustment that cannot be modified

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

the domain name adjustment that cannot be modified

Expected behavior:

We need to modify the domain name to a configurable item

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Add grafana summary visualization

What would you like to be added?

Add display of pod status summary in Grafana, allowing users to understand the root cause of pod delivery errors.

Why is this change required? What problem does it solve?

There are a large number of key events in the pod lifecycle, and users who lack knowledge related to kubernetes will find it difficult to extract useful information from them. A summary description of errors is needed.

In addition, the user is also unaware of the slo timeout period set by lunettes. We hope to inform users before delivery is completed

low disk watermark

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

Lunettes does not provide the feature of scheduled data cleaning. When the amount of data exceeds the water mark, elasticsearch will no longer store new data, and Lunettes cannot process new data.

image

Expected behavior:

I hope that lunettes can provide scheduled cleaning features of storage. For example, when users install lunettes, they can choose to save data for a few days, instead of requiring users to manually clean the data.

Additional

No response

Lunettes version

v1.0.0

Kubernetes version

v1.20.3

HyperEvent needs a universal extraction capability.

Problem description

Currently, the extraction process in HyperEvent is customized for pod resources in operations. If users need to trace a new resource, they have to write code to add an extractor, which is quite costly.

What I expect

The extraction action in HyperEvent should be more universal and able to adapt to multiple resources.

What I want to do

  • Perform request/response object diff operation for the audit logs of Update verbs, which requires a high-performance software.

  • Establish a new Operation model to identify Patch and Update verb operations uniformly as JSON path format operations.

  • Deserialize runtime.Object using a generic approach, referencing the method used by client-go to identify unknown Custom Resources (CR), to avoid the need to introduce the user's scheme when adding a new CR.

Policy/v1 version and K8S version do not match

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

when k8s>v1.23,api-versions only supprot "policy/v1",but in the yaml file,its just set “policy/v1beta1”,

Expected behavior:

update yaml file ,update "v1beta1" to "v1"

Additional

No response

Lunettes version

No response

Kubernetes version

v1.22.0

Support custom nodeAffinity/imagePullSecrets/dnsConfig/hostPath in helm chart

What would you like to be added?

Add more options in helm chart for deployment of Lunettes

Why is this change required? What problem does it solve?

Currently the deployment from helm chart lacks flexibility in that users cannot put their customized fields, e.g. nodeAffinity, imagePullSecrets, dnsConfig, hostPath, etc.
This issue intends to provide more options for users when deploying Lunettes

Creating a default community health file

About default community health files

Community health file Description
CODE_OF_CONDUCT.md A CODE_OF_CONDUCT file defines standards for how to engage in a community. For more information, see "Adding a code of conduct to your project."
CONTRIBUTING.md A CONTRIBUTING file communicates how people should contribute to your project. For more information, see "Setting guidelines for repository contributors."
Discussion category forms Discussion category forms customize the templates that are available for community members to use when they open new discussions in your repository. For more information, see "Creating discussion category forms."
FUNDING.yml A FUNDING file displays a sponsor button in your repository to increase the visibility of funding options for your open source project. For more information, see "Displaying a sponsor button in your repository."
GOVERNANCE.md A GOVERNANCE file lets people know about how your project is governed. For example, it might discuss project roles and how decisions are made.
Issue and pull request templates and config.yml Issue and pull request templates customize and standardize the information you'd like contributors to include when they open issues and pull requests in your repository. For more information, see "About issue and pull request templates."
SECURITY.md A SECURITY file gives instructions for how to report a security vulnerability in your project. For more information, see "Adding a security policy to your repository."
SUPPORT.md A SUPPORT file lets people know about ways to get help with your project. For more information, see "Adding support resources to your project."

New feature: hcs-dive requires a filter panel

What would you like to be added?

Add a filter panel plugin, users can customize the filter options and change the dashboard constant by changing the filter.

Why is this change required? What problem does it solve?

Grafana's variables can dynamically inject the values that users need into the dashboard panel, but the number of variables and the keywords retrieved cannot be customized by visitor users. We need a panel that allows users to define their own filter options and formats

Add debug pod federation API for querying multiple clusters

What would you like to be added?

Add debug pod API federation feature for querying multiple clusters.

Why is this change required? What problem does it solve?

This issue solve the problem that users may want to query multiple Kubernetes' data in one API call as there usually exist more than one clusters in an organization.

log panel with override config

What would you like to be added?

Create a grafana log panel, which can config field as data link.

Why is this change required? What problem does it solve?

Through this configuration, the field of the log can support datalinks

custom log panel scrollbar not controlled

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

The dashboard has added a custom log panel, and when scrolling to this panel, the dashboard will scroll uncontrollably to the top.

Expected behavior:

The scrollbar should not lose control

Additional

No response

Lunettes version

v0.1.4

Kubernetes version

No response

Rename cronjob from delete-es-job to purge-es-job

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

Rename the cronjob as delete is not a proper word

Expected behavior:

N/A

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Yaml panel‘s json format is incorrect

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

1.The json data displayed on yaml panel has wrong format;
2.Use The search command(ctrl/command+F)to check json, the obtained result will not include all matching content

Expected behavior:

1.The json data displayed on yaml panel should has correct format;
2.The obtained result should include all matching content

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Install failed through helm

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

The deployment failed through Helm, and several components cannot run properly.

Expected behavior:

Helm can deploy and run the corresponding components properly. For example, for components that require a password, a default password can be configured, and for components that do not have sufficient permissions, the corresponding rbac needs to be configured.

Additional

No response

Lunettes version

main

Kubernetes version

v1.21.3

Need to support different versions of k8s

What would you like to be added?

After testing, different versions of k8s node labels may differ,
BETA:kind v1.27

Why is this change required? What problem does it solve?

Filebeat cannot schedule pods on higher versions of k8s due to nodeaffinity's hard policy

Add debug workload feature

Now we can see what happened at each span in pod lifecycle. It will be great to extend the feature to workload, like replicaset and statefulset. Then we could also get a global view at workload level, which will help us to find which pod is slow or gets error, and why the whole workload is not healthy.

add badge and emoji to the readme file

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

Add badge and emoji to enrich the readme file, and welcome PRs from the community.

Expected behavior:

welcome PRs from the community.

Additional

No response

Lunettes version

No response

Kubernetes version

No response

change QUICK_START.md filename to quick_start.md

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

The filename QUICK_START.md is confusing to users, it'd be better to change it to lowercase letters.

Expected behavior:

change QUICK_START.md filename to quick_start.md

Additional

No response

Lunettes version

No response

Kubernetes version

No response

The plugin that volkovlabs-form-panel's version is inappropriate

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

The latest version of the volkovlabs-form-panel is 3.7.0, which is not suitable for the current project.

Expected behavior:

Set volkovlabs-form-panel's version to 3.1.0

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Add a panel to display yaml and json

What would you like to be added?

Create a grafana panel to display yaml and json, including fold code and download.

Why is this change required? What problem does it solve?

The lunettes hcs API podyaml page cannot be displayed

Add deploy documents on minikube

What would you like to be added?

provide relevant deployment documents

Why is this change required? What problem does it solve?

Some users use minikube deploy ,We provide relevant deployment documents

incorrect log printing

test

periodic and high-frequency behaviors are not suitable for printing INFO level logs. It may be considered to change to DEBUG level

position:pkg/config/config.go:100
test

Add grafana summary users' feedback

What would you like to be added?

Add user feedback on whether the pod summary results provided by Lunettes are satisfactory in Grafana, and store the feedback, summary, and corresponding pod information in the Elasticsearch backend, to help Lunettes contributors improve the quality of the summary.

Why is this change required? What problem does it solve?

Lunettes lacks an evaluation metric that can help developers measure the quality of summary results. Such a collection and storage ability for user feedback can better improve the accuracy and quality of summary.

modify PodCreate or PodDelete format in PodDelivery

What would you like to be added?

modify PodCreate or PodDelete format in PodDelivery

Why is this change required? What problem does it solve?

The current format display is not friendly,unable to clearly distinguish between content being created or deleted

grafana fails to install the specified version of plugin

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

In the current release github action, the grafana plugin uses a space as the separator between the plugin name and the version, so two parameters are passed in.

grafana-cli plugins install volkovlabs-form-panel:3.1.0

However, in the actual install script, the plugin and version are not separated, causing the install script to pass "pluginName version" as one parameter.

image

releated pipeline

Expected behavior:

corect install plugin

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Custom-log data exception

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

The data displayed on custom-log is inconsistent with the data returned from API

Expected behavior:

Custom-log‘s data should be consistent with API's data

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Support Lunettes namespace configuration

What would you like to be added?

Add ability to configure namespace of Lunettes.

Why is this change required? What problem does it solve?

Current Lunettes is only able to be running in Lunettes namespaces. This lacks flexibility and may have problem if the cluster is not in healthy state and some pod stuck in terminating state.

CreationResult Get wrong result

There's no existing/similar bug report.

  • I have searched the existing issues

Describe the bug:

Before modification,CreationResult from StartUpResultFromCreate,This is not correct,CreationResult should from
SLOViolationReason

Expected behavior:

update CreationResult from StartUpResultFromCreate to SLOViolationReason

Additional

No response

Lunettes version

No response

Kubernetes version

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.