GithubHelp home page GithubHelp logo

keikoproj / active-monitor Goto Github PK

View Code? Open in Web Editor NEW
172.0 18.0 31.0 401 KB

Provides deep monitoring and self-healing of Kubernetes clusters

License: Apache License 2.0

Dockerfile 1.65% Makefile 11.55% Go 86.80%
kubernetes argo-workflows metrics prometheus-metrics healthcheck kubernetes-cluster kubernetes-controller kubernetes-tools self-healing remedy

active-monitor's Introduction

Active-Monitor

Maintenance PR slack

Go Report Card Build Status Code Coverage Latest Version

Motivation

Active-Monitor is a Kubernetes custom resource controller which enables deep cluster monitoring and self-healing using Argo workflows.

While it is not too difficult to know that all entities in a cluster are running individually, it is often quite challenging to know that they can all coordinate with each other as required for successful cluster operation (network connectivity, volume access, etc).

Overview

Active-Monitor will create a new health namespace when installed in the cluster. Users can then create and submit HealthCheck object to the Kubernetes server. A HealthCheck / Remedy is essentially an instrumented wrapper around an Argo workflow.

The HealthCheck workflow is run periodically, as defined by repeatAfterSec or a schedule: cron property in its spec, and watched by the Active-Monitor controller.

Active-Monitor sets the status of the HealthCheck CR to indicate whether the monitoring check succeeded or failed. If in case the monitoring check failed then the Remedy workflow will execute to fix the issue. Status of Remedy will be updated in the CR. External systems can query these CRs and take appropriate action if they failed.

RemedyRunsLimit parameter allows to configure how many times a remedy should be run. If Remedy action fails for any reason it will stop on further retries. It is an optional parameter. If it is not set Remedyworkflow is triggered whenever HealthCheck workflow fails.

RemedyResetInterval parameter allows resetting remedy after the reset interval time and RemedyWorkflow can be retried again in case monitor workflow fails. If remedy reaches a RemedyRunsLimit it will be reset when HealthCheck passes in any subsequent run before RemedyResetInterval.

Typical examples of such workflows include tests for basic Kubernetes object creation/deletion, tests for cluster-wide services such as policy engines checks, authentication and authorization checks, etc.

The sort of HealthChecks one could run with Active-Monitor are:

  • verify namespace and deployment creation
  • verify AWS resources are using < 80% of their instance limits
  • verify kube-dns by running DNS lookups on the network
  • verify kube-dns by running DNS lookups on localhost
  • verify KIAM agent by running aws sts get-caller-identity on all available nodes
  • verify if pod max threads has reached
  • verify if storage volume for a pod (e.g: prometheus) has reached its capacity.
  • verify if critical pods e.g: calico, kube-dns/core-dns pods are in a failed or crashloopbackoff state

With the Cluster/Namespace level, healthchecks can be run in any namespace provided namespace is already created. The level in the HealthCheck spec defines at which level it runs; it can be either Namespace or Cluster.

When level is set to Namespace, Active-Monitor will create a ServiceAccount in the namespace as defined in the workflow spec, it will also create the Role and RoleBinding with namespace level permissions so that the HealthChecks in a namespace can be performed.

When the level is set to be Cluster the Active-Monitor will create a ServiceAccount in the namespace as defined in the workflow spec, it will also create the ClusterRole and ClusterRoleBinding with cluster level permissions so that the HealthChecks in a cluster scope can be performed.

Dependencies

Installation Guide

# step 0: ensure that all dependencies listed above are installed or present

# step 1: install argo workflow controller
kubectl apply -f https://raw.githubusercontent.com/keikoproj/active-monitor/master/deploy/deploy-argo.yaml

# step 2: install active-monitor CRD and start controller
kubectl apply -f https://raw.githubusercontent.com/keikoproj/active-monitor/master/config/crd/bases/activemonitor.keikoproj.io_healthchecks.yaml
kubectl apply -f https://raw.githubusercontent.com/keikoproj/active-monitor/master/deploy/deploy-active-monitor.yaml

Alternate Install - using locally cloned code

# step 0: ensure that all dependencies listed above are installed or present

# step 1: install argo workflow-controller
kubectl apply -f deploy/deploy-argo.yaml

# step 2: install active-monitor controller
make install
kubectl apply -f deploy/deploy-active-monitor.yaml

# step 3: run the controller via Makefile target
make run

Usage and Examples

Create a new healthcheck:

Example 1:

Create a new healthcheck with cluster level bindings to specified serviceaccount and in health namespace:

kubectl create -f https://raw.githubusercontent.com/keikoproj/active-monitor/master/examples/inlineHello.yaml

OR with local source code:

kubectl create -f examples/inlineHello.yaml

Then, list all healthchecks:

kubectl get healthcheck -n health OR kubectl get hc -n health

NAME                 LATEST STATUS   SUCCESS CNT     FAIL CNT    AGE
inline-hello-7nmzk   Succeeded        7               0          7m53s

View additional details/status of a healthcheck:

kubectl describe healthcheck inline-hello-zz5vm -n health

...
Status:
  Failed Count:              0
  Finished At:               2019-08-09T22:50:57Z
  Last Successful Workflow:  inline-hello-4mwxf
  Status:                    Succeeded
  Success Count:             13
Events:                      <none>

Example 2:

Create a new healthcheck with namespace level bindings to specified serviceaccount and in a specified namespace:

kubectl create ns test

kubectl create -f https://raw.githubusercontent.com/keikoproj/active-monitor/master/examples/inlineHello_ns.yaml

OR with local source code:

kubectl create -f examples/inlineHello_ns.yaml

Then, list all healthchecks:

kubectl get healthcheck -n test OR kubectl get hc -n test

NAME                 LATEST STATUS   SUCCESS CNT     FAIL CNT    AGE
inline-hello-zz5vm  Succeeded         7               0          7m53s

View additional details/status of a healthcheck:

kubectl describe healthcheck inline-hello-zz5vm -n test

...
Status:
  Failed Count:              0
  Finished At:               2019-08-09T22:50:57Z
  Last Successful Workflow:  inline-hello-4mwxf
  Status:                    Succeeded
  Success Count:             13
Events:                      <none>

argo list -n test

NAME                 STATUS      AGE   DURATION   PRIORITY
inline-hello-88rh2   Succeeded   29s   7s         0
inline-hello-xpsf5   Succeeded   1m    8s         0
inline-hello-z8llk   Succeeded   2m    7s         0

Generates Resources

  • activemonitor.keikoproj.io/v1alpha1/HealthCheck
  • argoproj.io/v1alpha1/Workflow

Sample HealthCheck CR:

apiVersion: activemonitor.keikoproj.io/v1alpha1
kind: HealthCheck
metadata:
  generateName: dns-healthcheck-
  namespace: health
spec:
  repeatAfterSec: 60
  description: "Monitor pod dns connections"
  workflow:
    generateName: dns-workflow-
    resource:
      namespace: health
      serviceAccount: activemonitor-controller-sa
      source:
        inline: |
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            spec:
              ttlSecondsAfterFinished: 60
              entrypoint: start
              templates:
              - name: start
                retryStrategy:
                  limit: 3
                container: 
                  image: tutum/dnsutils
                  command: [sh, -c]
                  args: ["nslookup www.google.com"]

Sample RemedyWorkflow CR:

apiVersion: activemonitor.keikoproj.io/v1alpha1
kind: HealthCheck
metadata:
  generateName: fail-healthcheck-
  namespace: health
spec:
  repeatAfterSec: 60 # duration in seconds
  level: cluster
  workflow:
    generateName: fail-workflow-
    resource:
      namespace: health # workflow will be submitted in this ns
      serviceAccount: activemonitor-healthcheck-sa # workflow will be submitted using this
      source:
        inline: |
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            metadata:
              labels:
                workflows.argoproj.io/controller-instanceid: activemonitor-workflows
            spec:
              ttlSecondsAfterFinished: 60
              entrypoint: start
              templates:
              - name: start
                retryStrategy:
                  limit: 1
                container: 
                  image: ravihari/ctrmemory:v2
                  command: ["python"]
                  args: ["promanalysis.py", "http://prometheus.system.svc.cluster.local:9090", "health", "memory-demo", "memory-demo-ctr", "95"]
  remedyworkflow:
    generateName: remedy-test-
    resource:
      namespace: health # workflow will be submitted in this ns
      serviceAccount: activemonitor-remedy-sa # workflow will be submitted using this acct
      source:
        inline: |
          apiVersion: argoproj.io/v1alpha1
          kind: Workflow
          spec:
            ttlSecondsAfterFinished: 60
            entrypoint: kubectl
            templates:
              -
                container:
                  args: ["kubectl delete po/memory-demo"]
                  command: ["/bin/bash", "-c"]
                  image: "ravihari/kubectl:v1"
                name: kubectl

Active-Monitor Architecture

Access Workflows on Argo UI

kubectl -n health port-forward deployment/argo-ui 8001:8001

Then visit: http://127.0.0.1:8001

Prometheus Metrics

Active-Monitor controller also exports metrics in Prometheus format which can be further used for notifications and alerting.

Prometheus metrics are available on :8080/metrics

kubectl -n health port-forward deployment/activemonitor-controller 8080:8080

Then visit: http://localhost:8080/metrics

Active-Monitor, by default, exports following Promethus metrics:

  • healthcheck_success_count - The total number of successful healthcheck resources
  • healthcheck_error_count - The total number of erred healthcheck resources
  • healthcheck_runtime_seconds - Time taken for the healthcheck's workflow to complete

Active-Monitor also supports custom metrics. For this to work, your workflow should export a global parameter. The parameter will be programmatically available in the completed workflow object under: workflow.status.outputs.parameters.

The global output parameters should look like below:

"{\"metrics\":
  [
    {\"name\": \"custom_total\", \"value\": 123, \"metrictype\": \"gauge\", \"help\": \"custom total\"},
    {\"name\": \"custom_metric\", \"value\": 12.3, \"metrictype\": \"gauge\", \"help\": \"custom metric\"}
  ]
}"

❤ Contributing ❤

Please see CONTRIBUTING.md.

To add a new example of a healthcheck and/or workflow:

Release Process

Please see RELEASE.

License

The Apache 2 license is used in this project. Details can be found in the LICENSE file.

Other Keiko Projects

Instance Manager - Kube Forensics - Addon Manager - Upgrade Manager - Minion Manager - Governor

active-monitor's People

Contributors

awwwd avatar ccfish2 avatar ccpeng avatar davemasselink avatar dependabot[bot] avatar dheerajgupta217 avatar grayudu avatar psaia avatar ravihari avatar rkilingr avatar sasagarw avatar shrinandj avatar susritha avatar tekenstam avatar thbishop-intuit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

active-monitor's Issues

Contribute more advanced real-world example workflow

Is your feature request related to a problem? Please describe.
The existing example workflows are all quite simple. They don't well represent a real-world workflow.

Describe the solution you'd like
There should be a new workflow example in the examples/ directory which carries out the following steps:

  • create namespace
  • create deployment
  • delete deployment
  • delete namespace

At this point, no record of the deployment nor namespace should exist

Update healthcheck spec and controller to support automatic "remediation"

Currently, Healthcheck custom resource and its child argo workflow can "detect" if there is a problem.

However, there is no place to express what action to take in case of a problem.

Therefore, healthcheck spec should support an alternative argo workflow for "remediation". If the main workflow fails, the "remediation" workflow should be run. Also, additional remediation metrics should be captured and exposed accordingly.


Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Extend support for cluster/namespace scoping of healthcheck

Is your feature request related to a problem? Please describe.
This project had previously supported healthchecks being defined either as cluster or namespace scoping.

Describe the solution you'd like
This feature needs to be confirmed to still work. It may also require some design work to ensure it's still a necessary feature.

Update Argo controller version

Is your feature request related to a problem? Please describe.
Update Argo Controller version to use latest features.

Describe the solution you'd like
Use latest Argo Controller version to leverage latest features.

Have you thought about contributing yourself?
Yes

Active-Monitor should process custom resources in parallel

Is your feature request related to a problem? Please describe.
Active-Monitor should process custom resources in parallel.

Describe the solution you'd like
Kubebuilder supports MaxParallel option to run multiple go routines. We should pass this maxparallel option into the reconciler to process multiple CR's.

Have you thought about contributing yourself?
Yes.. I will add this feature.

Update Default TTL Strategy to secondsAfterCompletion

Is your feature request related to a problem? Please describe.
Argoworkflow 3.x doesnot support ttlSecondsAfterFinished anymore.
It needs to be replaced with ttlStrategy: secondsAfterCompletion

Describe the solution you'd like
Update the default ttlstrategy here:

if ttlSecondAfterFinished := data["spec"].(map[string]interface{})["ttlSecondsAfterFinished"]; ttlSecondAfterFinished == nil {

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Limit number of times the Self-Healing/Remedy should be run

Is your feature request related to a problem? Please describe.
HealthCheck Custom Resource should have an ability to limit the number of times the remedy/self-healing should be run in a given interval of time.

Describe the solution you'd like
HealthCheck Custom Resource should provide parameters to limit the number of times the remedy should be run in a given interval. This would be helpful to avoid a continuous loop of running health check and remedy in case when remedy action given does not work.

Have you thought about contributing yourself?

Yes. I will implement it.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

BUG: Workflow not recreated on next run

Describe the bug
Workflow can be failed for some reason like scheduling and did not run but state is not changed in HealthCheck to reflect this error

To Reproduce
If we cannot reproduce, we cannot fix! Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Version

Paste any relevant version print outputs here.

Logs

Paste any relevant application logs here.

Have you thought about contributing a fix yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Race condition possible when deleting HealthCheck

Describe the bug
Rarely, an error will occur if a HealthCheck resource is deleted while it has a corresponding child workflow currently running.

To Reproduce

  1. create a healthcheck (kubectl create -f examples/inlineHello.yaml)
  2. while a corresponding workflow is running, delete the healthcheck (kubectl delete healthcheck inline-hello-abc01)
  3. infrequently, but eventually, this will cause an error seen by the error condition log message indicated below.

Expected behavior
Regardless of WHEN a healthcheck is deleted, it and any corresponding workflow resources should be successfully cleaned up. The repeating workflow executions should also stop at this time.

Error Condition

2019-08-09T15:55:18.479-0700    ERROR   controllers.HealthCheck Error updating healthcheck resource     {"HealthCheck": "health/url-hello-dkkxt", "error": "Operation cannot be fulfilled on healthcheck.activemonitor.orkaproj.io \"url-hello-dkkxt\": StorageError: invalid object, Code: 4, Key: /registry/activemonitor.orkaproj.io/healthcheck/health/url-hello-dkkxt, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a7c1b9b6-3e8f-4200-b41a-d81f53447ba2, UID in object meta: "}
github.com/go-logr/zapr.(*zapLogger).Error
        /Users/dmasselink/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/orkaproj/active-monitor/controllers.(*HealthCheckReconciler).watchWorkflowReschedule
        /Users/dmasselink/go/src/github.com/orkaproj/active-monitor/controllers/healthcheck_controller.go:229
github.com/orkaproj/active-monitor/controllers.(*HealthCheckReconciler).createSubmitWorkflowHelper.func1
        /Users/dmasselink/go/src/github.com/orkaproj/active-monitor/controllers/healthcheck_controller.go:147

Version
0.1.0

ServiceAcct not correctly applied to workflow

Describe the bug
When a serviceAccount is passed to the HealthCheck in spec, this same account should be applied to the argo workflows which the controller is creating. In this way, it will be passed down to the running pod where the account will actually be used

To Reproduce
If we cannot reproduce, we cannot fix! Steps to reproduce the behavior:

  1. Craft a healthcheck which uses the serviceAccount property.
  2. Submit healthcheck via kubectl create -f <filename>
  3. Inspect the workflow with argo get <workflowName>
  4. See that the default account was applied regardless of configured serviceAccount

Expected behavior
Configured serviceAccount should be set on workflow

Version
all

Release 0.6.0 - Active-Monitor

Release issue, predominantly for visibility purposes.

  • Move to Github Actions for CI enhancement - #81
  • Active Monitor crashing with concurrent map updates - #98
  • Update Default TTL Strategy to secondsAfterCompletion - #99
  • Update Argo controller version - #80

Reduce code complexity

metrics\collector.go

Screen Shot 2019-08-12 at 1 47 45 PM

We can remove to if layer to

if workflowStatus.Outputs == nil {
  return
}
if workflowStatus.Outputs.Parameters == nil {
  return
}
for ...
   for

Update AWS limit/quota check healthcheck(s) considering recent AWS updates

https://aws.amazon.com/blogs/compute/preview-vcpu-based-instance-limits/

AWS has recently modified how it handles limits/quotes. Rather than limit on the number of EC2 instances or other direct resources, limits are now defined based on vCPU quotas.

This means that healthcheck(s) using AWS cli/APIs to determine usage vs. limits, may benefit from an update in order to use newer quota mechanisms which will be supported further into the future.

Design/wireframe UI/dashboard for Active-Monitor (and all KeikoProj components)

The Keiko project components are awesome... but it's sometimes hard for people (who aren't deeply familiar with k8s) to understand how awesome they are. However, once the Keiko components have a user interface or dashboard, they will be immediately understandable/accessible to a much wider audience.

The aim of this ticket is to design one or more views for Active-Monitor data in a web interface.

Each component will likely have a similar task to design their respective views.

One possibility is build plug-ins for the Octant UI project, sponsored by VMWare - https://github.com/vmware-tanzu/octant

Active Monitor crashing with concurrent map updates

Describe the bug
Active-Monitor is crashing intermittently with this error:

fatal error: concurrent map writes
goroutine 105843 [running]:
runtime.throw(0x1633f60, 0x15)
/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc00056baa8 sp=0xc00056ba78 pc=0x436532
runtime.mapassign_faststr(0x1488fa0, 0xc00048a1b0, 0xc0006e2ee0, 0x1b, 0xc0008362a0)
/usr/local/go/src/runtime/map_faststr.go:291 +0x3d8 fp=0xc00056bb10 sp=0xc00056baa8 pc=0x414538
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).watchWorkflowReschedule(0xc0002530e0, 0x17e9da0, 0xc000126200, 0x0, 0x0, 0x0, 0x0, 0x17f1d20, 0xc0008362a0, 0xc000681d80, ...)
/workspace/controllers/healthcheck_controller.go:676 +0x12cf fp=0xc00056bf00 sp=0xc00056bb10 pc=0x132f2ef
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).createSubmitWorkflowHelper.func1()
/workspace/controllers/healthcheck_controller.go:459 +0x1b5 fp=0xc00056bfe0 sp=0xc00056bf00 pc=0x13394d5
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00056bfe8 sp=0xc00056bfe0 pc=0x46b941
created by time.goFunc
/usr/local/go/src/time/sleep.go:167 +0x45
goroutine 1 [select, 1658 minutes]:
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start(0xc000330a80, 0xc0000fa9c0, 0x0, 0x0)
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:433 +0x1dd
main.main()
/workspace/main.go:84 +0x59c

Clean-up example/sample workflows

Is your feature request related to a problem? Please describe.
Currently, the project contains both an examples/ as well as a sample-workflows/ directory. The directory names and corresponding documentation references don't really make it clear why both directories exist and when a new workflow should be added to one or the other.

Describe the solution you'd like
Instead, it would likely be best to consolidate all example or sample workflows into a single directory and ensure that all documentation references are up-to-date. Further, the README should be extended to explain what a contributor wanting to add a new example workflow aught to do and where it should be added.

Exponentially reduce Kubernetes API calls

Is your feature request related to a problem? Please describe.
In Active-monitor the status of the workflow is polled every second which results in too many kubernetes api calls. As the number of monitors or self-healing use cases increases, in a managed kubernetes clusters this can result in additional cost, in non-managed kubernetes clusters the master nodes might have to be scaled up to accomodate these api calls.

Describe the solution you'd like
This can be solved by leveraging https://github.com/keikoproj/inverse-exp-backoff library. There is an API in this library where we can use inverse exponential backoff with timeout. This will allow us to reduce the API calls made to the kubernetes API server exponentially.

Have you thought about contributing yourself?

Yes I will implement it.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Addl metric indicating start and end times of latest run for each healthcheck

Is your feature request related to a problem? Please describe.
first discussed in this slack thread: https://intuit-teams.slack.com/archives/GBLA5J9DH/p1579889119031100

Describe the solution you'd like
Controller should expose the start and end times of latest run for each healthcheck as a metric. This would assist in cluster issue debugging/diagnosis since it will be more obvious when work/traffic/etc. is happening due to a healthcheck rather than organic work/traffic.

Currently this can be determined only by looking at the healthcheck status and, even then, only completion times are tracked

Yet to discuss
Are timestamps the best piece of data to track? Otherwise would an "ongoing" boolean and "lastRunDuration" float be easier to make sense of?

After upgrade argo BDD fails with errors

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

[2022-03-11T20:08:20.710Z] [2022-03-11T20:08:20Z] �[36mINFO�[0m - skipping instancegroup since addon active-monitor is missing file instancegroup.yaml
[2022-03-11T20:08:21.274Z] error: error validating "/tmp/kubectl_manifest775841473": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "additionalPrinterColumns" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
[2022-03-11T20:08:21.274Z] [2022-03-11T20:08:21Z] �[33mWARN�[0m - cmd.Run() failed for cmd kubectl apply --filename /tmp/kubectl_manifest775841473 --context arktika-bdd-data-usw2 with exit status 1

Describe the solution you'd like
A clear and concise description of what you want to happen.

Have you thought about contributing yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Improve reliability of active-monitor-controller

Is your feature request related to a problem? Please describe
Though it's generally rare in testing so far, there are cases in which the active-monitor-controller can crash and be forced to restart. We may be losing information about the cause of the crash unless the condition is noticed and logged, in some way.

Describe the solution you'd like
We should follow the panic/recover pattern found in other, similar projects. ex: https://github.com/keikoproj/addon-manager/pull/28/files

Related Reading
https://blog.golang.org/defer-panic-and-recover

Improve usage of failed healthcheck error message

Is your feature request related to a problem? Please describe.
If a healthcheck's underlying workflow fails, a helpful error message as to why this occurred should be set as the healthcheck's error message in its status object. Currently, this error message is always hard-coded to a message indicating that the workflow couldn't start. This isn't always correct and therefore could be confusing to someone trying to deduce the reasons of a healthcheck/workflow failure.

Describe the solution you'd like
Instead, a relevant and meaningful description of an underlying reason should be set into this property in the status resource.

Release 0.5.2 - Active-Monitor

Release issue, predominantly for visibility purposes.

DRAFT CHANGELOG:
Bug Fix:
#88 - Active Monitor crashing with concurrent map updates

Workflow creation ignores metadata information in the healthcheck spec

Describe the bug
The active monitor controller ignores the metadata information while submitting the monitoring workflow. The metadata information has the controller instanceID details which are needed for the workflow controller to pick and execute the workflow.
The controller should pick and parse the metadata information and should include it while the workflow is submitted.

To Reproduce
If we cannot reproduce, we cannot fix! Steps to reproduce the behavior:
If the workflow controller is started with specific InstanceID details. The workflow controller will not pick the active monitor workflow for execution. The WF stays in the pending status.

Expected behavior
Start the workflow controller on a specific InstanceID and make the changes in the active monitor controller to parse the metadata information while submitting the workflow.

Screenshots
If applicable, add screenshots to help explain your problem.

Version

Paste any relevant version print outputs here.

Logs

Paste any relevant application logs here.

Have you thought about contributing a fix yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Allow for success/failure counts to be reset at user request or periodically

Currently, the success and failure counts associated with each healthcheck will start at 0 when first registered and monotonically increase over the life of the healthcheck (or active-monitor controller).

This works alright, however, one downside of such an approach is that you need to know how long the healthcheck has been running in order to have any context for whether the counts are large or small.

Therefore, the aim of this ticket is to provide:

  • a mechanism for a user to "reset" the counts so that they will better understand what a non-0 failure count means, for instance OR
  • a new property in the spec which allows for the user to indicate how frequently the counts should be reset to 0 (this would allow for alerting if the failure count surpassed N within that time period) OR
  • an addl status property which would represent a moving average rate of success/fail over some unit time (ex: 4 successes/hr, 2 failures/day)

Open questions:

  • which approach to pursue
  • what unit to use as the default time duration, ex: hr, day (if 3rd approach above is pursued)
  • are there any better strategies which allow for counts to reliably be used in alerting logic

Update README and feature/bug template with correct Orka project Slack links

Is your feature request related to a problem? Please describe.
Currently, links to Slack in the README and issue templates either are broken links OR actually direct to argoproj workspace.

Describe the solution you'd like
Once a new slack workspace is setup for orkaproj, update the assets in this repo to match.

Active Monitor crashing with concurrent map updates

Describe the bug
Active-Monitor crashin with the following error:

fatal error: concurrent map read and map write

goroutine 5518 [running]:
runtime.throw(0x2245810, 0x21)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc0006e8ff0 sp=0xc0006e8fc0 pc=0x1037c12
runtime.mapaccess2(0x208cf80, 0xc00088ee70, 0xc00082edf0, 0xc00082edf0, 0xc00036f802)
        /usr/local/go/src/runtime/map.go:469 +0x25b fp=0xc0006e9030 sp=0xc0006e8ff0 pc=0x101179b
reflect.mapaccess(0x208cf80, 0xc00088ee70, 0xc00082edf0, 0x2238f1d)
        /usr/local/go/src/runtime/map.go:1309 +0x3f fp=0xc0006e9068 sp=0xc0006e9030 pc=0x1066aff
reflect.Value.MapIndex(0x208cf80, 0xc00088ee70, 0x15, 0x2037380, 0xc00082edf0, 0x98, 0x21f8c40, 0x208cf80, 0x208cf80)
        /usr/local/go/src/reflect/value.go:1188 +0x16e fp=0xc0006e90e0 sp=0xc0006e9068 pc=0x109cbee
encoding/json.mapEncoder.encode(0x22cd1b0, 0xc000650080, 0x208cf80, 0xc00088ee70, 0x15, 0x2080000)
        /usr/local/go/src/encoding/json/encode.go:801 +0x30d fp=0xc0006e9258 sp=0xc0006e90e0 pc=0x111896d
encoding/json.mapEncoder.encode-fm(0xc000650080, 0x208cf80, 0xc00088ee70, 0x15, 0x2cb0000)
        /usr/local/go/src/encoding/json/encode.go:777 +0x65 fp=0xc0006e9298 sp=0xc0006e9258 pc=0x1124e65
encoding/json.(*encodeState).reflectValue(0xc000650080, 0x208cf80, 0xc00088ee70, 0x15, 0xc0006e0000)
        /usr/local/go/src/encoding/json/encode.go:358 +0x82 fp=0xc0006e92d0 sp=0xc0006e9298 pc=0x1115b02
encoding/json.(*encodeState).marshal(0xc000650080, 0x208cf80, 0xc00088ee70, 0x1f50000, 0x0, 0x0)
        /usr/local/go/src/encoding/json/encode.go:330 +0xf4 fp=0xc0006e9330 sp=0xc0006e92d0 pc=0x11156f4
encoding/json.(*Encoder).Encode(0xc00061ec80, 0x208cf80, 0xc00088ee70, 0x30, 0x30)
        /usr/local/go/src/encoding/json/stream.go:206 +0x8b fp=0xc0006e93c0 sp=0xc0006e9330 pc=0x112294b
go.uber.org/zap/zapcore.(*jsonEncoder).AddReflected(0xc0003d2e10, 0x222b748, 0x8, 0x208cf80, 0xc00088ee70, 0xc0006e94b0, 0x1085ff4)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/json_encoder.go:150 +0x65 fp=0xc0006e9440 sp=0xc0006e93c0 pc=0x1f5f385
go.uber.org/zap/zapcore.Field.AddTo(0x222b748, 0x8, 0x16, 0x0, 0x0, 0x0, 0x208cf80, 0xc00088ee70, 0x240ac20, 0xc0003d2e10)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/field.go:159 +0xb16 fp=0xc0006e9518 sp=0xc0006e9440 pc=0x1f5e3d6
go.uber.org/zap/zapcore.addFields(0x240ac20, 0xc0003d2e10, 0xc00084c800, 0x1, 0x1)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/field.go:199 +0xcf fp=0xc0006e95c0 sp=0xc0006e9518 pc=0x1f5eaaf
go.uber.org/zap/zapcore.consoleEncoder.writeContext(0xc000260000, 0xc00023bc00, 0xc00084c800, 0x1, 0x1)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/console_encoder.go:131 +0xcb fp=0xc0006e9660 sp=0xc0006e95c0 pc=0x1f5a88b
go.uber.org/zap/zapcore.consoleEncoder.EncodeEntry(0xc000260000, 0x0, 0xc027abae3294edc8, 0x5acbf2bbb7f, 0x2cb8ce0, 0xc000041680, 0x17, 0x223ac34, 0x18, 0x0, ...)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/console_encoder.go:110 +0x3df fp=0xc0006e9718 sp=0xc0006e9660 pc=0x1f5a23f
sigs.k8s.io/controller-runtime/pkg/log/zap.(*KubeAwareEncoder).EncodeEntry(0xc000478140, 0x0, 0xc027abae3294edc8, 0x5acbf2bbb7f, 0x2cb8ce0, 0xc000041680, 0x17, 0x223ac34, 0x18, 0x0, ...)
        /Users/rhari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/log/zap/kube_helpers.go:126 +0x175 fp=0xc0006e9930 sp=0xc0006e9718 pc=0x1f78c95
go.uber.org/zap/zapcore.(*ioCore).Write(0xc000260060, 0x0, 0xc027abae3294edc8, 0x5acbf2bbb7f, 0x2cb8ce0, 0xc000041680, 0x17, 0x223ac34, 0x18, 0x0, ...)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/core.go:86 +0xa9 fp=0xc0006e9a08 sp=0xc0006e9930 pc=0x1f5b0c9
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00016e6e0, 0xc00084c800, 0x1, 0x1)
        /Users/rhari/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:215 +0x12d fp=0xc0006e9ba8 sp=0xc0006e9a08 pc=0x1f5cb4d
github.com/go-logr/zapr.(*infoLogger).Info(0xc0004781c8, 0x223ac34, 0x18, 0xc0004c43e0, 0x2, 0x2)
        /Users/rhari/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:70 +0xdd fp=0xc0006e9c10 sp=0xc0006e9ba8 pc=0x1f776fd
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).parseWorkflowFromHealthcheck(0xc00073c780, 0x23f72a0, 0xc0004781c0, 0xc000154a00, 0xc0005262a0, 0x0, 0x0)
        /Users/rhari/go/src/github.com/keikoproj/active-monitor/controllers/healthcheck_controller.go:851 +0x5d0 fp=0xc0006e9de0 sp=0xc0006e9c10 pc=0x1f36010
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).createSubmitWorkflow(0xc00073c780, 0x23ef140, 0xc0001341f8, 0x23f72a0, 0xc0004781c0, 0xc000154a00, 0x0, 0xc000b00000, 0xc000001e00, 0x0)
        /Users/rhari/go/src/github.com/keikoproj/active-monitor/controllers/healthcheck_controller.go:471 +0x8c fp=0xc0006e9f00 sp=0xc0006e9de0 pc=0x1f2f30c
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).createSubmitWorkflowHelper.func1()
        /Users/rhari/go/src/github.com/keikoproj/active-monitor/controllers/healthcheck_controller.go:456 +0x115 fp=0xc0006e9fe0 sp=0xc0006e9f00 pc=0x1f3b535
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0006e9fe8 sp=0xc0006e9fe0 pc=0x106d0a1
created by time.goFunc
        /usr/local/go/src/time/sleep.go:167 +0x45

To Reproduce
Run Multiple workflows in parallel.

Expected behavior
Active-Monitor continue to work without issues.

Version
latest changes.

Upgrade argo to v3.2.6 as well as its dependencies

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Have you thought about contributing yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

kubectl describe healthcheck does not get expected health status

kubectl describe healthcheck does not get expected health status

**Step to reproduce **

step 1: install argo workflow-controller

kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/deploy/deploy-argo.yaml

step 2: install active-monitor controller

kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/config/crd/bases/activemonitor.orkaproj.io_healthchecks.yaml
kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/deploy/deploy-active-monitor.yaml
#step 3
make run
#step 4
kubectl create -f examples/inlineHello.yaml
#step 5
kubectl get healthcheck -n health

NAME AGE
inline-hello-fbgj9 5m

#step 6
kubectl describe healthcheck inline-hello-fbgj9 -n health
get result without readme suggested
...
Status:
Failed Count: 0
Finished At: timestampe
Last Successful Workflow: inline-hello-fbgj9
Status: Succeeded
Success Count:
Events:

Release 0.5.0 - Active-Monitor

Release issue, predominantly for visibility purposes.

DRAFT CHANGELOG:

#64 - Exponentially reduce Kubernetes API calls.
#65 - Limit number of times the Self-Healing/Remedy should be run
#67 - Enable default PodGC strategy as OnPodCompletion in workflow
#70 - Add Events to Acive-Monitor Custom Resources

Update go version to 1.18

Is your feature request related to a problem? Please describe.
Earlier go version produces breaking changes. The workflow is running 1.18 whereas repo is using 1.15, this creates breaking run of workflow because 1.18 uses go install instead of go get. Even kubebuilder is breaking since we should setup test env using make targets.

Describe the solution you'd like
Update go to 1.18 and use make targets for setting up test env.

Have you thought about contributing yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Issue in Getting Started with Active-monitor

Hi Team,

I am new to Go, Kubernetes and exploring the Kubernetes monitoring tool. I came across the active-monitor tool. I am facing few issues while getting started with this tool. Any help in this regard, will be highly appreciated. The details are as under:

Versions:
OS: Linux 5.11.0-25-generic, 20.04.1-Ubuntu
Go: go1.13.8 linux/amd64
Kubectl client: v1.22.0
Kubectl Server: v1.21.2
minikube: v1.22.0
argo: v3.0.10
active-monitor: 0.6.0

Also tried with Kubectl client version:v1.19.0 and server version:v1.20.0 but still the same warnings and errors.

Issue:
While following the step 2 for both type of installation, a warning is raised regarding the CRD versions. The screenshot is attached below:
err1
err2

While running the main.go file, the healthcheck starts but it produces error for some go files. The error screenshot is below:
Err 3

Please let me know how can I proceed further to run active-monitor.

Build unit tests around healthcheck_controller

Is your feature request related to a problem? Please describe.
Currently there are no unit tests for the bulk of the logic involved in this project, the controller.

Describe the solution you'd like
Build out tests to get controller coverage > 66%. This may be tricky since it isn't always straightforward to mock out kube-api related interactions.

active-monitor running workflows more frequently than the configuration

Describe the bug
Active-Monitor workflows are run continuously if there are errors in updating Custom Resources with a storage error or api server being busy etc.,
The timers then are not stopped causing leaks and a number of workflow pods getting created.

Expected behavior
The CR update if failed should the timers should be stopped and reqeued.

Logs

2021-02-22T13:57:55.825Z	ERROR	controllers.HealthCheck	Error updating healthcheck resource	{"HealthCheck": "monitoring/dns-healthcheck", "error": "Operation cannot be fulfilled on healthchecks.activemonitor.keikoproj.io \"dns-healthcheck\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).watchWorkflowReschedule
	/workspace/controllers/healthcheck_controller.go:525
github.com/keikoproj/active-monitor/controllers.(*HealthCheckReconciler).createSubmitWorkflowHelper.func1
	/workspace/controllers/healthcheck_controller.go:391

2021-02-22T14:58:59.848Z	ERROR	controllers.HealthCheck	Error updating healthcheck resource	{"HealthCheck": "monitoring/dns-healthcheck", "error": "Operation cannot be fulfilled on healthchecks.activemonitor.keikoproj.io \"dns-healthcheck\": StorageError: invalid object, Code: 4, Key: /registry/activemonitor.keikoproj.io/healthchecks/monitoring/dns-healthcheck, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: xxxx, UID in object meta: "}```

Enhance documentation to simplify cluster installation

Is your feature request related to a problem? Please describe.
Currently, the documentation suggests that users either directly build the project OR build a docker image and use that.

Describe the solution you'd like
Update documentation to highlight an installation track which requires nothing more than applying some yaml to a cluster with kubectl

Add Events to Acive-Monitor Custom Resources

Is your feature request related to a problem? Please describe.
Active-Monitor Custom Resources should have events displayed. It gives the state of Each CR and its progress.

Describe the solution you'd like
We need to add events to custom resources for Active-Monitor

Have you thought about contributing yourself?
Yes

Release 0.5.1 - Active-Monitor

Release issue, predominantly for visibility purposes.

DRAFT CHANGELOG:

Bug Fix:
#82 - active-monitor running workflows more frequently than the configuration.

Move to Github Actions for CI

Based on the message at: https://travis-ci.org/

Please be aware travis-ci.org will be shutting down in several weeks, with all accounts migrating to travis-ci.com. Please stay tuned here for more information.

We need to migrate cI pipeline to Github Actions.

make serviceAccount attribute optional for resource

Describe the bug
Currently the Workflow's resource.serviceAccount attribute is required. However, it isn't actually necessary for all healthchecks/workflows. So, this attribute should be re-configured to be optional.

To Reproduce
If we cannot reproduce, we cannot fix! Steps to reproduce the behavior:

  1. Create a healthcheck with an in-line workflow, leave out the resource.serviceAccount attribute
  2. Submit (create/apply) this healthcheck
  3. See error

Extend support for custom metrics

Is your feature request related to a problem? Please describe.
After this project was converted to kubebuilder style, custom metrics haven't been tested and confirmed to work.

Describe the solution you'd like
The aim of this ticket is to confirm this and update README accordingly

Support cron-like expressions as an alternative to repeatAfterSec param

Currently, active-monitor repeats the workflow submissions based on the repeatAfterSec spec parameter.

However, it would be more flexible if we also allowed users to specify how often they want the workflow to run using a cron-like expression.

@pzou1974 had the great suggestion to look at this library to help with parsing cron expressions: https://godoc.org/gopkg.in/robfig/cron.v2


If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Consolidate metrics end points

Don't begin this work until we can confirm we want to make this change. This will likely require a design decision, the outcome of which should be attached to this ticket as a comment.

Is your feature request related to a problem? Please describe.
After this project was re-worked using kubebuilder 2.0, a prometheus style metrics server was exposed by default. This server provides data such as underlying golang cpu/memory/networking/gc details for the Active-Monitor controller. http://0.0.0.0:8080/metrics

Our application also exposes its own metrics server running on a separate port and end-pt. This server is meant to communicate details of healthcheck operation and to be consumed by entities which may need to take a remediating action. http://0.0.0.0:2112/metrics

Describe the solution you'd like
An open question is: should these 2 servers which are doing pretty much the same thing (though with respect to different data sets) be combined some way?

Should there be just a single port/path combo where ALL metrics (whether built-in or health check oriented) could be exposed?

If not a single port/path combo, how about the same port but 1 path for healthcheck metrics and another for internal metrics? ex: 0.0.0.0:8080/metrics and 0.0.0.0:8080/internal or similar

Add metrics to the status

Is your feature request related to a problem? Please describe.
Add:

  • Total
  • LastSuccess
  • LastFailure

Describe the solution you'd like
A clear and concise description of what you want to happen.

Have you thought about contributing yourself?

Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Document release process

The project doesn't yet have a defined and documented release process. This task's aim is to document that in the README and get a 1.0.0 release pushed to dockerhub.

Document how to recognize and correct for false-positive and false-negatives

The concept of a health check succeeding or failing is related to the final return value from the nested/imported Argo workflow.

This isn't always incredibly obvious and can lead to scenarios where the workflow doesn't behave as expected yet is still marked as succeeded. Similarly, even if the workflow behaves as expected, it may indicate a failure if a non-0 return code is used.

README documentation should be improved to highlight this and provide users with patterns/strategies to ensure that healthchecks are behaving as expected and building confidence in the usage of Active-Monitor.

Add healthcheck status info to kubectl get response

Is your feature request related to a problem? Please describe
The problem is that the latest status as well as success/fail counts related to a healthcheck can only be seen when it is kubectl describe ...d.

Describe the solution you'd like
Instead, at least three columns should be added to the custom printer for healthcheck objects. Those are: status, success count, failure count.

This should be straightforward to accomplish using kubebuilder annotations regarding additional-printer-columns

Current example:

NAME      AGE
foo       1h
bar       1h

Target example:

NAME        LATEST STATUS        SUCCESS CNT        FAIL CNT        AGE
foo         Succeeded            4                  0               1h
bar         Failed               1                  3               1h

Related Reading
https://book.kubebuilder.io/reference/generating-crd.html#additional-printer-columns

https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#additional-printer-columns

Enable default PodGC strategy as OnPodCompletion in workflow

Is your feature request related to a problem? Please describe.
As the use cases for Monitor/Remedy grow the number of pods created can grow more. If the ttl for pod deletion is not aggressive there can many pods in completed state which can still consume resources such as IP's (in EKS) etc.,

Describe the solution you'd like
Enable default PodGC strategy as OnPodCompletion in Active-Monitor workflow. https://argoproj.github.io/argo/fields/#podgc.
This will help cleanup the pods immediately after execution. The status of the pod execution is updated in argo workflow. As the Active-Monitor controller reads the status from argo workflow we donot need the pod itself once it is executed. This will save resources.

Have you thought about contributing yourself?

Yes.

Release 0.4.0 - Active-Monitor

Release issue, predominantly for visibility purposes.

DRAFT CHANGELOG:

  • #47 - Controller-gen and kube-builder updates.
  • #7 - Consolidate Metric Endpoints for Active-Monitor
  • #49 - Update Status fields to include Total Healthcheck count
  • #23 - Update healthcheck spec and controller to support automatic "remediation"
  • #55 - Workflow creation ignores metadata information in the healthcheck spec
  • #54 - Active-Monitor should process custom resources in parallel

Enable/Disable flag in HealthCheckSpec

Is your feature request related to a problem? Please describe.

Having an Enable/Disable flag in HealthCheckSpec will provide the flexibility to stop monitoring on individual a given cluster. Sometime we might encounter a problematic situation on a cluster with this flag in place we can instruct the controller not to process the health check on a given cluster until the problem is addressed.

  • Noise alerts can be addressed instantly
  • Cluster upgrade operation can be avoided for just to disable a HealthCheck

Describe the solution you'd like
Under the HealthCheckSpec struct, we should add a field called EnableHealthCheck set to true by default and if set to false. The controller reconciler shouldn't process the HealthCheck. This field should be read dynamically.

We will have to handle this under the process workflow method.

Have you thought about contributing yourself?
Yes I would like to work on this solution.
Open Source software thrives with your contribution. It not only gives skills you might not be able to get in your day job, it also looks amazing on your resume.

If you want to get involved, check out the
contributing guide, then reach out to us on Slack so we can see how to get you started.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.