Sumo Logic collection solution for Kubernetes

License: Apache License 2.0

Shell 11.19% HCL 0.43% Makefile 4.27% Python 9.78% Smarty 16.82% Go 56.84% Jinja 0.23% Nix 0.44%

sumologic-kubernetes-collection's Introduction

sumologic-kubernetes-collection

This repo contains all the necessary resources to collect observability data from Kubernetes clusters and send it to Sumo Logic. Sumo Logic leverages CNCF supported technology including OpenTelemetry, Prometheus and Falco to collect logs, metrics and traces from Kubernetes clusters. The following diagram provides an overview of the collection process.

Installation

Detailed instructions are available in our Installation Guides.

Documentation

Sumo Logic Helm Chart Version

Supported versions

Below is a table with documentation for every supported minor release. EOL for the latest release will be six months after next minor release.

version	planned end of life date
v4.9	TBA
v4.8	2025-01-01
v4.7	2024-12-07
v4.6	2024-10-10
v4.5	2024-09-27
v4.4	2024-08-22
v4.3	2024-07-24
v3.19	2024-08-21

Unsupported versions

version	end of life date
v4.2	2024-06-13
v4.1	2024-05-27
v4.0	2024-05-03
v3.18	2024-04-20
v3.17	2024-04-20
v3.16	2024-04-20
v3.15	2024-04-18
v3.14	2024-03-18
v3.13	2024-03-01
v3.12	2024-02-21
v3.11	2024-02-07
v3.10	2024-01-28
v3.9	2024-01-06
v3.8	2023-12-14
v3.7	2023-11-22
v3.6	2023-11-11
v3.5	2023-11-04
v3.4	2023-10-14
v3.3	2023-09-27
v3.2	2023-09-01
v3.1	2023-08-16
v3.0	2023-08-09
v2.19	2023-07-20
v1.3	2021-07-14
v0.17	2020-11-21

Roadmap

Please refer to roadmap document.

License

This project is released under the Apache 2.0 License.

Contributing

Please refer to our Contributing documentation to get started.

Code Of Conduct

Please refer to our Code of Conduct.

sumologic-kubernetes-collection's People

Contributors

Stargazers

Watchers

Forkers

abhi-sumo-zz rvmiller89 funzie19 rickgong djsly yuriydee jpbelanger-mtl jrobersonaquent willthames hasselrot matthewdaltonfanduel frankreno kabamyqiu sachinar vakkur granular-ryanbonham evgo-vlopez pmaciolek rem14rem rplahn chunmk saicharan232 jc-benchmarkcorp spchin alexbklein91 breinero dbw0011 dbraab cmedley2 yunzhezyz spostek0 aj-garcia05 perlj victorcq bergerx severity1 ajithreddygeth sunysaurabh mohit077 blame19 zmotu kkruk-sumo adhira-deogade karthimohan unxmaal bogdansucila sekka1 managedkube danpop-chainguard takescoop onebadsanta kkujawa-sumo ag237 netflash brandon-rollins somasharath szpuni msuterski mohitbmehta jorgyp notsoluckycharm msnitowsky sandip750 fgshadden jhamohneesh mozz-lx ankitgoelcmu visu-svmx cwitthaus voidlily gizmo-rt hasantutac arvindiyengar psaia 4ragumurthy cbuto terrylillie sepulworld rghorpade-mdsol 814himanny venkatag08 frank-wang-xero andrewboyd-contsec poblahblahblah yashoza19 jineekr antpallen ajayk nairb e-jjj anphn-mtt nrsb-11 smark91 pszmytka-viacom cjones12 sheinat-ab pmalek-sumo kyleechols z-reed mpoirierpax8

sumologic-kubernetes-collection's Issues

Installation with helm example

In order to set the desired cluster metadata field value for both logs and metrics follow the below example when executing the setup script and helm install.

curl -s https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/master/deploy/docker/setup/setup.sh
| bash -s - -d false -y false -c <collector_name> -k <cluster_name> <api_endpoint> <access_id> <access_key>

helm install sumologic/sumologic --name collection --namespace sumologic --set prometheus-operator.prometheus.prometheusSpec.externalLabels.cluster=<cluster_name>

Make sure the two <cluster_name> values are the same.

kubelet https fix, command should be updated

In this section:

https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/docs/Troubleshoot_Collection.md#2-disable-the-kubeletservicemonitorhttps-flag-in-the-prometheus-operator

The fix is to install the prometheus-operator we say "helm install stable/prometheus-operator" but that only applies if you are running prometheus-operator separately. We should add this command to just upgrade our release: helm upgrade collection sumologic/sumologic --namespace sumologic --reuse-values --set prometheus-operator.kubelet.serviceMonitor.https=false 

fluentd deployment prevents safe cluster-autoscaler scale in due to pos-files volume

Due to the use of a hostPath volume for pos-files in the fluentd deployment, cluster-autoscaler is unable to evict pods for safe rescheduling on another node without risk of duplicate ingestion.

I suspect it may be better to use a persistent volume claim that can follow the pod to a new node, though there may other considerations I'm missing.

Running Helm install the second time yields the CRD already exists error

The prometheus operator chart has a flag to cleanup the CRDs. We should enable that flag by default or provide a flag the user can use to not have the CRDs run documented, so that users don't have to dig into the Prometheus Operator chart documentation to figure out how to avoid this error.

helm install fails on prometheus-operator

helm install sumologic/sumologic is failing on prometheus-operator. Following the install instructions to install the collector via helm when no prometheus is installed. I run into this error

Error: apiVersion "monitoring.coreos.com/v1" in sumologic/charts/prometheus-operator/templates/prometheus/rules/kubernetes-system.yaml is not available

helm install sumologic/sumologic --name collection --namespace sumologic --set sumologic.endpoint=https://api.us2.sumologic.com/api/v1/ --set sumologic.accessId=XXXXXXXXXX --set sumologic.accessKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx --set prometheus-operator.prometheus.prometheusSpec.externalLabels.cluster="my-cluster" --set sumologic.clusterName="my-cluster"
Error: apiVersion "monitoring.coreos.com/v1" in sumologic/charts/prometheus-operator/templates/prometheus/rules/kubernetes-system.yaml is not available

https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/docs/Installation_with_Helm.md#installation-steps

Helm fails to create CRDs for Prometheus. The workaround is in the prometheus-operator helm Chart here.

https://github.com/helm/charts/tree/master/stable/prometheus-operator#helm-fails-to-create-crds

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Problem in chart hook prevents installation

Our installation is currently fails due to the https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/e61a3f5665cdd75af82befe9daf061f0f12fe1bd/deploy/helm/sumologic/templates/setup/setup-job.yaml.

Here is what we see in helm install:

# helm install sumologic/sumologic --name collection --namespace sumologic --set sumologic.endpoint=https://api.sumologic.com/api/v1/ --set sumologic.accessId={...cropped...} --set sumologic.accessKey={...cropped...} -f sumo_values.yaml 
Error: timed out waiting for the condition

Checking the events:

# kubectl describe job collection-sumologic-setup -n sumologic
...
Events:
  Type     Reason                Age   From            Message
  ----     ------                ----  ----            -------
  Normal   SuccessfulCreate      27m   job-controller  Created pod: collection-sumologic-setup-5hh47
  Normal   SuccessfulDelete      21m   job-controller  Deleted pod: collection-sumologic-setup-5hh47
  Warning  BackoffLimitExceeded  21m   job-controller  Job has reached the specified backoff limit

Available logs in the Job's Pod:

# kubectl logs collection-sumologic-setup-chxnl -n sumologic
Namespace 'sumologic' exists, skip creating.
Checking for secret 'sumologic'...
Creating collector 'kubernetes-{...cropped...}' for cluster kubernetes...
Failed to create collector:
{ "status" : 301, "id" : "{...cropped...}", "code" : "moved", "message" : "The requested resource SHOULD be accessed through returned URI in Location Header." }
#

Error: timed out waiting for the condition

When running the below command ran into an issue, where it is timing out with the error
Error: timed out waiting for the condition

helm install sumologic/sumologic --name collection --namespace sumologic --set sumologic.endpoint=https://api.us2.sumologic.com/api/v1/ --set sumologic.accessId=xxxxxxx--set sumologic.accessKey=xxxxxx--set prometheus-operator.prometheus.prometheusSpec.externalLabels.cluster="xxxxx" --set sumologic.clusterName="xxxx-sumo" --no-crd-hook

Can I exclude all systemd and kubelet logs when install via helm?

Hi, is there any way to exclude all logs (systemd, kubelet) except containers' logs?

DPM ingestion rate

How can you slow down the DPM ingestion rate?

We ran the one line deploy and our ingestion rate jumped 15,000 overnight.

https://github.com/SumoLogic/sumologic-kubernetes-collection/tree/master/deploy

Error: secrets "sumologic" not found when setupEnabled = false

The setup process is not idempotent and will fail if the collector already exists. As a result, we needed to set setupEnabled: false.

When this flag is set, the sumologic secret never gets set. The sumologic service pods fail with the error:

Error: secrets "sumologic" not found

Sumologic helm chart approach

Current Sumologic proposed solution is working fine, your implementation helm chart contains Falco, Prometheus, Fluentd and Filebeat.

But a lot of organisations is using Sumologic for log management (without metrics). This Kubernetes deployment approach is somehow forcing organisations to use tools which they dont need or have other equivalent for them (for example metrics are gathered by Sysdig or Datadog). Could you rethink this chart and release log_only chart or allow such customisation in existing chart which can be implemented without any additional stuff like Prometheus/Falco.

This is very important to some customers because they dont want duplicate or replace existing processes (for example for metrics gathering by other tool) or they simply don`t require such features like Falco. It will also remove some issues which are currently problematic with your chart (like Falco which is failing or require to modify base AMI).

I know that all those pieces which are in chart are creating your super APP in Sumologic (which is really great) but even if this will be cost to loose it it will be still beneficial to customers to have log_only chart.

Digging into chart I found that falco can be disabled by:

helm upgrade --namespace sumologic --set sumologic.accessId=*** --set sumologic.accessKey=***  --set sumologic.clusterName=eks-testenvironment --set falco.enabled=false  collection sumologic/sumologic

so looks that it is possible, this is not documented anywhere.

Bash syntax error in SumoLogic/sumologic-kubernetes-collection/master/deploy/kubernetes/setup.sh

line 87 of returned scripts is

if [ -z $NAMESPACE]; then

however, this is incorrect Bash.

whitespace is required beween variable name and the bracket for bash to tokenize the bracket.

Log Details

It will be really helpful to mention in the readme as what logs are we capturing like container logs, controller manager, kubelet etc.

Facilitating gitops for setup

We utilize git ops for our Kubernetes deployments and for security reasons we do not want to hold our accessId and accessKey in git. To Facilitate this would it be possible to utilize envFrom in the setup charts to be able to source those secrets from a kubernetes native secret

source_category_replace_dash is not being applied to systemd logs

After upgrading from the old stable/sumologic helm chart, we noticed that source_category_replace_dash isn't being honored for systemd logs.

Upon inspection, it looks like source_category_replace_dash is missing from both the kubelet and systemd filters: https://github.com/bboerst/sumologic-kubernetes-collection/blob/7cfcf16379e77d72bcc133880e6de8881cb804e8/deploy/helm/sumologic/conf/logs/logs.source.systemd.conf#L36

I opened #483 to fix this. If approved, this would really help our workflow.

fluentdLogLevel in values.yaml has no effect

Either fluentdLogLevel in the values yaml has no effect, or does not perform as described. Changing to values like "warn" or "error" do not actually appear to have any impact on log level.

This is desired because fluentd stdout is exceptionally chatty, particularly when there are warn messages, which in our case were being issued repeatedly in the tens-of-millions which is unhelpful and greatly made issues worse by unexpectedly and dramatically increasing log volume.

fluent bit config doesn't allow to easily provide a hostname field

When getting logs from fluent-bit

we can get
container or system logs
the container log will have a field node and hostname
while the systemd logs will have a field _HOSTNAME

we started augmenting the fluent bit input and there is no easy way to have the source (hostname) of the new scrapped logs from the nodes with a proper hostname field

It would be great if fluent-bit+fluentd could provide a generic solution to have a common fields to query logs based on node/hostname

Prometheus-Operator(Charts) comes with different record rules for K8S >= 1.14 which renders some dashboards inoperable

For Example

When using K8S >= 1.14, the prometheus-operator charts only installs the following rules

https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus/rules-1.14/node.rules.yaml

compared to

https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus/rules/node.rules.yaml

that is 27 missing rules just for the node...

The following remoteWrite entry contains lots of deprecated recordNames that are used in the different Kubernetes App Dashboards.

https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/helm/sumologic/values.yaml#L355

Note that AKS is about to support N-2 starting in December and that 1.14/1.15 will be 2 of the three supported versions.

Azure/AKS#1235

Allow clobbering collector

To avoid the script failing when a user forgets to delete the collector and wants to re-use the same name, it would be good to add a flag to allow the collector to be clobbered.

Installation behind proxy fails

Our kubernetes cluster is on our corporate network, any http/https access to the internet has to go through a proxy. I can see no way to set a proxy to allow this to work. Attempting to install hangs on the pre-install setup

Please make it possible to specify an http/https proxy on install.

Requests and Limits removed from the new values file

In the previous version 0.14.0, but values file for 0.15.0 this was removed, is this intentional?

sumologic:
image:
repository: sumologic/kubernetes-fluentd
tag: 0.14.0
pullPolicy: IfNotPresent

nameOverride: ""

deployment:
nodeSelector: {}
tolerations: {}
replicaCount: 3
resources:
limits:
memory: 1Gi
cpu: 1
requests:
memory: 768Mi
cpu: 0.5

eventsDeployment:
nodeSelector: {}
tolerations: {}
resources:
limits:
memory: 256Mi
cpu: "100m"
requests:
memory: 256Mi
cpu: "100m"

Missing kubelet metrics troubleshooting step doesn't work on AKS.

Missing kubelet metrics troubleshooting step doesn't work on AKS, kops doesn't support AKS yet.
I used below commands for AKS and it worked:

kubectl -n sumologic get servicemonitor prometheus-operator-kubelet -o yaml | sed 's/https/http/' >> prometheus-operator-kubelet.yaml
kubectl apply -f prometheus-operator-kubelet.yaml

I think these should be added to the readme for AKS

Can we default the collection process to exclude Sumologic Logs

The suggestion that we should default to excluding the sumologic namespace as those are primarily sumo specific logs.

Replace prometheus-operator with a template value

"helm get values prometheus-operator > current-values.yaml"

Not everyone's release names are the same, so good to make sure we call out those can change.

Helm generates bad templates for the deployments and deployments-events object

running the following generates deployment yamls that are invalid

helm template sumologic-0.10.0.tgz --namespace sumologic --set sumologic.endpoint=https://test.com --set sumologic.accessId=1234ads --set sumologic.accessKey=123asdkljr --set falco.enabled=false --set prometheus-operator.enabled=false --set sumologic.setupEnabled=false

Both the second heritage and release are wrongly indented.

this causes helm to fail to install:

23:03 $ helm install sumologic/sumologic --name collection --namespace sumologic --set sumologic.endpoint=https://test.com --set sumologic.accessId=1234ads --set sumologic.accessKey=123asdkljr --set falco.enabled=false --set prometheus-operator.enabled=false --set sumologic.setupEnabled=false --debug --dry-run
[debug] Created tunnel using local port: '57442'

[debug] SERVER: "127.0.0.1:57442"

[debug] Original chart version: ""
[debug] Fetched sumologic/sumologic to /Users/sylvain_boily/.helm/cache/archive/sumologic-0.10.0.tgz

[debug] CHART PATH: /Users/sylvain_boily/.helm/cache/archive/sumologic-0.10.0.tgz

Error: error validating "": error validating data: [ValidationError(Deployment.spec.template): unknown field "heritage" in io.k8s.api.core.v1.PodTemplateSpec, ValidationError(Deployment.spec.template): unknown field "release" in io.k8s.api.core.v1.PodTemplateSpec]

yaml example

# Source: sumologic/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: release-name-sumologic
  labels:
    app: release-name-sumologic
    chart: "sumologic-0.10.0"
    release: "release-name"
    heritage: "Tiller"
spec:
  selector:
    matchLabels:
      app: release-name-sumologic
  replicas: 3
  template:
    metadata:
      labels:
        app: release-name-sumologic
        chart: "sumologic-0.10.0"
    release: "release-name"
    heritage: "Tiller"
    spec:

# Source: sumologic/templates/events-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: release-name-sumologic-events
  labels:
    app: release-name-sumologic-events
    chart: "sumologic-0.10.0"
    release: "release-name"
    heritage: "Tiller"
spec:
  selector:
    matchLabels:
      app: release-name-sumologic-events
  template:
    metadata:
      labels:
        app: release-name-sumologic-events
        chart: "sumologic-0.10.0"
    release: "release-name"
    heritage: "Tiller"
    spec:

23:04 $ helm version
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Having issues deploying to GKE - Falco fails

Installing the latest release of sumologic helm chart, it deploys the sources and pods but fails when trying falco.

kubernetes version: 1.14.7-gke.23
helm: v2.14.3

not sure what other info is needed to help

Runtime error: error opening device /host/dev/falco0. Make sure you have root credentials and that the falco-probe module is loaded.. Exiting.

Trying to download precompiled module from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/falco-probe-0.17.1-x86_64-4.14.138%2B-3ed0774cb82bc5c7f3b6f5190c3b82ef.ko

Download failed, consider compiling your own falco-probe and loading it or getting in touch with the sysdig community

[Metrics] Check the Prometheus UI Troubleshooting step needs sumologic namespace

[Metrics] Check the Prometheus UI Troubleshooting step needs sumologic namespace as we are installing prometheus in sumologic namespace:

kubectl port-forward prometheus-prometheus-operator-prometheus-0 8080:9090

should be changed to:

kubectl port-forward prometheus-prometheus-operator-prometheus-0 8080:9090 -n sumologic

Provide example for how to override the `clusterLabel` in Prometheus Spec.

Given that it is really common to override the clusterLabel in our case, we should make an example that shows someone how to set the cluster label for both the metrics and logs streams through the helm script.

Missing kubelet metrics in AKS

Per this guide option 2 works when modifying the service monitor. However, using http vs https in insecure and need the ability to allow Prometheus to scrape these endpoints using https.

setup.sh silently fails when -k and -c are ommitted, but a value is passed

Example, notice the flag is missing but the value is there. The script silently fails when trying to create the collector:

curl -s https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/master/deploy/kubernetes/setup.sh \ | bash -s - someCollector someCluster <api_endpoint> <access_id> <access_key>

values.yaml missing "resources:" section for prometheus

There is no configuration available in the values.yaml for prometheus resource declaration such as

resources:
limits:
memory: 1Gi
cpu: 1
requests:
memory: 768Mi
cpu: 0.5

Backport #549 into release-v0.17

Deployment script deploy/kubernetes/setup.sh : kubectl describe uses substring match

In the script deploy/kubernetes/setup.sh
the $NAMESPACE is checked for existence by using ;
kubectl describe namespace $NAMESPACE

However this command does substring matches. I do not know of a method of convincing it not to do so. This is unfortunate as the script will then record the non-zero exit code and determine this a failure.

Cleanup steps for the script.sh

It would be great if there was documentation on how to cleanup the setup.sh items as well for any setting. Maybe the setup.sh could have a flag for cleanup?

Add prometheus pushgateway to the helm chart

At GoSpotCheck, we use the pushgateway to receive metrics from our Spark jobs from outside the cluster. The prometheus-operator chart does not include the pushgateway from some reason.

The stable pushgateway chart is compatible with the operator in that it can add an optional service monitor. So a dependency on that chart could be added to this.

Right now that isn't compatible with our needs because it doesn't allow annotations on the service, only the ingress (we expose the pushgateway via Ambassador which reads service annotations). I have a PR open to add annotations to the service on the chart, but no idea how long that will take to get merged. But that personal problem doesn't mean it's not the right solution overall here.

EDIT: oh hey 6 days ago Ambassador added support as an Ingress. So now just adding a dependency on the stable pushgateway chart would work fine.

Filesystem Metrics

Some customers might need Filesystem metrics. These metrics can either describe disks mounted to nodes or persistent volume claims. It would be great if there were some documented steps around how to get at these metrics.

Apparently Prometheus already collects kubelet_volume_* metrics. This might be something that you could add to the remoteWrite config section by adding it to one of the regex filters described here:
https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/helm/sumologic/values.yaml#L422

You would just add |kubelet_volume_*

Setup.sh Script in the Helm instructions should include cluster and collection flag.

You should still ask the user to pass in the -k and the -c flags to set the cluster name. Otherwise, the user will not set the logs stream to having the same cluster name relative to the metrics stream.

Add way to add custom gems without having to maintain docker image.

One possible way to do this is via an init container. The user could provide a list of additional gems, the init container would come up and do a gemgem install on those.

installing with helm: --namespace not used in helm 3

In helm 3, --namespace is not supported and errors. It looks like helm picks up kubernetes namespace instead. Install instructions might need to be updated.

how to remove ADD_TIMESTAMP and ADD_STREAM from the application level

I'm running fluentd and fluentbit inside our cluster with this env enabled. I want to know if there is a way to disable this using an annotation from the application k8s deployment manifest. so far the only annotations i have are these:
annotations:
sumologic.com/sourceCategory: test-stage-application
sumologic.com/kubernetes_meta_reduce: "true"
sumologic.com/format: text

name: ADD_TIMESTAMP
value: "true"
- name: ADD_STREAM
value: "true"

Setting source category for events source does not get applied with Helm install

I am installing the sumo collector via helm and trying to set a custom source category for the events source. I have downloaded the values.yaml from https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/master/deploy/helm/sumologic/values.yaml and updated the section for sumologic.events.sourceCategory. I first tried setting it to %{sourceCategoryPrefix}/%{clusterName}/events and then to a fixed string and in both cases the source category on the logs is set to Http Input. The logs are also tagged with a source name of Http Input even though the source tag is set to events.

Curl command documentation updates

In the section "How to install if you have an existing Prometheus operator" it says to run a curl command put doesn't output it to a file.

Should be updated to the below:

curl -LJO https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/master/deploy/helm/sumologic/values.yaml

Helm chart Uninstall should use --purge

I believe if we just delete the release name will continue to be used. This means that I can't uninstall and then install the helm chart because I'll have to change the release name.

I propose that we change the uninstall step from:

helm delete collection

helm delete --purge collection

Collection setup fails with api secrets from an environment variable

Summary

When trying to install the Helm Chart via a custom values.yaml file, the setup pod script fails saying that it cannot find the access_id and access_secret keys.

Supplementing Info

Helm version: v3.0.2
Sumo helm chart version: 0.12.0
Kubectl version: v.17.0
K8s version: 1.14.9
Provider: 'EKS'

Setup

I wiped all old sumologic namespaces and old helm version

kubectl delete ns sumologic
kubectl create ns sumologic

I created the secrets manually. Here is the output:

$> kubectl describe secret -n sumologic sumologic-api-secret
Name:         sumologic-api-secret
Namespace:    sumologic
Labels:
Annotations:

Type:  Opaque

Data
====
access_id:      14 bytes
access_secret:  64 bytes

^ btw, I've tried both access_id and accessId formats, it results in the same exact error output.

I install via the Helm chart:
helm install collection sumologic/sumologic --namespace sumologic -f sumo_custom.yaml

My yaml has this snippet declared:

sumologic:
  ## Setup

  # If enabled, a pre-install hook will create Collector and Sources in Sumo Logic
  setupEnabled: true

  # If enabled, accessId and accessKey will be sourced from Secret Name given
  envFromSecret: sumologic-api-secret

  # Sumo access ID
  #accessId: ""

  # Sumo access key
  #accessKey: ""

  # Sumo API endpoint; Leave blank for automatic endpoint discovery and redirection
  # ref: https://help.sumologic.com/APIs/General-API-Information/Sumo-Logic-Endpoints-and-Firewall-Security
  endpoint: "https://api.us2.sumologic.com/api/v1/"

Logs and Error

This is the output of the collection-sumologic-setup pod:

kubernetes_namespace.sumologic_collection_namespace: Importing from ID "sumologic"...
kubernetes_namespace.sumologic_collection_namespace: Import prepared!
Prepared kubernetes_namespace for import
kubernetes_namespace.sumologic_collection_namespace: Refreshing state... [id=sumologic]
Error: Missing required argument
on /terraform/sumo-k8s.tf line 27, in provider "sumologic":
27: provider "sumologic" {}
The argument "access_key" is required, but no definition was found.
Error: Missing required argument
on /terraform/sumo-k8s.tf line 27, in provider "sumologic":
27: provider "sumologic" {}
The argument "access_id" is required, but no definition was found.
kubernetes_secret.sumologic_collection_secret: Importing from ID "sumologic/sumologic"...
kubernetes_secret.sumologic_collection_secret: Import prepared!
Prepared kubernetes_secret for import
kubernetes_secret.sumologic_collection_secret: Refreshing state... [id=sumologic/sumologic]
Error: Cannot import non-existent remote object
While attempting to import an existing object to
kubernetes_secret.sumologic_collection_secret, the provider detected that no
object exists with the given id. Only pre-existing objects can be imported;
check that the id is correct and that it is associated with the provider's
configured region or endpoint, or use "terraform apply" to create a new remote
object for this resource.
Error: Missing required argument
on /terraform/sumo-k8s.tf line 27, in provider "sumologic":
27: provider "sumologic" {}
The argument "access_id" is required, but no definition was found.
Error: Missing required argument
on /terraform/sumo-k8s.tf line 27, in provider "sumologic":
27: provider "sumologic" {}
The argument "access_key" is required, but no definition was found.
provider.sumologic.access_id
Enter a value:
provider.sumologic.access_key
Enter a value:
Error: sumologic provider: access_id should be set; access_key should be set;
on <input-prompt> line 1:
(source code not available)

No information given when release name or namespace are changed

When running the helm install, helm install sumologic/sumologic --name collection --namespace sumologic If you change the name or namespace and don't change the remote write and the fluentd endpoint then nothing will work. If someone changes the release name or namespace from the default options we should give some type of info to let them know it won't work without the proper updates to the values.yaml.

sumologic-collector-setup job fails with collector not found error.

curl -s https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/v0.13.0/deploy/kubernetes/setup-sumologic.yaml.tmpl | \
sed 's/\$NAMESPACE'"/sumologic/g" |
sed 's/\$SUMOLOGIC_ACCESSID'"/<SUMOLOGIC_ACCESSID>/g" | 
sed 's/\$SUMOLOGIC_ACCESSKEY'"/<SUMOLOGIC_ACCESSKEY>/g" | 
sed 's/\$COLLECTOR_NAME'"/collector/g" | 
sed 's/\$CLUSTER_NAME'"/<my-aks-cluster-name>/g" | 
tee setup-sumologic.yaml | 
kubectl -n sumologic apply -f -

I ran the above command with the appropriate values as shown above.
The job starts but quickly fails with the following error.

�[0m�[1m�[32mTerraform has been successfully initialized!�[0m�[32m�[0m
�[0m�[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.�[0m
�[0m�[1msumologic_collector.collector: Importing from ID "collector"...�[0m
�[0m�[1m�[32msumologic_collector.collector: Import prepared!�[0m
�[0m�[32m  Prepared sumologic_collector for import�[0m
�[0m�[1msumologic_collector.collector: Refreshing state... [id=collector]�[0m
�[31m
�[1m�[31mError: �[0m�[0m�[1mcollector with name 'collector' does not exist�[0m

�[0m�[0m�[0m
�[0m�[1msumologic_http_source.default_metrics_source: Importing from ID "collector/(default-metrics)"...�[0m
�[31m
�[1m�[31mError: �[0m�[0m�[1mcollector with name 'collector' does not exist�[0m

�[0m�[0m�[0m
�[0m�[1msumologic_http_source.apiserver_metrics_source: Importing from ID "collector/apiserver-metrics"...�[0m

Based on this part of the documentation:

It will create Kubernetes resources in your environment and run a container that uses the Sumo Logic Terraform provider to create a Hosted Collector and multiple HTTP Sources in Sumo.

It is my understanding that this job is to supposed to create a controller with the name we have passed in as a parameter?
If this assumption is incorrect -- what is the recommended process for creating this controller?

I am currently trying the non-helm installation. But when using helm the job fails with the same error.

This lead me to believe the issue might might have been with the sumologic endpoint, https://api.sumologic.com/api, I have provided but upon curling to the end point does not return errors. [actually it does not return anything]

Might anyone be able to provide any hints as to what might be going wrong?

fluent-bit daemonset may not run on every cluster node

Right now the values for the fluent-bit daemonset contain the following tolerations:

tolerations:
  - key: node-role.kubernetes.io/master
    effect: NoSchedule

Since this pod is required to collect node local logs, I think most customers (us included) would like to run these pods on every worker even those that might have NoSchedule taints.

I think this can be achieved by a more inclusive toleration:

tolerations:
	- effect: NoSchedule
	   operator: Exists

which should mean that the fluent-bit pod can tolerate any NoScheule taint regardless of key.

Update helm install with helm upgrade --install

We need to have a helm upgrade command example if we want to change parameters in the values.yaml file.

Suggested example:
helm upgrade collection sumologic/sumologic --install --force --namespace sumologic --set sumologic.endpoint=https://api.us2.sumologic.com/api/v1/ --set sumologic.accessId=id --set sumologic.accessKey=key -f values.yaml

Log lines over 16KB are truncated

We have found that when a pod produces a single JSON log line over 16kb, the resulting log gets truncated in SumoLogic. We don't see the log being split up into multiple messages in Sumo, instead the first part of the log gets lost and we only see the second "half" of the log.

We've done a bit of research on the issue and found that Docker chunks messages larger than 16KB, so it's essentially up to the log processor/aggregator to recombine the chunked messages. Relevant issues we found on the subject:
kubernetes/kubernetes#52444
moby/moby#34855

One of the recommendations was to use the fluentd concat plugin https://github.com/fluent-plugins-nursery/fluent-plugin-concat . In the readme, there's an example for "Handle Docker logs splitted in several parts (using partial_message), and do not add new line between parts."

<filter>
  @type concat
  key message
  partial_key partial_message
  partial_value true
  separator ""
</filter>

We are considering testing this out to vet the solution, but we figured we would open an issue since this might come up for other users of the chart.

Instruction to overwrite prometheus operator config to add remote write url is dangerous

https://raw.githubusercontent.com/SumoLogic/sumologic-kubernetes-collection/v0.9.0/deploy/helm/prometheus-overrides.yaml

The file will disable alertmanager and grafana as well as remove any additionalServiceMonitors that the original installation might have created

alertmanager:
  enabled: false
grafana:
  enabled: false
  defaultDashboardsEnabled: false
prometheus:
  additionalServiceMonitors:
    - name: collection-sumologic

The user should be asked to update his prometheus-operator with a prometheus-override which only contains

prometheus:
  prometheusSpec:
    remoteWrite:

Since the Documentation is asking him

https://github.com/SumoLogic/sumologic-kubernetes-collection/tree/master/deploy#overwrite-prometheus-remote-write-configuration

If you have not already customized your remote write configuration, run the following to update the remote write configuration of the prometheus operator by installing with the prometheus overrides file we provide below.

sumologic / sumologic-kubernetes-collection Goto Github PK