GithubHelp home page GithubHelp logo

openshift / cluster-logging-operator Goto Github PK

View Code? Open in Web Editor NEW
98.0 20.0 140.0 104.98 MB

Operator to support logging subsystem of OpenShift

License: Apache License 2.0

Go 86.22% Shell 12.42% Dockerfile 0.19% Makefile 0.86% Awk 0.04% Python 0.27%
logging fluentd vector logcollection

cluster-logging-operator's Introduction

Cluster Logging Operator

An operator to support OKD aggregated cluster logging. Cluster logging configuration information is found in the configuration documentation.

Overview

The CLO (Cluster Logging Operator) provides a set of APIs to control collection and forwarding of logs from all pods and nodes in a cluster. This includes application logs (from regular pods), infrastructure logs (from system pods and node logs), and audit logs (special node logs with legal/security implications)

The CLO does not collect or forward logs itself: it starts, configures, monitors and manages the components that do the work.

CLO currently uses:

  • Vector as collector/forwarder

  • Loki as store

  • Openshift console for visualization.

(Still supports fluentd, elasticsearch and kibana for compatibility)

The goal is to encapsulate those technologies behind APIs so that:

  1. The user has less to learn, and has a simpler experience to control logging.

  2. These technologies can be replaced in the future without affecting the user experience.

The CLO can also forward logs over multiple protocols, to multiple types of log stores, on- or off-cluster

The CLO owns the following APIs:

  • ClusterLogging: Top level control of cluster-wide logging resources

  • ClusterLogForwarder: Configure forwarding of logs to external sources

To install a released version of cluster logging see the Openshift Documentation, (e.g., OCP v4.5)

To experiment or contribute to the development of cluster logging, see the hacking and review documentation

To debug the cluster logging stack, see README.md

To find currently known Cluster Logging Operator issues with work-arounds, see the Troubleshooting guide.

cluster-logging-operator's People

Contributors

abrennan89 avatar ahadas avatar ajaygupta978 avatar alanconway avatar andreaskaris avatar blockloop avatar bparees avatar btaani avatar cahartma avatar clee2691 avatar eranra avatar ewolinetz avatar jaormx avatar jcantrill avatar jlarriba avatar k-keiichi-rh avatar lukas-vlcek avatar nhosoi avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar periklis avatar pmoogi-redhat avatar red-gv avatar richm avatar syedriko avatar vimalk78 avatar vladmasarik avatar vparfonov avatar xperimental avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-logging-operator's Issues

Restructure 'all-in-one' as its currently defined before 4.0 release

Overview

While working through:

  • standing up cluster-logging
  • configuration options
  • documentation
  • reviewing code

I fundamental believe the approach we are taking to configure split clusters is repeating the same problem we had with the deployer, ansible, and now the operator. Prior to feature freeze for 4.0, we must re-evaluate the current CR as it will become an API we will need to maintain for a while going forward

Issue

We currently treat the split scenario (apps to one cluster, infra to another) as a special case. The implementation depends on an annotation for which we introduce 'if' checks (i.e. elasticsearch case) in multiple places. This is contrary to the advisement we received several releases ago to consider how we might treat these cases as the same but different instance (e.g. class and object metaphor). With regards to applications and operations Elasticsearch stacks (ie. ES, Kibana, curator), there is no difference between the two besides the name. By subtly altering how we represent these use cases in the CR, we can remove the specialty nature of the current design. This should simply the code.

Proposal

This proposal is a variant of one of the alternates listed below. It would introduce an additional hierarchy to group stacks accordingly (allowing additional ones in future if that makes sense), and configure message routing in the collector. This change also would allow us to treat clusters uniformally:

Clusters

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
      - name: app
      type: elastic
      elastic:
         logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"
     -  name: infra
       type: elastic
       elastic:
          logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"  
...

One could further suggest an additional optimization where since we know the stacks[].type we no longer need component types; we will ALWAYS have the same components in a given cluster type (e.g. Elasticsearch, Kibana, Curator)

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
    - name: app
      type: elastic
      elastic:
         logStore:
            resources:
              request:
              limits:
            dataReplication: "NoReplication"
         visualization:
            resources:
              request:
              limits:
           replicas: 1
         curation:
            resources:
              request:
              limits:
            schedule: "30 3 * * *"
    -  name: infra
       type: elastic
       elastic:
         logStore:
         visualization:
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"  

What's in a name

Ideally, we would use the name as either the name for all dependent resources or as a suffix to the resources the operator creates (e.g. elasticsearch-infra). Alternatively, we might consider only applying the suffix (as we do now) when there are multiple cluster definitions. Additionally we should consider only supporting the names: apps, infra, since they have special meaning.

Collectors

Initially, message routing would require us make some opinionated assumptions based on the deployed clusters:

  • Single cluster: all messages route here
  • Multiple clusters: app logs -> app, infra -> infra
    In future we could introduce a way to define where messages are routed but intentionally absent here.
apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
collection:
    logCollection:
      type: "fluentd"
      fluentd:
        nodeSelector:
          logging-infra-fluentd: "true"

Alternates

Multiple CRs, one for each cluster

Ref: https://gist.github.com/jcantrill/4a9365170f32f72ed57c83f6bb566b4f#file-gistfile1-txt-L27

Cons

  • Requires cluster admin to 'wire' logs from collector various destinations.
  • No inherent relations between multiple CRs/clusters on a single 'cluster logging' setup

Single CR with names sources

https://gist.github.com/chancez/6f326e68412dbe760aeffd2be7ea5adf

Cons

  • Introduces named clusters in away that is limiting (e.g infraLog, appLog)

cluster logging operator csv stuck on pending

Hi! I am trying to deploy the cluster logging operator with the manifests/4.2/cluster-logging.v4.2.0.clusterserviceversion.yaml file. First i creates the operatorGroups that it needs, rbac for the elasticsearch operator and deploy the elasticsearch operator also with yaml file also and in both cases the csv is stuck on pending and it writes "Service account does not exist".
When I am trying to deploy it with operatorHub its deploying successfully, but i want to deploy it on disconnected environment eventually.

Thanks

cluster logging operator pod cant use clusterlogging

hey, to make things clear this time -
I have an issue deploying the operator in my environment. first I'm testing it in a normal cluster but at the end ill need to deploy it on a disconnected one.

I follow the instructions from those walk-throughes:
https://github.com/operator-framework/getting-started
https://docs.openshift.com/container-platform/4.2/logging/cluster-logging-deploying.html

and everything goes well until the point were i deploy the cluster logging operator csv:
https://github.com/openshift/cluster-logging-operator/blob/release-4.2/manifests/4.2/cluster-logging.v4.2.0.clusterserviceversion.yaml

where i receive the following message when i try it:
Failed to list *v1.ClusterLogging: clusterloggings.logging.openshift.io is forbidden: User "system:serviceaccount:openshift-logging:cluster-logging-operator" cannot list resource "clusterloggings" in API group "logging.openshift.io" at the cluster scope

make deploy-example fails on MACOS

Issue

The command REMOTE_CLUSTER=true make deploy-example fails on MACOS as he mktemp option passed as parameter is illegal

++ mktemp --tmpdir -d cluster-logging-operator-build-XXXXXXXXXX
mktemp: illegal option -- -
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix

ElasticSearch or Cluster-Logging Operator should extend the vm.max_map_count

The kernel parameter regarding vm.max_map_count should be modified by Cluster-Logging-Operator or ElasticSearch Operator in order to allow deploy ElasticSearch instance correctly into the concrete project.

# Should be vm.max_map_count=262144
sysctl -n vm.max_map_count                          
65530
  • MC Patch
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-sysctl-elastic
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          # vm.max_map_count=262144
          source: data:text/plain;charset=utf-8;base64,dm0ubWF4X21hcF9jb3VudD0yNjIxNDQ=
        filesystem: root
        mode: 0644
        path: /etc/sysctl.d/99-elasticsearch.conf

Use existing ES cluster

Hi,

I'm using an Elasticsearch cluster for a search project using the Elastic operator from elastic.co. Now I want to turn on cluster logging but it seems it will try and create another cluster altogether.

Given the heavy resource requirements of ES clusters, how can I leverage my existing one instead of letting this operator to create a new one? I can't seem to find anything in the ClusterLogging CR to do that, at least in the 4.2 version.

Thanks

CLO should not allow specification of the number of ES deployment replicas

We should not allow user's to specify the number of ES pod replicas since we know our model is one deployment per ES node. If this value represents 'ES nodes' then we should change the name accordingly. Furthermore if this represents 'nodes', we should consider removing this field (or defaulting if it doesnt exist) since we can go into unmanaged state, and default the number of ES nodes to '3'. Recall our target is 99% of installations on an AWS Openshift installation which should have enough infra and workers to support logging

Make /usr/share/logging/ location customizable

As of now the location of the folder /usr/share/logging/ seems to be hardcoded in Dockerfile, see:

RUN mkdir -p /usr/share/logging/

When running CLO locally (like during development) this location is expected to contain couple of files, some examples are:

ERRO[0030] Unable to read file to get contents: open /usr/share/logging/curator/curator-actions.yaml: no such file or directory 
ERRO[0030] Unable to read file to get contents: open /usr/share/logging/curator/curator5-config.yaml: no such file or directory 
ERRO[0030] Unable to read file to get contents: open /usr/share/logging/curator/curator-config.yaml: no such file or directory

This can be challenge to Apple users as this location can not be modified on MacOS. See here or here.

Would it make sense to make this location customizable? Can we think of any downsides?

Errors using the deploy-example goal

I'm trying to get clustered logging work on a local instance of minishift (or cluster up) and am facing some issues.

First issue

The first issue is it seems the ELASTICSEARCH_OP_REPO needs to be explicitly set otherwise I see this error:

+ popd
~/tmp/wildfly-efk/cluster-logging-operator
+ CREATE_ES_SECRET=false
+ NAMESPACE=openshift-logging
+ make -C hack/../../elasticsearch-operator deploy-setup
make[1]: *** hack/../../elasticsearch-operator: No such file or directory.  Stop.
make: *** [Makefile:91: deploy-setup] Error 2

Second Issue

The second issue is the vendor/github.com/openshift/elasticsearch-operator/hack/deploy-setup.sh script has line where it point's to an invalid directory.

pushd vendor/github.com/coreos/prometheus-operator/example/prometheus-operator-crd
  for file in prometheusrule.crd.yaml servicemonitor.crd.yaml; do 
    oc create -n ${NAMESPACE} -f ${file} ||:
  done
popd

The directory vendor/github.com/coreos/prometheus-operator/example/prometheus-operator-crd does not exist.

Third Issue

This one could actually just be the environment I'm attempting to use. It could be that I'm using OpenShift 3 and it looks like this targets OpenShift 4. Anyway the error is:

--> FROM registry.svc.ci.openshift.org/openshift/origin-v4.0:base as 1
--> RUN INSTALL_PKGS="       openssl       " &&     yum install -y $INSTALL_PKGS &&     rpm -V $INSTALL_PKGS &&     yum clean all &&     mkdir /tmp/_working_dir &&     chmod og+w /tmp/_working_dir
Loaded plugins: ovl, product-id, search-disabled-repos, subscription-manager
This system is not receiving updates. You can use subscription-manager on the host to register and assign subscriptions.
http://base-4-0.ocp.svc/rhel-fast-datapath/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: base-4-0.ocp.svc; Unknown error"
Trying other mirror.


 One of the configured repositories failed (rhel-fast-datapath),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=rhel-fast-datapath ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable rhel-fast-datapath
        or
            subscription-manager repos --disable=rhel-fast-datapath

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=rhel-fast-datapath.skip_if_unavailable=true

failure: repodata/repomd.xml from rhel-fast-datapath: [Errno 256] No more mirrors to try.
http://base-4-0.ocp.svc/rhel-fast-datapath/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: base-4-0.ocp.svc; Unknown error"
running 'INSTALL_PKGS="       openssl       " &&     yum install -y $INSTALL_PKGS &&     rpm -V $INSTALL_PKGS &&     yum clean all &&     mkdir /tmp/_working_dir &&     chmod og+w /tmp/_working_dir' failed with exit code 1
make: *** [Makefile:75: image] Error 1

Permission problem when allow to run as anyuid

Quick install of Openshift 4.5 on AWS
Use gp2 as default storage class
Install cluster-logging and elasticsearch operators
Add scc to group:

 oc adm policy add-scc-to-group anyuid system:authenticated

Deploy a default instance as described on Openshift 4.5 documentation.

Then the elasticsearch pods will throw a exception copying files:

[2020-07-23 06:59:28,823][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-07-23 06:59:28,826][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-07-23 06:59:28,827][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-07-23 06:59:28,828][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms8192m -Xmx8192m'
[2020-07-23 06:59:28,829][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch//secret
[2020-07-23 06:59:28,834][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch//secret/admin.p12 to /etc/elasticsearch//secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch//secret/elasticsearch.p12 to /etc/elasticsearch//secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch//secret/logging-es.p12 to /etc/elasticsearch//secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
cp: cannot create regular file '/etc/elasticsearch/elasticsearch.yml': Permission denied
cp: cannot create regular file '/etc/elasticsearch/log4j2.properties': Permission denied

Proxyconfig-controller fails to delete logcollector service account

This may not be the core issue, however I see the following error in the cluster-logging-operator logs repeat several times during start up. It does eventually stop appearing, however maybe this is something we can handle more gracefully with a get check first?

{"level":"error","ts":1582132106.5039325,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"proxyconfig-controller","request":"/cluster","error":"Unable to create or update collection for \"\": Failure deleting logcollector service account: an empty namespace may not be set when a resource name is provided","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Does CLO managed state control EO managed state?

Working through config options and the spec:

  • What happens when I set the CLO to Unmanged?
  • What happens when I set the EO to Unmanaged but CLO is managed?

We need to sort this out, and describe it in our documention

fluentd pods scheduled only on worker nodes

I deployed the logging-stack refering the README.md file and the operators and pods were up and running.

I could not see nodeSelector anymore on the ds configuration, but the fluentd pods are only up and running on worker nodes and not master nodes.

[root@localhost ocp1]# oc get pods -owide -l component=fluentd
NAME            READY     STATUS    RESTARTS   AGE       IP            NODE
fluentd-4xxgl   1/1       Running   0          36m       10.131.x.xx   ip-10-0-xxx-xx.us-east-2.compute.internal
fluentd-r56fp   1/1       Running   0          36m       10.128.x.xx   ip-10-0-143-93.us-east-2.compute.internal
fluentd-swrqv   1/1       Running   0          36m       10.129.x.xx   ip-10-0-175-200.us-east-2.compute.internal
[root@localhost ocp1]#

The cluster comprises of 6 nodes which includes 3 masters and 3 worker nodes on it.

Could some check if this is expected behavior or am I missing something on it?

Create custom elasticsearch index in fluentd configuration

Can you details out steps to create custom elasticsearch index in fluentd configuration.

Tried like below, but it says cannot create any new index other than project_full, operations_full..

<elasticsearch_index_name>
     enabled "true"
     tag "myapp*"
     name_type custom_index
</elasticsearch_index_name>

Thanks for any help in advance

Set max_map_count when using minishift

Follow-up from #41: when deploying the example and the OpenShift cluster is set via minishift, the script should run the following:

minishift ssh -- sudo sysctl -w vm.max_map_count=262144

500 Internal Error Additional Trusted CA Bundle missing

Hello,

Internally signed TLS CAtrusted bundled are not being copied to the kibana and kibana-proxy pod.

I had to set the Operator to unmanaged and created the configmap with the additional trusted CA bundle - named: "trusted-ca-bundle"

Then modified the deployment.

$ diff -U5 deployment_kibana.yaml.old deployment_kibana.yaml
--- deployment_kibana.yaml.old  2019-10-25 09:01:02.446738600 -0400
+++ deployment_kibana.yaml      2019-10-25 09:00:10.815299500 -0400
@@ -87,10 +87,13 @@
         terminationMessagePolicy: File
         volumeMounts:
         - mountPath: /etc/kibana/keys
           name: kibana
           readOnly: true
+        - mountPath: /etc/pki/ca-trust/extracted/pem
+          name: trusted-ca-bundle
+          readOnly: true
       - args:
         - --upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
         - --https-address=:3000
         - -provider=openshift
         - -client-id=system:serviceaccount:openshift-logging:kibana
@@ -128,10 +131,13 @@
         terminationMessagePolicy: File
         volumeMounts:
         - mountPath: /secret
           name: kibana-proxy
           readOnly: true
+        - mountPath: /etc/pki/ca-trust/extracted/pem
+          name: trusted-ca-bundle
+          readOnly: true
       dnsPolicy: ClusterFirst
       nodeSelector:
         kubernetes.io/os: linux
         node-role.kubernetes.io/infra: ""
       restartPolicy: Always
@@ -147,10 +153,17 @@
           secretName: kibana
       - name: kibana-proxy
         secret:
           defaultMode: 420
           secretName: kibana-proxy
+      - configMap:
+          defaultMode: 420
+          items:
+          - key: ca-bundle.crt
+            path: tls-ca-bundle.pem
+          name: trusted-ca-bundle
+        name: trusted-ca-bundle
 status:
   availableReplicas: 1
   conditions:
   - lastTransitionTime: "2019-10-24T18:21:02Z"
     lastUpdateTime: "2019-10-24T18:21:02Z"

Failure creating Elasticsearch CR

Issue

Cluster logging can't be deployed on ocp4. The Cluster Logging Operator reports this error

time="2019-04-09T16:14:18Z" level=error msg="error syncing key (openshift-logging/instance): Unable to create or update logstore for \"instance\": Failure creating Elasticsearch CR: failed to get resource client: failed to get resource type: failed to get the resource REST mapping for GroupVersionKind(logging.openshift.io/v1, Kind=Elasticsearch): no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1\""
time="2019-04-09T16:14:23Z" level=error msg="error syncing key (openshift-logging/instance): Unable to create or update logstore for \"instance\": Failure creating Elasticsearch CR: failed to get resource client: failed to get resource type: failed to get the resource REST mapping for GroupVersionKind(logging.openshift.io/v1, Kind=Elasticsearch): no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1\""

after we have applied the following CR on the namespace openshift-logging

oc create -n openshift-logging -f hack/cr.yaml
where cr.yaml

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 1
      storage: {}
      redundancyPolicy: "ZeroRedundancy"
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  curation:
    type: "curator"
    curator:
      schedule: "30 3,9,15,21 * * *"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
-->
clusterlogging.logging.openshift.io/instance created

Info

Red Hat OpenShift Container Platform
OpenShift is Red Hat's container application platform that allows developers to quickly develop, host, and scale applications in a cloud environment.

Cluster ID
a3acddcb-6eff-41d5-b10c-eafb2b905d11
Kubernetes Master Version
v1.12.4+0ba401e

Cluster Logging Operator deployed : 4.1.0 (preview)

ImageStreams are not propagated to the deployment

There looks to be an issue with whats in the manifest:
https://github.com/openshift/cluster-logging-operator/blob/master/manifests/05-deployment.yaml#L29-L42

and what gets rolled out.

    spec:
      containers:
      - command:
        - cluster-logging-operator
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: OPERATOR_NAME
          value: cluster-logging-operator
        - name: ELASTICSEARCH_IMAGE
          value: quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.0.0-0.93.0.0-ose-elasticsearch-operator
        image: quay.io/openshift/cluster-logging-operator:latest
        imagePullPolicy: IfNotPresent

Note ELASTICSEARCH_IMAGE I manually added to try and update the ES image. It didnt exist before that.
This will inhibit us from deploying the correct images during release

Also, it doesn't appear the value was added to the ES operator deployment:

  - command:
    - elasticsearch-operator
    env:
    - name: WATCH_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: OPERATOR_NAME
      value: elasticsearch-operator
    image: quay.io/openshift/elasticsearch-operator:latest

Use RetryOnConflict for updates to existing objects #28

First take a look at https://github.com/kubernetes/client-go/blob/master/examples/create-update-delete-deployment/main.go#L102 which describes why RetryOnConflict is needed.

There are several patterns in our code like this:

  client.Get(object)
  object.somefield = "new value"
  client.Update(object)

The problem is that the object can be updated by another client between the Get and the Update and the Update will return a Conflict error. Instead, we need to wrap all such places in our code with RetryOnConflict

I've already seen cases running e2e tests where we get errors from conflicts.

Change repository permissions

I had a badly set up .git/config and by mistake pushed into upstream / master.

Can we change repo settings so that direct pushes are always declined?

how to set MERGE_JSON_LOG for fluentd

Hi,
I'm trying to set the MERGE_JSON_LOG=true while preserving the ManagementState==Manage (unlike the solution from the docs).

Target (DaemonSet autogenerated by the cluster-logging-operator)

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: fluentd
spec:
  template:
    spec:
      containers:
        - resources:
          name: fluentd
          env:
            - name: MERGE_JSON_LOG
              value: 'true'

Current State
I've got a clusterlogging.4.3.1 operator up and running and every (manual) manipulation in the DaemonSet is instantly reverted.
generators/forwarding/fluentd/templates.go#L167 looks like what I wanted to do, but I cant figure out how to pass the json_fields/ENV correctly.

My CL-instance.yaml looks like this:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  managementState: Managed
  collection:
    logs:
      type: fluentd
      fluentd:
        merge_json_log: true          # <----  that's what I'd like to do

Any hint appreciated!
Best, Nick

Switching the logCollection type from fluentd to rsyslog does not delete fluentd pods.

How to reproduce the issue.

Original pods:

NAME                                                  READY     STATUS    RESTARTS   AGE
cluster-logging-operator-5b8f47b598-lh7zp             1/1       Running   0          45m
elasticsearch-clientdatamaster-0-1-84d764899d-qqkqv   1/1       Running   0          44m
elasticsearch-operator-649f9b69b5-6wkkj               1/1       Running   0          45m
fluentd-82pp7                                         1/1       Running   0          44m
fluentd-sdlth                                         1/1       Running   0          44m
kibana-675b587dfd-l5s5j                               2/2       Running   0          44m

oc edit clusterlogging example - change the spec.collection.logCollection.type to "rsyslog"

Rsyslog pods are created, but still the fluentd pods are running.

NAME                                                  READY     STATUS    RESTARTS   AGE
cluster-logging-operator-5b8f47b598-lh7zp             1/1       Running   0          46m
elasticsearch-clientdatamaster-0-1-84d764899d-qqkqv   1/1       Running   0          45m
elasticsearch-operator-649f9b69b5-6wkkj               1/1       Running   0          45m
fluentd-82pp7                                         1/1       Running   0          45m
fluentd-sdlth                                         1/1       Running   0          45m
kibana-675b587dfd-l5s5j                               2/2       Running   0          45m
rsyslog-92hz4                                         1/1       Running   0          42s
rsyslog-hcpcc                                         1/1       Running   0          42s

Assign a priority class to pods

Priority classes docs:
https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

Example: https://github.com/openshift/cluster-monitoring-operator/search?q=priority&unscoped_q=priority

Notes: The pre-configured system priority classes (system-node-critical and system-cluster-critical) can only be assigned to pods in kube-system or openshift-* namespaces. Most likely, core operators and their pods should be assigned system-cluster-critical. Please do not assign system-node-critical (the highest priority) unless you are really sure about it.

Wrong indentation for the field - metadata within hack/cr-aws.yaml

Issue

The METADATA field is not well positioned within the file hack/cr-aws.yaml

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
  metadata:
  name: "instance"

Should be

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"

Feature Request: Allow list of ElasticSearch endpoints in Log Forwarding API

Hello,

in Openshift 4.5 I would like to send the logs collected by fluentd to multiple elasticsearch endpoints. My ELK administrator sent me a list of three ES nodes, to which fluentd is supposed to send the logs. There is no loadbalancer in front of ES.

In the LogForwarding CR you can only specify one FQDN for each endpoint:

oc describe crd logforwardings.logging.openshift.io
[..]
Schema:
openAPIV3Schema:
Properties:
Spec:
Description: Specification for logforwarding of messages
Properties:
Outputs:
Description: Destinations for log messages
Items:
Description: An individual entry for a specific output
Properties:
Endpoint:
Description: the url to the the service defined by this output
=> Type: string
Name:
Description: The name of the output
Type: string
[..]

I would like to specify a list of hosts, and fluentd chooses one of them.
Fluentd accepts a list of hosts (https://docs.fluentd.org/output/elasticsearch#hosts-optional)

Thanks, Thomas

Reconsider how collector nodeSelector is defined

The recent submission of a PR to support rsyslog has made me question why we have not taken the opportunity in 4.0 to possibly change our node selector for the collector. Consider the following;
Ref: https://github.com/openshift/cluster-logging-operator/blob/master/hack/cr.yaml

...
  collection:
    logCollection:
      type: "fluentd"
      fluentd:
        nodeSelector:
          logging-infra-fluentd: "true"
      rsyslog:
        nodeSelector:
          logging-infra-rsyslog: "true"

Wouldn't it be more reasonable to have a single label which takes a collector:

logging-infra-collector: "fluentd | rsyslog"

Additionally, moving to a 'well-known-label'='known value' pattern would allow us to remove the nodeselector block all together. I do not see a need for a customer to have to modify the selector

Either way allows us to split collectors across nodes, but the later would allow us to add additional collectors without requiring an additional label.

  • Why did we decide to continue to use a boolean to identify which nodes will receive the collector?
  • Additionally, node labeling is now outside the responsibility of this operator correct? What is the workflow to get alternate collectors landed on nodes?
  • One could additionally argue that eventrouter is a specialized collector. Is there a need to support a variant of landing multiple collectors on the same node?

Invalid fields in Logging Cluster Components status

When the Logging Cluster is installed properly and I am able to see the logs. But when I try to see the statuses of the different components, I see that the messages seen in the ScreenCapture below. The Screen Capture is from OpenShift 4.3 but I saw the same in 4.5 also.

After digging into it, I saw that the path being accessed in the CSV to check the status is incorrect in this file.
The below path are arrays -

- visualization.kibanaStatus
- logStore.elasticsearchStatus
- logStore.elasticsearchStatus
- logStore.elasticsearchStatus

But they are being referred at they are single objects.

- visualization.kibanaStatus.pods
- logStore.elasticsearchStatus.pods.client
- logStore.elasticsearchStatus.pods.data
- logStore.elasticsearchStatus.pods.master

Rather they should have been as below -

- visualization.kibanaStatus[0].pods
- logStore.elasticsearchStatus[0].pods.client
- logStore.elasticsearchStatus[0].pods.data
- logStore.elasticsearchStatus[0].pods.master

I saw this issue in 4.2, 4.3, 4.4 and 4.5.
It's just a small change is there which needs to be done.
I plan to create a PR for 4.5,

The Screen Capture mentioned above -

Invalid_field_clusterlogging

PersistentVolumeClaim does not seem like a viable storage choice

Based on the description in [1] repeated here, this does not seem like a viable choice to represent the ES node storage. The description would indicate you can only specify a single, existing PVC which is not usable for anything but a single node ES cluster.

 78     // PersistentVolumeClaim will NOT try to regenerate PVC, it will be used
 79     // as-is. You may want to use it instead of VolumeClaimTemplate in case
 80     // you already have bounded PersistentVolumeClaims you want to use, and the names
 81     // of these PersistentVolumeClaims doesn't follow the naming convention.
 82     PersistentVolumeClaim *v1.PersistentVolumeClaimVolumeSource `json:"persistentVolumeClaim,omitempty"`

Additionally, we provide no mechanism to specify or default a storage class. This seems like an issue given I thought storageClass was the convenient way to define storage in a single representation of kind, size, etc.

[1] https://github.com/openshift/elasticsearch-operator/blob/master/pkg/apis/elasticsearch/v1alpha1/types.go#L78-L82

reconciliation resulting in certificate issue and causing elasticsearch unstable.

While using Openshift 4.3.5, Cluster Logging Operator : Cluster logging Operator: 4.3.9-202003230345 , we starts getting below error after around 24 hours related to certificate and elastic search pods become down. It seems to be some bug in operator which is not able to reconcile or rotate the certificate related to elastic search and kibana ?

{"level":"error","ts":1586146160.3927138,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"logforwarding-controller","request":"openshift-logging/instance","error":"Unable to create or update certificates for "instance": Error running script: exit status 1","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Error deploying an instance of Cluster-Logging regarding ES SearchGuard

The error that I've faced is regarding an ElasticSearch one with SearchGuard, cannot be initialized and the cluster stays on RED and does not self recover:

  • Context:
OCP 4.4
OCS 4.3
ELO 4.4
CLO 4.4
  • Fluentd
2020-01-28 09:12:02 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:12:02 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 14. Retry to communicate after 2 second(s).
2020-01-28 09:12:07 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:12:07 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 13. Retry to communicate after 4 second(s).
2020-01-28 09:12:16 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:12:16 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 12. Retry to communicate after 8 second(s).
2020-01-28 09:12:33 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:12:33 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 11. Retry to communicate after 16 second(s).
2020-01-28 09:13:05 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:13:05 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 10. Retry to communicate after 32 second(s).
2020-01-28 09:14:10 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.166.250:9200 (Errno::ECONNREFUSED)
2020-01-28 09:14:10 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 9. Retry to communicate after 64 second(s).
2020-01-28 09:14:15 +0000 [error]: unexpected error error_class=Elasticsearch::Transport::Transport::Errors::ServiceUnavailable error="[503] Search Guard not initialized (SG11). See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md"
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:333:in `perform_request'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/http/faraday.rb:24:in `perform_request'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/client.rb:152:in `perform_request'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-api-7.4.0/lib/elasticsearch/api/actions/info.rb:19:in `info'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:394:in `detect_es_major_version'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:264:in `block in configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/elasticsearch_index_template.rb:35:in `retry_operate'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:263:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:74:in `block in configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `each'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/out_copy.rb:36:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:130:in `add_match'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:72:in `block in configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `each'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/label.rb:31:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `block in configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `each'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:131:in `configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:96:in `run_configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:812:in `run_configure'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:558:in `block in run_worker'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:741:in `main_process'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:554:in `run_worker'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/command/fluentd.rb:330:in `<top (required)>'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/bin/fluentd:8:in `<top (required)>'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `load'
  2020-01-28 09:14:15 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `<main>'

On ElasticSearch I see that the cluster is in RED state but appears as Running and Ready on Openshift:

NAME                                                                READY      STATUS             RESTARTS   AGE
cluster-logging-operator-667799d786-z4nkh	          1/1     Running            0          40m
elasticsearch-cdm-nwjeo1ix-1-68855bd9b-qr6d4     2/2     Running            0          36m
elasticsearch-cdm-nwjeo1ix-2-5f5866dd47-hjwfg     2/2     Running            0          36m
elasticsearch-cdm-nwjeo1ix-3-7c75876bd4-7zpc6   2/2     Running            0          36m
fluentd-58g69                                                            0/1     CrashLoopBackOff   7          13m
fluentd-c77vw                                                            0/1     CrashLoopBackOff   7          13m
fluentd-ljwxz                                                               0/1     CrashLoopBackOff   7          13m
fluentd-v7tgt                                                               0/1     CrashLoopBackOff   7          13m
fluentd-v7tgt-debug                                                    1/1     Running            0          10m
kibana-69c75d9cd9-lcljp                                             2/2     Running            0          36m

This is the first time that you deploys an instance, when you kills Fluentd pods, it will be in the same status, the real error is in ElasticSearch, for any reason cannot initialize SearchGuard.

To workaround the error, you just need to delete the ELS pods and with time the cluster will reach the Green state:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2020-01-28 10:33:50,205][INFO ][container.run            ] Elasticsearch is ready and listening
/usr/share/elasticsearch/init ~
[2020-01-28 10:33:50,211][INFO ][container.run            ] Starting init script: 0001-jaeger
[2020-01-28 10:33:50,213][INFO ][container.run            ] Completed init script: 0001-jaeger
[2020-01-28 10:33:50,251][INFO ][container.run            ] Forcing the seeding of ACL documents
[2020-01-28 10:33:50,254][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
[2020-01-28 10:33:50,286][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Contacting elasticsearch cluster 'elasticsearch' ...
Clustername: elasticsearch
Clusterstate: RED
Number of nodes: 3
Number of data nodes: 3
.searchguard index already exists, so we do not need to create one.
ERR: .searchguard index state is RED.
Populate config from /opt/app-root/src/sgconfig/
Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml
   SUCC: Configuration for 'config' created or updated
Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml
   SUCC: Configuration for 'roles' created or updated
Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml
   SUCC: Configuration for 'rolesmapping' created or updated
Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml
   SUCC: Configuration for 'internalusers' created or updated
Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml
   SUCC: Configuration for 'actiongroups' created or updated
Done with success
/usr/share/elasticsearch/init
[2020-01-28 10:34:25,709][INFO ][container.run            ] Seeded the searchguard ACL index
[2020-01-28 10:34:25,710][INFO ][container.run            ] Disabling auto replication
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Auto-expand replicas disabled
/usr/share/elasticsearch/init
[2020-01-28 10:34:39,568][INFO ][container.run            ] Updating replica count to 1
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
...
...
  • After kill ELS pods and waiting a bit you could see in the ELS logs this entry which seems that the recovery happens:
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Update number of replicas to 1 with result: true
/usr/share/elasticsearch/init
[2020-01-28 10:34:45,420][INFO ][container.run            ] Adding index templates
[2020-01-28 10:34:45,608][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:35:15,288][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-orphaned.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:35:45,330][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:35:50,283][INFO ][container.run            ] Index template 'common.settings.kibana.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:00,286][INFO ][container.run            ] Index template 'common.settings.operations.orphaned.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:00,585][INFO ][container.run            ] Index template 'common.settings.operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:00,868][INFO ][container.run            ] Index template 'common.settings.project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:01,151][INFO ][container.run            ] Index template 'jaeger-service.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:01,435][INFO ][container.run            ] Index template 'jaeger-span.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:01,729][INFO ][container.run            ] Index template 'org.ovirt.viaq-collectd.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-01-28 10:36:01,935][INFO ][container.run            ] Finished adding index templates
[2020-01-28 10:36:01,940][INFO ][container.run            ] Starting init script: 0500-remove-index-patterns-without-uid
[2020-01-28 10:36:02,090][INFO ][container.run            ] Found 0 index-patterns to evaluate for removal
[2020-01-28 10:36:02,090][INFO ][container.run            ] Completed init script: 0500-remove-index-patterns-without-uid with 0 successful and 0 failed bulk requests
[2020-01-28 10:36:02,094][INFO ][container.run            ] Starting init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-01-28 10:36:02,255][INFO ][container.run            ] Found 0 index-patterns to remove
[2020-01-28 10:36:02,441][INFO ][container.run            ] Completed init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-01-28 10:36:02,446][INFO ][container.run            ] Starting init script: 0520-bz1658632-remove-old-sg-indices
[2020-01-28 10:36:02,740][WARN ][container.run            ] Found .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-01-28 10:36:02,741][INFO ][container.run            ] Updating .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-01-28 10:36:02,899][INFO ][container.run            ] Completed init script: 0520-bz1658632-remove-old-sg-indices
[2020-01-28 10:36:02,903][INFO ][container.run            ] Starting init script: 0530-bz1667801-fix-kibana-replica-shards
[2020-01-28 10:36:03,042][INFO ][container.run            ] Found 0 Kibana indices with replica count not equal to 1
[2020-01-28 10:36:03,042][INFO ][container.run            ] Completed init script: 0530-bz1667801-fix-kibana-replica-shards

Missing PriorityClass

I'm trying to get a simple CR deployed, but I'm facing the following error:

ERRO[0031] error syncing key (openshift-logging/example): Unable to create or update collection: Failure creating Collection priority class: failed to get resource client: failed to get resource type: failed to get the resource REST mapping for GroupVersionKind(scheduling.k8s.io/v1beta1, Kind=PriorityClass): no matches for kind "PriorityClass" in version "scheduling.k8s.io/v1beta1" 

Here are the steps I'm following, based on the readme and trial and error.

$ minishift version
v1.25.0+90fb23e
$ minishift start --cpus 2 --memory 8192
$ oc login -u system:admin
$ oc create -f manifests/01-namespace.yaml
$ oc project openshift-logging
$ make deploy-setup
$ REPO_PREFIX=openshift/ \
    IMAGE_PREFIX=origin- \
    OPERATOR_NAME=cluster-logging-operator \
    WATCH_NAMESPACE=openshift-logging \
    KUBERNETES_CONFIG=~/.kube/config \
    ELASTICSEARCH_IMAGE=docker.io/openshift/origin-logging-elasticsearch5:latest \
    OAUTH_PROXY_IMAGE=docker.io/openshift/oauth-proxy:latest \
    KIBANA_IMAGE=docker.io/openshift/origin-logging-kibana5:latest \
    CURATOR_IMAGE=docker.io/openshift/origin-logging-curator5:latest \
    FLUENTD_IMAGE=docker.io/openshift/origin-logging-fluentd:latest \
    go run cmd/cluster-logging-operator/main.go
$ oc apply -f hack/cr.yaml

What am I doing wrong?

Schema change introduced parsing error

Doesn't tell where the error is in the code, unfortunately.

ERRO[0000] error syncing key (openshift-logging/example): failed to decode json data with gvk(logging.openshift.io/v1alpha1, Kind=ClusterLogging): v1alpha1.ClusterLogging.Spec: v1alpha1.ClusterLoggingSpec.Collection: v1alpha1.CollectionSpec.LogCollection: v1alpha1.LogCollectionSpec.FluentdSpec: v1alpha1.FluentdSpec.NodeSelector: ReadMapCB: expect { or n, but found ", error found in #10 byte of ...|elector":"logging-in|..., bigger context ...|ion":{"logCollection":{"fluentd":{"nodeSelector":"logging-infra-fluentd=true"},"type":"fluentd"}},"c|... 

Error deploying ES: Regarding SearchGuard

The error that I've faced is regarding an ElasticSearch , cannot be initialized and the cluster stays on RED and does not self recover. OCP version is 4.4.

  • oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-598b875dfc-mmtp4       1/1     Running            2          3d12h
elasticsearch-cdm-85u334ts-1-5dd99bb9-p6lz6     2/2     Running            0          13m
elasticsearch-cdm-85u334ts-2-dbdc7d9d5-z7chm    2/2     Running            0          12m
elasticsearch-cdm-85u334ts-3-5744fbfd4b-4zxw6   2/2     Running            0          12m
fluentd-825zp                                   0/1     CrashLoopBackOff   7          14m
fluentd-8djwz                                   0/1     CrashLoopBackOff   7          14m
fluentd-crrqz                                   0/1     CrashLoopBackOff   7          14m
fluentd-dzqm6                                   0/1     CrashLoopBackOff   7          14m
fluentd-kmwn7                                   0/1     CrashLoopBackOff   7          14m
fluentd-ph2rh                                   0/1     CrashLoopBackOff   7          14m
fluentd-px7kz                                   0/1     CrashLoopBackOff   7          14m
kibana-6c4b5d7c8d-nqqzc                         2/2     Running            0          45m
  • fluentd
2020-06-15 08:46:27 +0000 [error]: unexpected error error_class=Elasticsearch::Transport::Transport::Errors::ServiceUnavailable error="[503] Search Guard not initialized (SG11). See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md"
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:333:in `perform_request'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/http/faraday.rb:24:in `perform_request'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/client.rb:152:in `perform_request'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-api-7.4.0/lib/elasticsearch/api/actions/info.rb:19:in `info'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:394:in `detect_es_major_version'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:264:in `block in configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/elasticsearch_index_template.rb:35:in `retry_operate'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:263:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:74:in `block in configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `each'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/out_copy.rb:36:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:130:in `add_match'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:72:in `block in configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `each'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/label.rb:31:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `block in configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `each'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:131:in `configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:96:in `run_configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:812:in `run_configure'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:558:in `block in run_worker'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:741:in `main_process'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:554:in `run_worker'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/command/fluentd.rb:330:in `<top (required)>'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/bin/fluentd:8:in `<top (required)>'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `load'
  2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `<main>'
  • oc logs -f $ESPod -c elasticsearch:
[2020-06-15 08:31:52,097][INFO ][container.run            ] Elasticsearch is ready and listening
/usr/share/elasticsearch/init ~
[2020-06-15 08:31:52,114][INFO ][container.run            ] Starting init script: 0001-jaeger
[2020-06-15 08:31:52,116][INFO ][container.run            ] Completed init script: 0001-jaeger
[2020-06-15 08:31:52,160][INFO ][container.run            ] Forcing the seeding of ACL documents
[2020-06-15 08:31:52,164][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
[2020-06-15 08:31:52,204][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Contacting elasticsearch cluster 'elasticsearch' ...
Clustername: elasticsearch
Clusterstate: RED
Number of nodes: 3
Number of data nodes: 3
.searchguard index already exists, so we do not need to create one.
ERR: .searchguard index state is RED.
Populate config from /opt/app-root/src/sgconfig/
Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml
   FAIL: Configuration for 'config' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][config][0], source[{"config":"....................eXBlIjoibm9vcCJ9fX19fX0="}]}] and a refresh]]
Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml
   FAIL: Configuration for 'roles' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][roles][0], source[{"roles":"..........kaWNlczphZG1pbi9nZXQqIl19fX19"}]}] and a refresh]]
Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml
   FAIL: Configuration for 'rolesmapping' failed because of **UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing** [index {[.searchguard][rolesmapping][0], source[{"rolesmapping":"..........sImJhY2tlbmRyb2xlcyI6WyJqYWVnZXIiXX19"}]}] and a refresh]]
Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml
   FAIL: Configuration for 'internalusers' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][internalusers][0], source[{"internalusers":"eyJETFdaUmhRTSI6eyJoYXNoIjoiT2tEcnBIdnVwS0x0d1Q3aDAwdWsifX0="}]}] and a refresh]]
Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml
   FAIL: Configuration for 'actiongroups' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][actiongroups][0], source[n/a, actual length: [2.8kb], max length: 2kb]}] and a refresh]]
null
null
null
Done with failures
/usr/share/elasticsearch/init
[2020-06-15 08:37:55,055][INFO ][container.run            ] Seeded the searchguard ACL index
[2020-06-15 08:37:55,055][INFO ][container.run            ] Disabling auto replication
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Auto-expand replicas disabled
/usr/share/elasticsearch/init
[2020-06-15 08:38:57,990][INFO ][container.run            ] Updating replica count to 0
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Update number of replicas to 0 with result: true
/usr/share/elasticsearch/init
[2020-06-15 08:40:00,688][INFO ][container.run            ] Adding index templates
[2020-06-15 08:40:00,769][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,195][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-orphaned.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,424][INFO ][container.run            ] Index template 'com.redhat.viaq-openshift-project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,665][INFO ][container.run            ] Index template 'common.settings.kibana.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,837][INFO ][container.run            ] Index template 'common.settings.operations.orphaned.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,018][INFO ][container.run            ] Index template 'common.settings.operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,187][INFO ][container.run            ] Index template 'common.settings.project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,351][INFO ][container.run            ] Index template 'jaeger-service.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,520][INFO ][container.run            ] Index template 'jaeger-span.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,693][INFO ][container.run            ] Index template 'org.ovirt.viaq-collectd.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,841][INFO ][container.run            ] Finished adding index templates
[2020-06-15 08:40:02,846][INFO ][container.run            ] Starting init script: 0500-remove-index-patterns-without-uid
[2020-06-15 08:40:02,940][INFO ][container.run            ] Found 0 index-patterns to evaluate for removal
[2020-06-15 08:40:02,941][INFO ][container.run            ] Completed init script: 0500-remove-index-patterns-without-uid with 0 successful and 0 failed bulk requests
[2020-06-15 08:40:02,945][INFO ][container.run            ] Starting init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-06-15 08:40:03,025][INFO ][container.run            ] Found 0 index-patterns to remove
[2020-06-15 08:40:03,126][INFO ][container.run            ] Completed init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-06-15 08:40:03,131][INFO ][container.run            ] Starting init script: 0520-bz1658632-remove-old-sg-indices
[2020-06-15 08:40:03,303][WARN ][container.run            ] Found .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-06-15 08:40:03,305][INFO ][container.run            ] Updating .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-06-15 08:40:03,419][INFO ][container.run            ] Completed init script: 0520-bz1658632-remove-old-sg-indices
[2020-06-15 08:40:03,423][INFO ][container.run            ] Starting init script: 0530-bz1667801-fix-kibana-replica-shards
[2020-06-15 08:40:03,493][INFO ][container.run            ] Found 0 Kibana indices with replica count not equal to 0
[2020-06-15 08:40:03,494][INFO ][container.run            ] Completed init script: 0530-bz1667801-fix-kibana-replica-shards
  • CLO instance yaml
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    elasticsearch:
      nodeCount: 3
      resources:
        limits:
          memory: "4Gi"
        requests:
          cpu: "1"
          memory: "4Gi"
      storage:
        storageClassName: nfs-storage-provisioner
        size: 40Gi      
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  curation:
    type: "curator"  
    curator:
      schedule: "30 3 * * *"
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.