The error that I've faced is regarding an ElasticSearch , cannot be initialized and the cluster stays on RED and does not self recover. OCP version is 4.4.
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-598b875dfc-mmtp4 1/1 Running 2 3d12h
elasticsearch-cdm-85u334ts-1-5dd99bb9-p6lz6 2/2 Running 0 13m
elasticsearch-cdm-85u334ts-2-dbdc7d9d5-z7chm 2/2 Running 0 12m
elasticsearch-cdm-85u334ts-3-5744fbfd4b-4zxw6 2/2 Running 0 12m
fluentd-825zp 0/1 CrashLoopBackOff 7 14m
fluentd-8djwz 0/1 CrashLoopBackOff 7 14m
fluentd-crrqz 0/1 CrashLoopBackOff 7 14m
fluentd-dzqm6 0/1 CrashLoopBackOff 7 14m
fluentd-kmwn7 0/1 CrashLoopBackOff 7 14m
fluentd-ph2rh 0/1 CrashLoopBackOff 7 14m
fluentd-px7kz 0/1 CrashLoopBackOff 7 14m
kibana-6c4b5d7c8d-nqqzc 2/2 Running 0 45m
2020-06-15 08:46:27 +0000 [error]: unexpected error error_class=Elasticsearch::Transport::Transport::Errors::ServiceUnavailable error="[503] Search Guard not initialized (SG11). See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md"
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:333:in `perform_request'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/http/faraday.rb:24:in `perform_request'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/client.rb:152:in `perform_request'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/elasticsearch-api-7.4.0/lib/elasticsearch/api/actions/info.rb:19:in `info'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:394:in `detect_es_major_version'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:264:in `block in configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/elasticsearch_index_template.rb:35:in `retry_operate'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.7.1/lib/fluent/plugin/out_elasticsearch.rb:263:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:74:in `block in configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `each'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/multi_output.rb:63:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/out_copy.rb:36:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin.rb:164:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:130:in `add_match'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:72:in `block in configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `each'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/agent.rb:64:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/label.rb:31:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `block in configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `each'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/root_agent.rb:147:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:131:in `configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/engine.rb:96:in `run_configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:812:in `run_configure'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:558:in `block in run_worker'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:741:in `main_process'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/supervisor.rb:554:in `run_worker'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/command/fluentd.rb:330:in `<top (required)>'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:59:in `require'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/bin/fluentd:8:in `<top (required)>'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `load'
2020-06-15 08:46:27 +0000 [error]: /opt/rh/rh-ruby25/root/usr/local/bin/fluentd:23:in `<main>'
[2020-06-15 08:31:52,097][INFO ][container.run ] Elasticsearch is ready and listening
/usr/share/elasticsearch/init ~
[2020-06-15 08:31:52,114][INFO ][container.run ] Starting init script: 0001-jaeger
[2020-06-15 08:31:52,116][INFO ][container.run ] Completed init script: 0001-jaeger
[2020-06-15 08:31:52,160][INFO ][container.run ] Forcing the seeding of ACL documents
[2020-06-15 08:31:52,164][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds.
[2020-06-15 08:31:52,204][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds.
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Contacting elasticsearch cluster 'elasticsearch' ...
Clustername: elasticsearch
Clusterstate: RED
Number of nodes: 3
Number of data nodes: 3
.searchguard index already exists, so we do not need to create one.
ERR: .searchguard index state is RED.
Populate config from /opt/app-root/src/sgconfig/
Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml
FAIL: Configuration for 'config' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][config][0], source[{"config":"....................eXBlIjoibm9vcCJ9fX19fX0="}]}] and a refresh]]
Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml
FAIL: Configuration for 'roles' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][roles][0], source[{"roles":"..........kaWNlczphZG1pbi9nZXQqIl19fX19"}]}] and a refresh]]
Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml
FAIL: Configuration for 'rolesmapping' failed because of **UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing** [index {[.searchguard][rolesmapping][0], source[{"rolesmapping":"..........sImJhY2tlbmRyb2xlcyI6WyJqYWVnZXIiXX19"}]}] and a refresh]]
Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml
FAIL: Configuration for 'internalusers' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][internalusers][0], source[{"internalusers":"eyJETFdaUmhRTSI6eyJoYXNoIjoiT2tEcnBIdnVwS0x0d1Q3aDAwdWsifX0="}]}] and a refresh]]
Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml
FAIL: Configuration for 'actiongroups' failed because of UnavailableShardsException[[.searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.searchguard][0]] containing [index {[.searchguard][actiongroups][0], source[n/a, actual length: [2.8kb], max length: 2kb]}] and a refresh]]
null
null
null
Done with failures
/usr/share/elasticsearch/init
[2020-06-15 08:37:55,055][INFO ][container.run ] Seeded the searchguard ACL index
[2020-06-15 08:37:55,055][INFO ][container.run ] Disabling auto replication
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Auto-expand replicas disabled
/usr/share/elasticsearch/init
[2020-06-15 08:38:57,990][INFO ][container.run ] Updating replica count to 0
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.16
Search Guard Version: <unknown>
Reload config on all nodes
Update number of replicas to 0 with result: true
/usr/share/elasticsearch/init
[2020-06-15 08:40:00,688][INFO ][container.run ] Adding index templates
[2020-06-15 08:40:00,769][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,195][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-orphaned.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,424][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,665][INFO ][container.run ] Index template 'common.settings.kibana.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:01,837][INFO ][container.run ] Index template 'common.settings.operations.orphaned.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,018][INFO ][container.run ] Index template 'common.settings.operations.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,187][INFO ][container.run ] Index template 'common.settings.project.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,351][INFO ][container.run ] Index template 'jaeger-service.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,520][INFO ][container.run ] Index template 'jaeger-span.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,693][INFO ][container.run ] Index template 'org.ovirt.viaq-collectd.template.json' found in the cluster, overriding it
{"acknowledged":true}[2020-06-15 08:40:02,841][INFO ][container.run ] Finished adding index templates
[2020-06-15 08:40:02,846][INFO ][container.run ] Starting init script: 0500-remove-index-patterns-without-uid
[2020-06-15 08:40:02,940][INFO ][container.run ] Found 0 index-patterns to evaluate for removal
[2020-06-15 08:40:02,941][INFO ][container.run ] Completed init script: 0500-remove-index-patterns-without-uid with 0 successful and 0 failed bulk requests
[2020-06-15 08:40:02,945][INFO ][container.run ] Starting init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-06-15 08:40:03,025][INFO ][container.run ] Found 0 index-patterns to remove
[2020-06-15 08:40:03,126][INFO ][container.run ] Completed init script: 0510-bz1656086-remove-index-patterns-with-bad-title
[2020-06-15 08:40:03,131][INFO ][container.run ] Starting init script: 0520-bz1658632-remove-old-sg-indices
[2020-06-15 08:40:03,303][WARN ][container.run ] Found .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-06-15 08:40:03,305][INFO ][container.run ] Updating .searchguard setting 'index.routing.allocation.include._name' to be null
[2020-06-15 08:40:03,419][INFO ][container.run ] Completed init script: 0520-bz1658632-remove-old-sg-indices
[2020-06-15 08:40:03,423][INFO ][container.run ] Starting init script: 0530-bz1667801-fix-kibana-replica-shards
[2020-06-15 08:40:03,493][INFO ][container.run ] Found 0 Kibana indices with replica count not equal to 0
[2020-06-15 08:40:03,494][INFO ][container.run ] Completed init script: 0530-bz1667801-fix-kibana-replica-shards
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 3
resources:
limits:
memory: "4Gi"
requests:
cpu: "1"
memory: "4Gi"
storage:
storageClassName: nfs-storage-provisioner
size: 40Gi
visualization:
type: "kibana"
kibana:
replicas: 1
curation:
type: "curator"
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}