fluent / fluent-bit-kubernetes-logging Goto Github PK

View Code? Open in Web Editor NEW

468.0 468.0 248.0 114 KB

Fluent Bit Kubernetes Daemonset

License: Apache License 2.0

fluent-bit-kubernetes-logging's Introduction

fluent

fluent-bit-kubernetes-logging's People

Contributors

Stargazers

Watchers

Forkers

leahnp gganssauge alexandernilsson yolean martinyunify adalrsjr1 rcm7 sky-mah96 globegitter ixai andesyudanto amerine victoru ruslanfirefly wkruse abhilash1a2b cordoval walktall loburm appcoreopc levlas psoders taylorshaulis iortizferia simonswine rporres akashkrdutta tarkalabs dhruvjain51 gitobjects wirelessjeano amit-uc sullivanchan yubobo akshaydubey29 sangjaerijae wangkezun misterniall xiaolang098 prashantvicky 40a sridharc20 blixtra toreshonda alyragab puremad ashokreddy-gh rhiza c0rmichels nik786 hzduo1988 pawarrchetan ps-stuartzahn hobti01 goodmoney dlipovetsky ignatev xncbf raoulbill thearifismail hbo2 wudongyin truongnguyenptit drmavenrebe ansfan encircled jordo1138 guardjenny pelashchoudhary dalavancloud pelash10 simonwh unnamed44 hklohar simonsu-ayla peihsinsu julienvincent imransysg goulahyane kylercai rmacian amulyamalla oded-b jlon hwting atanass jchammons dangvv-teko nlamirault joeltimothyoh gamichalski pathcl ratnamkamila trquoccuong mohans07 ezc zhuqiyang florinpeter allanhung jeffwan

fluent-bit-kubernetes-logging's Issues

Certificate verification failed, e.g. CRL, CA or signature check

In a normal config, I keep getting this error, I have no idea why. I have applied all the relevant configuration.

[2019/04/08 17:01:31] [error] [io_tls] flb_io_tls.c:304 X509 - Certificate verification failed, e.g. CRL, CA or signature check
[2019/04/08 17:01:31] [error] [filter_kube] upstream connection error

I had to configure with tls.verify Off to make it work, now I get [2019/04/08 17:03:01] [ info] [filter_kube] API server connectivity OK.

Installation fails on k8s 1.22

As of 1.22 rbac.authorization.k8s.io/v1beta1 is removed, instead rbac.authorization.k8s.io/v1 must be used: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#rbac-resources-v122

Affected files:

fluent-bit-role.yaml
fluent-bit-role-binding.yaml

Load Balancing ES nodes

based on this document https://fluentbit.io/documentation/0.13/output/elasticsearch.html it appears that I fluent-bit does not support outputting to multiple ES nodes, can support for this be added as my ES cluster is outside of k8s.

Missing required field "selector" in DaemonSet spec for Elasticsearch

After changing the apiVersion value to apps/v1, now I have an issue with the missing selector for the spec section.
What should the value be?
I've tried with:

  selector:
    matchLabels:
      name: fluent-bit

but it does not work.

Add Archive notice

As this repo is not fully maintained, we should point users to the Helm charts which have frequent releases timed with new Fluent Bit versions. We can also add some walkthrough documentation if users would like to learn more

[error] [sqldb] cannot open database /var/log/flb_kube.db

Bug Report

Describe the bug
Openshift 4
CoreOS cri-o
fluent-bit v1.8.10 deployed in logging namespace as HOWTO.
Role, Rolebinding and serviceaccount created as HOWTO. Deploying daemonset and following errormessage occours.

To Reproduce

Rubular link if applicable:
Example log message if applicable:

�[1mFluent Bit v1.8.10�[0m
* �[1m�[93mCopyright (C) 2019-2021 The Fluent Bit Authors�[0m
* �[1m�[93mCopyright (C) 2015-2018 Treasure Data�[0m
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/03/25 09:51:42] [ info] [engine] started (pid=1)
[2022/03/25 09:51:42] [ info] [storage] version=1.1.5, initializing...
[2022/03/25 09:51:42] [ info] [storage] in-memory
[2022/03/25 09:51:42] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/03/25 09:51:42] [ info] [cmetrics] version=0.2.2
[2022/03/25 09:51:42] [error] [sqldb] cannot open database /var/log/flb_kube.db
[2022/03/25 09:51:42] [error] [input:tail:tail.0] could not open/create database
[2022/03/25 09:51:42] [error] Failed initialize input tail.0
[2022/03/25 09:51:42] [error] [lib] backend failed

Steps to reproduce the problem:
Deploy daemonset

Expected behavior
Fluent-bit starts and transfer logs to "outside" elasticsearch solution.

Your Environment
Openshift 4
CoreOS cri-o
fluent-bit v1.8.10

Additional context
The only thing changed is Path since pod logs have a different path on CoreOS, like below. Where does fluent-bit want to create this database, is it in the filesystem on the nodes or on a mounted pvc? Since permissions to write on the nodes must likely is prohibited by permissions and security contexts.

[INPUT]
Name tail
Tag kube.*
Path /host/var/log/pods/*.log
Parser cri
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10

k8s日志落盘打到kafka可以自定义topic变量吗

为了实现不同的container日志存放在不同的topic上

How can i parser the log filed when i use the rancher logging (fluentd) to efk

ConfigMap is not working

error parsing https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml: error converting YAML to JSON: yaml: line 114: mapping values are not allowed in this context

unable to validate against any security context constraint

Hi I want to deploy fluent-bit as daemonset on all in one cluster. I have follow all the steps given on README.md file. but still not able to deploy the fluent-bit. It gives me
below error message.

Error Description:
Error creating: pods "fluent-bit-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]

docker version:
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:50:12 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:48:43 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683

oc version:
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.65.117.99:8443
kubernetes v1.11.0+d4cacc0

Please fix your k8s manifests

My team just came from the open source summit and was quite excited about what they heard about fluentbit running in a k8s. Reason enough for me to have a deeper look as the idea of enriched logs promises some very cool use-cases. BUT there are some very annoying issues within this repo:

The manifest in the docs and in this repo don't match
The image used here inside the daemonset-yaml is not the "official" image from docker hub
Image tags from "vanilla" fluentbit and for the fluentbit daemonset are out of sync. Actually the is no further need to push separate images anymore since you now provide the additional config for k8s within a configmap. It might also be a good idea to move this into the examples-section of the original fluentbit repository.
I really appreciate the usage of configmaps but you then should also mount it into your pods and indent it correctly
The name of the created service-account for RBAC-setup does not match the one consumed by the daemonset

But after fixing all those minor issues i was very pleased with the result, as the application itself does a great job :)

Log file is deleted but fluent-bit is still monitoring

Problem:
Log file is deleted but fluent-bit is still monitoring

Question：
How to fix it?

fluent-bit: error while loading shared libraries: librdkafka.so

$ kubectl create namespace logging
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/0.13-dev/output/elasticsearch/fluent-bit-configmap.yaml

After this getting error

/fluent-bit/bin/fluent-bit: error while loading shared libraries: librdkafka.so: cannot open shared object file: No such file or directory

And fluent-bit pod is creating but in CrashLoopBackOff

Can any one suggest how to resolve this

Document advantages over fluent-kubernetes-daemonset

This project looks like a very neat lightweight edition of the full-blown fluentd agent, just curious about specific use cases for choosing one over the other.

Not able to access fluent-bit metrics

Hi,

I want to deploy fluent-bit on OKD 3.11 all in one cluster. The deployment creates pod successfully and does show logs on pod console. But when I attempt to access metrics using curl or web browser it shows nothing. It ends up with an error message pod is not running or application is not running.

Replicate the issue:

Create service account

oc create -f fluentbit-sa.yaml

create role

oc create -f fluentbit-role.yaml

create role binding

oc create -f fluentbit-rb.yaml

create config map
oc create -f fluentbit-configmap.yaml
create application

oc new-app --docker-image=fluent/fluent-bit:latest -e FLUENT_ELASTICSEARCH_HOST=172.17.0.9 -e FLUENT_ELASTICSEARCH_PORT=9200

fluent-bit.zip

CPU and Memory Requests and Limits?

What would be a good set of CPU and memory requests and limits? For comparison, this is filebeat:

        resources:
          requests:
            cpu: 2m
            memory: 10Mi
          limits:
            cpu: 10m
            memory: 20Mi

I know that the documentation talks about Memory limits being dependent on the buffer amount.

So, if we impose a limit of 10MB for the input plugins and considering the worse case scenario of the output plugin consuming 20MB extra, as a minimum we need (30MB x 1.2) = 36MB.

Given the Mem_Buf_Limit 5MB, would we need 13 MB?
I have very little insight as to an appropriate CPU request and limit.

Fluent bit to elastic search on normal kubernetes cluster ?

There are two Fluent Bit to Elasticsearch on Minikube instructions and not for the normal kubernetes cluster ?

Best Practice Question: Multiple fluent-bit or single with multiple outputs?

We need to be able to output (with Retry_Limit false) to:

ElasticSearch
Kafka
File

If one output fails (say ElasticSearch is down), but the others succeed, what happens?

What I am really asking is what is your recommended best practice. I can either have multiple fluent-bits with a single output or a single fluent-bit with multiple outputs. Which would you recommend?

Need to include `LICENSE` on this repo

Need to include LICENSE on this repo.

Expect to be Apache License 2.0

as used on

and the similar fluentd-kubernetes-daemonset:

https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/LICENSE

outdated manifests don't allow creation of DaemonSet

According to kubectl changelog for 1.15.0:

The following APIs are no longer served by default:

All resources under apps/v1beta1 and apps/v1beta2 - use apps/v1 instead
daemonsets, deployments, replicasets resources under extensions/v1beta1 - use apps/v1 instead
networkpolicies resources under extensions/v1beta1 - use networking.k8s.io/v1 instead
podsecuritypolicies resources under extensions/v1beta1 - use policy/v1beta1 instead

Garbage appended to JSON response

Upgrading to 0.11.6 is not working for me. I get the following output, where a JSON can't be parsed because it has some garbage appended. Downgrading to 0.11.5 works correctly.

[2017/08/02 17:59:50] [ info] [engine] started
[2017/08/02 17:59:50] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2017/08/02 17:59:50] [ info] [filter_kube] local POD info OK
[2017/08/02 17:59:50] [ info] [filter_kube] testing connectivity with API server...
[2017/08/02 17:59:50] [ info] [filter_kube] API server connectivity OK
[2017/08/02 17:59:51] [error] [out_es] could not pack JSON response
{"took":52,"errors":false,"items":[{"index":{"_index":"container-logs-2017.08.02","_type":"flb_type","_id":"AV2kGpK9qYkcGJsH9A7f","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true,"status":201}},{"index":{"_index":"container-logs-2017.08.02","_type":"flb_type","_id":"AV2kGpK9qYkcGJsH9A7g","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true,"status":201}},{"index":{"_index":"container-logs-2017.08.02","_type":"flb_type","_id":"AV2kGpK9qYkcGJsH9A7h","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true,"status":201}},{"index":{"_index":"container-logs-2017.08.02","_type":"flb_type","_id":"AV2kGpK9qYkcGJsH9A7i","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true,"status":201}},{"index":{"_index":"container-logs-2017.08.02","_type":"flb_type","_id":"AV2kGpK9qYkcGJsH9�6c��"[�-#[� �D#[����"[�`c#[�`]#[�`l#[�` �"[��c�"[���"[�Pa�"[���"[�l�ר�I�ʝ��N���#[���#[����"[����"[������������"[���"[����x�BuJG�c�"[��?"[��@�"[��"[� "[�"[��?�����AA A@!!!#!Caa$aa��"�B�b���� �@�`���� �@"Bb���!�A�a������!!!!A�� �@�`���������!�Aa��!�A�a��AA!AAAaA�A�A�AA"ABAbA�A�A�!!"!B!Aa���"�B�a���#�Caa$aa$aa$��$�D�?�$���������$���_�$�����$�����aa$aa$aa$����#Cc���"�B�b������$Dd����#�C�c��������d���#�C�c����$�D�d���?���#�C�c�����������_���_�_��aa$�?[2017/08/02 17:59:51] [ warn] [out_es] Elasticsearch error

My configuration is standard, using elasticsearch as output

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: monitoring
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush          1
        Daemon         Off
        Log_Level      info
        Parsers_File   parsers.conf

    [INPUT]
        Name           tail
        Tag            kube.*
        Path           /var/log/containers/*.log
        Parser         docker
        DB             /var/log/flb_kube.db
        Mem_Buf_Limit  5MB

    [FILTER]
        Name           kubernetes
        Match          kube.*
        Kube_URL       https://kubernetes.default.svc:443
        Merge_JSON_Log On

    [OUTPUT]
        Name   es
        Match  *
        Host   ${FLUENT_ELASTICSEARCH_HOST}
        Port   ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Logstash_Prefix container-logs

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json-test
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: monitoring
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      tolerations:
      - operator: "Exists"
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit-kubernetes-daemonset:0.11.5
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.elk.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        resources:
          limits:
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        volumeMounts:
        - mountPath: /var/log
          name: varlog
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
          readOnly: true
        - mountPath: /fluent-bit/etc
          name: fluentbitetc
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentbitetc
        configMap:
          defaultMode: 420
          name: fluent-bit-config

My ElasticSearch deployment was created using quay.io/pires/docker-elasticsearch-kubernetes:5.5.0

Fluent-bit cannot send logs to elasticsearch in one environment but works fine in another.

Need help with how to locate the problem blocking logs push by fluent-bit containers to elasticsearch.

This setup works fine without any problems in one environment but does not in our staging environment, where it must succeed before moving to the production environment.

Setup

Kubernetes v1.11 (installed using RKE CLI with controlplan, etcd, and workers on separate nodes)
Elasticsearch v6.4.3 native install
Fluent-bit image: fluent/fluent-bit:0.14.6
Kibana v.6.4.2

The elasticsearch host is accessible from every node in the problem cluster. Fluent-bit containers can read logs but what happens after that is a mystery. Here is the docker log from one of the nodes:

docker logs 54b2ed96ca7f
Fluent-Bit v0.14.6
Copyright (C) Treasure Data

[2018/12/07 22:15:28] [ info] [engine] started (pid=1)
[2018/12/07 22:15:28] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/12/07 22:15:28] [ info] [filter_kube] local POD info OK
[2018/12/07 22:15:28] [ info] [filter_kube] testing connectivity with API server...
[2018/12/07 22:15:28] [ info] [filter_kube] API server connectivity OK
[2018/12/07 22:15:28] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

I don't know if it has any bearing, but I don't have permission on the system to check if port 2020 is available or not.

The /var/log/messages in the fluent-bit container on a node is flooded with messages like the following:

kernel: ipmi-sensors:61430 map pfn expected mapping type uncached-minus for [mem 0xbfee0000-0xbfee0fff], got write-back

Dec 7 22:44:37 , dockerd: time="2018-12-07T22:44:37.062465721Z" level=error msg="Error running exec in container: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown"

dockerd: time="2018-12-07T22:44:37.665307619Z" level=error msg="stream copy error: reading from a closed fifo"

Dec 7 22:24:39 dockerd: time="2018-12-07T22:24:39.310744098Z" level=error msg="Error running exec in container: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown"
Dec 7 22:24:39 dockerd: time="2018-12-07T22:24:39.424232019Z" level=error msg="stream copy error: reading from a closed fifo"
Dec 7 22:24:39 dockerd: time="2018-12-07T22:24:39.424235038Z" level=error msg="stream copy error: reading from a closed fifo"
Dec 7 22:25:01 systemd: Created slice User Slice of pcp.
Dec 7 22:25:01 systemd: Starting User Slice of pcp.
Dec 7 22:25:01 systemd: Started Session 45542 of user pcp.
Dec 7 22:25:01 systemd: Starting Session 45542 of user pcp.
Dec 7 22:25:01 systemd: Removed slice User Slice of pcp.
Dec 7 22:25:01 systemd: Stopping User Slice of pcp.
Dec 7 22:25:10 telegraf: 2018-12-07T22:25:10Z E! [outputs.influxdb]: when writing to : received error partial write: max-values-per-tag limit exceeded (100055/100000): measurement="net" tag="interface" value="<some_string>" dropped=1; discarding points
Dec 7 22:25:37 dockerd: time="2018-12-07T22:25:37.189532650Z" level=error msg="stream copy error: reading from a closed fifo"
Dec 7 22:25:37 dockerd: time="2018-12-07T22:25:37.189532758Z" level=error msg="stream copy error: reading from a closed fifo"
Dec 7 22:25:37 dockerd: time="2018-12-07T22:25:37.199774849Z" level=error msg="Error running exec in container: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown"

How to connect with elastic cloud?

I organized ek stack by elastic cloud.
But i can't found any field for elastic cloud user and password.
I tried this but i can't get any connections.

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        HTTP_User       ${FLUENT_ELASTICSEARCH_USER}
        HTTP_Passwd     ${FLUENT_ELASTICSEARCH_PASSWORD}
        Logstash_Format On
        Retry_Limit     False

Confused metadata while tailing kubernetes containers logs

Logs from /var/log/containers/service-1.log were enriched with metadata from /var/log/containers/service-2.log
This bug reproduces after fluent-bit restart.

Kubernetes v.1.6
Docker version 1.12.0, build 8eab29e
Ubuntu 16.04 LTS
Fluent-bit container fluent/fluent-bit-kubernetes-daemonset:0.11
Elasticsearch container docker.elastic.co/elasticsearch/elasticsearch:5.4.0

Kubernetes deployment configuration:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
serviceAccount: fluent
serviceAccountName: fluent
containers:
- name: fluent-bit
image: docker.company.com/fluent-bit-debug:0.11
imagePullPolicy: Always
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "10.10.7.21"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
resources:
limits:
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent
namespace: kube-system
rules:

apiGroups:

""
resources:

pods
verbs:

get

list

watch

fluent-bit.conf

[SERVICE]
Flush 1
Daemon Off
Log_Level debug
Parsers_File parsers.conf

[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Exclude_Path /var/log/containers/fluent.log,/var/log/containers/calico.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB

[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_JSON_Log On

[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False

Log in elasticsearch

{
"_index": "logstash-2017.06.16",
"_type": "flb_type",
"_id": "AVyv0PEQkGtdARlOstLQ",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2017-06-16T07:31:39Z",
"log": "{"levelname": "DEBUG", "module": "containers", "lineno": 177, "request_id": null, "args": ["<ServiceContainer [profiles_service] at 0x7f69e8227048>"], "msg": "starting %s"}\n",
"stream": "stderr",
"time": "2017-06-16T07:31:39.745970396Z",
"levelname": "DEBUG",
"module": "containers",
"lineno": 177,
"request_id": null,
"args": [
"u003cServiceContainer [profiles_service] at 0x7f69e8227048u003e"
],
"msg": "starting %s",
"kubernetes": {
"pod_name": "auth-emul-3229163131-xtfvw",
"namespace_name": "pandora",
"container_name": "auth-emul",
"docker_id": "3dbb153b9254ad292a459d84cab6936ed053b3f90d1c485035b2038bf9d9957c",
"pod_id": "ca96a54c-5265-11e7-826e-724e14304815",
"labels": {
"app": "auth-emul",
"pod-template-hash": "3229163131"
},
"annotations": {
"kubernetes_io/created-by": "{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"pandora","name":"auth-emul-3229163131","uid":"ca9617a7-5265-11e7-826e-724e14304815","apiVersion":"extensions","resourceVersion":"3343079"}}\n"
}
}
},
"fields": {
"@timestamp": [
1497598299000
],
"time": [
1497598299745
]
},
"highlight": {
"kubernetes.container_name": [
"@kibana-highlighted-field@auth@/kibana-highlighted-field@-@kibana-highlighted-field@emul@/kibana-highlighted-field@"
],
"kubernetes.namespace_name": [
"@kibana-highlighted-field@pandora@/kibana-highlighted-field@"
]
},
"sort": [
1497598299745
]
}

Index not created for elasticsearch with logstash_format on and logstash_prefix set

With just "logstash_format On" it does create an index of value "logstash-". But if a prefix is specified, no index is created and logs do not get indexed.
Using fluent-bit version 1.3.3

Here's what my elastic output config looks like:

  output-elasticsearch.conf: |
    [OUTPUT]
        Name                 es
        Match                 *
        Host                    10.128.241.156
        Port                     9200
        logstash_format On
        Replace_Dots     On
        Retry_Limit         False
        logstash_prefix  myapp

Don't know if it matters but I have tried capitalizing like "Logstash_Format" and "Logstash_Prefix". But it does not work.

Readme says systemd/journald are inputs but those inputs are not in the configmaps

The readme says:

Read Kubernetes/Docker log files ... through Systemd Journal

None of the config maps have systemd inputs configured, only tail. Perhaps there should be an example of how this would work? 😕

How to exclude Kubernetes namespace

Hi
Is there a way to exclude certain namespaces in fluet-bit? I would like to exclude certain namespaces so that fluent bit doesn't forward all logs created in those namespaces to ELK.

Is there a way to do it besides adding annotation to each pod in that namespace? Im aware that you can run Kubectl annotation to update all the pod in a namespace with a new annotation.

Thanks in advance for your support and maintaining this project.

Kubernetes jobs - logging

We have installed fluent-bit as daemon set in our kubernetes cluster for forwarding logs to our centralized logging cluster. Every thing is working fine except some of the pods run as kubernetes jobs. We are not able to see logs of those jobs being forwarded. Please suggest what is the best way/configuration that need to be done at fluent-bit side to get logs of these jobs

Role creation failed on GKE cluster

I have just tried to install fluentbit as DS on a GKE cluster (v1.9.2), followed the README,

I created the namespace and the service account, but when I tried to create the role I got an error message:
Error from server (Forbidden): error when creating "https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml": clusterroles.rbac.authorization.k8s.io "fluent-bit-read" is forbidden: attempt to grant extra privileges: https://gist.github.com/levlas/c956099b7c9753decf5d97d8a7cb26dd

The solution was the role binding of GKE user to cluster-admin role: fluent/fluentd-kubernetes-daemonset#14 (comment)

Would you add that step to the docs?

Readme is repetitive

Critical security issues with sqllite 3.31.0 on latest fluentbit release (v1.5.4)

Sqlite3 version 3.31.0 has the following security vulnerabilities:

Upgrading to version 3.33.0 is fixing this issues if you rebuild the source code with sources from here: https://www.sqlite.org/download.html

Thanks.

K8's cluster on GKE and elasticsearch instance on aiven

Even after giving username and password in es configurations it does not show up in elastic search on aiven instance

Timestamp_Key not working when using kafka output

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-kafka.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        K8S-Logging.Parser  On

  output-kafka.conf: |
    [OUTPUT]
        Name           kafka
        Match          *
        Brokers        192.168.12.104:9092,192.168.12.105:9092,192.168.12.106:9092
        Topics         app-logs-k8s
        Format         json
        Timestamp_Key  @timestamp
        Timestamp_Format iso8601
        Retry_Limit    false
        # hides errors "Receive failed: Disconnected" when kafka kills idle connections
        rdkafka.log.connection.close false
        rdkafka.queue.buffering.max.kbytes 10240
        rdkafka.request.required.acks 1

......
{"@timestamp":1552268032.077178, "log":"2019-03-11 01:33:52.076 [INFO][76] ipsets.go 222: Asked to resync with the dataplane on next update. family=\"inet\"\n", "stream":"stdout", "time":"2019-03-11T01:33:52.077177656Z", "kubernetes":{"pod_name":"calico-node-ls9fs", "namespace_name":"kube-system", "pod_id":"76fe49b0-3da4-11e9-9109-00163e06302b", "labels":{"controller-revision-hash":"755ffcc586", "k8s-app":"calico-node", "pod-template-generation":"1"}, "annotations":{"kubespray.etcd-cert/serial":"DDC485DA1003CC0D", "prometheus.io/port":"9091", "prometheus.io/scrape":"true", "scheduler.alpha.kubernetes.io/critical-pod":""}, "host":"a-docker-cluster01", "container_name":"calico-node", "docker_id":"bb8eb87a56fc0c48cc9ac23e64f18707331943b8cf8c905fc7350423a7a040c2"}}
{"@timestamp":1552268032.077077, "log":"2019-03-11 01:33:52.076 [INFO][76] int_dataplane.go 733: Applying dataplane updates\n", "stream":"stdout", "time":"2019-03-11T01:33:52.077076685Z", "kubernetes":{"pod_name":"calico-node-ls9fs", "namespace_name":"kube-system", "pod_id":"76fe49b0-3da4-11e9-9109-00163e06302b", "labels":{"controller-revision-hash":"755ffcc586", "k8s-app":"calico-node", "pod-template-generation":"1"}, "annotations":{"kubespray.etcd-cert/serial":"DDC485DA1003CC0D", "prometheus.io/port":"9091", "prometheus.io/scrape":"true", "scheduler.alpha.kubernetes.io/critical-pod":""}, "host":"a-docker-cluster01", "container_name":"calico-node", "docker_id":"bb8eb87a56fc0c48cc9ac23e64f18707331943b8cf8c905fc7350423a7a040c2"}}
{"@timestamp":1552268032.077666, "log":"2019-03-11 01:33:52.077 [INFO][76] ipsets.go 253: Resyncing ipsets with dataplane. family=\"inet\"\n", "stream":"stdout", "time":"2019-03-11T01:33:52.077666066Z", "kubernetes":{"pod_name":"calico-node-ls9fs", "namespace_name":"kube-system", "pod_id":"76fe49b0-3da4-11e9-9109-00163e06302b", "labels":{"controller-revision-hash":"755ffcc586", "k8s-app":"calico-node", "pod-template-generation":"1"}, "annotations":{"kubespray.etcd-cert/serial":"DDC485DA1003CC0D", "prometheus.io/port":"9091", "prometheus.io/scrape":"true", "scheduler.alpha.kubernetes.io/critical-pod":""}, "host":"a-docker-cluster01", "container_name":"calico-node", "docker_id":"bb8eb87a56fc0c48cc9ac23e64f18707331943b8cf8c905fc7350423a7a040c2"}}
{"@timestamp":1552268032.082286, "log":"2019-03-11 01:33:52.082 [INFO][76] ipsets.go 295: Finished resync family=\"inet\" numInconsistenciesFound=0 resyncDuration=4.928899ms\n", "stream":"stdout", "time":"2019-03-11T01:33:52.082285552Z", "kubernetes":{"pod_name":"calico-node-ls9fs", "namespace_name":"kube-system", "pod_id":"76fe49b0-3da4-11e9-9109-00163e06302b", "labels":{"controller-revision-hash":"755ffcc586", "k8s-app":"calico-node", "pod-template-generation":"1"}, "annotations":{"kubespray.etcd-cert/serial":"DDC485DA1003CC0D", "prometheus.io/port":"9091", "prometheus.io/scrape":"true", "scheduler.alpha.kubernetes.io/critical-pod":""}, "host":"a-docker-cluster01", "container_name":"calico-node", "docker_id":"bb8eb87a56fc0c48cc9ac23e64f18707331943b8cf8c905fc7350423a7a040c2"}}
{"@timestamp":1552268032.082313, "log":"2019-03-11 01:33:52.082 [INFO][76] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=5.248031999999999\n", "stream":"stdout", "time":"2019-03-11T01:33:52.082312879Z", "kubernetes":{"pod_name":"calico-node-ls9fs", "namespace_name":"kube-system", "pod_id":"76fe49b0-3da4-11e9-9109-00163e06302b", "labels":{"controller-revision-hash":"755ffcc586", "k8s-app":"calico-node", "pod-template-generation":"1"}, "annotations":{"kubespray.etcd-cert/serial":"DDC485DA1003CC0D", "prometheus.io/port":"9091", "prometheus.io/scrape":"true", "scheduler.alpha.kubernetes.io/critical-pod":""}, "host":"a-docker-cluster01", "container_name":"calico-node", "docker_id":"bb8eb87a56fc0c48cc9ac23e64f18707331943b8cf8c905fc7350423a7a040c2"}}
{"@timestamp":1552268034.247845, "log":"172.18.88.200 - - [11/Mar/2019:01:33:54 +0000] \"GET / HTTP/1.1\" 200 3420\n", "stream":"stdout", "time":"2019-03-11T01:33:54.247845484Z", "kubernetes":{"pod_name":"passport-phpmyadmin-cfd7b6649-59flh", "namespace_name":"infrastructure", "pod_id":"406ad9e5-3e4e-11e9-9109-00163e06302b", "labels":{"app":"phpmyadmin", "chart":"phpmyadmin-2.0.5", "pod-template-hash":"cfd7b6649", "release":"passport"}, "host":"a-docker-cluster01", "container_name":"phpmyadmin", "docker_id":"01323fddd854c7079b338ad3e251e4dc2f180cf69483ef9aa2d100baf1fd8f1b"}}

the @timestamp field is still numberic, and the logstash got an error.

......
[2019-03-11T09:22:46,971][WARN ][org.logstash.Event       ] Unrecognized @timestamp value type=class org.jruby.RubyFloat
[2019-03-11T09:22:47,953][WARN ][org.logstash.Event       ] Unrecognized @timestamp value type=class org.jruby.RubyFloat
[2019-03-11T09:22:47,966][WARN ][org.logstash.Event       ] Unrecognized @timestamp value type=class org.jruby.RubyFloat
[2019-03-11T09:22:50,953][WARN ][org.logstash.Event       ] Unrecognized @timestamp value type=class org.jruby.RubyFloat

Json fields parser and Logstash_prefix problem

Hi, We have a working EFK logging aggregation on kubernetes. Now I try to migrate from FluentD to Fluent-bit.
I have 2 issues:

It creates index '-YYYY.MM.DD' not 'logstest-YYYY.MM.DD'. When I set Logstash_Format Off it creates fluentbit index correctly.
@fields are not exposed as I expected and as it worked with fluentd according merge_json_log=true default set in FluentD_Kuberentes_metadata_filter

Configuration file is placed in configmap:

[SERVICE]
      Flush 1
      Daemon Off
      Log_Level info
      Parsers_File /fluent-bit/etc/parsers.conf

    [INPUT]
      Name tail
      Tag kubernetes.*
      Path /var/log/containers/*.log
      Parser docker
      DB /var/log/flb_kube.db
      Mem_Buf_Limit 5MB

    [FILTER]
      Name kubernetes
      Match kubernetes.*
      Kube_URL https://10.0.0.1:443

    [OUTPUT]
      Name es
      Match *
      Index fluentbit
      Type flb_type
      Host elasticsearch-logging
      Port 9200
      Logstash_Format On
      Logstash_Prefix logstest
      Time_Key @timestamp
      Retry_Limit False

I expect:

{
      "_index" : "logstash-2017.04.10",
      "_type" : "fluentd",
      "_id" : "AVtVnkb-9g_Xl9E-AQK1",
      "_score" : 6.289148,
      "_source" : {
        "@timestamp" : "2017-04-10T02:08:04+00:00",
        "@fields" : {
          "remote_addr" : "172.17.0.1",
          "remote_user" : "-",
          "body_bytes_sent" : "518",
          "request_time" : "0.000",
          "status" : "200",
          "request" : "GET / HTTP/1.1",
          "request_method" : "GET",
          "http_referrer" : "-",
          "http_user_agent" : "Go-http-client/1.1"
        },
        "log" : "{ \"@timestamp\": \"2017-04-10T02:08:04+00:00\", \"@fields\": { \"remote_addr\": \"172.17.0.1\", \"remote_user\": \"-\", \"body_bytes_sent\": \"518\", \"request_time\": \"0.000\", \"status\": \"200\", \"request\": \"GET / HTTP/1.1\", \"request_method\": \"GET\", \"http_referrer\": \"-\", \"http_user_agent\": \"Go-http-client/1.1\" } }\n",

I have:

{
      "_index" : "-2017.04.27",
      "_type" : "flb_type",
      "_id" : "AVuvHyc1A___NU0ewDMH",
      "_score" : 8.767609,
      "_source" : {
        "@timestamp" : "2017-04-27T11:15:05",
        "log" : "{ \"@timestamp\": \"2017-04-27T11:13:44+00:00\", \"@fields\": { \"remote_addr\": \"172.17.0.1\", \"remote_user\": \"-\", \"body_bytes_sent\": \"518\", \"request_time\": \"0.000\", \"status\": \"200\", \"request\": \"GET / HTTP/1.1\", \"request_method\": \"GET\", \"http_referrer\": \"-\", \"http_user_agent\": \"Go-http-client/1.1\" } }\n",
        "stream" : "stdout",

My previous FluentD conf:

    <source>
      type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>
    <filter kubernetes.**>
      type kubernetes_metadata
    </filter>
    <filter kubernetes.**>
      @type record_transformer
      enable_ruby
      auto_typecast true
      <record>
        @timestamp ${t = Time.now; (t.iso8601(3))}
      </record>
    </filter>
    <match **>
       type elasticsearch
       log_level info
       include_tag_key true
       host elasticsearch-logging
       port 9200
       logstash_format true
       # Set the chunk limit the same as for fluentd-gcp.
       buffer_chunk_limit 2M
       # Cap buffer memory usage to 2MiB/chunk * 32 chunks = 64 MiB
       buffer_queue_limit 32
       flush_interval 5s
       # Never wait longer than 5 minutes between retries.
       max_retry_wait 30
       # Disable the limit on the number of retries (retry forever).
       disable_retry_limit
       # Use multiple threads for processing.
       num_threads 8
    </match>

Do you have idea what I missed?

How to configure multiple targets in fluent-bit

Hi Team,

How to configure multiple targets in fluent-bit.(Example : Elasticsearch, Kafka) in one single daemonset file.
We are able to configure single target elasticsearch with fluentd daemonset with below link.

https://github.com/fluent/fluentd-kubernetes-daemonset

Please let us know anyone has configured. Thanks in advance.

Unstable using latest from 0.13-dev branch

I am experiencing a lot of instability when applying the latest changes from 0.13-dev branch, specifically #16

Eventually if a pod crashes on a busy nodes and enters CrashLoopBackOff, it won't ever recover. I am still investigating, but if you can see anything obvious, I would really appreciate your insight.

At first, I thought it was the memory and or cpu limits, so I removed those and crashes happen much less reliably. Without limits, I'm still seeing what looks like multiple failure reasons. I changed the namespace (to kangaroo) and kafka topic (to k8s-firehose), and I changed the Log_Level to debug. In the Kube_URL, kubernetes.default.svc got a few Temporary failure in name resolution errors so I changed it to kubernetes.default.svc.cluster.local and it have not seen it again.

I am using kail to follow all the daemonset pods in parallel, but that's quite chatty, so I do filter it down to errors with some context using:

kail --ds=fluent-bit | grep -A 10 -B 10 error

The output I get is:

kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (854 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (871 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [oukangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [sched] retry=0x7fdd38017938 1 in 234 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [sched] retry=0x7fbce1a17938 5 in 329 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [retry] re-using retry for task_id=2 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [sched] retry=0x7f2aa9a17a00 2 in 101 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [sched] retry=0x7f2aa9a179d8 1 in 749 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [sched] retry=0x7fbce1a17960 3 in 691 seconds
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (867 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (884 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [dekangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
--
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-9e77c3d34cae27579fb2236fd361cc4d8d0f4018e2f1cb76a68a4d8f0b16b774.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=491254
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2778795
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1319893
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=284611
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] message delivered (1112 bytes, partition 4)
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [sched] retry=0x7fdd38017988 5 in 1123 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [sched] retry=0x7f2aa9a17a50 3 in 1609 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [sched] retry=0x7fdd380178e8 3 in 1386 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [sched] retry=0x7fbce1a178c0 1 in 1636 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
--
--
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [sched] retry=0x7fbce1a17960 3 in 888 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [retry] re-using retry for task_id=1 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [sched] retry=0x7f2aa9a179d8 1 in 1755 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [retry] re-using retry for task_id=0 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [sched] retry=0x7f2aa9a179b0 0 in 1741 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:03:56] [debug] [retry] rkangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-1ee969753bd79567e97d12a8c82e10542c4b720db6d0aa1f78a4009c6d064920.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=515164
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2800532
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1346840
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=335508

extra line in fluent-bit role file for 1.22

I got the following error when creating the fluent-bit role on a 1.22 cluster:

error: error parsing https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-1.22.yaml

This is due to a remaining "git diff" at the end of the file:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  verbs: ["get", "list", "watch"]
diff --git a/fluent-bit-role-binding-1.22.yaml b/fluent-bit-role-binding-1.22.yaml

[io_tls] flb_io_tls.c:287 X509 - Certificate verification failed, e.g. CRL, CA or signature check

Please remove theis from the examples:
https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml#L40

For more details:
fluent/fluent-bit#367 (comment)

Pleasenote that it is a somewhat frequent for people to trip over this:
#50

for containerd/cri please include note about /var/lib dir

I noticed that there are notes about changing the parser to cri if you are using containerd or cri-o.
but there is not an example manifest for it (daemonset, etc) and it doesn't say anywhere to also modify the volumes (i presume this is also not accounted for in the helm chart).

On a node running containerd as the runtime for k8s, there is no /var/lib/docker, but there is /var/lib/containerd, yet the ds will still mount (or try to) the docker dir.

ls -la /var/lib/docker
ls: cannot access '/var/lib/docker': No such file or directory
[root@ip-10-42-10-221 ~]# ls -la /var/lib/container*
/var/lib/containerd:
total 8
drwx------. 10 root root 4096 Jan 20 22:27 .
drwxr-xr-x. 30 root root 4096 Jan 20 22:27 ..
drwxr-xr-x.  3 root root   20 Jan 20 22:27 io.containerd.content.v1.content
drwx--x--x.  2 root root   21 Jan 20 22:27 io.containerd.metadata.v1.bolt
drwx--x--x.  2 root root    6 Jan 20 22:27 io.containerd.runtime.v1.linux
drwx--x--x.  2 root root    6 Jan 20 22:27 io.containerd.runtime.v2.task
drwxr-xr-x.  2 root root    6 Jan 20 22:27 io.containerd.snapshotter.v1.btrfs
drwx------.  3 root root   23 Jan 20 22:27 io.containerd.snapshotter.v1.native
drwx------.  3 root root   23 Jan 20 22:27 io.containerd.snapshotter.v1.overlayfs
drwx------.  2 root root    6 Jan 20 22:27 tmpmounts

/var/lib/containers:
total 4
drwxr-xr-x.  4 root root   37 Jan 20 22:27 .
drwxr-xr-x. 30 root root 4096 Jan 20 22:27 ..
drwxr-xr-x.  2 root root    6 Sep 23 16:19 sigstore
drwx------.  9 root root  169 Jan 20 22:27 storage

v1.6.7 seems to have broken Kubernetes API connection

I have a daemonset of fluent-bit v1.6 pods running in three different (EKS) Kubernetes clusters.
For historical reasons, I have a cronjob to restart them every 6 hours, and after they restarted this morning, they could not contact https://kubernetes.default.svc:443 in any cluster.
After doing a bit of digging, I spotted that my daemonset was configured to

      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.6
        imagePullPolicy: Always

and that v1.6.7 was released a few hours ago.
Pinning the version to v1.6.6 resolved the issue, so there seems to be a bug in the new version.

The specific errors I was seeing in fluent-bit's logs were:

[2020/12/03 10:39:42] [error] [io] connection #42 failed to: kubernetes.default.svc:443
[2020/12/03 10:39:42] [error] [filter:kubernetes:kubernetes.0] upstream connection error

fluent-bit kafka data pipeline is not working.

Hi Team,

We have configured the fluent-bit->Kafka data pipeline with the below mentioned link.

https://github.com/fluent/fluent-bit-kubernetes-logging/tree/master/output/kafka

When we have deployed this pipeline, pods are coming up, Kafka topic was created, but logs are not flowing to Kafka cluster.
We could not see any error in the pod logs, we are seeing below info.
#########################################################

[root@centos]# kubectl logs fluent-bit-nb886 -n ec2-kafka
Fluent Bit v1.5.7

Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
https://fluentbit.io

[2022/09/12 07:11:40] [ info] [engine] started (pid=1)
[2022/09/12 07:11:40] [ info] [storage] version=1.0.5, initializing...
[2022/09/12 07:11:40] [ info] [storage] in-memory
[2022/09/12 07:11:40] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/09/12 07:11:40] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/09/12 07:11:40] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/09/12 07:11:40] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/09/12 07:11:40] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2022/09/12 07:11:40] [ info] [output:kafka:kafka.0] brokers='x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092' topics='kafka-bit'
[2022/09/12 07:11:40] [ info] [sp] stream processor started
##########################################################################

Please let us know how to solve this issue.

Thanks in advance..!!

Getting HTTP post in ELK from Fluent-bit

Hi
I finished setting up the solution and i getting entries like the ones that you can see in the attached picture. they seem to be regular HTTP post headers.

Does anyone how to filter them out? or if this is a normal setup?

Thanks in advance for all the help and the project.

Container logs broken symlinks

The logs at /var/log/containers/*.log will end up referencing files in /var/data on my k8s cluster (v1.9.7), which is not mounted by the DaemonSet and results in broken symlinks. Workaround is to just mount /var as a whole.

No matches for kind "DaemonSet" in version "extensions/v1beta1"

The DaemonSet config file for Elasticsearch needs to be updated.
According to gluster/gluster-kubernetes#627 "extensions/v1beta1" has to be replaced with apps/v1

[filter_kube] invalid pattern for given tag

Could anyone help me with this error? I am using EFK stack
Fluent bit: v1.0.1
Kubernetes: 1.10.5
Docker: 18.06.0-ce

[error] [out_es] could not pack/validate JSON
[ warn] [out_es] Elasticsearch error

{"log":"[2019/01/14 14:53:46] [error] [out_es] could not pack/validate JSON response\n","stream":"stderr","time":"2019-01-14T14:53:46.176768172Z"}
{"log":"{"took":158,"errors":true,"items":[{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-TLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10946,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-jLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":11085,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-zLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10879,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"_DLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10880,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-\n","stream":"stderr","time":"2019-01-14T14:53:46.176879287Z"}
{"log":"[2019/01/14 14:53:46] [ warn] [out_es] Elasticsearch error\n","stream":"stderr","time":"2019-01-14T14:53:46.176902697Z"}
{"log":"{"took":158,"errors":true,"items":[{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-TLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10946,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-jLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":11085,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"-zLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10879,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type":"flb_type","_id":"_DLaTGgBbqH75WQ8rG2j","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10880,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2019.01.14","_type\n","stream":"stderr","time":"2019-01-14T14:53:46.17691558Z"}

Support CoreOS?

We tested it in our OCP4 env which is use CoreOS as the worker hosts. But always get permssion deny error for dir "/var/log/containers/*.log"

2021/02/01 08:10:44] [ warn] [input:tail:tail.0] error scanning path: /var/log/containers/*.log 
[2021/02/01 08:10:54] [error] [input:tail:tail.0] read error, check permissions: /var/log/containers/*.log;
[2021/02/01 08:10:54] [ warn] [input:tail:tail.0] error scanning path: /var/log/containers/*.log
[2021/02/01 08:11:04] [error] [input:tail:tail.0] read error, check permissions: /var/log/containers/*.log
[2021/02/01 08:11:04] [ warn] [input:tail:tail.0] error scanning path: /var/log/containers/*.log
[2021/02/01 08:11:14] [error] [input:tail:tail.0] read error, check permissions: /var/log/containers/*.log

Below is the content of the configmap:

apiVersion: v1
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf
  input-kubernetes.conf: |
    [INPUT]
        Name          tail
        Path          /var/log/containers/*.log
        Parser        cri
        Tag           kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines Off
        Refresh_Interval  10
  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            <es host>
        Port            443
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False
        HTTP_User       <user>
        HTTP_Passwd     <passwd>

Then $oc rsh and $oc debug cannot be used for inside troubleshooting:

$oc rsh fluent-bit-5gwn2
ERRO[0000] exec failed: container_linux.go:348: starting container process caused "exec: \"/bin/sh\": stat /bin/sh: no such file or directory"
exec failed: container_linux.go:348: starting container process caused "exec: \"/bin/sh\": stat /bin/sh: no such file or directory"
command terminated with exit code 1

Thank you.

Time_Key_Format for elasticsearch not works with miliseconds

Hi,
I would like to have time key @timestamp in elasticsearch with miliseconds.
I set Time_Key_Format:

%Y-%m-%dT%H:%M:%S.%L%z

but it logs:

"@timestamp" : "2017-05-16T10:43:34.%L+0000"

I found %L in parsers.conf Time_formats.

Elasticsearch Logs:

"_source" : {
        "@timestamp" : "2017-05-16T10:43:34.%L+0000",
        "log" : "{\"timestamp\":\"2017-05-16T10:43:25.358Z\",\"level\":\"DEBUG\",\"thread\":\"ccc_agent-akka.actor.default-dispatcher-36\",\"mdc\":{\"sourceThread\":\"ccc_agent-akka.actor.default-dispatcher-25\",\"akkaTimestamp\":\"10:43:25.358UTC\",\"akkaSource\":\"akka://ccc_agent/user/agentHttpInterface\",\"sourceActorSystem\":\"ccc_agent\"},\"logger\":\"com.intel.ccc.agent.controller.JobControllerActor\",\"message\":\"no longer watched by Actor[akka://ccc_agent/user/IO-HTTP/listener-0/847#489616481]\",\"context\":\"default\"}\n",
        "stream" : "stdout",
        "time" : "2017-05-16T10:43:25.359158656Z",
        "timestamp" : "2017-05-16T10:43:25.358Z",

With FluentD i used kubernetes filter to add miliseconds:

<filter kubernetes.**>
      @type record_transformer
      enable_ruby
      auto_typecast true
      <record>
        @timestamp ${t = Time.now; (t.iso8601(3))}
      </record>
    </filter>

"@timestamp" : "2017-05-10T21:59:53.163+00:00"

Question - how to apply python multiline exceptions

I'm using this config form your examples:

output/elasticsearch/fluent-bit-configmap.yaml

but when my python program runs in kubernetes and throws an exception (multiple lines) - the logs are inserted as individual lines (and not one traceback at once)

Any advice how to address this ?

Anyone using this one to send over to cloudwatch?

Anyone using this one to send over to cloudwatch? Please show us the example.

fail to run fluent-bit-ds on kubernetes v1.16.2 (output to elasticsearch)

api version: extensions/v1beta1 is not support DaemonSet.
spec needs selector
I changed es service name to elasticsearch-01-service

base on above, my fluent-bit-ds.yaml is

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit-logging
      version: v1
      kubernetes.io/cluster-service: "true"
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: registry.cn-hangzhou.aliyuncs.com/mykg/fluent-bit:1.2.1
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-01-service"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

then my pod start failed

logs is