aws-observability / aws-otel-helm-charts Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 40.0 2.11 MB

AWS Distro for OpenTelemetry (ADOT) Helm Charts

Home Page: https://aws-otel.github.io/

License: Apache License 2.0

Makefile 9.13% Shell 39.75% Mustache 51.13%

adot aws helm helm-charts

aws-otel-helm-charts's People

Contributors

Stargazers

Watchers

aws-otel-helm-charts's Issues

Use the namespace set by Helm instead of creating a custom one

Describe the bug

Usually, the namespace in Helm Charts set explicitly by Helm itself, as namespace name itself via --namespace or simply -n and as the choice to create namespace via --create-namespace. So that we don't have to "host" the namespace, because Helm will take care of managing it.

In chart, we're declaring namespace explicitly the namespace called amazon-metrics and set up all resources there, which is confusing, when you try to deploy chart to the different namespace.

My proposal is to:

Remove the namespace from this chart completely
Override namespace references in chart from .Values.adotCollector.daemonSet.namespace to .Release.Namespace in order to use the namespace set by Helm
Move Values.global.namespaceOverride value one layer up to Values.namespaceOverride so it will look the same as the most of Helm charts.

Steps to reproduce

helm install -n [NAMESPACE_NAME] [RELEASE_NAME] [REPO_NAME]/adot-exporter-for-eks-on-ec2

helm install -n monitoring adot aws-otel/adot-exporter-for-eks-on-ec2

What did you expect to see?

All resources being created in [NAMESPACE_NAME], which is monitoring in my case

What did you see instead?

All resources being created in amazon-metrics namespace, which is also created by the chart

Environment

This issue is environment-agnostic

Additional context

I'm willing to fix it by forking the chart and sending the PR for my proposal. Please let me know what you think about this issue.

Failed to update lock: leases.coordination.k8s.io is forbidden

Describe the bug
tried to helm install adot, after i attached the CloudWatchAgentServerPolicy policy to the eks ng, i could see the metrics and logs on cloudwatch, however i see these logs when i output the logs of the collector. not really sure how it affects functionality.

kubectl logs adot-collector-daemonset-c98m8 -n amazon-metrics
E0917 19:25:08.199040 1 leaderelection.go:367] Failed to update lock: leases.coordination.k8s.io is forbidden: User "system:serviceaccount:amazon-metrics:adot-collector-sa" cannot create resource "leases" in API group "coordination.k8s.io" in the namespace "amazon-metrics"
E0917 19:25:17.959812 1 leaderelection.go:334] error initially creating leader election record: leases.coordination.k8s.io is forbidden: User "system:serviceaccount:amazon-metrics:adot-collector-sa" cannot create resource "leases" in API group "coordination.k8s.io" in the namespace "amazon-metrics"

Steps to reproduce
eks 1.21
helm install
container-insights aws-observability/adot-exporter-for-eks-on-ec2
-f values.yml

same default values file with recievers and exporters updated for cloudwatch

ampexporters:
  namespaces: ""
  endpoint: ""
  resourcetootel: false
  authenticator: "sigv4auth"
service:
  metrics:
    receivers: ["awscontainerinsightreceiver"]
    processors: ["batch/metrics"]
    exporters: ["awsemf"]
  extensions: ["health_check", "sigv4auth"]

What did you expect to see?
no failed or error logs

What did you see instead?
failed and error logs

feat: Support multiple ADOT pipelines via helm chart

Hi,

I would like to suggest a new feature (multi pipeline support) for the helm chart.

Currently only one pipeline is supported by the ADOT configuration. I have a use case, where I would like to write some metrics to Prometheus and some metrics to Cloudwatch.

A configuration like this would be a solution.

data:
  adot-config:  |
    extensions:
      health_check:
    ...
    service:
      pipelines:
        metrics/prometheus:
          receivers:
          - prometheus
          processors:
          - batch/metrics
          exporters:
          - awsprometheusremotewrite
        metrics/cloudwatch:
          receivers:
          - awscontainerinsightreceiver
          processors:
          - batch/metrics
          exporters:
          - awsemf

However, currently there is no possibiliaty to configure the service like that.

aws/aws-for-fluent-bit version in default values.yaml is very old, also sets IMDS v1 by default

I deployed this to our cluster but had IMDS errors and only partial logging and after reading many Github Issues threads, realized the chart ships a really old version and sets IMDS v1 by default.

Suggest modifying the default value for aws/aws-for-fluent-bit version in values.yaml to 2.28.1 from 2.21.1. The v2.21.1 release is from Nov 2021 and there have been 18 releases since then, fixing a great number of issues and making improvements.

Additionally, setting imdsVersion to v2 by default (instead of v1) may lead to better outcomes.

This stuff is reasonably challenging to setup and validate, it's why people come for the chart, so keeping it up-to-date will help users be successful with it without a lot of labor tracking down defects that are already fixed in related packages.

Clarification on removal of logging templates

Hi team,

I was going over some documentation associated to this repo today:

However, I noticed that in #82, the templates for logging were removed from the helm chart, citing "stability of Logs upstream in the OTel community in 2023". I don't see any further issues or rationale behind this in the PR.

After installing the Helm chart I don't see any container logs in CloudWatch.

Does it mean this helm chart doesn't provide the logging integration anymore and it's metrics-only?
Or is the logging integration now supposed to be managed by the adot-collector and something in my configuration is wrong?
Has the functionality of setting up the log collection been moved to another Helm repo? We're interested in setting this up through a Helm chart, not installing manually with manifests as this guide describes. I stumbled upon this repo by reading aws/containers-roadmap#779.

serviceAccount not annotated

I've created values.yml:

---
awsRegion: "us-east-1"
clusterName: "my_cluster_name"

fluentbit:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::012345678901:role/AmazonEKSFluentBitRole"
      eks.amazonaws.com/sts-regional-endpoints: "true"

adotCollector:
  daemonSet:
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::012345678901:role/AmazonEKSOTELCollectorRole"
        eks.amazonaws.com/sts-regional-endpoints: "true"

And installed the chart:
helm install cloudwatch-container-insights aws-observability/adot-exporter-for-eks-on-ec2 -f values.yml

While the values are being merged:

helm get values cloudwatch-container-insights 
USER-SUPPLIED VALUES:
adotCollector:
  daemonSet:
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::012345678901:role/AmazonEKSOTELCollectorRole
        eks.amazonaws.com/sts-regional-endpoints: "true"
awsRegion: us-east-1
clusterName: my_cluster_name
fluentbit:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::012345678901:role/AmazonEKSFluentBitRole
      eks.amazonaws.com/sts-regional-endpoints: "true"

The generated manifest does not include the annotations (I've stripped some content and left only ServiceAccount kind):

helm get manifest cloudwatch-container-insights
# Source: adot-exporter-for-eks-on-ec2/templates/adot-collector/serviceaccount.yaml
# Service account provides identity information for a user to be able to authenticate processes running in a pod.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: adot-collector-sa
  namespace: amzn-cloudwatch-metrics
---
# Source: adot-exporter-for-eks-on-ec2/templates/aws-for-fluent-bit/serviceaccount.yaml
# Service account provides identity information for a user to be able to authenticate processes running in a pod.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: amazon-cloudwatch

One more confirmation that the annotations were not created:

kubectl get sa -n amzn-cloudwatch-metrics adot-collector-sa -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    meta.helm.sh/release-name: cloudwatch-container-insights
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2022-03-23T10:44:38Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: adot-collector-sa
  namespace: amzn-cloudwatch-metrics
  resourceVersion: "418055"
  uid: 0bf1c377-eba7-4f72-9098-1f587037556f
secrets:
- name: adot-collector-sa-token-44bnz

kubectl get sa -n amazon-cloudwatch fluent-bit -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    meta.helm.sh/release-name: cloudwatch-container-insights
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2022-03-23T10:44:38Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: fluent-bit
  namespace: amazon-cloudwatch
  resourceVersion: "418054"
  uid: ec9456ab-7c4e-4b0b-910b-c5f48e76e6df
secrets:
- name: fluent-bit-token-8sb8t

Expected results: Annotations for serviceAccount to be created in order to use IRSA.

FluentBit crashes on deployment

Describe the bug
A clear and concise description of what the bug is.

Steps to reproduce
If possible, provide a recipe for reproducing the error.

Updated the receivers and the exporters to offload metrics and logs to CloudWatch.

helm install \
[RELEASE_NAME] [REPO_NAME]/adot-exporter-for-eks-on-ec2 \
--set clusterName=[CLUSTER_NAME] --set awsRegion=[AWS_REGION]

What did you expect to see?
A clear and concise description of what you expected to see.
FluentBit and Collector pods are both running

What did you see instead?
A clear and concise description of what you saw instead.
FluentBit pods do not deploy.
Environment
Describe any aspect of your environment relevant to the problem.

Additional context
Add any other context about the problem here.

Include envFrom option

Describe the issue

Setting envFrom will help passing in sensitive variables, such as AWS Credentials or AMP Credentials.

My proposal is to:

Add {{ .Values.enfFrom }} to the template
Set in values.yaml default value: envFrom: {}

What did you expect to see?

Environment variables being populated from secret/configmap to the daemonset/sidecar

Environment

This issue is environment-agnostic

Additional context

I'm willing to fix it by forking the chart and sending the PR for my proposal. Please let me know what you think about this issue.

Unset CPU limit

Hi, I want to unset CPU limit for adot-collector-container but if I don't set the CPU limit in values.yaml, it uses the default value (200m).

helm chart version: 0.14.0

I am attaching my values.yaml file.

values.yml.txt

ADOT Collector ignores prometheus.io port annotation

While I tried to use the ADOT Collector to scrape metrics from the prometheus /metric endpoint, I recognized that the config ignores the prometheus.io/port pod annotation.

I fixed it by updating the ampreceivers.scrapeConfigs configuration

from

  - source_labels: [__address__]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $$1:$$2
    target_label: __address__

  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $$1:$$2
    target_label: __address__

Investigate `kubeVersion` values

An evaluation should be made to see if the kubeVersion in adot-exporter-for-eks-on-ec2 chart is still valid. Deprecated/Removed APIs may be in use that are not compatible with newer versions of EKS.

Unable to reach port 4317

Introduction

I have installed aws otel collector using the help chart provided here in this repository. I am able to send the metrics to cloudwatch and i can see logs appearing in cloudwatch logs as well, which is a good sign that the collector is working.

Issues

The installation went fine with couple of hiccups. I tried instrumenting a sample application but the pod is unable to connect to collector service on port 4317. Logs below:

My otel helm release looks like below:

`

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: aws-otel
namespace: infra
spec:
releaseName: aws-otel
interval: 5m
chart:
spec:
chart: adot-exporter-for-eks-on-ec2
sourceRef:
kind: HelmRepository
name: aws-otel
namespace: infra
values:
nameOverride: aws-otel
clusterName: dev
awsRegion: "us-west-2"
adotCollector:
image:
name: "aws-otel-collector"
repository: "amazon/aws-otel-collector"
tag: "v0.29.0"
daemonSetPullPolicy: "IfNotPresent"
sidecarPullPolicy: "Always"
daemonSet:
enabled: true
daemonSetName: "adot-collector-daemonset"
createNamespace: false
namespace: "infra"
clusterRoleName: "dataos-core-dev-adot-collector-role"
clusterRoleBindingName: "adot-collector-role-binding"
command:
- "/awscollector"
- "--config=/conf/adot-config.yaml"
resources:
limits:
cpu: "200m"
memory: "200Mi"
requests:
cpu: "200m"
memory: "200Mi"
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
awsxray:
region: us-west-2
processors:
memory_limiter:
limit_mib: 100
check_interval: 5s
extensions:
sigv4auth:
assume_role:
arn: "arn:aws:iam::xxxxxxxxxx:role/adot-collector-sa"
sts_region: "us-west-2"
cwexporters:
namespace: "ContainerInsights"
logGroupName: "aws-otel"
logStreamName: "InputNodeName"
enabled: true
dimensionRollupOption: "NoDimensionRollup"
parseJsonEncodedAttrValues: [ "Sources", "kubernetes" ]
metricDeclarations: |
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_working_set
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
- dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_number_of_container_restarts
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# service metrics
- dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- service_number_of_running_pods
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
# namespace metrics
- dimensions: [[Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- namespace_number_of_running_pods
service:
pipelines:
traces:
processors:
- memory_limiter
receivers:
- otlp
exporters:
- awsxray
metrics:
receivers: [ "awscontainerinsightreceiver"]
processors: [ "batch/metrics" ]
exporters: [ "awsemf"]
extensions: [ "sigv4auth" ]
`

Further Troubleshooting

I tried installing a test pod to run nmap to verify if the service are actually running or not. But i don't get anything back from nmap. Screenshot below

I even tried telnetting and i get a connection refused.

Additional Issues

Apart from the above issue There are a couple of things i do not understand:

How to get an IAM Role created using the helm chart. I had to manually create IRSA and then use with sigv4auth.
How do i get to choose the type of installation. ( Daemonset, statefulset, or side car type )
Is this a preferred way to install or i need to use the helm chart. As i see no mention of helm charts in the documentation.
How do i expose the ports of collector as a service.

Any help on this would be really appreciated as i cannot find anything on the internet related to such a problem.

Bottlerocket and containerd support

Hi,

Making this a general issue and not a bug report, as I did not have time to re-test. There is a large possibility I am wrong here, and I apologize if that is the case!

Does the adot-exporter-for-eks-on-ec2 chart support container metrics from EC2 hosts using Bottlerocket and containerd?

I am seeing no containerdsock mount in values.yaml, and that is in-line with the No pod metrics when using Bottlerocket for Amazon EKS common error, and it would be in-line with the experience I had with CloudWatch Container Insights.

Need of CloudWatch Agent section in values.yaml file ??

Hi,

As per my understanding, this Helm chart takes care of deploying the following Agent/Collector on k8s cluster.

1.) FluentBit agent - It will get deployed as DaemonSet on k8s cluster and be responsible for gathering and offloading Application, Host, and DataPlane logs into CloudWatch.

2.) OTEL collector - It will also get deployed as DaemonSet on k8s cluster and be responsible for gathering and offloading metrics data into CloudWatch.

So I'm wondering what's the role of CloudWatch agent in this setup. I do see the following section in Values.yaml file and also these fields are being referenced in INPUT section in configmap.yaml file.

cloudwatchAgent:
path: "/var/log/containers/cloudwatch-agent*"
dockerModeParser: "cwagent_firstline"
db: "/var/fluent-bit/state/flb_cwagent.db"
memBufLimit: "5MB"

`adot-exporter-for-eks-on-ec2` Helm installation does not work with existing namespace and SA

Describe the bug
adot-exporter-for-eks-on-ec2 Helm installation does not work with existing namespace and SA. Im using CDK blueprints and im trying to create a namespace amazon-metrics and then creating an IRSA by name adot-collector-sa and then deploying the helm with following additional values :

        let values: ValuesSchema = {
            awsRegion: cluster.stack.region,
            clusterName: cluster.clusterName,
            fluentbit: {
                enabled: true
            },
            serviceAccount: {
                create: false,
            },
            adotCollector: {
                daemonSet: {
                    createNamespace: false,
                    service: {
                        metrics: {
                            receivers: ["awscontainerinsightreceiver"],
                            exporters: ["awsemf"],
                        }
                    },
                    serviceAccount: {
                        create: false,
                    },
                    cwexporters: {
                        logStreamName: "EKSNode",
                    }
                }
            }
        };

helm installation fails with below errors which clearly shows that serviceAccount: create: false does not work. Appreciate any resolutions on this. This is a dependency for a blueprints CDK EKS Addon.

2:49:47 PM | CREATE_FAILED        | Custom::AWSCDK-EKS-HelmChart          | blueprintconstruct...oreksonec2037B3D69
Received response status [FAILED] from custom resource. Message returned: Error: b'Release "adot-eks-addon" does not exist. Installing it now.\nError: rendered manifests contain a r
esource that already exists. Unable to continue with install: ServiceAccount "adot-collector-sa" in namespace "amazon-metrics" exists and cannot be imported into the current release
: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/relea
se-name": must be set to "adot-eks-addon"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"\n'

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?
A clear and concise description of what you expected to see.

What did you see instead?
A clear and concise description of what you saw instead.

Environment
Describe any aspect of your environment relevant to the problem.

Additional context
Add any other context about the problem here.

ENHANCEMENT: Enable cloudwatch log group retention for adot-collector metrics performance log

Hi Team,

Enhancement #23 enabled cloudwatch log group retention for:

application
dataplane
host

But it did not enable it for the performance log which is used by the adot-collector for the metrics. I think it needs to be added to the adot-config data in the adot-collector ConfigMap, something like:

    exporters:
      awsemf:
        namespace: {{ .Values.adotCollector.daemonSet.cwexporters.namespace }}
        log_group_name: '/aws/containerinsights/{{ .Values.clusterName }}/performance'
        log_stream_name: {{ .Values.adotCollector.daemonSet.cwexporters.logStreamName }}
        log_retention: 60

As described on this page: https://aws-otel.github.io/docs/getting-started/cloudwatch-metrics

It would be great if the Helm chart could be updated with this extra option.
Thanks

FluentBit pods go into CrashLoopBackOff

My goal is to use this chart to deploy FluentBit to forward logs to CloudWatch logs as described in the docs here.

I'm setting my values to the following:

adotCollector.daemonSet.service.metrics.receivers is awscontainerinsightreceiver
adotCollector.daemonSet.service.metrics.exporters is awsemf.

Note these values are slightly different than what the doc referenced above says they should be, but I believe these are the correct ones. I've tried the other ones too of course. Its unclear to me if the adotcollector is only metrics or has something to do with logs as well.

I see that the collector has quite a bit of variety when it comes to components and collectors, processors etc. but I'm lost as to what the magic combination might be.

With the above config thefluent-bit pods fail to start and enter CrashLoopBackOff. Doing kubectl logs <fluent bit podname> yields the following but I have no way to debug this as I can't connect to the pod to view that application-log.conf file.

Fluent Bit v1.8.9
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

Error: Configuration file contains errors. Aborting

AWS for Fluent Bit Container Image Version 2.21.1[2022/04/08 16:52:33] [  Error] File application-log.conf
[2022/04/08 16:52:33] [  Error] Error in line 58: Key has an empty value

Questions

Is the FluentBit portion of this chart known to work for my use case? My goal is to get the application logs for my pods forwarded to CloudWatch Logs in addition to getting the metrics. So metrics and logs. BTW the metrics pods are deploying fine.
Is there a values change I can make to get logs/FB going? Perhaps I need to set something in the fluentBit portion of values. I have fluentBit.enabled set to true but nothing else seems like something I should set and I don't see anything in the docs about that.

Regarding the prerequisites, I have bound two worker role managed policies to my nodes: CloudWatchLogsFullAccess and CloudWatchAgentServerPolicy but since I'm able to see the metrics pods start up and send their metrics to CloudWatch perhaps this part is fine.

Application log group not created after deploying FluentBit

I'm deploying the FluentBit part of this chart. Here's what my pods look like:

$ kubectl get pods --all-namespaces | grep amazon
amazon-cloudwatch   fluent-bit-448z6                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-9s8jz                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-jblg5                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-ts4kg                                     1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-2s4zj                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-9fhd7                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-g6t9m                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-qdcf2                       1/1     Running   0          123m

The docs say here that 4 log groups should be created once the pods are deployed. But in the CloudWatch dashboard I only see one group /performance for the cluster.

What additional config do I need to see application logs? The only thing I'm doing now is setting fluentBit.enabled to true in the values.

My pods generate logs (I can see them by doing kubectl logs <pod name> at least).

The logs for a given fluent-bit pod look like this:

Fluent Bit v1.8.9
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/08 17:57:22] [ info] [engine] started (pid=1)
[2022/04/08 17:57:22] [ info] [storage] created root path /var/fluent-bit/state/flb-storage/
[2022/04/08 17:57:22] [ info] [storage] version=1.1.5, initializing...
[2022/04/08 17:57:22] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/08 17:57:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/08 17:57:22] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/08 17:57:22] [ info] [cmetrics] version=0.2.2
[2022/04/08 17:57:22] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] connectivity OK

IDMS is hardcoded to v1

Describe the bug
In the FluentBit config, IDMS is hardcoded to v1. When using IDMS v2, the v1 endpoint is no longer working and FluentBit complains.

Steps to reproduce
Deploy the Helm chart with FluentBit enabled, on a cluster where IDMSv2 is enabled.

What did you expect to see?
The helm chart working and logs appearing in CloudWatch Logs.

What did you see instead?

[error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS

Environment

EKS version: 1.21
EC2 workers: Managed Node Group, version 1.21, using Bottlerocket OS 1.4.2 (aws-k8s-1.21), with IDMSv2 enabled.

Additional context

fluent/fluent-bit#2840

Note hardcoded v1 string:

aws-otel-helm-charts/charts/adot-exporter-for-eks-on-ec2/templates/aws-for-fluent-bit/configmap.yaml

Lines 127 to 130 in 317dab9

  [FILTER] 

  Name aws 

  Match dataplane.* 

  imds_version v1

ENHANCEMENT: Provide capability in Helm chart to configure the retention policy for Log Groups

Hi Team,

By default, CloudWatch retains the logs forever. Many of our customers expressed interest in retaining the logs for a specific duration in CloudWatch service to save the storage cost. We implemented that by configuring the log_retention_days parameter in FluentBit agent config file while installing the agent separately on the EKS cluster.

It will be great if the Helm chart can also support this capability. I can work on this enhancement. Please let me know if that works for the team. Thanks.

Allow for image pull secrets to be configurable in the helm chart

Describe the bug
I'd like to be able to set a image pull secrets in the daemonsets for the helm chart.

Steps to reproduce
N/A

What did you expect to see?
Allow for a helm value to be set that will specify an image pull secret to be used.

What did you see instead?
This is just not configurable yet.

Environment
Useful for enterprise solutions

Additional context
This is useful in cases where images are stored in an Enterprise repository, and access to the images requires a docker secret.

Unable to install helm chart for cloudwatch metrics without AMP/Prometheus

Describe the bug
Following the documentation I am getting a failure for the adot collector add-on. It used to work for version 0.1.0 of the chart, but fails with all higher versions

Steps to reproduce
Follow documentation for offloading metrics to amazon cloudwatch. I can see that the instructions for the two sections are identical:
CloudWatch
CloudWatch and AMP.

What did you expect to see?
Expected an option to enable CloudWatch metrics only.

What did you see instead?
Appears that AMP exporter is active by default
adot collector daemonset failed with the following error message:

builder/exporters_builder.go:40    Exporter is starting...    {"kind": "exporter", "name": "awsprometheusremotewrite"}    │
│ Error: cannot start exporters: invalid endpoint: "http://some.url:9411/api/prom/push"                                                                         │
│ 2022/07/13 02:26:20 application run finished with error: cannot start exporters: invalid endpoint: "http://some.url:9411/api/prom/push"                       │
│ Stream closed EOF for amazon-metrics/adot-collector-daemonset-n54d9 (adot-collector-container)

Environment
EKS 1.21

Additional context
Add any other context about the problem here.

Add OTEL Prometheus to Helm chart for Container Insights Prometheus

Add OTEL Prometheus to Helm chart for full Container Insights Prometheus functionality

Update the docs with correct values

Noticed this while working on #32. I believe the doc here has incorrect values. I'm fairly sure I wasn't able to get the adot collector started unless I used these instead of what are listed.

The doc suggests that:

adotCollector.daemonSet.service.metrics.receivers is awscontainerinsight
adotCollector.daemonSet.service.metrics.exporters is awsemfexporter.

I think the correct values are:

adotCollector.daemonSet.service.metrics.receivers is awscontainerinsightreceiver
adotCollector.daemonSet.service.metrics.exporters is awsemf.

service account adot-collector-sa gives forbidden error

I installed the adot collector using the helm using argocd, but if I check the logs for the daemonset it shows below error .When I check the clusterrole template file https://github.com/aws-observability/aws-otel-helm-charts/blob/main/charts/adot-exporter-for-eks-on-ec2/templates/adot-collector/clusterrole.yaml I do not see the permission of resource "services"

Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:amazon-metrics:adot-collector-sa" cannot list resource "services" in API group "" at the cluster scopeW0512 17:39:48.906520 1 reflector.go:535] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:amazon-metrics:adot-collector-sa" cannot list resource "services" in API group "" at the cluster scope

awsemf exporter send data to the wrong region

Hello, i have a cluster on eu-south-1 region, but some feature of ContainerInsight is still not deployed to this region; so i setup the chart to send log and metrics to eu-west-1. The log are send correctly to the right region, but the metrics are still on eu-south-1. It's seems you missed to propagate the value of awsRegion to the awsemf exporter config.

aws-otel-helm-charts/charts/adot-exporter-for-eks-on-ec2/templates/adot-collector/configmap.yaml

Lines 36 to 46 in ecde735

  awsemf: 

  namespace: {{ .Values.adotCollector.daemonSet.cwexporters.namespace }} 

  log_group_name: '/aws/containerinsights/{{ .Values.clusterName }}/performance' 

  log_stream_name: {{ .Values.adotCollector.daemonSet.cwexporters.logStreamName }} 

  resource_to_telemetry_conversion: 

  enabled: {{ .Values.adotCollector.daemonSet.cwexporters.enabled }} 

  dimension_rollup_option: {{ .Values.adotCollector.daemonSet.cwexporters.dimensionRollupOption }} 

  parse_json_encoded_attr_values: {{- range .Values.adotCollector.daemonSet.cwexporters.parseJsonEncodedAttrValues }} 

  - {{.}}{{- end }} 

  metric_declarations: 

  {{ .Values.adotCollector.daemonSet.metricDeclarations | nindent 10 }}

Just append region property to awsemf exporter:

region: {{ .Values.awsRegion }}

Hope this help, have a nice day

Request for an Helm chart for Fargate only clusters

Hello,

since our system runs on EKS Fargate configuration would be great to have an Helm chart usefull to install and configure the AWS-OTEL.

Regards,

Vincenzo.

Can this be used to send traces to AWS XRay, if yes, which all places needs modification?

ADOT Collector in EKS does not receive any signal.

I use this helm chart to install ADOT collector in EKS. I deploy the ADOT collector with daemonset mode and I can see it started successfully. There is no any error in the collector's log. I also modify the configmap so that the collector can receive logs via otlp and export to them to CloudWatch Log. This is also looking ok.

However, from my app in another pod, the collector endpoint is "http://cluster-node-IP:4317", when it tries to send out logs to the collector, nothing happens. There is no any message in the app's log. There is no any new messages in the controller log too.

Then I enable the self-diagnostic log for OpenTelemetry in my app, I can see these two exceptions when it tries to send out logs to the collector.

Exception 1 - HTTP/2 handshake error

2024-05-15T04:16:15.7280926Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://10.17.72.214:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.")
---> System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.
---> System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.
at System.Net.Sockets.NetworkStream.WriteAsync(ReadOnlyMemory1 buffer, CancellationToken cancellationToken) at Grpc.Net.Client.Balancer.Internal.StreamWrapper.WriteAsync(ReadOnlyMemory1 buffer, CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.SetupAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.SetupAsync(CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.ConstructHttp2ConnectionAsync(Stream stream, HttpRequestMessage request, IPEndPoint remoteEndPoint, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnectionPool.ConstructHttp2ConnectionAsync(Stream stream, HttpRequestMessage request, IPEndPoint remoteEndPoint, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(QueueItem queueItem)
at System.Threading.Tasks.TaskCompletionSourceWithCancellation1.WaitWithCancellationAsync(CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout) --- End of inner exception stack trace --- at Grpc.Net.Client.Internal.HttpClientCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method2 method, String host, CallOptions options, TRequest request)
at Grpc.Core.Interceptors.InterceptingCallInvoker.b__3_0[TRequest,TResponse](TRequest req, ClientInterceptorContext2 ctx) at Grpc.Core.ClientBase.ClientBaseConfiguration.ClientBaseConfigurationInterceptor.BlockingUnaryCall[TRequest,TResponse](TRequest request, ClientInterceptorContext2 context, BlockingUnaryCallContinuation2 continuation) at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method2 method, String host, CallOptions options, TRequest request)
at OpenTelemetry.Proto.Collector.Logs.V1.LogsService.LogsServiceClient.Export(ExportLogsServiceRequest request, CallOptions options)
at OpenTelemetry.Proto.Collector.Logs.V1.LogsService.LogsServiceClient.Export(ExportLogsServiceRequest request, Metadata headers, Nullable`1 deadline, CancellationToken cancellationToken)
at OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.ExportClient.OtlpGrpcLogExportClient.SendExportRequest(ExportLogsServiceRequest request, DateTime deadlineUtc, CancellationToken cancellationToken)}

Exception 2 - Broken Pipe error

2024-05-15T07:22:39.8784780Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://10.17.72.214:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. IOException: Unable to write data to the transport connection: Broken pipe. SocketException: Broken pipe", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.")
---> System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: The request was aborted.
---> System.IO.IOException: Unable to write data to the transport connection: Broken pipe.
---> System.Net.Sockets.SocketException (32): Broken pipe
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.CreateException(SocketError error, Boolean forAsyncThrow)
at System.Net.Sockets.NetworkStream.WriteAsync(ReadOnlyMemory1 buffer, CancellationToken cancellationToken) at Grpc.Net.Client.Balancer.Internal.StreamWrapper.WriteAsync(ReadOnlyMemory1 buffer, CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.FlushOutgoingBytesAsync()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at System.Net.Http.Http2Connection.FlushOutgoingBytesAsync()
at System.Net.Http.Http2Connection.ProcessOutgoingFramesAsync()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
--- End of stack trace from previous location ---
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.FlushOutgoingBytesAsync()
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
at System.Net.Http.Http2Connection.Http2Stream.TryEnsureHeaders()
at System.Net.Http.Http2Connection.Http2Stream.ReadResponseHeadersAsync(CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout)
--- End of inner exception stack trace ---
at Grpc.Net.Client.Internal.HttpClientCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method2 method, String host, CallOptions options, TRequest request) at Grpc.Core.Interceptors.InterceptingCallInvoker.<BlockingUnaryCall>b__3_0[TRequest,TResponse](TRequest req, ClientInterceptorContext2 ctx)
at Grpc.Core.ClientBase.ClientBaseConfiguration.ClientBaseConfigurationInterceptor.BlockingUnaryCall[TRequest,TResponse](TRequest request, ClientInterceptorContext2 context, BlockingUnaryCallContinuation2 continuation)
at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method2 method, String host, CallOptions options, TRequest request) at OpenTelemetry.Proto.Collector.Trace.V1.TraceService.TraceServiceClient.Export(ExportTraceServiceRequest request, CallOptions options) at OpenTelemetry.Proto.Collector.Trace.V1.TraceService.TraceServiceClient.Export(ExportTraceServiceRequest request, Metadata headers, Nullable1 deadline, CancellationToken cancellationToken)
at OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.ExportClient.OtlpGrpcTraceExportClient.SendExportRequest(ExportTraceServiceRequest request, DateTime deadlineUtc, CancellationToken cancellationToken)}

Do you know what is the root cause of this problem?

Besides this helm chart to install the collector, is there anything else that must be installed first?

Thank you
TP

Error: INSTALLATION FAILED: create: failed to create: Request entity too large: limit is 3145728

Installing the chart is not possible anymore.

Simple installation via helm:

helm install cloudwatch-container-insights aws-observability/adot-exporter-for-eks-on-ec2 -f values.yml

will fail with:

Error: INSTALLATION FAILED: create: failed to create: Request entity too large: limit is 3145728

I think the documentation directory can be safely added to .helmignore

Update Helm Chart

Update Helm chart to use Recent collector version, Kubernetes Version and Sigv4 Auth for prometheusremotewrite .

Support for this Helm chart in EKS v1.24

We are currently using the ADOT collector on our 1.23 EKS cluster and we are using this to send OTLP traces to Amazon Opensearch. We are successfully able to ingest the traces and see the service map for our application.

However, since 1.23 is nearing end of support and we are planning to move to v1.24 of EKS.
Docker is not supported as a container runtime on 1.24 and we plan to use containerd as our container runtime.

Currently in the values.yaml file, I see that this Helm chart mounts "/var/lib/docker" and "var/run/docker.sock" as volumes to the collector Daemonset.

Since docker is not supported on EKS version 1.24 and later, will this Collector daemonset still function as intended.
Can someone officially verify this Helm chart support for v1.24

Make exporter configuration more dynamic

Describe the issue

For now, default (static) configuration file exports metrics to AMP ignoring the option to deploy it only to Cloudwatch. In case if you don't use AMP, and expect your metrics being pushed only to Cloudwatch, the exporter will break and get stuck in CrashLoop, because no AMP URL has been set.

I ended up creating my own configmap and forking the whole chart codebase to make it fit.

data:
  adot-config:  |
    extensions:
      health_check: 
      sigv4auth:
        region: us-east-1
    receivers:
      awscontainerinsightreceiver:
        collection_interval: 
        container_orchestrator: 
        add_service_as_attribute: 
        prefer_full_pod_name: 
        add_full_pod_name_metric_label: 
    processors:
      batch/metrics:
        timeout: 60s
    exporters:
      awsemf:
        namespace: ContainerInsights
        log_group_name: '/aws/containerinsights/clou-eu-central-1/performance'
        log_stream_name: InputNodeName
        region: eu-central-1
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values:
        - Sources
        - kubernetes
        metric_declarations:
          
          # node metrics
          - dimensions: [[NodeName, InstanceId, ClusterName]]
            metric_name_selectors:
              - node_cpu_utilization
              - node_memory_utilization
              - node_network_total_bytes
              - node_cpu_reserved_capacity
              - node_memory_reserved_capacity
              - node_number_of_running_pods
              - node_number_of_running_containers
          - dimensions: [[ClusterName]]
            metric_name_selectors:
              - node_cpu_utilization
              - node_memory_utilization
              - node_network_total_bytes
              - node_cpu_reserved_capacity
              - node_memory_reserved_capacity
              - node_number_of_running_pods
              - node_number_of_running_containers
              - node_cpu_usage_total
              - node_cpu_limit
              - node_memory_working_set
              - node_memory_limit
          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization
              - pod_network_rx_bytes
              - pod_network_tx_bytes
              - pod_cpu_utilization_over_pod_limit
              - pod_memory_utilization_over_pod_limit
          - dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - pod_cpu_reserved_capacity
              - pod_memory_reserved_capacity
          - dimensions: [[PodName, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_number_of_container_restarts
          # cluster metrics
          - dimensions: [[ClusterName]]
            metric_name_selectors:
              - cluster_node_count
              - cluster_failed_node_count
          # service metrics
          - dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - service_number_of_running_pods
          # node fs metrics
          - dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
            metric_name_selectors:
              - node_filesystem_utilization
          # namespace metrics
          - dimensions: [[Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - namespace_number_of_running_pods
    service:
      pipelines:
        metrics:
          receivers:
          - awscontainerinsightreceiver
          processors:
          - batch/metrics
          exporters:
          - awsemf
      extensions:
      - health_check
      - sigv4auth

We need to build a template which will allow users to choose the exporter destination as well as the whole pipeline.

My proposal is to:

Template configuration file with Helm means and set in values the format, whether it's an export to AMP, or to Cloudwatch or both.

What did you expect to see?

Configuration file being adapted for the Cloudwatch use-case

Environment

This issue is environment-agnostic

Additional context

I'm willing to fix it by forking the chart and sending the PR for my proposal. Please let me know what you think about this issue.

log retention days is not working

Describe the bug
Setting the following properties of log_retention to true and 60 the retention days on CloudWatch console is still set to Never Expire

fluentbit:
  enabled: true
  image:
    tag: 2.28.1
  output:
    applicationLog:
      log_retention:
        enabled: true
        days: 60
    dataplaneLog:
      log_retention:
        enabled: true
        days: 60
    hostLog:
      log_retention:
        enabled: true
        days: 60

What did you expect to see?

What did you see instead?

Environment
kubernets 1.23
fluentbit 2.28.1 (i tried with 2.21.1 with the same result)
aws-otel-helm-charts 0.7.0

Have a nice day

Add enabled attribute for adotCollector section as well

fluentbit and fargateLog already contains an attribute enabled which can be set to false or true.
Please add same attribute "enabled" to adotCollector section as well so on can decide what should be installed for all 3 components.

The default pod attributes (e.g. env, volumes) should be declared in templates, not in values file

Describe the bug

Those pod attributes that otel-exporter definitely needs to function, such as env, volume, command and so on, won't need to be set by values, rather being placed directly in templates/ directory.

The current approach is not convenient, when you want to set your own environment variable, such as AWS SDK Credentials for example, and with that you have to list in your values file all other environmnent variables declared in default values file - otherwise, they will be gone. Same thing with volumes. Generally, there is an idea that values file can be shorter and such inconveniences will be avoided.

My proposal is to:

env, volume, command and other default attributes from values.yaml to templates, for both daemonset and sidecar.

Steps to reproduce

    containersName: "adot-collector-container"
    env:
      - name: "AWS_REGION"
        valueFrom:
          secretKeyRef:
            name: "adot-aws-credentials"
            key: "AWS_REGION"
      - name: "AWS_ACCESS_KEY_ID"
        valueFrom:
          secretKeyRef:
            name: "adot-aws-credentials"
            key: "AWS_ACCESS_KEY_ID"
      - name: "AWS_SECRET_ACCESS_KEY"
        valueFrom:
          secretKeyRef:
            name: "adot-aws-credentials"
            key: "AWS_SECRET_ACCESS_KEY"
    command: ...

helm install -n [NAMESPACE_NAME] [RELEASE_NAME] [REPO_NAME]/adot-exporter-for-eks-on-ec2

helm install -n monitoring adot aws-otel/adot-exporter-for-eks-on-ec2

What did you expect to see?

Declared variables set for daemonset/sidecar altogether with default variables, such as K8S_POD_NAME

What did you see instead?

All default variables are gone.

Environment

This issue is environment-agnostic

Additional context

I'm willing to fix it by forking the chart and sending the PR for my proposal. Please let me know what you think about this issue.

Pod is unable to getToken

After deploying the chart to a second cluster it's crashing with the following error.

2023/05/15 11:14:35 ADOT Collector version: v0.28.0
2023/05/15 11:14:35 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
SDK 2023/05/15 11:14:35 WARN falling back to IMDSv1: operation error ec2imds: getToken, http response error StatusCode: 403, request to EC2 IMDS failed
Error: invalid configuration: extensions::sigv4auth: could not retrieve credential provider: failed to refresh cached credentials, unexpected empty EC2 IMDS role list
2023/05/15 11:14:35 application run finished with error: invalid configuration: extensions::sigv4auth: could not retrieve credential provider: failed to refresh cached credentials, unexpected empty EC2 IMDS role list

Is there a way to increase logging for the sigv4auth extension?

containerd support

As per AWS documentation on dockershim deprecation: "Amazon EKS AMIs that are officially published will have containerd as the only runtime starting with version 1.23. This is targeted for end of the second quarter of 2022"

My customer has tested the aws-otel-helm-charts solution with only containerd used by the cluster and the solution stopped working - probably because of mounts to docker-specific host paths.

The customer is asking, if/when the solution will be adjusted to use only containerd

Prometheus remote write

Hi, how can i specify the configuration for Prometheus Remote Write ?
Like here : https://github.com/aws-samples/amazon-eks-observability-demo/blob/main/observability/resources/adot-configmap.yaml

adot-collector-config: |
    
    exporters:
      awsprometheusremotewrite:
        # replace this with your endpoint
        endpoint: "${APS_REMOTE_WRITE_ENDPOINT}"
        # replace this with your region
        aws_auth:
          region: "${APS_REGION}"
          service: "aps"
        namespace: "adot"
      awsxray:
          region: "${AWS_REGION}"
      logging:
        loglevel: debug
    extensions:
      health_check:
      pprof:
        endpoint: :1888
      zpages:
        endpoint: :55679
    service:
      extensions: [pprof, zpages, health_check]
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [awsxray]
        metrics:
          receivers: [prometheus]
          exporters: [logging, awsprometheusremotewrite]

Enable IRSA by providing different serviceAccount annotations for fluent-bit and ADOT

In order to make IRSA possible we need dedicated annotation for fluent-bit and adot-collector-sa ServiceAccount

Currently only the following is available in values.yaml

serviceAccount:
create: true
annotations: {}
name: ""

Please provide a serviceAccount section for fluent-bit including annotation and add annotations into already existing adot-collector-sa ServiceAccount section

Currently it is only possible to attach IAM policy to worker node profile.

	awsemf:
	namespace: {{ .Values.adotCollector.daemonSet.cwexporters.namespace }}
	log_group_name: '/aws/containerinsights/{{ .Values.clusterName }}/performance'
	log_stream_name: {{ .Values.adotCollector.daemonSet.cwexporters.logStreamName }}
	resource_to_telemetry_conversion:
	enabled: {{ .Values.adotCollector.daemonSet.cwexporters.enabled }}
	dimension_rollup_option: {{ .Values.adotCollector.daemonSet.cwexporters.dimensionRollupOption }}
	parse_json_encoded_attr_values: {{- range .Values.adotCollector.daemonSet.cwexporters.parseJsonEncodedAttrValues }}
	- {{.}}{{- end }}
	metric_declarations:
	{{ .Values.adotCollector.daemonSet.metricDeclarations \| nindent 10 }}

aws-observability / aws-otel-helm-charts Goto Github PK

aws-otel-helm-charts's People

Contributors

Stargazers

Watchers

Forkers

aws-otel-helm-charts's Issues

Introduction

Issues

`

Further Troubleshooting

Additional Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs