GithubHelp home page GithubHelp logo

aws-observability / aws-otel-community Goto Github PK

View Code? Open in Web Editor NEW
82.0 23.0 88.0 1.89 MB

Welcome to the AWS Distro for OpenTelemetry project. If you're using monitoring and observability tools for AWS products and services, this is a great place to ask questions, request features and network with other community members.

Home Page: https://aws-otel.github.io/

License: Apache License 2.0

Dockerfile 1.63% Go 19.13% Makefile 0.26% Shell 1.19% Mustache 2.59% Python 10.70% Ruby 6.25% Java 41.62% JavaScript 7.93% C# 8.71%
observability opensource opentelemetry opentelemetry-collector opentelemetry-api opentelemetry-exporter prometheus aws-xray

aws-otel-community's Introduction

Welcome to the AWS Distro for OpenTelemetry Community!

If you’re an open source observability user or contributor, we invite you to get involved. You can contribute in the following ways:

  • Ask a question by filing an issue.
  • File a bug by filing a pull request (PR).
  • Contribute an enhancement or a feature which you need and maintainers will be happy to code review. You can open an issue to discuss the design of your proposed enhancement and then file a PR.
  • If you’re just getting started, we welcome you to get started with issues tagged as “good first issues” and join Gitter to ask maintainers and other developers any questions you may have.
  • Join the awesome upstream OpenTelemetry project community. Participate in the OpenTelemetry SIG meetings where observability experts meet and discuss the OpenTelemetry specification and implementation of observability components. And the code is open source - so contribute to OpenTelemetry!

AWS Distro for OpenTelemetry (ADOT) Public Preview Program

All preview features and AWS services integrations available for ADOT are supported under the guidelines of the ADOT Public Preview Program.

ADOT Technical Documentation Site

The ADOT technical documentation is also an open source documentation site hosted on GitHub. This means you can file issues to report documentation errors and updates. You can also file PRs to submit doc fixes and updates. Please take a look at the ADOT site developer guide to make changes and file PRs.

Support

Please note that as per policy, we're providing support via GitHub on a best effort basis. However, if you have AWS Enterprise Support you can create a ticket and we will provide direct support within the respective SLAs.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public github issue.

License

This project is licensed under the Apache-2.0 License.

aws-otel-community's People

Contributors

alexperez52 avatar alolita avatar amanbrar1999 avatar amazon-auto avatar aneurysm9 avatar awssandra avatar bryan-aguilar avatar carolabadeer avatar dependabot[bot] avatar erichsueh3 avatar humivo avatar jj22ee avatar kausik-a avatar mbeacom avatar mhausenblas avatar mohammadalavi1986 avatar mrwacky42 avatar nicksulistio avatar normalfaults avatar paurushgarg avatar pgasca avatar rapphil avatar sethamazon avatar srprash avatar vasireddy99 avatar wilguo avatar willarmiros avatar wytrivail avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-otel-community's Issues

Logging support in OpenTelemetry

Provide logging support in ADOT (collector, EKS-add on, Lambda layers, SDKs, etc.). Tracking the maturity of components via upstream status.


Update by @mhausenblas on 2023-01-19:

In terms of supporting the logs signal type in ADOT we're particularly interested in feedback on:

  • What's your environment (EKS, ECS, Lambda, EC2, on-prem)?
  • What kind of logs (application-level vs. system-level or both) would you like to handle?
  • Anything that you can share in terms of logs formats mappable to OTLP do you need supported?
  • Any requirements around throughput?

ECS scheduled task w/ OTEL sidecar

We have an ECS setup that runs a scheduled task every 4 hours which takes <10 minutes to complete. I would like to push the application metrics from this task to Prometheus. The typical recommended way to do this for an ephemeral service is to use a Prometheus push gateway. Is there a recommended pattern to do this with ADOT yet?

I am considering just using a tight scrape loop and adding a sleep to the end of the scheduled task but this feels hacky.

AMP-onboard-ingest-metrics-OpenTelemetry walkthrough

Describe the bug
A clear and concise description of what the bug is.
OTEL agent scapes metrics but encounters an error when sending to the Prometheus endpoint

2020-12-16T19:14:23.809Z	ERROR	exporterhelper/queued_retry.go:239	Exporting failed. The error is not retryable. Dropping data.	{"component_kind": "exporter", "component_type": "awsprometheusremotewrite", "component_name": "awsprometheusremotewrite", "error": "Permanent error: Permanent error: server returned HTTP status 404 Not Found: <HttpNotFoundException/>", "dropped_items": 40}go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:239go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metricshelper.go:116go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:133github.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1	github.com/jaegertracing/[email protected]/pkg/queue/bounded_queue.go:77

Steps to reproduce
If possible, provide a recipe for reproducing the error.
Follow the walk-through setting up metrics ingestion using AWS Distro for Open Telemetry, https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-onboard-ingest-metrics-OpenTelemetry.html

What did you expect to see?
A clear and concise description of what you expected to see.
Was expecting to see something similar to the following:

Resource labels:
     -> service.name: STRING(kubernetes-service-endpoints)
     -> host.name: STRING(192.168.16.238)
     -> port: STRING(8080)
     -> scheme: STRING(http)
InstrumentationLibraryMetrics #0
Metric #0
Descriptor:
     -> Name: test_gauge0
     -> Description: This is my gauge
     -> Unit: 
     -> DataType: DoubleGauge
DoubleDataPoints #0
StartTime: 0
Timestamp: 1606511460471000000
Value: 0.000000

What did you see instead?
A clear and concise description of what you saw instead.

Environment
Describe any aspect of your environment relevant to the problem.
EKS v1.18

Additional context
Add any other context about the problem here.

Instrumenting the AWS SDK documentation incorrect

Instrumenting the AWS SDK docs shows:

...
    .addExecutionInterceptor(AwsSdkTracing.create(openTelemetry).newExecutionInterceptor())

however, it should be:

...
    .addExecutionInterceptor(AwsSdkTelemetry.create(openTelemetry).newExecutionInterceptor())

using the latest dependencies as of this writing:

implementation(platform("io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha:1.14.0-alpha"))
    implementation("io.opentelemetry.instrumentation:opentelemetry-aws-sdk-2.2")```

Global propagators documentation (Python) out of date

Describe the bug
Python documentation at https://aws-otel.github.io/docs/getting-started/python-sdk/trace-auto-instr#setting-the-global-propagators says to define progagators this way:

from opentelemetry import propagators
from opentelemetry.sdk.extension.aws.trace.propagation.aws_xray_format import AwsXRayFormat

propagators.set_global_textmap(AwsXRayFormat())

But as of opentelemetry-api==1.3.0 (and likely earlier), that method has been moved:

AttributeError: module 'opentelemetry.propagators' has no attribute 'set_global_textmap'

This appears to do the trick:

from opentelemetry import propagate
from opentelemetry.sdk.extension.aws.trace.propagation.aws_xray_format import AwsXRayFormat

propagate.set_global_textmap(AwsXRayFormat())

Prometheus Remote Write Exporter for AMP Credential error

Hi,
I am trying to configure AMP exporter but get NoCredentialProviders on logs of the collector as following.
WARN batchprocessor/batch_processor.go:209 Sender failed {"kind": "processor", "name": "batch/metrics", "error": "Permanent error: Post \"https://aps-workspaces.us-east-1.<workspace>/api/v1/remote_write\": NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"}

I have set aws access key according to default AWS credentials chain and also AWS go SDK as the documentation. what I could be missing here?

Thanks

Potential documentation issue on aws-otel.github.io site

Hello

I believe there might be a documentation issue in this section of the aws-otel site: Collecting infrastructure metrics

The section says: ADOT Collector uses the AWS Container Insights Receiver and the link leads to awscontainerinsightreceiver GitHub folder however, below in a list of supported platforms, the following bullet - Amazon ECS with cluster- and service-level metrics, leads to a site section describing the usage of a completely different receiver - awsecscontainermetrics.

Is that intentional or a documentation problem? Can awscontainerinsightreceiver actually be used with AWS ECS to collect cluster and service level metrics (e.g. for Fargate)?

Cheers,
Marcin

ADOT docs revamp

  1. Support versioned documentation
  2. Move agnostic content upstream (docs consolidation project in OpenTelemetry)
  3. Simplify and improve the navigation (fewer top-level items, better search and discovery)
  4. Move all product stuff to PDP

Customer feedback:

Tracing with the AWS Distro for OpenTelemetry Python SDK and X-Ray documentation incorrect

Describe the bug
Tracing with ADOT Python SDK and X-Ray documentation uses some old classes.

What did you expect to see?
Sending Traces to AWS X-Ray should be like the following because BatchExportSpanProcessor is renamed to BatchSpanProcessor at this PR and AwsXRayIdsGenerator is renamed to AwsXRayIdGenerator at this PR.

from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.extension.aws.trace import AwsXRayIdGenerator
# Sends generated traces in the OTLP format to an ADOT Collector running on port 4317
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
# Processes traces in batches as opposed to immediately one after the other
span_processor = BatchSpanProcessor(otlp_exporter)

What did you see instead?
At Sending Traces to AWS X-Ray, doc says:

from opentelemetry.sdk.trace.export import BatchExportSpanProcessor
from opentelemetry.sdk.extension.aws.trace import AwsXRayIdsGenerator
# Sends generated traces in the OTLP format to an ADOT Collector running on port 4317
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
# Processes traces in batches as opposed to immediately one after the other
span_processor = BatchExportSpanProcessor(otlp_exporter)

Document AWS Lambda compatibility

Is it possible to use AWS Distro for OpenTelemetry to instrument AWS Lambda functions?

From aws-otel.github.io :

Use AWS Distro for OpenTelemetry to instrument your applications running on Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS) on EC2, and AWS Fargate, as well as on- premises.

If so, I recommend updating the documentation.

Metrics Path and Port don't change according to pod annotations and metrics without a type get ignored completely

Describe the bugs

  • The collector doesn't check the metrics path nor port annotated on pods
  • Can't collect all spark executor metrics (These metrics don't have a type assigned on them)

Steps to reproduce
If trying to scrape metrics using the Prometheus Receiver under:

data:
  adot-collector-config: |
    receivers:
      prometheus:

These scrape configs wont work:

- action: keep
  regex: true
  source_labels:
  - __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
  regex: (.+)
  source_labels:
  - __meta_kubernetes_service_annotation_prometheus_io_path
  target_label: __metrics_path__
- action: replace
  regex: ([^:]+)(?::\d+)?;(\d+)
  replacement: $1:$2
  source_labels:
  - __address__
  - __meta_kubernetes_service_annotation_prometheus_io_port

What did you expect to see?
I expected adot to collect the metrics in the metrics path and port annotated in the pod.

What did you see instead?
My pods had in their annotations both the metrics path and the port from where to collect the metrics. I saw in the debug logs that the collector never checked neither the port nor the metrics path specified.

Workaround
I had to manually write the metrics path for each of my pods in the configuration file below like so:

metrics_path: "/metrics/executors/prometheus"

And under Service I had to specify the path:

spec:
  ports:
  - name: executor-metrics 
    port: 4040

Additional Information (The Second Issue)
After I finally set the metrics path and port explicitly I noticed it wasn't collecting all the Spark job metrics, this is due to them having no type defined, therefore adot just drops them and they all get ignored. I found myself with this issue. Only the metrics that had the suffix _total in their names were collected, all the others were ignored. I didn't find a workaround for this last issue and wasn't able to collect all the spark metrics.

Environment
AWS EKS Cluster

Full Configuration Used
I basically just do kubectl apply -f filename.yml with the configuration file below:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adot-collector-conf
  namespace: adot-col
  labels:
    app: aws-adot
    component: adot-collector-conf
data:
  adot-collector-config: |
    receivers:
      prometheus:
        config:
          global:
            evaluation_interval: 1m
            scrape_interval: 1m
            scrape_timeout: 30s
            

          scrape_configs:
          - job_name: 'kubernetes-service-endpoints-spark'
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            metrics_path: "/metrics/executors/prometheus"
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            - action: replace
              regex: (.+)
              source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port]
              target_label: __address__

          - job_name: 'kubernetes-executor-pods-spark'
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            metrics_path: "/metrics/executors/prometheus"
            kubernetes_sd_configs:
              - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name

          - job_name: 'kubernetes-executor-pods-spark-slow'
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            metrics_path: "/metrics/executors/prometheus"
            scrape_interval: 2m
            scrape_timeout: 40s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
          
          - job_name: 'kubernetes-driver-pods-spark'
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            metrics_path: "/metrics/driver/prometheus" # spark.metrics.conf.*.sink.prometheusServlet.path
            kubernetes_sd_configs:
              - role: pod
            relabel_configs:
            - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
              action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: kubernetes_namespace
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: kubernetes_pod_name
            

    exporters:
      awsprometheusremotewrite:
        # replace this with your endpoint
        endpoint: <removing this for privacy>
        # replace this with your region
        aws_auth:
          region: <removing this for privacy>
          service: "aps"
        namespace: "adot"
      logging:
        loglevel: debug

    extensions:
      health_check:
        endpoint: :13133
      pprof:
        endpoint: :1777
      zpages:
        endpoint: :55679

    service:
      extensions: [pprof, zpages, health_check]
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [logging, awsprometheusremotewrite]
---
# create adot-col service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
  name: amp-iamproxy-ingest-service-account
  namespace: adot-col
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<removing this for privacy>:role/amp-iamproxy-ingest-role

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: adotcol-admin-role
rules:
  - apiGroups: [""]
    resources:
    - nodes
    - nodes/proxy
    - services
    - endpoints
    - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
    - extensions
    resources:
    - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: adotcol-admin-role-binding
subjects:
  - kind: ServiceAccount
    name: amp-iamproxy-ingest-service-account
    namespace: adot-col
roleRef:
  kind: ClusterRole
  name: adotcol-admin-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: Service
metadata:
  name: adot-collector
  namespace: adot-col
  labels:
    app: aws-adot
    component: adot-collector
spec:
  ports:
  - name: executor-metrics 
    port: 4040
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  selector:
    component: adot-collector
  type: NodePort
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: adot-collector
  namespace: adot-col
  labels:
    app: aws-adot
    component: adot-collector
spec:
  selector:
    matchLabels:
      app: aws-adot
      component: adot-collector
  minReadySeconds: 5
  template:
    metadata:
      labels:
        app: aws-adot
        component: adot-collector
    spec:
      serviceAccountName: amp-iamproxy-ingest-service-account
      containers:
      - command:
          - "/awscollector"
          - "--config=/conf/adot-collector-config.yaml"
        image: public.ecr.aws/aws-observability/aws-otel-collector:latest
        name: adot-collector
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 400Mi
        ports:
        - containerPort: 8888  # Default endpoint for querying metrics.
        volumeMounts:
        - name: adot-collector-config-vol
          mountPath: /conf
        livenessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension port.
        readinessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension port.
      volumes:
        - configMap:
            name: adot-collector-conf
            items:
              - key: adot-collector-config
                path: adot-collector-config.yaml
          name: adot-collector-config-vol
---

Example of metrics I'm trying to scrape
Only the metrics that have the _total suffix get scraped, all the others get ignored (for example: metrics_executor_failedTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"})

spark_info{version="3.1.1", revision=""} 1.0
metrics_executor_rddBlocks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_memoryUsed_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1816
metrics_executor_diskUsed_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalCores{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_maxTasks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_activeTasks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_failedTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_completedTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalDuration_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0.0
metrics_executor_totalGCTime_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0.0
metrics_executor_totalInputBytes_bytes_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalShuffleRead_bytes_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalShuffleWrite_bytes_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_maxMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1078827417
metrics_executor_usedOnHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1816
metrics_executor_usedOffHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_totalOnHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1078827417
metrics_executor_totalOffHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_JVMHeapMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1224267120
metrics_executor_JVMOffHeapMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 120520560
metrics_executor_OnHeapExecutionMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_OffHeapExecutionMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_OnHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 5000
metrics_executor_OffHeapStorageMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_OnHeapUnifiedMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 5000
metrics_executor_OffHeapUnifiedMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_DirectPoolMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 320589
metrics_executor_MappedPoolMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_ProcessTreeJVMVMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 5970055168
metrics_executor_ProcessTreeJVMRSSMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 1603010560
metrics_executor_ProcessTreePythonVMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_ProcessTreePythonRSSMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_ProcessTreeOtherVMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_ProcessTreeOtherRSSMemory_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0
metrics_executor_MinorGCCount_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 18366
metrics_executor_MajorGCCount_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 18
metrics_executor_MinorGCTime_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 147.54
metrics_executor_MajorGCTime_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="driver"} 0.45
metrics_executor_rddBlocks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 0
metrics_executor_memoryUsed_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 1816
metrics_executor_diskUsed_bytes{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 0
metrics_executor_totalCores{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 2
metrics_executor_maxTasks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 2
metrics_executor_activeTasks{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 2
metrics_executor_failedTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 0
metrics_executor_completedTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 450466
metrics_executor_totalTasks_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 450468
metrics_executor_totalDuration_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 2356.288
metrics_executor_totalGCTime_seconds_total{application_id="spark-af748c06812c416e946c4aedfd4b4c4b", application_name="Spark Pi", executor_id="2"} 11.77
...

@alolita

aws otel collector supported jaeger receiver remote_sampling configuration doesn't work

Hi folk. I am using jaeger as the trace data receiver and without remote_sampling configuration, it's working fine but if I configure remote_sampling then I get an error.

Here is the error,

aws-ot-collector_1 | Error: failed to get config: invalid configuration: receiver "jaeger" has invalid configuration: unable to extract port for the Remote Sampling endpoint: endpoint is not formatted correctly: missing port in address aws-ot-collector_1 | 2022/04/18 09:25:59 application run finished with error: failed to get config: invalid configuration: receiver "jaeger" has invalid configuration: unable to extract port for the Remote Sampling endpoint: endpoint is not formatted correctly: missing port in address

And the configuration is,

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777

receivers:
  jaeger:
    protocols:
      grpc:
    remote_sampling:
      endpoint: "jaeger:14268"
      tls:
        insecure: true
      strategy_file: "/etc/strategies.json"
      strategy_file_reload_interval: 10s

processors:
  batch:

exporters:
  logging:
    loglevel: debug
  awsxray:
    region: 'us-west-2'

service:
  pipelines:
    traces:
      receivers: [jaeger]
      exporters: [awsxray]

  extensions: [pprof]
  telemetry:
    logs:
      level: debug

Please help me if anyone knows where I am doing wrong. I would like to provide more info if needed. Thanks

X-Ray traces have no log information in CloudWatch

This is not a bug, just a configuration question. We've migrated from Java X-Ray SDK to AWS OTEL. And the thing we are facing now is the absence of log records while analyzing traces. This worked with X-Ray SDK:
image

However, with OTEL, the logs are not mapped (we use Fluentbit). The only difference we've found in the metrics/traces messages is the absence of the following key:

...
Segments: [
 {
  ...
  "document": {
   ....
   "aws": {
     ...
     "cloudwatch_logs": {
         "log_group": "our log group"
     ....

The aws.cloudwatch_logs.log_group is missing when using OTEL. The documentation says it translates from aws.log.group.names (https://aws-otel.github.io/docs/getting-started/x-ray#otel-span-cw-logs-metadata-translation).

We've tried configuration using environment properties, in OTEL_RESOURCE_ATTRIBUTES with no luck. What are we missing? Is it possible to map Fluentbit logs (visible in Logs/Log groups under the desired group and containing AWS-XRAY-TRACE-ID) with OTEL traces in a way we see the traces with the correlated log records?

Thank you.

Clarification on Configuration with ECS

Trying to add the OTEL Collector and Emitter as sidecar containers to our Java app to be able to do tracing in AWS X-Ray using the instructions here. I wasn't sure what the difference between the default config and task metrics config was, so I went with the default config. Now, In X-Ray, I see an application called "SampleServer" running on port 8000. There was nothing in X-Ray before, so I assume it's picking our application up, but our application runs on a different port.

How can we specify specific configuration for our app using the sidecar container? My understanding was that we would not need to modify our application in any way for this to work, but that may be wrong.

Provide additional guidance re ADOT/ECS/NodeJS and dimensions

We run a NodeJS app in Fargate, and want to export custom metrics.

Unfortunately, CloudWatch is hindered by a restriction of 10 dimensions on metrics. The dimensions attached by ADOT by default total 6 (TaskDefinitionFamily, TaskDefinitionRevision, LaunchType, ClusterARN, TaskARN and OtelLib) and when using the OpenTelemetry SDK for NodeJS, by default four further dimensions are attached (service.name, telemetry.sdk.language, telemetry.sdk.name, telemetry.sdk.version). Therefore, when using the Opentelemetry SDK for NodeJS with ADOT, all of your dimensions are used up, this makes it impossible to add dimensions to your own metrics.

I cannot find advice on how to either disable some of the dimensions attached by ADOT, or disable some of the dimensions added by default on the Opentelemetry SDK. This is quite frustrating as this feels like it should be an easy to pick up and implement solution, and yet I am having to find a configuration to customise in order to be able to attach a meagre two dimensions to my metric.

YAML configuration docs?

Hello,

I'm deploying a aws-otel-collector instance as an ECS service using fargate, with a plain http AWS load balancer in front. Currently my ECS task is failing with a "read from UDP socket: read udp [::]:2000: use of closed network connection" message, and I was wondering exactly what I can disable to debug things. Is there a document that details what can go into the configuration yaml file?

Is it possible to just remove the awsxray related stuff below if I know we won't use it?

Many thanks!

extensions:
  health_check:
receivers:
  awsxray:
    endpoint: 0.0.0.0:2000
    transport: udp
  otlp:
    protocols:
      grpc:
        endpoint:
      http:
        endpoint:
processors:
  batch/traces:
    timeout: 5s
    send_batch_size: 256
  resourcedetection:
    detectors: [env, ec2, ecs]
    timeout: 5s
    override: true
exporters:
  otlphttp:
    endpoint: REDACTED
service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [awsxray,otlp]
      processors: [resourcedetection,batch/traces]
      exporters: [otlphttp]

Request for adding OTEL-Permissions Step to AWS Distro for OpenTelemetry (ADOT) add-on.

Currently, Grant permissions to Amazon EKS add-ons to install ADOT step is added as a Prerequisite. It would be beneficial if kubectl apply -f https://amazon-eks.s3.amazonaws.com/docs/addons-otel-permissions.yaml can be part of the AWS Distro for OpenTelemetry (ADOT) EKS add-on installation. This will help us avoid additional step while creating automation for EKS Addon Installation. We are currently working on ADOT Addon for EKS Blueprints, this will help us avoid additional kubernetes manifest step,

AWS OTel Collector service fail to start

We have a web app running on windows server that is instrumented with OpenTelemetry library AspNet and we are trying to run the AWS OTel Collector to export the metrics to external monitoring solution with Sumologic. But when starting the service it is failing with following error:

At C:\Program Files\Amazon\AWSOTelCollector\aws-otel-collector-ctl.ps1:96 char:12
+     $svc | Start-Service
+            ~~~~~~~~~~~~~
    + CategoryInfo          : OpenError: (System.ServiceProcess.ServiceController:ServiceController) [Start-Service],
   ServiceCommandException
    + FullyQualifiedErrorId : StartServiceFailed,Microsoft.PowerShell.Commands.StartServiceCommand```

Help with ECS Fargate setup

Hello,

I followed the documentation here

The otel-trace-emitter is my java application image and the collector is aws-otel-collector version 0.7 as sidecar.

My java application is up but throws error:
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:4317

What am I missing?

Thanks,
Raghuram

Day 3 in nodejs, and giving up

Hopefully this will save someone else some time. This is definitely not criticism of AWS or OTEL -- OTEL is essentially experimental, and thus AWS too. I can't stress how much I appreciate that this work is all happening.

But, as someone coming fresh to opentelemetry: this all is a minefield. Things are moving quickly and definitely breaking things over at otel js. Tutorials/help docs are outdated and incorrect. Versions and types don't line up. 😭

Personally, I'll be waiting for more stability. It has been 3 days and I didn't even come close to getting off the ground with the embedded cloudwatch/xray+adot stuff. But... I'm excited for the future. Can't wait.

Lambda exiting before sending telemetry data

Snippet of main handler with delay added to give the function time to send the data is below. Without the delay at the end it would not send the data more than it would. Everything works with the delay but would rather not incur the extra cost of the 400ms per function call.

I have a separate file in the project that creates the tracer. Any suggestions on how to get this functioning without the delay would be appreciated.

` const { tracing } = require('./tracing');

exports.handler = async(event, context) => {
const delay = 400;
const tracer = tracing;
const span = tracer.startSpan('POS Request');
context.callbackWaitsForEmptyEventLoop = true;
let rslt = '';

span.setAttributes({
    'service.version': process.env.SERVICE_VERSION,
    'faas.execution': context.awsRequestId,
    'faas.coldstart': coldStartFlag,
    'faas.logStream': context.logStreamName
});

coldStartFlag = false;

if (event.key_id != -1) {
    span.addEvent('Start processing');
    rslt = await processEvent(event, tracer, span);
} else {
    rslt = JSON.stringify({Path: 'No project selected.'});
    span.addEvent('Project Missing', { result: rslt });
}
if (typeof rslt === 'string' && rslt.toUpperCase().includes('ERROR') || rslt[0] === undefined) {
    span.end();
    await tracer.getActiveSpanProcessor().forceFlush();
    // await tracer.getActiveSpanProcessor().shutdown();
    await sleep(delay);
    throw rslt;
}
else {
    span.addEvent('Finished processing');
    span.end();
    await tracer.getActiveSpanProcessor().forceFlush();
    // await tracer.getActiveSpanProcessor().shutdown();
    await sleep(delay);
    return rslt;
}

}; `

Instrumenting Prometheus metrics on an ECS service with multiple instances of task definition

Hello,

I'm working on a prototype to instrument application metrics to Prometheus for an application hosted in ECS with an application load balanced fargate service. The AWS OTEL collector is running as a sidecar container - as below:

image

This is working well with a single task, however, with multiple instantiations of the task definition, I'm unable to distinguish which instance the metric has been pulled from, resulting in inaccurate data.

Each metric has an instance label, however, it's the same value across all instances, 0.0.0.0:8080, which is the scrape target.

As a result, each collector is writing the same metric plus labels to AWS Managed Prometheus and I have found no way to distinguish.

I tried tried the following config -
resource_to_telemetry_conversion:
enabled: true

Which added labels service_name and service_instance_id, however the values are not unique - aws-otel-app (job_name) and 0.0.0.0:8080 (scrape_target) for all metrics.

I can see the target metric has the following ECS related labels:
aws_ecs_cluster_name, aws_ecs_launchtype, aws_ecs_service_name, aws_ecs_task_arn, aws_ecs_task_family, aws_ecs_task_id, aws_ecs_task_known_status, aws_ecs_task_launch_type, aws_ecs_task_pull_started_at, aws_ecs_task_pull_stopped_at, aws_ecs_task_revision.

Applying the aws_ecs_task_id label to all other metrics would be useful but I have been unable to do this successfully.

I appreciate any help or guidance,
Thanks,
Matt

adot-config -

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 30s
        scrape_timeout: 10s
      scrape_configs:
        - job_name: "aws-otel-app"
          honor_labels: true
          static_configs:
            - targets: ["0.0.0.0:8080"]
             
  awsecscontainermetrics:
    collection_interval: 10s
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:55681
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
  memory_limiter:
    limit_mib: 100
    check_interval: 5s
exporters:
  awsprometheusremotewrite:
    endpoint: https://aps-workspaces.eu-west-2.amazonaws.com/workspaces/ws-{workspace-id}/api/v1/remote_write
    aws_auth:
      region: {region}
      service: aps
    resource_to_telemetry_conversion:
      enabled: true
  logging:
    loglevel: info
  awsxray:
    region: {region}
    index_all_attributes: true
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, awsprometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, awsprometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [awsxray]

Issue while fetching metrics with awsecscontainermetrics receiver.

The adot collector is deployed as a sidecar in a ecs task. The receiver is able to get the metrics but without entity.name and hence is unable to map with the service because there is no dimension to it. Is there any way to fix it?

Yaml File:

receivers:
  awsecscontainermetrics:

processors:
  batch:

exporters:
  logging:
    loglevel: debug
    sampling_initial: 5
    sampling_thereafter: 200
  otlp:
    endpoint: ENDPOINT
    headers:
      api-key: SENSITIVE

service:
  pipelines:
      metrics:
          receivers: [awsecscontainermetrics]
          exporters: [otlp, logging]

event:

  {
      "events": [
        {
          "aws.ecs.cluster.name": "ecs-clstr",
          "aws.ecs.launchtype": "fargate",
          "aws.ecs.service.name": "undefined",
          "aws.ecs.task.arn": "arn:aws:ecs:ap-south-1:99999999999:task/ecs-clstr/51caeeaf12a74f86bac677df6df3c24b",
          "aws.ecs.task.family": "sample-app",
          "aws.ecs.task.id": "51caeeaf12a74f86bac677df6df3c24b",
          "aws.ecs.task.known_status": "RUNNING",
          "aws.ecs.task.launch_type": "FARGATE",
          "aws.ecs.task.pull_started_at": "2022-07-20T06:38:06.172986773Z",
          "aws.ecs.task.pull_stopped_at": "2022-07-20T06:38:17.242745615Z",
          "aws.ecs.task.revision": "110",
          "aws.ecs.task.version": "110",
          "cloud.account.id": "99999999999",
          "cloud.availability_zone": "ap-south-1b",
          "cloud.region": "ap-south-1",
          "ecs.task.memory.utilized": {
            "type": "gauge",
            "count": 1,
            "sum": 55,
            "min": 55,
            "max": 55,
            "latest": 55
          },
          "instrumentation.provider": "opentelemetry",
          "metricName": "ecs.task.memory.utilized",
          "newrelic.source": "api.metrics.otlp",
          "otel.library.name": "",
          "otel.library.version": "",
          "timestamp": 1658394321523,
          "unit": "Megabytes"
        }

Also, it is unable to get the ECS service name.

2022-07-20T12:08:41.529+05:30 | -> aws.ecs.service.name: STRING(undefined)
-- | --
  | 2022-07-20T12:08:41.530+05:30 | -> aws.ecs.service.name: STRING(undefined)
  | 2022-07-20T12:08:41.530+05:30 | -> aws.ecs.service.name: STRING(undefined)
  | 2022-07-20T12:08:41.531+05:30 | -> aws.ecs.service.name: STRING(undefined)
  | 2022-07-20T12:09:01.525+05:30 | -> aws.ecs.service.name: STRING(undefined)
  | 2022-07-20T12:09:01.525+05:30 | -> aws.ecs.service.name: STRING(undefined)

cert-manager version for ADOT add-on

Hello,
There is a note on the ADOT add-on requirements page https://aws-otel.github.io/docs/getting-started/adot-eks-add-on/requirements#tls-certificate-requirement saying that cert-manager with version < 1.6 is required. It prevents to use the latest version for other components, e.g. integrating appmesh mutual TLS with cert-manager. This version is old and there are CVEs ASAIK
Why is this limitation?
ADOT add-on fails in installation if cert-manager is not installed if I want to use another method to create certificates for webhooks

Thanks

How to connect to the aws-otel-collector using opentelemetry-cpp?

I'd like to export from my client app using the C++ lib and connect to the OTLP exporter. I assume I need to authenticate using AWS sig v4 to reach the OTLP HTTP endpoint. Do you have sample code?

Thinking about it, these are two questions:

  • Does the ADOT collector provide IAM authentication for HTTP endpoints when using as OTLP receiver?
  • Where in the C++ lib can I add signatures/headers to the underlying HTTP clients?

Getting Metrics to flow in AWS Distro for OpenTelemetry Lambda Support For Python

Using the current release arn:aws:lambda:\<region>:901920570463:layer:aws-otel-python-amd64-ver-1-11-1:1 it looks like Opentelemetry metrics are available in the SDK used in a lambda with the above lambda layer if you import _metrics. At least it doesn't give an error.

But I can not get metrics to actually flow to an exporter. I have tried something based on the Python metrics example in the Opentelemetry docs (shown below). The only difference was I could not get the

processors:
  batch:

in the collector config.yml to work in the v1.11.1 lambda. It would always crash the extension. I presume that things are related to that.

I also tried to use the metrics example from the v1.11.1 opentelemetry-python but could not import from opentelemetry.exporter.otlp.proto.grpc._metric_exporter as it seemed to not be available in the aws python lambda layer. Also it had the processors batch line in the collector config.yaml that would cause the lambda to crash.

Here is the lambda code and the collector config.yml I was trying:

import os
import json
from opentelemetry import trace
from typing import Iterable
from opentelemetry import _metrics
from random import randint
import time

# Acquire a tracer
tracer = trace.get_tracer(__name__)

# Acquire a meter and create a counter
meter = _metrics.get_meter(__name__)

my_counter = meter.create_counter(
    "my_counter",
    description="My Counter counting",
)
       
# The lambda H
def handler(event, context):
    with tracer.start_as_current_span("top") as topspan:
        res = randint(1, 6)

        # Counter
        my_counter.add(1, {"res.value": res})

        json_region = os.environ['AWS_REGION']
        topspan.set_attribute("region", json_region)
        topspan.set_attribute("res.value", res)
        time.sleep(10)
        return {
            "statusCode": 200,
            "headers": {
                "Content-Type": "application/json"
            },
            "body": json.dumps({
                "Region ": json_region,
                "res ": res
            })
        }

I've tried many variations of this, none end up exporting metrics to honeycomb (or the logger as far as I can tell) Traces do flow to Honeycomb.

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  logging:
    loglevel: debug
  awsxray:
  otlp:
    endpoint: "api.honeycomb.io:443"
    headers: {
      "x-honeycomb-team": "Qf0n7UBOs2sG3DL7SA8CDD",
      "x-honeycomb-dataset": "rob"
    }
  otlp/metrics:
    endpoint: "api.honeycomb.io:443"
    headers:
      "x-honeycomb-team": "Qf0n7UBOs2sG3DL7SA8CDD"
      "x-honeycomb-dataset": "rob"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [logging, awsxray, otlp]
    metrics:
      receivers: [otlp]
      processors: []
      exporters: [logging, otlp/metrics]

Ether I'm doing something wrong (highly probable) or the v1.11.1 AWS Lambda layer for Python is not quite supporting metrics. Hard for me to tell right now and can't find any explicit examples for metrics and the AWS Python Lambda Layer. Any help would be appreciated!

Span's add_event and record_exception not working with X-Ray

Span's add_event() and record_exception() are not working when exporter is X-Ray. For other exporters like Jaeger I could see Logs as a separate field in UI for add_event() and record_exception(). Looks like set_attribute() works fine as I can see the attributes in Metadata section of the span X-Ray.

Steps to reproduce:

  1. Run the aws-otel-collector in docker as below
    docker run --rm -p 4317:4317 -p 55680:55680 -p 8889:8888 -e "AWS_ACCESS_KEY_ID=" -e "AWS_SECRET_ACCESS_KEY=" -e AWS_REGION= -v :/otel-local-config.yaml --name awscollector public.ecr.aws/aws-observability/aws-otel-collector:latest --config otel-local-config.yaml
  2. Create a simple python script with 2 spans as below and run:
    """OTLP exporter"""
    from opentelemetry import trace
    from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
    )
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    from opentelemetry.sdk.extension.aws.trace import AwsXRayIdGenerator

span_exporter = OTLPSpanExporter(
endpoint="localhost:4317",
insecure=True,
)
span_processor = BatchSpanProcessor(span_exporter)
tracer_provider = TracerProvider(id_generator=AwsXRayIdGenerator())
tracer_provider.add_span_processor(span_processor)
trace.set_tracer_provider(tracer_provider)

tracer = trace.get_tracer(name)

try:
s1 = tracer.start_span("foo")
s1.add_event("Tracing.........Foo...........", {})
s1.set_attribute("example", "value")
time.sleep(2)
s2 = tracer.start_span("bar", context=trace.set_span_in_context(s1))
try:
1 / 0
except Exception as e:
s2.record_exception(e)
finally:
s2.end()
finally:
s1.end()

Note: I have to use tracer.start_span instead of tracer.start_as_current_span as per the requirement

Expected:
s1.add_event("Tracing.........Foo...........", {}) - has to send logs/event to X-Ray (foo Segment)
s2.record_exception(e) - has to send Exception trace to X-Ray (bar Subsegment)

Actual:

  • Dont know where the events/logs will be for foo Segment in X-Ray
  • Did not get Exception trace in Exception tab of bar Subsegment
    image

Environment:
Install docker, python3.8, opentelemetry-sdk and opentelemtry-api

Incorrect ARN format on Lambda layer docs

Describe the bug
The documentation for Lambda is incorrect. It includes an extra slash in the 'Lambda layer ARN format'

https://aws-otel.github.io/docs/getting-started/lambda/lambda-java

Steps to reproduce
https://aws-otel.github.io/docs/getting-started/lambda/lambda-java page

What did you expect to see?

I should be able to copy and paste the ARN, replace the region and it should work.

What did you see instead?
Instead the ARN had an additional '\' in.

Environment
None

Additional context
None

Dynatrace exporter config docs mention invalid key

The docs in https://aws-otel.github.io/docs/partners/dynatrace still list the config key insecure_skip_verify at top-level although this config was restructured in the collector a while ago (see the collector readme) and nested into a section tls.
This is the same issue as reported here: open-telemetry/opentelemetry-collector-contrib#7566
The exporter readme in the collector was fixed in open-telemetry/opentelemetry-collector-contrib@6cfac22 and open-telemetry/opentelemetry-collector-contrib@c8dc4b9.
Could you please take over these changes in https://aws-otel.github.io/docs/partners/dynatrace or instruct us how to update the docs there?
Thanks!

ADOT guidelines for testing and marking partner exporters as stable

ADOT guidelines for testing and marking partner exporters as stable

Objectives

  1. Ensure all partner components including receivers and exporters for partner end-points are tested and verified with the latest version of the Collector (v 0.30.0) which will guarantee tracing stability in OpenTelemetry.
  2. Testing includes:
    1. ADOT end-to-end testing with public service end-point
    2. ADOT Soak testing to ensure there are no CPU, memory leaks
    3. Performance testing to ensure all known thresholds/limitations are tested against and verified for these components
    4. Security testing identifying any known vulnerabilities

How to do conduct testing

  1. Use the latest version of the Collector (corresponding to the tracing stable release targeting launch on 8/13)
  2. Use the ADOT test framework to run end-to-end tests and soak tests. See ADOT Testing Framework guidance here.
  3. Performance testing has to be done for each component based on individual service providers (vendors / partner service end point limitations)
  4. Security testing includes running GoSec for Go modules; CodeQL security scans should be run and list of any known vulnerabilities should be submitted in the issue
  5. Use guidelines itemized in the readme to publish end-to-end test and soak test results. See the ADOT test framework here.
  6. OTEL Stable Readiness Issue: File an issue here to indicate that the partner has tested, verified and requested their components to be marked stable for the tracing stable release.
  7. ADOT Stable Readiness issue: File an issue here to publish all ADOT test results for each partner component.

Expected results

  1. Steps 1-4 need to be run and pass all testing success criteria.

Deliverables

  1. 8/13 OTEL Collector core will be marked as stable
  2. 8/23 OTEL will announce tracing stability
  3. 8/20 OTEL Collector contrib components will be ready / marked stable to be included in Collector contrib release (includes AWS and partners exporters)
  4. 9/23 ADOT announces tracing stable GA with X-Ray, OTLP, and Partner tracing exporters

[Prometheus Sample App]: Copy from Open-o11y / Prometheus-Sample-App and Add new features

Add Features:

  • Updates from Prometheus Sample App of Open-O11y
  • Configurable constant labels generation
  • Configurable multiple datapoints generation
  • Introduce normal distribution for summary metric values.
  • Create example deployment file to simulate multiple host/endpoints scrapping environment for OTEL Collector using Prometheus Sample App
  • Update Readme to run cluster on Kubernetes and on Amazon EKS.

cc @alolita @Aneurysm9

Go Demo Walkthrough in the Getting Started section is broken

The current Demo Walkthrough doesn't work. I don't think it worked before (because of unused variables in the initTracer() function and some missing imports), but otlp exporters have moved so now it's really broken in a confusing way. Here is the corrected code that I ended up with without straying too far from the original:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"net/http"

	"github.com/gorilla/mux"
	"go.opentelemetry.io/contrib/instrumentation/github.com/gorilla/mux/otelmux"
	"go.opentelemetry.io/contrib/propagators/aws/xray"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"google.golang.org/grpc"

	sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

var tracer = otel.Tracer("demo-app")

func main() {
	fmt.Println("starting hello world")

	initTracer()

	r := mux.NewRouter()

	r.Use(otelmux.Middleware("my-server"))

	r.HandleFunc("/hello-world", handler).Methods(http.MethodGet)
	http.ListenAndServe(":8080", r)
}

func handler(w http.ResponseWriter, r *http.Request) {
	w.Header().Set("Content-Type", "application/json")
	json.NewEncoder(w).Encode("hello world")
}

func initTracer() {
	driver := otlptracegrpc.NewClient(
		otlptracegrpc.WithInsecure(),
		otlptracegrpc.WithEndpoint("127.0.0.1:4317"),
		otlptracegrpc.WithDialOption(grpc.WithBlock()),
	)

	exporter, err := otlptrace.New(context.Background(), driver)
	if err != nil {
		fmt.Println("error!", err)
	}

	idg := xray.NewIDGenerator()

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithSampler(sdktrace.AlwaysSample()),
		sdktrace.WithSyncer(exporter),
		sdktrace.WithIDGenerator(idg),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(xray.Propagator{})
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.