GithubHelp home page GithubHelp logo

int128 / datadog-actions-metrics Goto Github PK

View Code? Open in Web Editor NEW
60.0 3.0 18.0 31.37 MB

Send GitHub Actions metrics to Datadog for developer experience

License: Apache License 2.0

JavaScript 0.06% TypeScript 99.94%
github-actions datadog metrics observability

datadog-actions-metrics's Introduction

datadog-actions-metrics ts e2e

This is an action to send metrics of GitHub Actions to Datadog on an event. It is inspired from yuya-takeyama/github-actions-metrics-to-datadog-action.

Purpose

Improve the reliability and experience of CI/CD pipeline

To collect the metrics when a workflow run is completed:

on:
  workflow_run:
    workflows:
      - '**'
    types:
      - completed

jobs:
  send:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: int128/datadog-actions-metrics@v1
        with:
          # create an API key in https://docs.datadoghq.com/account_management/api-app-keys/
          datadog-api-key: ${{ secrets.DATADOG_API_KEY }}

For the developer experience, you can analyze the following metrics:

  • Time to test an application
  • Time to build and deploy an application

For the reliability, you can monitor the following metrics:

  • Success rate of the default branch
  • Rate limit of built-in GITHUB_TOKEN

Here is an example of screenshot in Datadog.

image

Improve the reliability and experience of self-hosted runners

For the self-hosted runners, you can monitor the following metrics for reliability and experience:

Here is an example of screenshot in Datadog.

image

Improve your team development process

You can analyze your development activity such as number of merged pull requests. It helps the continuous process improvement of your team.

To collect the metrics when a pull request is opened, closed or merged into main:

on:
  pull_request:
    types:
      - opened
      - closed
  push:
    branches:
      - main

jobs:
  send:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: int128/datadog-actions-metrics@v1
        with:
          # create an API key in https://docs.datadoghq.com/account_management/api-app-keys/
          datadog-api-key: ${{ secrets.DATADOG_API_KEY }}

Overview

This action can handle the following events:

  • workflow_run event
  • pull_request event
  • push event
  • schedule event

Other events are ignored.

Metrics for workflow_run event

Workflow run

This action sends the following metrics.

  • github.actions.workflow_run.total
    • Total workflow runs (count)
  • github.actions.workflow_run.conclusion.{CONCLUSION}_total
    • Total workflow runs by the conclusion (count). See the official document for the possible values of CONCLUSION field
    • e.g. github.actions.workflow_run.conclusion.success_total
    • e.g. github.actions.workflow_run.conclusion.failure_total
  • github.actions.workflow_run.duration_second
    • Time from a workflow run is started until it is updated (gauge)
  • github.actions.workflow_run.duration_second.distribution
    • Time from a workflow run is started until it is updated (distribution)

It has the following tags:

  • repository_owner
  • repository_name
  • workflow_name
  • workflow_id
  • run_attempt
    • Attempt number of the run, 1 for first attempt and higher if the workflow was re-run
  • event
  • sender
  • sender_type = either Bot, User or Organization
  • branch
  • default_branch = true or false
  • pull_request_number
    • Pull request(s) which triggered the workflow
  • conclusion

See also the actual metrics in the E2E test.

Job

This action sends the following metrics if collect-job-metrics is enabled.

  • github.actions.job.total
    • Total jobs (count)
  • github.actions.job.conclusion.{CONCLUSION}_total
    • Total jobs by the conclusion (count)
    • e.g. github.actions.job.conclusion.success_total
    • e.g. github.actions.job.conclusion.failure_total
  • github.actions.job.queued_duration_second
    • Time from a job is created to started (gauge)
  • github.actions.job.queued_duration_second.distribution
    • Time from a job is created to started (distribution)
  • github.actions.job.duration_second
    • Time from a job is started to completed (gauge)
  • github.actions.job.duration_second.distribution
    • Time from a job is started to completed (distribution)
  • github.actions.job.start_time_from_workflow_start_second.distribution
    • Time from the workflow run is started until a job is started (distribution)
  • github.actions.job.lost_communication_with_server_error_total
    • Count of "lost communication with the server" errors of self-hosted runners. See the issue #444 for details
  • github.actions.job.received_shutdown_signal_error_total
    • Count of "The runner has received a shutdown signal" errors of self-hosted runners.

It has the following tags:

  • repository_owner
  • repository_name
  • workflow_name
  • workflow_id
  • event
  • sender
  • sender_type = either Bot, User or Organization
  • branch
  • default_branch = true or false
  • pull_request_number
    • Pull request(s) which triggered the workflow
  • job_name
  • job_id
  • conclusion
  • status
  • runs_on
    • Runner label inferred from the workflow file if available
    • e.g. ubuntu-latest

Step

This action sends the following metrics if collect-step-metrics is enabled.

  • github.actions.step.total
    • Total steps (count)
  • github.actions.step.conclusion.{CONCLUSION}_total
    • Total steps by the conclusion (count)
    • e.g. github.actions.step.conclusion.success_total
    • e.g. github.actions.step.conclusion.failure_total
  • github.actions.step.duration_second
    • Time from a step is started until completed (gauge)
  • github.actions.step.duration_second.distribution
    • Time from a step is started until completed (distribution)
  • github.actions.step.start_time_from_workflow_start_second.distribution
    • Time from the workflow run is started until a step is started (distribution)

It has the following tags:

  • repository_owner
  • repository_name
  • workflow_name
  • workflow_id
  • event
  • sender
  • sender_type = either Bot, User or Organization
  • branch
  • default_branch = true or false
  • pull_request_number
    • Pull request(s) which triggered the workflow
  • job_name
  • job_id
  • step_name
  • step_number = 1, 2, ...
  • conclusion
  • status
  • runs_on
    • Runner label inferred from the workflow file if available
    • e.g. ubuntu-latest

Enable job or step metrics

To send the metrics of jobs and steps:

steps:
  - uses: int128/datadog-actions-metrics@v1
    with:
      datadog-api-key: ${{ secrets.DATADOG_API_KEY }}
      collect-job-metrics: true
      collect-step-metrics: true

To send the metrics of jobs and steps on the default branch only:

steps:
  - uses: int128/datadog-actions-metrics@v1
    with:
      datadog-api-key: ${{ secrets.DATADOG_API_KEY }}
      collect-job-metrics: ${{ github.event.workflow_run.head_branch == github.event.repository.default_branch }}
      collect-step-metrics: ${{ github.event.workflow_run.head_branch == github.event.repository.default_branch }}

This action calls GitHub REST API and GraphQL API to get jobs and steps of the current workflow run. Note that it may cause the rate exceeding error when too many workflows are run.

If the job or step metrics is enabled, this action requires the following permissions:

permissions:
  actions: read
  checks: read
  contents: read

Metrics for pull_request event

Pull request (opened)

This action sends the following metrics on opened type.

  • github.actions.pull_request_opened.total
    • Total opened events (count)
  • github.actions.pull_request_opened.commits
    • Number of commits in a pull request (count)
  • github.actions.pull_request_opened.changed_files
    • Number of changed files in a pull request (count)
  • github.actions.pull_request_opened.additions
    • Number of added lines in a pull request (count)
  • github.actions.pull_request_opened.deletions
    • Number of deleted lines in a pull request (count)

It has the following tags:

  • repository_owner
  • repository_name
  • sender
  • sender_type = either Bot, User or Organization
  • user
  • pull_request_number
  • draft = true or false
  • base_ref
  • head_ref

Pull request (closed)

This action sends the following metrics on closed type.

  • github.actions.pull_request_closed.total
    • Total closed events (count)
  • github.actions.pull_request_closed.since_opened_seconds
    • Time from a pull request is opened to closed (gauge)
  • github.actions.pull_request_closed.since_first_authored_seconds
    • Time from the authored time of the first commit until closed (gauge)
  • github.actions.pull_request_closed.since_first_committed_seconds
    • Time from the committed time of the first commit until closed (gauge)
  • github.actions.pull_request_closed.commits
    • Number of commits in a pull request (count)
  • github.actions.pull_request_closed.changed_files
    • Number of changed files in a pull request (count)
  • github.actions.pull_request_closed.additions
    • Number of added lines in a pull request (count)
  • github.actions.pull_request_closed.deletions
    • Number of deleted lines in a pull request (count)

It has the following tags:

  • repository_owner
  • repository_name
  • sender
  • sender_type = either Bot, User or Organization
  • user
  • pull_request_number
  • draft = true or false
  • base_ref
  • head_ref
  • merged = true or false
  • requested_team
    • Team(s) of requested reviewer(s)
  • label
    • Label(s) of a pull request
    • Available if send-pull-request-labels is set

Permissions

For pull_request event, this action requires the following permissions:

permissions:
  pull-requests: read

Metrics for push event

This action sends the following metrics.

  • github.actions.push.total
    • Total push events (count)

It has the following tags:

  • repository_owner
  • repository_name
  • sender
  • sender_type = either Bot, User or Organization
  • ref
  • created = true or false
  • deleted = true or false
  • forced = true or false
  • default_branch = true or false

Metrics for schedule event

Workflow run

This action sends the following metrics:

  • github.actions.schedule.queued_workflow_run.total
    • Number of queued workflow runs (gauge)

It has the following tags:

  • repository_owner
  • repository_name

It is useful for monitoring self-hosted runners.

Permissions

For schedule event, this action requires the following permissions:

permissions:
  actions: read

Metrics for all supported events

Rate limit

This action always sends the following metrics of the built-in GITHUB_TOKEN rate limit.

  • github.actions.api_rate_limit.remaining
    • Remaining requests of GitHub API (gauge)
  • github.actions.api_rate_limit.limit
    • Limit of requests of GitHub API (gauge)

It has the following tags:

  • repository_owner
  • repository_name
  • resource = core, search and graphql

This does not affect the rate limit of GitHub API because it just calls /rate_limit endpoint.

Specification

You can set the following inputs:

Name Default Description
github-token github.token GitHub token to get jobs and steps if needed
github-token-rate-limit-metrics github.token GitHub token for rate limit metrics
datadog-api-key - Datadog API key. If not set, this action does not send metrics actually
datadog-site - Datadog Server name such as datadoghq.eu, ddog-gov.com, us3.datadoghq.com
datadog-tags - Additional tags in the form of key:value in a multiline string
send-pull-request-labels false Send pull request labels as Datadog tags
collect-job-metrics false Collect job metrics
collect-step-metrics false Collect step metrics

Proxy

To connect to Datadog API via a HTTPS proxy, set https_proxy environment variable. For example,

steps:
  - uses: int128/datadog-actions-metrics@v1
    with:
      datadog-api-key: ${{ secrets.DATADOG_API_KEY }}
    env:
      https_proxy: http://proxy.example.com:8080

Contribution

This is an open source software. Feel free to open issues and pull requests.

datadog-actions-metrics's People

Contributors

int128 avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

datadog-actions-metrics's Issues

Is there a Debug Parameter?

Hi, @int128, I've set this up to send data from my GHA workflow to DataDog; however, I've encountered a number of issues.

  1. When trying to filter, not all my workflows show up as options in DataDog.
  2. Some filter options show up then disappear. So not sure what consistent param to filter by.
  3. github.actions.pull_request_closed.since_first_authored_seconds and github.actions.pull_request_closed.since_first_committed_seconds do not show up in DataDog. I only see github.actions.pull_request_closed.since_opened_seconds.

I'd like to figure out why. Any ideas or is there a param I can use to debug the library and/or payload being sent to DataDog please?

Could it be because I'm using this in my config? collect-job-metrics: ${{ github.event.workflow_run.head_branch == github.event.repository.default_branch }}

Which GitHub Actions workflow permissions are required?

I see this error in my log when this action runs

Warning: Could not get the check suite: GraphqlError: Resource not accessible by integration

Which permission scopes do I need to add to the workflow for the GraphQL query to work?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: File contents are invalid JSON but parse using JSON5. Support for this will be removed in a future release so please change to a support .json5 file name or ensure correct JSON syntax.

This repository currently has no open or pending branches.

Detected dependencies

github-actions
.github/workflows/e2e-target.yaml
  • actions/github-script v7.0.1@60a0d83039c74a4aee543508d2ffcb1c3799cdea
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.2@60edb5dd545a775178f52524783378180af0d1f8
  • actions/github-script v7.0.1@60a0d83039c74a4aee543508d2ffcb1c3799cdea
.github/workflows/e2e.yaml
  • actions/github-script v7.0.1@60a0d83039c74a4aee543508d2ffcb1c3799cdea
  • int128/datadog-actions-metrics v1.80.0@2ca3c09b18e2185806a071ed29fd5a02b75f8599
.github/workflows/release.yaml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.2@60edb5dd545a775178f52524783378180af0d1f8
  • int128/release-typescript-action v1.27.0@94c45715849473c37ebdc66190c9ec4c9543e46b
.github/workflows/ts.yaml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.2@60edb5dd545a775178f52524783378180af0d1f8
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.2@60edb5dd545a775178f52524783378180af0d1f8
  • int128/update-generated-files-action v2.48.0@1bf39f3aec4afa88c0c7060cdb0d88332c052991
npm
package.json
  • @actions/core 1.10.1
  • @actions/github 6.0.0
  • @actions/http-client 2.2.0
  • @datadog/datadog-api-client 1.22.0
  • @octokit/webhooks-types 7.3.2
  • graphql 16.8.1
  • @graphql-codegen/cli 5.0.2
  • @graphql-codegen/import-types-preset 3.0.0
  • @graphql-codegen/typescript 4.0.4
  • @graphql-codegen/typescript-operations 4.1.2
  • @octokit/graphql-schema 14.55.1
  • @octokit/webhooks-examples 7.3.2
  • @tsconfig/node16 16.1.1
  • @types/jest 29.5.12
  • @types/js-yaml 4.0.9
  • @types/node 20.11.17
  • @types/proxy 1.0.4
  • @typescript-eslint/eslint-plugin 7.0.1
  • @typescript-eslint/parser 7.0.1
  • @vercel/ncc 0.38.1
  • eslint 8.56.0
  • eslint-plugin-jest 27.6.3
  • jest 29.7.0
  • js-yaml 4.1.0
  • prettier 3.2.5
  • proxy 2.1.1
  • ts-jest 29.1.2
  • typescript 5.3.3
regex
action.yaml
  • node 20
.github/workflows/e2e-target.yaml
  • node 20
.github/workflows/release.yaml
  • node 20
.github/workflows/ts.yaml
  • node 20
  • node 20

  • Check this box to trigger a request for Renovate to run again on this repository

Feature: A Metric for 'Dev Cycle Time'

Hello, my team is using this action and find it useful. Thanks for creating this!

Many of these are useful to us, but particularly the github.actions.pull_request_closed.since_opened_seconds metric is shedding light on our current PR process. We'd love to also be able to measure what someone might refer to as 'Dev Cycle Time' or 'Pull Request Lead Time', which would measure the time from first commit on a feature branch, until the PR is closed.

Is this something you would be willing to add to the action? I imagine this feature might require passing a github token with more permission than the one provided by actions to use the API and get the start time (time of first commit, or head branch creation event) to calculate this metric. I would be happy to put together a PR to implement this.

Metrics are wrong for retried workflows

Hi,

I believe metrics are wrong for retried workflows. It looks like when metrics (eg: workflow_run.duration_second) are calculated, the original start_time of the first run is used even for retried workflows. Can you double check? thanks.

Metric of "lost communication with the server" error

Problems to solve

Eventually a self-hosted runner is killed by OOM or some issue. It is called "lost communication with the server" error.

When the error occurred, GitHub Actions adds an annotation with the following message:

The self-hosted runner: POD_NAME lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

Currently, we send the annotation message to Slack by this action:
https://github.com/int128/workflow-run-summary-action/blob/216f94dd10d099652cfb393e598c2a8f604c3bd0/src/run.ts#L60

How to solve

It would be nice to monitor the count of "lost communication with the server" errors for fact-based decision.

Include attempt number into tags

I'd like to include the attempt number into the tags of the metrics payload to be able to filter them for some particular cases.

For example, this second attempt took 40 minutes while the first attempt took 1 hour and 20 minutes.
image
In average it represent 1 hour, but that is not completely true, the metric that I'm looking for is the 1 hour and 20 minutes to compare with other executions.

queued_duration_second data seems wrong

This metric seems wrong. We have jobs that are clearly waiting for github action runners but the chart shows either negative numbers or zero for queuing time. Can you please check? thanks.

Cursor_and_bayone_repo_core_metrics___Datadog

No job metrics sent

Hi, I recently started using this action and was able to start sending workflow metrics easily

However, I have ran into an issue when enabling collect-job-metrics

No job metrics are being sent to datadog and I get the following warning for my workflow:

Warning: Could not get the check suite: GraphqlError: Resource not accessible by integration

Here's my workflow:

name: Track deployment workflow metrics to Datadog

on:
    workflow_run:
        workflows:
            - Deploy to dev environment
            - Deploy to test environment
            - Deploy to prod environment
        types:
            - completed

jobs:
    send_metrics:
        runs-on: ubuntu-latest
        timeout-minutes: 10
        steps:
            - uses: int128/datadog-actions-metrics@v1
              with:
                  datadog-api-key: ${{ secrets.DATADOG_API_KEY }}
                  datadog-site: datadoghq.eu
                  collect-job-metrics: true
                  github-token: ${{ secrets.GITHUB_TOKEN }}

Do I need to enable some permissions for my github token or have I set up my workflow incorrectly?

Many thanks

Craig

Limit of number of metrics per workflow run?

Is there a limit of how many metrics that can be sent to datadog per workflow run? We have a pretty complicated workflow and I'm noticing that there are no metrics for certain jobs. Here's an example:

2022-10-02T22:01:09.3441781Z Sending 3288 metrics to Datadog
2022-10-02T22:01:09.3441869Z Sent as {"status":"ok"}
2022-10-02T22:01:09.3689665Z Cleaning up orphan processes

Even with 3288 metrics, there are still no metrics for certain jobs. Thanks!

Pull request number

I'd like to include the pull request id into the metadata when we are running a workflow that is triggered by a pull request event.

Parameter ''using: node20' is not supported

The recent node upgrade broke my action.

Is this because my organization is behind on GHE versions?

Error: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter ''using: node20' is not supported, use 'docker', 'node12' or 'node16' instead.')

Unable to produce job metrics despite added permissions

Hi there.

I am very grateful for your action, and I am using it within a repository, as part of a workflow triggered by the completion of any other workflow in that same repository. My Datadog Metrics workflow looks as follows:

name: Datadog Metrics Collector

on:
  workflow_run:
    workflows: 
      - '**'
    types: 
      - completed
    branches:
      - '**'
    permissions:
      actions: read
      checks: read
      contents: read

jobs:
  send_metrics:
    runs-on: [redacted]
    timeout-minutes: 10
    steps:
      - name: Datadog metric collection
        uses: int128/datadog-actions-metrics@29ebe39408450214940f0b17dcb7d8926aea58aa #v1.48.0
        with:
          datadog-api-key: ${{ secrets.redacted }}
          datadog-site: datadoghq.eu
          collect-job-metrics: true

I am receiving workflow metrics, and api limit metrics, but not job metrics. Am I doing something wrong here?

Any help is really appreciated, thank you!

Only getting rate limit metrics

It seems like I'm only getting rate limit metrics and a warning. This is running as a part of a composite action and the job is triggered by a deployment.

Warning: Not supported event deployment

image

Can it send job level metrics only?

When the collect-job-metrics flag is set to true, metrics at the job level as well as step level are sent to datadog. This doesn't work well if a workflow has a lot of jobs and steps. For a large workflow we have, not every job & step level metric is sent to datadog because I think there's a limit.

Can this flag be broken down to two flags?

  • collect-job-metrics (collect job-level metrics only)
  • collect-step-metrics (collect step-level metrics only)

That way, people can choose what they want. Thanks!

Add option to exclude certain metrics?

We noticed that we started sending a lot more metrics recently and realized that those were distribution metrics. Is there a way to exclude certain metrics? This is from a cost perspective. The addition of distribution metrics is great and it's also very costly.

Proxy support

We need to connect to datadog through a proxy, is that supported?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.