actions-runner-controller

This controller operates self-hosted runners for GitHub Actions on your Kubernetes cluster.

ToC:

Motivation
Installation
- GitHub Enterprise Support
Setting up authentication with GitHub API
- Deploying using GitHub App Authentication
- Deploying using PAT Authentication
Usage
Contributing

Motivation

GitHub Actions is a very useful tool for automating development. GitHub Actions jobs are run in the cloud by default, but you may want to run your jobs in your environment. Self-hosted runner can be used for such use cases, but requires the provisioning and configuration of a virtual machine instance. Instead if you already have a Kubernetes cluster, it makes more sense to run the self-hosted runner on top of it.

actions-runner-controller makes that possible. Just create a Runner resource on your Kubernetes, and it will run and operate the self-hosted runner for the specified repository. Combined with Kubernetes RBAC, you can also build simple Self-hosted runners as a Service.

Installation

actions-runner-controller uses cert-manager for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for cert-manager can be found below.

Installing cert-manager on Kubernetes

Install the custom resource and actions-runner-controller with kubectl or helm. This will create actions-runner-system namespace in your Kubernetes and deploy the required resources.

kubectl:

# REPLACE "v0.18.2" with the version you wish to deploy
kubectl apply -f https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.18.2/actions-runner-controller.yaml

helm:

helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
helm upgrade --install --namespace actions-runner-system --create-namespace \ 
             --wait actions-runner-controller actions-runner-controller/actions-runner-controller

GitHub Enterprise Support

The solution supports both GitHub Enterprise Cloud and Server editions as well as regular GitHub. Both PAT (personal access token) and GitHub App authentication works for installations that will be deploying either repository level and / or organization level runners. If you need to deploy enterprise level runners then you are restricted to PAT based authentication as GitHub doesn't support GitHub App based authentication for enterprise runners currently.

If you are deploying this solution into a GitHub Enterprise Server environment then you will need version >= 3.0.0.

When deploying the solution for a Github Enterprise Server environment you need to provide an additional environment variable as part of the controller deployment:

kubectl set env deploy controller-manager -c manager GITHUB_ENTERPRISE_URL=<GHEC/S URL> --namespace actions-runner-system

Note: The repository maintainers do not have an enterprise environment (cloud or server). Support for the enterprise specific feature set is community driven and on a best endeavors basis. PRs from the community are welcomed to add features and maintain support.

Setting up authentication with GitHub API

There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time however):

Using a GitHub App (not supported when you need enterprise level runners)
Using a PAT

Functionality wise, there isn't much of a difference between the 2 authentication methods. The primarily benefit of authenticating via a GitHub App is an increased API quota.

If you are deploying the solution for a GitHub Enterprise Server environment you are able to configure your rate limiting settings making the main benefit irrelevant. If you're deploying the solution for a GitHub Enterprise Cloud or regular GitHub environment and you run into rate limiting issues, consider deploying the solution using the GitHub App authentication method instead.

Deploying using GitHub App Authentication

You can create a GitHub App for either your account or any organization. If you want to create a GitHub App for your account, open the following link to the creation page, enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page.

Note: The permissions are already set in the query string of the provided link:

Create GitHub Apps on your account

If you want to create a GitHub App for your organization, replace the :org part of the following URL with your organization name before opening it. Then enter any unique name in the "GitHub App name" field, and hit the "Create GitHub App" button at the bottom of the page to create a GitHub App.

Note: The permissions are already set in the query string of the provided link:

Create GitHub Apps on your organization

You will see an App ID on the page of the GitHub App you created as follows, the value of this App ID will be used later.

Download the private key file by pushing the "Generate a private key" button at the bottom of the GitHub App page. This file will also be used later.

Go to the "Install App" tab on the left side of the page and install the GitHub App that you created for your account or organization.

When the installation is complete, you will be taken to a URL in one of the following formats, the last number of the URL will be used as the Installation ID later (For example, if the URL ends in settings/installations/12345, then the Installation ID is 12345).

https://github.com/settings/installations/${INSTALLATION_ID}
https://github.com/organizations/eventreactor/settings/installations/${INSTALLATION_ID}

Finally, register the App ID (APP_ID), Installation ID (INSTALLATION_ID), and downloaded private key file (PRIVATE_KEY_FILE_PATH) to Kubernetes as Secret.

$ kubectl create secret generic controller-manager \
    -n actions-runner-system \
    --from-literal=github_app_id=${APP_ID} \
    --from-literal=github_app_installation_id=${INSTALLATION_ID} \
    --from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH}

Deploying using PAT Authentication

Personal Access Tokens can be used to register a self-hosted runner by actions-runner-controller.

Log-in to a GitHub account that has admin privileges for the repository, and create a personal access token with the appropriate scopes listed below:

Scopes for a Repository Runners

repo (Full control)

Scopes for a Organization Runners

repo (Full control)
admin:org (Full control)
admin:public_key - read:public_key
admin:repo_hook - read:repo_hook
admin:org_hook
notifications
workflow

Scopes for Enterprise Runners

enterprise:admin (Full control)

Note: when you deploy enterprise runners they will get access to organisations, however, access to the repositories themselves is NOT allowed by default. Each Github organisation must allow enterprise runner groups to be used in repositories as an initial one time configuration step, this only needs to be done once after which it is permanent for that runner group.

Once you have created the appropriate token, deploy it as a secret to your kubernetes cluster that you are going to deploy the solution on:

kubectl create secret generic controller-manager \
    -n actions-runner-system \
    --from-literal=github_token=${GITHUB_TOKEN}

Usage

GitHub self-hosted runners can be deployed at various levels in a management hierarchy:

The repository level
The organization level
The enterprise level

There are two ways to use this controller:

Manage runners one by one with Runner.
Manage a set of runners with RunnerDeployment.

Repository Runners

To launch a single self-hosted runner, you need to create a manifest file includes Runner resource as follows. This example launches a self-hosted runner with name example-runner for the actions-runner-controller/actions-runner-controller repository.

# runner.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: example-runner
spec:
  repository: summerwind/actions-runner-controller
  env: []

Apply the created manifest file to your Kubernetes.

$ kubectl apply -f runner.yaml
runner.actions.summerwind.dev/example-runner created

You can see that the Runner resource has been created.

$ kubectl get runners
NAME             REPOSITORY                             STATUS
example-runner   summerwind/actions-runner-controller   Running

You can also see that the runner pod has been running.

$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
example-runner 2/2     Running   0          1m

The runner you created has been registered to your repository.

Now you can use your self-hosted runner. See the official documentation on how to run a job with it.

Organization Runners

To add the runner to an organization, you only need to replace the repository field with organization, so the runner will register itself to the organization.

# runner.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: example-org-runner
spec:
  organization: your-organization-name

Now you can see the runner on the organization level (if you have organization owner permissions).

Enterprise Runners

To add the runner to an enterprise, you only need to replace the repository field with enterprise, so the runner will register itself to the enterprise.

# runner.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: example-enterprise-runner
spec:
  enterprise: your-enterprise-name

Now you can see the runner on the enterprise level (if you have enterprise access permissions).

RunnerDeployments

There are RunnerReplicaSet and RunnerDeployment that corresponds to ReplicaSet and Deployment but for Runner.

You usually need only RunnerDeployment rather than RunnerReplicaSet as the former is for managing the latter.

# runnerdeployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
spec:
  replicas: 2
  template:
    spec:
      repository: mumoshu/actions-runner-controller-ci
      env: []

Apply the manifest file to your cluster:

$ kubectl apply -f runnerdeployment.yaml
runnerdeployment.actions.summerwind.dev/example-runnerdeploy created

You can see that 2 runners have been created as specified by replicas: 2:

$ kubectl get runners
NAME                             REPOSITORY                             STATUS
example-runnerdeploy2475h595fr   mumoshu/actions-runner-controller-ci   Running
example-runnerdeploy2475ht2qbr   mumoshu/actions-runner-controller-ci   Running

Note on scaling to/from 0

This is a documentation about a unreleased version of actions-runner-controller.

It would be great if you could try building the latest controller image following https://github.com/actions-runner-controller/actions-runner-controller#contributing if you are eager to test it early and help developers by reporting any bugs 😄

You can either delete the runner deployment, or update it to have replicas: 0, so that there will be 0 runner pods in the cluster. This, in combination with e.g. cluster-autoscaler, enables you to save your infrastructure cost when there's no need to run Actions jobs.

# runnerdeployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
spec:
  replicas: 0

The implication of setting replicas: 0 instead of deleting the runner deployment is that you can let GitHub Actions queue jobs until there will be one or more runners. See #465 for more information.

Also note that the controller creates a "registration-only" runner per RunnerReplicaSet on it's being scaled to zero, and retains it until there are one or more runners available.

This, in combination with a correctly configured HorizontalRunnerAutoscaler, allows you to automatically scale to/from 0

Autoscaling

IMPORTANT : Due to limitations / a bug with GitHub's routing engine autoscaling does NOT work correctly with RunnerDeployments that target the enterprise level. Scaling activity works as expected however jobs fail to get assigned to the scaled out replicas. This was explored in issue #470. Once GitHub resolves the issue with their backend service we expect the solution to be able to support autoscaled enterprise runnerdeploments without any additional changes.

A RunnerDeployment (excluding enterprise runners) can scale the number of runners between minReplicas and maxReplicas fields based the chosen scaling metric as defined in the metrics attribute

Scaling Metrics

TotalNumberOfQueuedAndInProgressWorkflowRuns

In the below example, actions-runner will poll GitHub for all pending workflows with the poll period defined by the sync period configuration. It will then scale to e.g. 3 if there're 3 pending jobs at sync time. With this scaling metric we are required to define a list of repositories within our metric.

The scale out performance is controlled via the manager containers startup --sync-period argument. The default value is set to 10 minutes to prevent default deployments rate limiting themselves from the GitHub API.

Kustomize Config : The period can be customised in the config/default/manager_auth_proxy_patch.yaml patch
Helm Config : syncPeriod

Benefits of this metric

Supports named repositories allowing you to restrict the runner to a specified set of repositories server side.
Scales the runner count based on the actual queue depth of the jobs meaning a more 1:1 scaling of runners to queued jobs.
Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use of Github labels.

Drawbacks of this metric

Repositories must be named within the scaling metric, maintaining a list of repositories may not be viable in larger environments or self-serve environments.
May not scale quick enough for some users needs. This metric is pull based and so the queue depth is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
Relatively large amounts of API requests required to maintain this metric, you may run in API rate limiting issues depending on the size of your environment and how aggressive your sync period configuration is

Example RunnerDeployment backed by a HorizontalRunnerAutoscaler:

Important!!! We no longer include the attribute replicas in our RunnerDeployment if we are configuring autoscaling!

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runner-deployment
spec:
  template:
    spec:
      repository: summerwind/actions-runner-controller
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: example-runner-deployment
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
    repositoryNames:
    - summerwind/actions-runner-controller

Additionally, the HorizontalRunnerAutoscaler also has an anti-flapping option that prevents periodic loop of scaling up and down. By default, it doesn't scale down until the grace period of 10 minutes passes after a scale up. The grace period can be configured however by adding the setting scaleDownDelaySecondsAfterScaleOut in the HorizontalRunnerAutoscaler spec:

spec:
  scaleDownDelaySecondsAfterScaleOut: 60

PercentageRunnersBusy

The HorizontalRunnerAutoscaler will poll GitHub based on the configuration sync period for the number of busy runners which live in the RunnerDeployment's namespace and scale based on the settings

Kustomize Config : The period can be customised in the config/default/manager_auth_proxy_patch.yaml patch
Helm Config : syncPeriod

Benefits of this metric

Supports named repositories server side the same as the TotalNumberOfQueuedAndInProgressWorkflowRuns metric #313
Supports GitHub organization wide scaling without maintaining an explicit list of repositories, this is especially useful for those that are working at a larger scale. #223
Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use of Github labels
Supports scaling desired runner count on both a percentage increase / decrease basis as well as on a fixed increase / decrease count basis #223 #315

Drawbacks of this metric

May not scale quick enough for some users needs. This metric is pull based and so the number of busy runners are polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
We are scaling up and down based on indicative information rather than a count of the actual number of queued jobs and so the desired runner count is likely to under provision new runners or overprovision them relative to actual job queue depth, this may or may not be a problem for you.

Examples of each scaling type implemented with a RunnerDeployment backed by a HorizontalRunnerAutoscaler:

Important!!! We no longer include the attribute replicas in our RunnerDeployment if we are configuring autoscaling!

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: example-runner-deployment
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'    # The percentage of busy runners at which the number of desired runners are re-evaluated to scale up
    scaleDownThreshold: '0.3'   # The percentage of busy runners at which the number of desired runners are re-evaluated to scale down
    scaleUpFactor: '1.4'        # The scale up multiplier factor applied to desired count
    scaleDownFactor: '0.7'      # The scale down multiplier factor applied to desired count

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: example-runner-deployment
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'    # The percentage of busy runners at which the number of desired runners are re-evaluated to scale up
    scaleDownThreshold: '0.3'   # The percentage of busy runners at which the number of desired runners are re-evaluated to scale down
    ScaleUpAdjustment: '2'      # The scale up runner count added to desired count
    ScaleDownAdjustment: '1'    # The scale down runner count subtracted from the desired count

Like the previous metric, the scale down factor respects the anti-flapping configuration is applied to the HorizontalRunnerAutoscaler as mentioned previously:

spec:
  scaleDownDelaySecondsAfterScaleOut: 60

Faster Autoscaling with GitHub Webhook

IMPORTANT : Due to missing webhook events, webhook based scaling is not avaliable for enterprise level RunnerDeployments. This was explored in issue #470.

This feature is an ADVANCED feature which may require more work to set up. Please get prepared to put some time and effort to learn and leverage this feature!

actions-runner-controller has an optional Webhook server that receives GitHub Webhook events and scale RunnerDeployments by updating corresponding HorizontalRunnerAutoscalers.

Today, the Webhook server can be configured to respond GitHub check_run, pull_request, and push events by scaling up the matching HorizontalRunnerAutoscaler by N replica(s), where N is configurable within HorizontalRunnerAutoscaler's Spec.

More concretely, you can configure the targeted GitHub event types and the N in scaleUpTriggers:

kind: HorizontalRunnerAutoscaler
spec:
  scaleTargetRef:
    name: myrunners
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "5m"

With the above example, the webhook server scales myrunners by 1 replica for 5 minutes on each check_run event with the type of created and the status of queued received.

The primary benefit of autoscaling on Webhook compared to the standard autoscaling is that this one allows you to immediately add "resource slack" for future GitHub Actions job runs.

In contrast, the standard autoscaling requires you to wait next sync period to add insufficient runners. You can definitely shorten the sync period to make the standard autoscaling more responsive. But doing so eventually result in the controller not functional due to GitHub API rate limit.

You can learn the implementation details in #282

To enable this feature, you firstly need to install the webhook server.

Currently, only our Helm chart has the ability install it.

$ helm --upgrade install actions-runner-controller/actions-runner-controller \
  githubWebhookServer.enabled=true \
  githubWebhookServer.ports[0].nodePort=33080

The above command will result in exposing the node port 33080 for Webhook events. Usually, you need to create an external loadbalancer targeted to the node port, and register the hostname or the IP address of the external loadbalancer to the GitHub Webhook.

Once you were able to confirm that the Webhook server is ready and running from GitHub - this is usually verified by the GitHub sending PING events to the Webhook server - create or update your HorizontalRunnerAutoscaler resources by learning the following configuration examples.

Example 1: Scale up on each check_run event
Example 2: Scale on each pull_request event against develop or main branches

Example 1: Scale up on each `check_run` event

Note: This should work almost like https://github.com/philips-labs/terraform-aws-github-runner

To scale up replicas of the runners for example/myrepo by 1 for 5 minutes on each check_run, you write manifests like the below:

kind: RunnerDeployment
metadata:
   name: myrunners
spec:
  repository: example/myrepo
---
kind: HorizontalRunnerAutoscaler
spec:
  scaleTargetRef:
    name: myrunners
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "5m"

Example 2: Scale on each `pull_request` event against `develop` or `main` branches

kind: RunnerDeployment:
metadata:
   name: myrunners
spec:
  repository: example/myrepo
---
kind: HorizontalRunnerAutoscaler
spec:
  scaleTargetRef:
    name: myrunners
  scaleUpTriggers:
  - githubEvent:
      pullRequest:
        types: ["synchronize"]
        branches: ["main", "develop"]
    amount: 1
    duration: "5m"

See "activity types" for the list of valid values for scaleUpTriggers[].githubEvent.pullRequest.types.

Autoscaling to/from 0

This is a documentation about a unreleased version of actions-runner-controller.

It would be great if you could try building the latest controller image following https://github.com/actions-runner-controller/actions-runner-controller#contributing if you are eager to test it early and help developers by reporting any bugs 😄

Previously, we've discussed about how to scale a RunnerDeployment to/from 0

To automate the process of scaling to/from 0, you can use HorizontalRunnerAutoscaler with a caveat.

That is, you need to choose one of the following configuration for metrigs and triggers:

TotalNumberOfQueuedAndInProgressWorkflowRuns
PercentageRunnersBusy + TotalNumberOfQueuedAndInProgressWorkflowRuns
PercentageRunnersBusy + Webhook-based autoscaling

This is due to that PercentageRunnersBusy, by its definition, needs one or more GitHub runners that can become busy, which cannot happen at all when you have 0 active runners.

If and only if HorizontalRunnerAutoscaler is configured to have a secondary metric of TotalNumberOfQueuedAndInProgressWorkflowRuns and the controller sees the primary metric of PercentageRunnersBusy returned 0 desired replicas, it uses the secondary metric for calculating the desired replicas once agian.

A correctly configured TotalNumberOfQueuedAndInProgressWorkflowRuns can return non-zero desired replicas even when there are no runners other than registration-only runners, hence the PercentageRunnersBusy + TotalNumberOfQueuedAndInProgressWorkflowRuns configuration makes scaling from zero possible.

Similarly, Webhook-based autoscaling works regarless of there are active runners, hence PercentageRunnersBusy + Webhook-based autoscaling configuration makes scaling from zero, too.

Scheduled Overrides

This is a documentation about a unreleased version of actions-runner-controller.

It would be great if you could try building the latest controller image following https://github.com/actions-runner-controller/actions-runner-controller#contributing if you are eager to test it early and help developers by reporting any bugs 😄

Scheduled Overrides allows you to configure HorizontalRunnerAutoscaler so that its Spec gets updated only during a certain period of time.

usually, this feature is used for following scenarios:

You want to pay for your infrastructure cost running runners only in business hours
You want to prepare for scheduled spikes in workloads

For the first scenario, you might consider configuration like the below:

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: example-runner-deployment
  scheduledOverrides:
  # Override minReplicas to 0 only between 0am sat to 0am mon
  - startTime: "2021-05-01T00:00:00+09:00"
    endTime: "2021-05-03T00:00:00+09:00"
    recurrenceRule:
      frequency: Weekly
      untilTime: "2022-05-01T00:00:00+09:00"
    minReplicas: 0
  minReplicas: 1

For the second scenario, you might consider something like the below:

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: example-runner-deployment
  scheduledOverrides:
  # Override minReplicas to 100 only between 2021-06-01T00:00:00+09:00 and 2021-06-03T00:00:00+09:00
  - startTime: "2021-06-01T00:00:00+09:00"
    endTime: "2021-06-03T00:00:00+09:00"
    minReplicas: 100
  minReplicas: 1

The most basic usage of this feature is actually the second scenario mentioned above. A scheduled override without recurrenceRule is considered a one-off override, that is active between startTime and endTime. In the second scenario, it overrides minReplicas to 100 only between 2021-06-01T00:00:00+09:00 and 2021-06-03T00:00:00+09:00.

A scheduled override with recurrenceRule is consdiered a recurring override. A recurring override is initially active between startTime and endTime, and then it repeatedly get activated after a certain period of time denoted by frequency.

frequecy can take one of the following values:

Daily
Weekly
Monthly
Yearly

By default, a scheduled override repeats forever. If you want it to repeat until a specific point in time, define untilTime. The controller create the last recurrence of the override until the recurrence's startTime is equal or earlier than untilTime.

Do note that you have enough slack for untilTime, so that a delayed or offline actions-runner-controller is much less likely to miss the last recurrence. For example, you might want to set untilTime to M minutes after the last recurrence's startTime, so that actions-runner-controller being offline up to M minutes doesn't miss the last recurrence.

Combining Multiple Scheduled Overrides:

In case you have a more complex scenarios, try writing two or more entries under scheduledOverrides.

The earlier entry is prioritized higher than later entries. So you usually define one-time overrides in the top of your list, then yearly, monthly, weekly, and lastly daily overrides.

Runner with DinD

When using default runner, runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there's LimitRange set to namespace.

# dindrunnerdeployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-dindrunnerdeploy
spec:
  replicas: 2
  template:
    spec:
      image: summerwind/actions-runner-dind
      dockerdWithinRunnerContainer: true
      repository: mumoshu/actions-runner-controller-ci
      env: []

This also helps with resources, as you don't need to give resources separately to docker and runner.

Additional tweaks

You can pass details through the spec selector. Here's an eg. of what you may like to do:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: actions-runner
  namespace: default
spec:
  replicas: 2
  template:
    spec:
      nodeSelector:
        node-role.kubernetes.io/test: ""

      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/test
        operator: Exists

      repository: mumoshu/actions-runner-controller-ci
      image: custom-image/actions-runner:latest
      imagePullPolicy: Always
      resources:
        limits:
          cpu: "4.0"
          memory: "8Gi"
        requests:
          cpu: "2.0"
          memory: "4Gi"
      # Timeout after a node crashed or became unreachable to evict your pods somewhere else (default 5mins)
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 10
      # true (default) = The runner restarts after running jobs, to ensure a clean and reproducible build environment
      # false = The runner is persistent across jobs and doesn't automatically restart
      # This directly controls the behaviour of `--once` flag provided to the github runner
      ephemeral: false 
      # true (default) = A privileged docker sidecar container is included in the runner pod.
      # false = A docker sidecar container is not included in the runner pod and you can't use docker.
      # If set to false, there are no privileged container and you cannot use docker.
      dockerEnabled: false
      # Optional Docker containers network MTU
      # If your network card MTU is smaller than Docker's default 1500, you might encounter Docker networking issues.
      # To fix these issues, you should setup Docker MTU smaller than or equal to that on the outgoing network card.
      # More information:
      # - https://mlohr.com/docker-mtu/
      dockerMTU: 1500
      # Optional Docker registry mirror
      # Docker Hub has enabled rate-limiting for free plans.
      # To avoid disruptions in your CI/CD pipelines, you might want to setup an external or on-premises Docker registry mirror.
      # More information:
      # - https://docs.docker.com/docker-hub/download-rate-limit/
      # - https://cloud.google.com/container-registry/docs/pulling-cached-images
      dockerRegistryMirror: https://mirror.gcr.io/
      # false (default) = Docker support is provided by a sidecar container deployed in the runner pod.
      # true = No docker sidecar container is deployed in the runner pod but docker can be used within teh runner container instead. The image summerwind/actions-runner-dind is used by default.
      dockerdWithinRunnerContainer: true
      # Docker sidecar container image tweaks examples below, only applicable if dockerdWithinRunnerContainer = false
      dockerdContainerResources:
        limits:
          cpu: "4.0"
          memory: "8Gi"
        requests:
          cpu: "2.0"
          memory: "4Gi"
      # Additional N number of sidecar containers
      sidecarContainers:
        - name: mysql
          image: mysql:5.7
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: abcd1234
          securityContext:
            runAsUser: 0
      # workDir if not specified (default = /runner/_work)
      # You can customise this setting allowing you to change the default working directory location
      # for example, the below setting is the same as on the ubuntu-18.04 image
      workDir: /home/runner/work
      # You can mount some of the shared volumes to the dind container using dockerVolumeMounts, like any other volume mounting.
      # NOTE: in case you want to use an hostPath like the following example, make sure that Kubernetes doesn't schedule more than one runner
      # per physical host. You can achieve that by setting pod anti-affinity rules and/or resource requests/limits.
      volumes:
        - name: docker-extra
          hostPath:
            path: /mnt/docker-extra
            type: DirectoryOrCreate
      dockerVolumeMounts:
        - mountPath: /var/lib/docker
          name: docker-extra

Runner labels

To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow:

jobs:
  release:
    runs-on: self-hosted

When you have multiple kinds of self-hosted runners, you can distinguish between them using labels. In order to do so, you can specify one or more labels in your Runner or RunnerDeployment spec.

# runnerdeployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: custom-runner
spec:
  replicas: 1
  template:
    spec:
      repository: summerwind/actions-runner-controller
      labels:
        - custom-runner

Once this spec is applied, you can observe the labels for your runner from the repository or organization in the GitHub settings page for the repository or organization. You can now select a specific runner from your workflow by using the label in runs-on:

jobs:
  release:
    runs-on: custom-runner

Note that if you specify self-hosted in your workflow, then this will run your job on any self-hosted runner, regardless of the labels that they have.

Runner Groups

Runner groups can be used to limit which repositories are able to use the GitHub Runner at an organization level. Runner groups have to be created in GitHub first before they can be referenced.

To add the runner to the group NewGroup, specify the group in your Runner or RunnerDeployment spec.

# runnerdeployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: custom-runner
spec:
  replicas: 1
  template:
    spec:
      group: NewGroup

Using EKS IAM role for service accounts

actions-runner-controller v0.15.0 or later has support for EKS IAM role for service accounts.

As similar as for regular pods and deployments, you firstly need an existing service account with the IAM role associated. Create one using e.g. eksctl. You can refer to the EKS documentation for more details.

Once you set up the service account, all you need is to add serviceAccountName and fsGroup to any pods that uses the IAM-role enabled service account.

For RunnerDeployment, you can set those two fields under the runner spec at RunnerDeployment.Spec.Template:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
spec:
  template:
    spec:
      repository: USER/REO
      serviceAccountName: my-service-account
      securityContext:
        fsGroup: 1000

Software installed in the runner image

Cloud Tooling
The project supports being deployed on the various cloud Kubernetes platforms (e.g. EKS), it does not however aim to go beyond that. No cloud specific tooling is bundled in the base runner, this is an active decision to keep the overhead of maintaining the solution manageable.

Bundled Software
The GitHub hosted runners include a large amount of pre-installed software packages. GitHub maintain a list in README files at https://github.com/actions/virtual-environments/tree/main/images/linux

This solution maintains a few runner images with latest aligning with GitHub's Ubuntu version. Older images are maintained whilst GitHub also provides them as an option. These images do not contain all of the software installed on the GitHub runners. It contains the following subset of packages from the GitHub runners:

Basic CLI packages
git (2.26)
docker
build-essentials

The virtual environments from GitHub contain a lot more software packages (different versions of Java, Node.js, Golang, .NET, etc) which are not provided in the runner image. Most of these have dedicated setup actions which allow the tools to be installed on-demand in a workflow, for example: actions/setup-java or actions/setup-node

If there is a need to include packages in the runner image for which there is no setup action, then this can be achieved by building a custom container image for the runner. The easiest way is to start with the summerwind/actions-runner image and installing the extra dependencies directly in the docker image:

FROM summerwind/actions-runner:latest

RUN sudo apt update -y \
  && sudo apt install YOUR_PACKAGE
  && sudo rm -rf /var/lib/apt/lists/*

You can then configure the runner to use a custom docker image by configuring the image field of a Runner or RunnerDeployment:

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: custom-runner
spec:
  repository: summerwind/actions-runner-controller
  image: YOUR_CUSTOM_DOCKER_IMAGE

Common Errors

invalid header field value

2020-11-12T22:17:30.693Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "runner", "request": "actions-runner-system/runner-deployment-dk7q8-dk5c9", "error": "failed to create registration token: Post \"https://api.github.com/orgs/$YOUR_ORG_HERE/actions/runners/registration-token\": net/http: invalid header field value \"Bearer $YOUR_TOKEN_HERE\\n\" for key Authorization"}

Solutions
Your base64'ed PAT token has a new line at the end, it needs to be created without a \n added

echo -n $TOKEN | base64
Create the secret as described in the docs using the shell and documeneted flags

Contributing

For more details about any requirements or process, please check out Getting Started with Contributing.

The Controller
If you'd like to modify the controller to fork or contribute, I'd suggest using the following snippet for running the acceptance test:

# This sets `VERSION` envvar to some appropriate value
. hack/make-env.sh

DOCKER_USER=*** \
  GITHUB_TOKEN=*** \
  APP_ID=*** \
  PRIVATE_KEY_FILE_PATH=path/to/pem/file \
  INSTALLATION_ID=*** \
  make acceptance

Notes for Ubuntu 20.04+ users

If you're using Ubuntu 20.04 or greater, you might have installed docker with snap.

If you want to stick with snap-provided docker, do not forget to set TMPDIR to somewhere under $HOME. Otherwise kind load docker-image fail while running docker save. See https://kind.sigs.k8s.io/docs/user/known-issues/#docker-installed-with-snap for more information.

Please follow the instructions explained in Using Personal Access Token to obtain GITHUB_TOKEN, and those in Using GitHub App to obtain APP_ID, INSTALLATION_ID, and PRIAVTE_KEY_FILE_PATH.

The test creates a one-off kind cluster, deploys cert-manager and actions-runner-controller, creates a RunnerDeployment custom resource for a public Git repository to confirm that the controller is able to bring up a runner pod with the actions runner registration token installed.

Rerunning a failed test

When one of tests run by make acceptance failed, you'd probably like to rerun only the failed one.

It can be done by make acceptance/run and by setting the combination of ACCEPTANCE_TEST_DEPLOYMENT_TOOL and ACCEPTANCE_TEST_SECRET_TYPE values that failed.

In the example below, we rerun the test for the combination ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm ACCEPTANCE_TEST_SECRET_TYPE=token only:

DOCKER_USER=*** \
  GITHUB_TOKEN=*** \
  APP_ID=*** \
  PRIVATE_KEY_FILE_PATH=path/to/pem/file \
  INSTALLATION_ID=*** \
  ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm ACCEPTANCE_TEST_SECRET_TYPE=token \
  make acceptance/run

Testing in a non-kind cluster

If you prefer to test in a non-kind cluster, you can instead run:

KUBECONFIG=path/to/kubeconfig \
  DOCKER_USER=*** \
  GITHUB_TOKEN=*** \
  APP_ID=*** \
  PRIVATE_KEY_FILE_PATH=path/to/pem/file \
  INSTALLATION_ID=*** \
  ACCEPTANCE_TEST_SECRET_TYPE=token \
  make docker-build acceptance/setup \
       acceptance/deploy \
       acceptance/tests

Development Tips

Rerunning the whole acceptance test suite from scratch on every little change to the controller, the runner, and the chart would be counter-productive.

To make your development cycle faster, use the below command to update deploy and update all the three:

# Let assume we have all other envvars like DOCKER_USER, GITHUB_TOKEN already set,
# The below command will (re)build `actions-runner-controller:controller1` and `actions-runner:runner1`,
# load those into kind nodes, and then rerun kubectl or helm to install/upgrade the controller,
# and finally upgrade the runner deployment to use the new runner image.
#
# As helm 3 and kubectl is unable to recreate a pod when no tag change,
# you either need to bump VERSION and RUNNER_TAG on each run,
# or manually run `kubectl delete pod $POD` on respective pods for changes to actually take effect.
VERSION=controller1 \
  RUNNER_TAG=runner1 \
  make docker-build acceptance/load acceptance/deploy

If you've already deployed actions-runner-controller and only want to recreate pods to use the newer image, you can run:

NAME=$DOCKER_USER/actions-runner-controller \
  make docker-build acceptance/load && \
  kubectl -n actions-runner-system delete po $(kubectl -n actions-runner-system get po -ojsonpath={.items[*].metadata.name})

Similary, if you'd like to recreate runner pods with the newer runner image,

NAME=$DOCKER_USER/actions-runner make \
  -C runner docker-{build,push}-ubuntu && \
  (kubectl get po -ojsonpath={.items[*].metadata.name} | xargs -n1 kubectl delete po)

Runner Tests
A set of example pipelines (./acceptance/pipelines) are provided in this repository which you can use to validate your runners are working as expected. When raising a PR please run the relevant suites to prove your change hasn't broken anything.

Running Ginkgo Tests

You can run the integration test suite that is written in Ginkgo with:

make test-with-deps

This will firstly install a few binaries required to setup the integration test environment and then runs go test to start the Ginkgo test.

If you don't want to use make, like when you're running tests from your IDE, install required binaries to /usr/local/kubebuilder/bin. That's the directory in which controller-runtime's envtest framework locates the binaries.

sudo mkdir -p /usr/local/kubebuilder/bin
make kube-apiserver etcd
sudo mv test-assets/{etcd,kube-apiserver} /usr/local/kubebuilder/bin/
go test -v -run TestAPIs github.com/summerwind/actions-runner-controller/controllers

To run Ginkgo tests selectively, set the pattern of target test names to GINKGO_FOCUS. All the Ginkgo test that matches GINKGO_FOCUS will be run.

GINKGO_FOCUS='[It] should create a new Runner resource from the specified template, add a another Runner on replicas increased, and removes all the replicas when set to 0' \
  go test -v -run TestAPIs github.com/summerwind/actions-runner-controller/controllers

isabella232 / actions-runner-controller Goto Github PK