GithubHelp home page GithubHelp logo

machulav / ec2-github-runner Goto Github PK

View Code? Open in Web Editor NEW
658.0 10.0 290.0 4.39 MB

On-demand self-hosted AWS EC2 runner for GitHub Actions

License: MIT License

JavaScript 100.00%
github-actions github-actions-runner javascript aws ec2 github cicd self-hosted actions-runner on-demand

ec2-github-runner's People

Contributors

absalukaskosina avatar davinchia avatar dependabot[bot] avatar eschneider1271 avatar hajapy avatar jonico avatar jpalomaki avatar machulav avatar skyzh avatar tonyhutter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ec2-github-runner's Issues

Action not detecting registered runners.

Hi,

About ~10 hours ago our build started failing with the timeout error out of the blue; I double checked and made sure no infrastructure or code changes were made.

I also tried upgrading from 2.1.0 to 2.2.0 to no effect.

I confirmed networking is working fine by manually creating an instance with the same networking set up and running the same commands the action runs. I was able to register a runner with Github.

I observed the runner page and saw that that runners were coming up and being registered fine. They were sitting in Idle while the action was polling for them. This screen shot shows my test instance, as well as the two self hosted instances our build runs usually start up before they are terminated after the 5 min window.
Screen Shot 2021-06-07 at 12 20 11 PM

Has anyone else observed this in the last day or two? Any ideas how to proceed with this?

Thanks!

30% of runner instantiation failes due to timeout

Run machulav/ec2-github-runner@v2
GitHub Registration Token is received
AWS EC2 instance i-0eeae9ef28dcd04e9 is started
AWS EC2 instance i-0eeae9ef28dcd04e9 is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
.
.
.
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

this is the error i receive for like 30% of my runners
what could cause this? and how can i increase the percentage of successfull instantiations?

Use latest GitHub runner build

In current release you use runner version hardcoded in aws.js. And its version is older than latest. It'd be great to programmatically get the latest release for usage.
GitHub API gives us this information: curl -s -X GET 'https://api.github.com/repos/actions/runner/releases/latest'
It could be parsed directly, or with jq script or by octokit.

Can we just stop instead of terminating a finished runner?

In my case, I cache a lot of intermediate compilation objects on EBS and I don't want to lose them when a workflow is finished.
So I'm wondering if we can provide an option that only stops a machine without terminating it, and also an option to support restarting from the previously stopped instance.

mode: start_existing
instances:
  - i-123
  - i-124
  - i-125

mode: stop_no_terminate
instances:
  - i-123
  - i-124
  - i-125

Related AWS APIs:

Another approach could be just to reuse the available EBS volumes that were not deleted after the previous ec2 was terminated. https://aws.amazon.com/premiumsupport/knowledge-center/deleteontermination-ebs/ I do not know which option is technically easier to implement.

Thanks.

Support multiple regions/availabilty zones if an instance type cannot be started

We are using instances types that require lots of reasources. Because of, from time to time it is impossible to start an instance of some type in a certain region and a certain AZ.
It'll be great if this plugin will be able to run over list of regions/AZs (subnets and security groups) and start the instance where its type is available.

Can't execute job on created EC2

Hi colleagues,

After instance creation job couldn't be run on EC2 machine.
Instances could be created and terminated in my VPC, outbound traffic is allowed on port 443 with security group.
What can be the cause and where I should start troubleshooting?

Thanks, Artem.

AWS EC2 instance i-xxxxxxxx is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
Checking...
Checking... (x N times)
Error: GitHub self-hosted runner registration error
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

Support multiple security group IDs

Unless I'm missing something it doesn't look like this supports multiple security group IDs when starting instances. Is that right? If so, would it be possible to add that feature?

Hello World Job Completes With a Failed Result on the EC2 Instance

I have followed all the instructions that are mentioned in the README.md but the hello world job results in failure for me.

Here is the output from the EC2 Serial Console:
image

Here is the output from the GitHub action (it stays stuck here):
image

Here is what's in my .github/workflows/aws-ec2-job.yml file:

name: aws-ec2-job

on: pull_request

jobs:

  start-runner:
    name: Start self-hosted EC2 runner
    runs-on: ubuntu-latest
    outputs:
      label: ${{ steps.start-ec2-runner.outputs.label }}
      ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}

    steps:

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Start EC2 runner
        id: start-ec2-runner
        uses: machulav/ec2-github-runner@v2
        with:
          mode: start
          github-token: ${{ secrets.REPO_SCOPE_PAT }}
          ec2-image-id: ${{ secrets.EC2_IMAGE_ID }}
          ec2-instance-type: t3.xlarge
          subnet-id: ${{ secrets.SUBNET_ID }}
          security-group-id: ${{ secrets.SECURITY_GROUP_ID }}

  aws-ec2-job:

    name: run the benchmarks on the runner
    needs: start-runner # required to start the main job when the runner is ready

    runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner

    steps:
      - name: Hello World
        run: echo 'Hello World!'

  stop-runner:

    name: Stop self-hosted EC2 runner
    needs:
      - start-runner # required to get output from the start-runner job
      - aws-ec2-job # required to wait when the main job is done

    runs-on: ubuntu-latest

    if: ${{ always() }} # required to stop the runner even if the error happened in the previous jobs

    steps:

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Stop EC2 runner
        uses: machulav/ec2-github-runner@v2
        with:
          mode: stop
          github-token: ${{ secrets.REPO_SCOPE_PAT }}
          label: ${{ needs.start-runner.outputs.label }}
          ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}

Spot instances support

really like the idea of on-demand runners.
spot support will make it more appealing :)

Runner software on running EC2 instance does not register with GitHub

Hello, I'm having an issue with the "start" mode of this action, as the runner software on my EC2 instance does not register with GitHub. I have a feeling that this issue is caused by the instance's lack of internet connection as the action hangs at "Checking every 10 seconds to see if the runner is registered".
Screen Shot 2021-08-12 at 4 27 36 PM
The instance is started and stopped correctly by the Action. This is my first application of this action in my work, so I have not implemented this action before.

I've taken the following steps so far:

The security group which is attached to the instance allows all outbound traffic
A NAT gateway was added to the VPC/Subnet containing the instance
The route table of the VPC/subnet points internet traffic to the NAT gateway
Regenerating the GitHub token used in this repository

Has anyone else encountered this issue, if so, how did you resolve it?

Add storage as an option.

By default it takes the instance type's default storage capacity. Can we add an option to define the storage capacity.

Allow the runner to be run by a non root user

In relation to #64 can functionality be added to allow the runner to be run by a non root user. We have a requirement for this that I'm sure is not all that unusual, where we need a non root user to do build tasks. We are not using Docker to build, but running in the EC2 instance.

I understand that the startup script is run as root, but there must be a way to tell the script to run the GitHub runner service as a different user.

Specifying timeout(s)

I may be mistaken but I didn't find any options to specify a timeout.

It seems to me that 2 types of timeouts could be implemented.

  • Stopping the job after x amount of time. This could be achieved in a run statement, but having an argument like "job-timeout" or such could be useful.

  • Terminating the ec2 instance after x amount of time. To reduce the risk of paying for an unused EC2 instance if any edge case, where that could happen, was to happen. Self terminating the ec2 instance might be a way to implement this behavior. An argument "ec2-timeout" might then be appropriate.

Github actions results doesn't match manual equivalent.

I have spent a few days troubleshooting this issue, and it seems to be an issue with the GitHub runner.

The action involves

  • Pull code
  • Build docker images (frontend and backend)
  • Run docker images (frontend and backend)
  • Run RPA script to virtually interact with the full-stack application.

The following is a simplification of the code. Many packages are pre-installed in the AMI image (docker, database client e.t.c.)

name: RPA deep test
on:
  push:
    branches:
      - dev
jobs:
  start-runner:
    name: Instanciate temporary EC2 instance
    runs-on: ubuntu-latest
    outputs:
      label: ${{ steps.start-ec2-runner.outputs.label }}
      ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.RPA_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.RPA_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.RPA_REGION }}
      - name: Start EC2 runner
        id: start-ec2-runner
        uses: machulav/ec2-github-runner@v2
        with:
          mode: start
          github-token: ${{ secrets.RPA_PERSONAL_ACCESS_TOKEN }}
          ec2-image-id: ${{ secrets.AMI_IMAGE }}
          ec2-instance-type: ${{ secrets.INSTANCE_TYPE }}
          subnet-id: ${{ secrets.SUBNET }}
          security-group-id: ${{ secrets.SECURITY_GROUP }}
  do-the-job:
    name: Perform tests
    needs: start-runner # required to start the main job when the runner is ready
    runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner
    steps:
      - name: Enable docker permissions
        id: docker-permission
        run: sudo chmod 666 /var/run/docker.sock

      - name: Clone repo
        id: git-clone-repo
        run: git clone https://${{secrets.GH_USER}}:${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}@github.com/foo-project/foo.git --branch=${GITHUB_REF##*/}

      - name: Build frontend image
        id: build-frontend-image
        run: cd foo && docker build -t frontend-test:latest frontend/.

      - name: Build backend image
        id: build-backend-image
        run: cd sq-web-app && docker build -t backend-test:latest backend/.

      - name: Run frontend image
        id: run-frontend-image
        run: docker run --name frontend -p 3000:80 --env BACKEND=http://localhost:8080/api -d frontend-test:latest

      - name: Run backend image
        id: run-backend-image
        run: docker run --name backend -d -p 8080:8080  backend-test:latest

      - name: Run tests
        id: robot-test
        run: |
          cd sq-web-app
          sudo apt install -y python3-pip
          pip3 install robotframework
          pip3 install robotframework-browser
          python3 -m Browser.entry init
          python3 -m robot test/0*.robot

  stop-runner:
    name: Terminate EC2 Instance
    needs:
      - start-runner # required to get output from the start-runner job
      - do-the-job # required to wait when the main job is done
    runs-on: ubuntu-latest
    if: ${{ always() }} # required to stop the runner even if the error happened in the previous jobs
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.RPA_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.RPA_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.RPA_REGION }}
      - name: Stop EC2 runner
        uses: machulav/ec2-github-runner@v2
        with:
          mode: stop
          github-token: ${{ secrets.RPA_PERSONAL_ACCESS_TOKEN }}
          label: ${{ needs.start-runner.outputs.label }}
          ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}

When this is executed in GitHub actions, the robot script cannot find the backend server.

When I manually execute these commands with identical settings, everything works flawlessly.

Why does the GitHub runner not execute commands the same way as if I were to SSH into an instance and execute the commands manually? (as you would expect)

Error: GitHub Registration Token receiving error Error: HttpError: Not Found Error: Not Found

I have configured this runner for use but when it tries to run the start-runner job it exits with

Error: GitHub Registration Token receiving error
Error: HttpError: Not Found
Error: Not Found

It seems to be falling over here:

try {
const response = await octokit.request('POST /repos/{owner}/{repo}/actions/runners/registration-token', config.githubContext);
core.info('GitHub Registration Token is received');
return response.data.token;
} catch (error) {
core.error('GitHub Registration Token receiving error');
throw error;
}
}

so I guess that /repos/{owner}/{repo}/actions/runners/registration-token is the URL which is Not Found.

I've tried ensuring that my Access Token is correct and it seems to be.

I can't imagine that github would be blocking access to their own API from the runners.

Does anyone know what I've missed?

git: command not found

image
My runner started successfully and terminated successfully:
image

in my recipe, i also got the user data by :
sudo yum update -y &&
sudo yum install docker -y &&
sudo yum install git -y &&
sudo systemctl enable docker

Support for GCE/GCP runner

👋 internally we have somewhat similar approach for a GCE/GCP self-hosted runner. We are considering open-sourcing it. Do you think it would be a good idea to make this action "cloud agnostic", and add support for GCE based self-hosted runner here, or would you suggest to make a separate action?

Error: A timeout of 5 minutes is exceeded. Please ensure your EC2 instance has access to the Internet.

I'm sure that this is a configuration issue on my end but I'm not sure what the problem is.

Link to my yml:
https://github.com/choderalab/super-duper-guacamole/blob/27b75e91103c5804e1d056908cb4c97110b5a7eb/.github/workflows/self-hosted-test.yml

Link to github action log:
https://github.com/choderalab/super-duper-guacamole/runs/2273873236?check_suite_focus=true

I made the inbound traffic rules wide open to help troubleshoot this:

Inbound rules
Type | Protocol | Port range | Source | Description - optional
-- | -- | -- | -- | --
All traffic | All | All | 0.0.0.0/0 | wide open for testing
All traffic | All | All | ::/0 | wide open for testing
Outbound rules
Port range | Protocol | Destination | Security groups
-- | -- | -- | --
443 | TCP | 0.0.0.0/0 | GitHubActionSelfHostedRunner

I'm able to SSH onto the EC2 instance that gets spun up -- any other ideas on how to test?

libicu60 needed for arm

Needed to run this for an t4g.micro instance with Amazon Linux 2.

yum install libicu60

without it, the config step fails with the error

Cannot get symbol ucol_setMaxVariable_50 from libicui18n
Error: /lib64/libicui18n.so.50: undefined symbol: ucol_setMaxVariable_50

Use pre-installed GitHub runner on the AMI

From @jdraymon in #16:

Would it make sense to have a version fixed in our AMI instead? or maybe a download URI with a default? My organization is a bit wary about downloading things off of the web for every build. Maybe a separate enhancement?

Let's keep it separate and collect feedback from the others.

Why is my GitHub registration hanging?

As you can see in this failed workflow run, my task is able to create an EC2 image, but then fails to connect with GitHub to register.

I made a token with the repo credentials and set it as a secret. So I don't think that's it.

I worry that I've messed up my VPC or security group configuration. Are there any screenshots or more detail examples of how to set it. Here's how I made my security group. Is this right?
Screenshot from 2022-09-07 15-25-18

Runners not cleaned up from the list of GitHub runners

When using the standard ec2 runner setup from the documentation it seems as if the GitHub runners aren't being cleaned up. The returned message is GitHub self-hosted runner with label <random id> is not found, so the removal is skipped but when I look at the list of runners, I see the runner with that label as offline. The AWS side seems to clean up properly. In any case, thanks for this great tool!

Hitting Github API Limit on creating ephemeral runners.

Our OSS repo has grown over the last few months to the point where we are actually seeing some issue with the Github API rate limits on registering a runner: https://github.com/airbytehq/airbyte/runs/5406131335?check_suite_focus=true#step:3:244

I'm wondering if anyone has seen this issue in their builds and have any recommendations on how to tackle this.

I've added another PAT from another user in the time being as a temporary solution. Thanks!

[Feature Request] - Configurable timings whilst waiting for runner

It would be great if we could configure the quietPeriodSeconds and retryIntervalSeconds values. I see my EC2 instance (running Amazon Linux 2) is usually booted up and registered in GitHub much earlier than detected in the script.

Would it be possible to expose these as configurable settings within the action so we can adjust the timings? 🙂

Support EC2 launch templates

Сontinuation of the discussion started by @jpalomaki in #62 (comment)


Related issues that may be covered using the EC2 launch template approach:

Feature Issue Can be covered by EC2 launch template
Re-use runner #4 ?
Spot instances #5 yes
Parallel processing #8 ?
Public IP #52 yes
Custom storage #53 yes, block device mappings, e.g. larger root volume
Re-use storage #59 ?
Multiple regions/AZ support #60 ?
Multiple security groups #68 yes
Tags #3 (implemented) yes
IAM role #6 (implemented) yes, via instance profile
EC2 keypair #74 yes

Not exiting when runner creation errors out

I had been pushing frequently while iterating on a workflow and ran into an issue which I think is due to hitting the usage limit of 1000 api calls /hr /repo.

Screen Shot 2021-01-21 at 10 44 05 PM

Notice in the image, that though the code reached the timeout, this error continued to repeat and the job didn't exit even after the 10 minute timeout (I manually canceled the workflow). The instance was up and running, but perhaps not in time. It did not ever succeed to register, so the limit must have been hit before it could try to register.

So, something is in the reject/exit stack is not quite working? Also, it seems likely that the interval used to check whether the runner is created should be increased or made configurable.

stop not working when start was cancelled by concurrency cancelation rule

To ensure that no more than one instance is running on several consequent pushes, the idea is to cancel the previous still running workflows on a new push, so we added a global setting:

concurrency: # cancel previous build on a new push
  group: ${{ github.ref }} # https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
  cancel-in-progress: true

now when I'm testing this, quite often stop fails to stop leaving the instance running, which is very expensive!

Run machulav/ec2-github-runner@v2
Error: Error: Not all the required inputs are provided for the 'stop' mode
Error: Not all the required inputs are provided for the 'stop' mode
Error: TypeError: Cannot read property 'mode' of undefined
Error: Cannot read property 'mode' of undefined

Here is the log from the start job:

 with:
    mode: start
    github-token: ***
    ec2-image-id: ami-03540b272db1624b7
    ec2-instance-type: p3.8xlarge
    security-group-id: sg-f2a4e2fc
    subnet-id: subnet-b7533b96
    aws-resource-tags: [
    {"Key": "Name", "Value": "ec2-github-runner"},
    {"Key": "GitHubRepository", "Value": "bigscience-workshop/Megatron-DeepSpeed"}
  ]
  
  env:
    AWS_DEFAULT_REGION: us-east-1
    AWS_REGION: us-east-1
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
GitHub Registration Token is received
AWS EC2 instance i-038eeed014c994b48 is started
Error: The operation was canceled.

so it looks like it didn't set the vars it was supposed to set because it was cancelled.

Here is the full workflow for context:
https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/7c636d7555e915f1f426984172f73840b2168313/.github/workflows/main.yml

If there are other solutions I'm all ears.

Thank you!

incorrect environment on runner, unable to checkout repo

When I try to run actions/checkout without supplying my personal access token (PAT) on my EC2 runner I receive an error: "remote: Repository not found." When I supply my PAT using the token the checkout is successful.

  do-the-job:
    name: Do the job on the runner
    needs: start-runner # required to start the main job when the runner is ready
    runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner
    steps:
      - name: clone repo
        uses: actions/checkout@v3
        with:
          token: ${{secrets.GH_PERSONAL_ACCESS_TOKEN}} #without this line it fails

On a Github runner I don't need to supply the token, I think because the action finds it at ${{ github.token }} so my hosted runner must not be receiving this. I inspected the github context on the runner with echo and all looks well except I can't obviously confirm the token (it's censored).

Others seemingly are able to use actions/checkout on their EC2 runner without supplying a token, example: here.

This may be a problem with how I prepared my runner EC2 image however I'm not sure how to diagnose this.

Is there a way to choose an SSH key pair?

I'd like to be able to SSH into the machine manually to verify things. Is there a way to specify which SSH pair the EC2 instance uses?

Sorry if this was answered in the doc, I didn't see anything in there.

How can I tell the ec2 instance to switch to a non-root user via this runner?

All examples I have seen use a docker image, which has a user parameter.

But I'm not using docker. How do I then tell the action-runner to run as non-root (ubuntu in this case)

I tried many different ways, but no matter what I do the current user remains root

      - name: Who Am I?
        run: |
          sudo su - ubuntu
          whoami
      - name: Who Am I?
        run: |
          sudo -u ubuntu bash
          whoami
      - name: Who Am I?
        shell: bash -l {0}
        run: |
          su - ubuntu
          whoami

I can't find anything on the EC2 side that will let me change the default user. When I connect via ssh it gives the root@ip address.

I have everything already installed/configured under ubuntu.

If this is not the right place to ask if you know where I can find this info please let me know as I have spent many hours searching and can't find any information.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.