mapbox / ecs-conex Goto Github PK

View Code? Open in Web Editor NEW

32.0 83.0 6.0 312 KB

ECS Container Express

Shell 48.29% JavaScript 49.82% Dockerfile 1.89%

artifact-management

ecs-conex's Introduction

[deprecated] ecs-conex

⚠️ This repository is deprecated and will no longer be maintained ⚠️.

If you’re looking for alternatives to building docker images for AWS ECS, we recommend checking out AWS CodeBuild.

What is ecs-conex?

ECS Container Express is a continuous integration service for building Docker images and uploading them to ECR repositories in response to push events to Github repositories.

Dockerfile

The Dockerfile contains the commands required to build an image, or snapshot of your repository, when you push to GitHub. This file is located in the root directory of your application code. If you are using private npm modules, your Dockerfile might require some additional commands as listed over here

ECR Repository

ecs-conex will create one ECR repository for each Github repository, and each time a push is made to the Github repository, a Docker image will be created for the most recent commit in the push. The image will be tagged with the SHA of that most recent commit. Also, if the most recent commit represents a git tag, the tag's name will also become an image in the ECR repository.

Usage

You only need to run ecs-conex's watch.sh script once to subscribe your repository to the ecs-conex webhook. For more information about associating these resources, see the Getting started documentation.

Documentation

ecs-conex's People

Contributors

Stargazers

Watchers

Forkers

testbigorg sportsbitenews rubythonode crazyrex isabella232

ecs-conex's Issues

ecs-conex Node 4 EoL

Hello @taraadiseshan ! I’m a bot from your friendly neighborhood security team! Our systems have detected a lambda function in this repository running Node 4 (or older). Node 4 officially went end-of-life on April 30 of this year, and is no longer supported or receiving security patches from the Node community. As such, we’d like to get this repo updated at your earliest convenience.

Our systems aren’t perfect, so it’s possible this issue was created in error:

if there is nothing in this repo running outdated Node, please leave a comment to that effect and close the issue
if this repo is deprecated, no longer in use, or not deployed on our infrastructure, please leave a comment to that effect and close the issue
if you are not the right person to contact about this codebase, please leave a comment and tag in the most appropriate person or team you know of to handle it

Thank you so much for your help! If you have any questions or concerns, don’t hesitate to reach out!

Best,
~ Versioning Looker-Outer Bot 3001™

Staging stack needs to be re-created

Utilize github status API

Could be cool to utilize https://developer.github.com/v3/repos/statuses/ so that failed builds would show up in the github UI.

Do not overwrite existing images

We should have safeguards in place that images cannot get overwritten. Later overwrites could lead to unexpected versions of dependencies that are "more modern" than the image should be.

Implement a forced timeout

Let's force a timeout of 60 minutes. Presently, its possible for workers to hang indefinitely, and that can lead to a stack that can't process any more messages.

Use cached layers if repository has a yarn lockfile

Right now, images are built with the --no-cache flag. Using cached layers is a way to significantly decrease build times, and long build times are one of the biggest bummers about our current CI flow.

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

Yarn's use of a lockfile that pins node.js dependency versions and is committed in the repo avoids this misstep, and makes me wonder if we could drop the --no-cache flag if there's yarn file in the repo.

However there are still a few other questions to weigh against such a decision:

You would only get build-time caching benefits sometimes and not all the time. This depends on whether your conex worker task lands on an EC2 that still has the cached layers from a previous build.
Due to ^^, you'd may want to try and keep cached layers laying around on the EC2s for longer, and this leads to disk space management problems.

It may be worth exploring this anyways, without adjusting anything about how we have our EC2s clean up old images/layers. If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

cc @springmeyer @scothis @mcwhittemore @mapsam @GretaCB

watch.sh script broken

112e8ae Removed the CF Outputs AccessKeyId and WebhookEndpoint on which scripts.watch.sh depends.

cc/ @jakepruitt @rclark

--no-cache all the builds

We should always build fresh images, without relying on a previous cache. This will make conex builds take longer, but it will insure that conex builds are in sync with builds that may be done locally or on other parts of a CI pipeline (e.g. travis or circle)

remove the code which attempts to download a before image
add code that refuses a build if the after image already exists in ECR
specify --no-cache on all builds
[ ](victory lap) this allows us to always remove the image we just built after it has been uploaded to ECR (see #29)

cc @emilymdubois @jakepruitt

Build multiple images

On AWS, a "task" can run one or more docker images. I can definitely imagine reasons why a Github repository might contain multiple dockerfiles to build more than one image. It would be worth considering how this scenario would be handled by this repo.

[secure] the secrets

Should upgrade to [email protected] and [secure] the sensitive stack parameters.

Alarming on ECR registry size

Moving this comment on PR #63 into its own issue:

When I investigated a few days ago, there din't seem to be a way to alarm on ECR. Curious if we have creative ways to use CustomMetrics at our disposal to setup ECR registry size.

cc @rclark @yhahn @emilymcafee

npm package with Github URL version dependnecies fail to build containers

scrubbadubdub

Large push payloads

InvalidParameterException: Container Overrides length must be at most 8192

A large push payload that hits the webhook will be rejected when watchbot attempts to run the task. The Lambda proxy ought to pluck out the parts of the commit message that ecs-conex needs in order to circumvent this situation.

Conex support for multiple images in the same repo

Conex currently only builds one container per repo.

We have a repo, we'll call it tabby-cat, which defines multiple containers and uses docker-compose as well as custom bash scripts to build and push the multiple images to ECR. The multiple images are tagged with:

tabby-cat:<git-sha>-rails
tabby-cat:<git-sha>-cgimap
tabby-cat:<git-sha>-orcd

where each of those sub-images is defined in a different part of the services section of the docker-compose.yml file.

It would be great if we could rewrite conex to support multiple containers.

Plan for how to do this

The outputs of docker-compose build are multiple images with the format tabby-cat_rails, tabby-cat_cgimap, etc., where tabby-cat is the name of the folder and rails is the name of the section of the docker compose file. Using some string transformations, I think we could re-tag these images with the <repo>:<gitsha>-<subimage> format and push all of them to ECR.

cc/ @Yuffster @rclark

Can webhooks work?

API Gateway allows you to create an API key for your endpoint, and then expects that key to be provided as the x-api-key header in POST requests to the endpoint.

Github, on the other hand, allows you to specify a secret, and then uses that to provide an HMAC digest of the payload as the X-Hub-Signature header in the POST request.

I emailed Github support to try and learn if they have any intention of ever supporting custom headers. An alternative would be to:

configure api gateway to reject any request lacking the X-Hub-Signature header
setup the lambda function to check the X-Hub-Signature header against the expected value for the payload provided, and proceed if acceptable.

cc @zmully

Maybe not --quiet

ecs-conex/ecs-conex.sh

Line 49 in ed7ad40

docker build --no-cache --quiet ${args} --tag ${repo}:${after} ${tmpdir}

Suppressing the build output makes logs less useful to developers trying to determine why a particular image failed to build.

Document steps for npm private modules

Reminder to add documentation around special steps for repos using private npm modules

production not on master/ unhandled promise rejection

While running out the eng standards inventory (#141) I noticed that latest master is not deployed to production. production is still on 112e8ae0.

I started the update and it failed - no healthy tasks could start up because of this error:

(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: maxJobDuration: not a number

next steps

what introduced this regression?
deploy a safe fix

cc @mapbox/assembly-line

Clean up tagged regional images

Clean up any images we tag as part of the docker push process.

cc @rclark

Load AWS credentials from environment

Right now, during a build ecs-conex attempts to read AWS credentials from the EC2 metadata service. It should read credentials from the environment first, then fall back to the metadata service if there are none in environment.

Provide GitHub token as build-arg

ecs-conex already has a GitHub token for cloning the repo that it is building. However, we'd need to pass in the token to the dockerfile itself if another private repo (for instance, a python module in development).

Another block like so https://github.com/mapbox/ecs-conex/blob/master/utils.sh#L79-L82 should work.

cc @jacquestardie @rclark

Cleanup strategy

Currently after building enough image the host that ecs-conex is running on will run out of space and be 🙅 blocked from processing any more jobs. (aws/amazon-ecs-agent#349 (comment))

Stopgap: maybe we clean up an image immediately after it's complete
Ideally: Keeping images around after a build will speed up subsequent builds. Some kind of LRU-like behavior around the images kept would be nice.

Do not retry

If a build fails, it should either

never be retried, or
retried some (small) number of times

Shhhh! inside voices

https://docs.docker.com/engine/reference/commandline/build/

We should use the -q flag when building docker images.

cc @jacquestardie

log-in to private ECR before building image

Sometimes the base images used by Dockerfiles are located in private ECR repos. ecs-conex needs an option to log into private repos as needed to grab base images before building the docker file.

Install aws-cli w/o pip

I've seen pip have really old version of the aws cli and I've starting only installing it directly from AWS. Directions are at http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-bundle-other-os

tl;dr;

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

/cc @rclark

ECR image limits

The documented limit to the number of images in an ECR repository is 1000. The ecr:ListImages request provides no insight into which images are older and which are younger -- in fact there's no clear indication of the sorting order.

How will we manage repository size?

cc @yhahn @emilymdubois @emilymcafee

Optionally, save image tarballs on S3

ecs-conex should allow the user to provide a list of S3 buckets, perhaps spanning several regions. If provided, each build job should

make a tagged version of the image as <service name>:<git sha>
docker save the tagged image and gzip the file
upload it into the specified buckets:

s3://<bucket name>/images/<service name>/<git sha>.tar.gz

cc @jakepruitt @zmully

Invalid signatures in Github payloads

@scothis has noticed that some Github payloads sent when a PR is merged have been getting rejected by conex with a 403. I checked and Github is actually providing an incorrect signature in the POST that it sends. I've filed a support request with Github for this.

It is unclear if this problem is repository-specific or not, but in case anyone else encounters it, the current workaround is to push a subsequent empty commit directly to the master branch of your repo. This will fire a webhook with the correct signature.

cc @yhahn @emilymcafee @jakepruitt @emilymdubois

Issues with --force and squashed merge

It appears that push events related to rewriting the git tree can cause conex to fail on this line. The current behavior is to delete the event from the queue and send a failure notification.

Is this the right approach? I'd like to catch one of these push payloads and understand why they refer to commits that are no longer part of the tree.

When performing a "Squash and Merge" from a PR on Github, conex receives two push payloads. One works, the other doesn't. For example:

[Tue, 17 May 2016 00:13:28 GMT] [ecs-conex] [ecd7dcfa-c378-411a-83e1-01c547b4f14a] processing commit 0000000000000000000000000000000000000000 by rclark to refs/heads/twice of mapbox/ecs-watchbot
[Tue, 17 May 2016 00:13:28 GMT] [ecs-conex] [ecd7dcfa-c378-411a-83e1-01c547b4f14a] Cloning into '/mnt/data/xj4q70'...
[Tue, 17 May 2016 00:13:29 GMT] [ecs-conex] [ecd7dcfa-c378-411a-83e1-01c547b4f14a] fatal: reference is not a tree: 0000000000000000000000000000000000000000

We should be able to silently ignore payloads with this .after sha of all zeros. I've also noticed that the payload has .head_commit: null and .commits: [].

Watching private repositories

The stack is provided a GitHub token that has permission to clone private repositories. watch.sh should accept something like a github user or team name, to make sure that the owner of the token being used is listed as a collaborator with (at least) read permission to the repository.

related #2

AWS credentials are copypasta and wrong

This credentials code sets AWS_ACCESS_KEY_ID several times. The should be different credentials variables.

Eng standards inventory

Required Elements

If any elements in the below list are not checked, this repo will fail standards compliance.

Not running node 4 or below
Has at least some test coverage?
Has a README?
Has no hard-coded critical secrets like API keys?

Rubric

Total possible: 30 points (+2 bonus)
Grading scale:

Point Total	Qualitative Description	Scaled Grade
28+ points	Strongly adheres to eng. standards	5
23-27 points	Adheres to eng. standards fairly well	4
18-22 points	Adheres to some eng. standards	3
13-17 points	Starting to adhere to some eng. standards	2
9-12 points	Following a limited number of eng. standard practices	1
< 9 points	Needs significant work, does not follow most standards	0

Repo grade: 3 (21 points)

cc @mapbox/assembly-line

reveal GH authentication failure in `watch`

If GithubAccessToken is set, but the token doesn't have enough permissions, the error from Github ends up hidden here, causing the next line to fail with an unhelpful error.

I wonder if we could inspect the response from the Github API to make sure it hasn't errored before proceeding.

Make conex a transform stream

ecs-conex should announce when it has completed a build. This announcement could be an SNS message to a topic that conex controls, or maybe a custom cloudwatch event?

The message body could include

the details of the commit message
the repository URIs for the images that it dropped onto ECR

cc @jakepruitt @zmully

ecs-conex and tags

A moment ago I committed be77d88, tagged it as v0.2.0 and then git push && git push --tags.

These two pushes resulted in two conex jobs, but the ECR repository did not end up with an image tagged with the git sha -- only the v0.2.0 image exists. My hunch is that because the two images are identical, ECR doesn't retain both? This could lead to a botched deploy if the deploy tool assumes that it can use the sha in the stack's GitSha parameter.

I'm not sure if there's a way to mitigate this, maybe we just need to document the behavior?

cc @emilymdubois @yhahn @emilymcafee

Migrate to Circle 2.0

cc @mapbox/assembly-line

ecs-conex check not appearing in github pull request checks

I've set up ecs-conex successfully: It shows up as an installed app in my repo's "Integration and Services" and docker images are getting built successfully on each commit.

The only issue is that the PR check is missing from the GitHub UI.

@arunasank noted that we should look at

ecs-conex/utils.sh

Lines 42 to 50 in 8014ea9

 function github_status() { 

 local status=$1 

 local description=$2 

 curl -s \ 

 --request POST \ 

 --header "Content-Type: application/json" \ 

 --data "{\"state\":\"${status}\",\"description\":\"${description}\",\"context\":\"ecs-conex\"}" \ 

 ${status_url} > /dev/null 

 }

cc @rclark @arunasank - not urgent unless this is impacting other users. Thanks!

Tests

I think a good test suite would

build the image from the Dockerfile in this repo, then
run the image, building an image for another, real Github repository
[maybe] run the cloudformation template, triggering a repository build via webhook

cc @karenzshea

"Install docker binary matching EC2 version" (sic)

From https://github.com/mapbox/ecs-conex/blob/master/Dockerfile#L15-L17:

# Install docker binary matching EC2 version
RUN curl -sL https://get.docker.com/builds/Linux/x86_64/docker-1.11.1.tgz > docker-1.11.1.tgz
RUN tar -xzf docker-1.11.1.tgz && cp docker/docker /usr/local/bin/docker && chmod 755 /usr/local/bin/docker

Except the Docker binary on the EC2 is a somewhat moving target. It'll probably be safest if conex enforces version consistency at runtime, checking the docker binaries it's going to run and getting access (Somehow?) to the host's docker version to make sure that the docker binaries within conex are going to talk to a docker service/socket that it's expecting.

cc @mapbox/platform

Github status failure shouldn't abort the build

If any of the requests to update github status fail, the build should not be aborted. Instead, these failures should be silently ignored.

LogAggregationFunction isn't optional

If you provide an empty string as ecs-conex's LogAggregationFunction parameter, the { "Ref": "... } still gets handed to ecs-watchbot. Then, ecs-watchbot uses a JS condition to add the subscription filter. This results in a failed deploy because it tries to create a subscription with an invalid ARN.

Cleanup needs a `--help` option

The cleanup script could use a --help argument.

Commit message check for build no-op

We should be able to specify some specific string in a commit message that tells ecs-conex not to perform a build.

cc @emilymdubois @jakepruitt

Issues with rapid pushes

push or pull ${accountid}.dkr.ecr.us-east-1.amazonaws.com/${reponame} is already in progress

If this occurs conex will exit, watchbot will retry the job and send an error notification. Desired behavior would be a silent retry (exit code 4)

github ci status lied

@rclark said I could open a "ecs-conex lied to me" ticket...

Even though my image failed to build.

Change manifest format on ECR images

Ref: #97 (comment)

When you push and pull images to and from Amazon ECR, your container engine client (for example, Docker) communicates with the registry to agree on a manifest format that is understood by the client and the registry to use for the image.

When you push an image to Amazon ECR with Docker version 1.9 or older, the image manifest format is stored as Docker Image Manifest V2 Schema 1. When you push an image to Amazon ECR with Docker version 1.10 or newer, the image manifest format is stored as Docker Image Manifest V2 Schema 2.

When you pull an image from Amazon ECR by tag, Amazon ECR returns the image manifest format that is stored in the repository, but only if that format is understood by the client. If the stored image manifest format is not understood by the client (for example, if a Docker 1.9 client requests an image manifest that is stored as Docker Image Manifest V2 Schema 2), Amazon ECR converts the image manifest into a format that is understood by the client (in this case, Docker Image Manifest V2 Schema 1).

Next actions

~~Update Docker version on ecs-conex~~
Set up a stack to pull and re-push existing images on the ECR - this will be a one time task for existing images.

cc/ @mapbox/platform

Move logging and debugging docs into /docs

Clear up the readme a little bit.

Provide Better Error Message for Timeout

It seems like there is a default limit of 20 minutes for building Docker images set here.

For more involved Docker images this is often not enough. I didn't get any error message for hitting the timeout, all I saw was Docker build getting randomly killed during building the image.

Is it possible to issue a timeout-reached error message?

	function github_status() {
	local status=$1
	local description=$2
	curl -s \
	--request POST \
	--header "Content-Type: application/json" \
	--data "{\"state\":\"${status}\",\"description\":\"${description}\",\"context\":\"ecs-conex\"}" \
	${status_url} > /dev/null
	}

mapbox / ecs-conex Goto Github PK

ecs-conex's Introduction

[deprecated] ecs-conex

What is ecs-conex?

Dockerfile

ECR Repository

Usage

Documentation

ecs-conex's People

Contributors

Stargazers

Watchers

Forkers

ecs-conex's Issues

Plan for how to do this

next steps

Required Elements

Rubric

Repo grade: 3 (21 points)

Next actions

Recommend Projects

Recommend Topics

Recommend Org

Jobs