GithubHelp home page GithubHelp logo

infrastructure's Introduction

Concourse: the continuous thing-doer.

Discord Build Contributors Help Wanted

Concourse is an automation system written in Go. It is most commonly used for CI/CD, and is built to scale to any kind of automation pipeline, from simple to complex.

booklit pipeline

Concourse is very opinionated about a few things: idempotency, immutability, declarative config, stateless workers, and reproducible builds.

The road to Concourse v10

Concourse v10 is the code name for a set of features which, when used in combination, will have a massive impact on Concourse's capabilities as a generic continuous thing-doer. These features, and how they interact, are described in detail in the Core roadmap: towards v10 and Re-inventing resource types blog posts. (These posts are slightly out of date, but they get the idea across.)

Notably, v10 will make Concourse not suck for multi-branch and/or pull-request driven workflows - examples of spatial change, where the set of things to automate grows and shrinks over time.

Because v10 is really an alias for a ton of separate features, there's a lot to keep track of - here's an overview:

Feature RFC Status
set_pipeline step #31 ✔ v5.8.0 (experimental)
Var sources for creds #39 ✔ v5.8.0 (experimental), TODO: #5813
Archiving pipelines #33 ✔ v6.5.0
Instanced pipelines #34 ✔ v7.0.0 (experimental)
Static across step 🚧 #29 ✔ v6.5.0 (experimental)
Dynamic across step 🚧 #29 ✔ v7.4.0 (experimental, not released yet)
Projects 🚧 #32 🙏 RFC needs feedback!
load_var step #27 ✔ v6.0.0 (experimental)
get_var step #27 🚧 #5815 in progress!
Prototypes #37 ⚠ Pending first use of protocol (any of the below)
run step 🚧 #37 ⚠ Pending its own RFC, but feel free to experiment
Resource prototypes #38 🙏 #5870 looking for volunteers!
Var source prototypes 🚧 #6275 planned, may lead to RFC
Notifier prototypes 🚧 #28 ⚠ RFC not ready

The Concourse team at VMware will be working on these features, however in the interest of growing a healthy community of contributors we would really appreciate any volunteers. This roadmap is very easy to parallelize, as it is comprised of many orthogonal features, so the faster we can power through it, the faster we can all benefit. We want these for our own pipelines too! 😆

If you'd like to get involved, hop in Discord or leave a comment on any of the issues linked above so we can coordinate. We're more than happy to help figure things out or pick up any work that you don't feel comfortable doing (e.g. UI, unfamiliar parts, etc.).

Thanks to everyone who has contributed so far, whether in code or in the community, and thanks to everyone for their patience while we figure out how to support such common functionality the "Concoursey way!" 🙏

Installation

Concourse is distributed as a single concourse binary, available on the Releases page.

If you want to just kick the tires, jump ahead to the Quick Start.

In addition to the concourse binary, there are a few other supported formats. Consult their GitHub repos for more information:

Quick Start

$ wget https://concourse-ci.org/docker-compose.yml
$ docker-compose up
Creating docs_concourse-db_1 ... done
Creating docs_concourse_1    ... done

Concourse will be running at 127.0.0.1:8080. You can log in with the username/password as test/test.

⚠️ If you are using an M1 mac: M1 macs are incompatible with the containerd runtime. After downloading the docker-compose file, change CONCOURSE_WORKER_RUNTIME: "containerd" to CONCOURSE_WORKER_RUNTIME: "houdini". This feature is experimental

Next, install fly by downloading it from the web UI and target your local Concourse as the test user:

$ fly -t ci login -c http://127.0.0.1:8080 -u test -p test
logging in to team 'main'

target saved

Configuring a Pipeline

There is no GUI for configuring Concourse. Instead, pipelines are configured as declarative YAML files:

resources:
- name: booklit
  type: git
  source: {uri: "https://github.com/vito/booklit"}

jobs:
- name: unit
  plan:
  - get: booklit
    trigger: true
  - task: test
    file: booklit/ci/test.yml

Most operations are done via the accompanying fly CLI. If you've got Concourse installed, try saving the above example as booklit.yml, target your Concourse instance, and then run:

fly -t ci set-pipeline -p booklit -c booklit.yml

These pipeline files are self-contained, maximizing portability from one Concourse instance to the next.

Learn More

Contributing

Our user base is basically everyone that develops software (and wants it to work).

It's a lot of work, and we need your help! If you're interested, check out our contributing docs.

infrastructure's People

Contributors

aoldershaw avatar chenbh avatar cirocosta avatar clarafu avatar dtimm avatar estebanfs avatar izabelacg avatar jamieklassen avatar jwntrs avatar kcmannem avatar pivotal-bin-ju avatar taylorsilva avatar vito avatar xtreme-sameer-vohra avatar xtremerui avatar youssb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

infrastructure's Issues

Destroying terraform environment can fail to delete database user

When I run the terraform destroy job in the environment pipeline, it fails with Error, failed to deleteuser atc in instance <instance-name>. I think this is caused by the fact that terraform tries to delete the user before the database, which it is not allowed to do and results in this error. hashicorp/terraform-provider-google#3820 (comment)

If I run the terraform destroy job a few more times, it seems like randomly it will try to delete the database first (which I think is able to delete it successfully) but then the user is still left in the terraform state and it then errors with Error, failed to deleteuser atc in instance <instance-name>: googleapi: Error 400: Invalid request: Invalid request since instance is not running., invalid.

The way I manually fixed this as a workaround to the issue is to run terraform state list and find the terraform google sql user (it should look like module.baseline_database.google_sql_user.user) and then run 1terraform state rm module.baseline_database.google_sql_user.user`.

But I think maybe the long term fix is to try what is suggested in the issue I linked, where we can try adding a depends_on to the google_sql_database so that it depends on the google_sql_user.

write-secrets script not working

Running the script gets me this error:

$ ./scripts/write-secret production concourse/main/test_writing 123
downloading gs://"concourse-greenpeace"/vault/production/data.tar...
BadRequestException: 400 Invalid bucket name: '"concourse-greenpeace"'

Messed around with the script a bit and kept getting the error. Running gsutil directly did not give an error

gsutil -q cp gs://"concourse-greenpeace"/vault/production/data.tar data.tar

If I add single quotes around the bucket url then I get the same error

$ gsutil -q cp 'gs://"concourse-greenpeace"/vault/production/data.tar' data.tar
BadRequestException: 400 Invalid bucket name: '"concourse-greenpeace"'

Don't use basic auth in k8s cluster

We're currently using basic auth to authenticate with our GKE clusters:

username = "concourse"
password = random_string.password.result

GKE has announced they're removing support basic auth in Kubernetes 1.19 and above. We should switch to a more secure form of authentication - perhaps we can use the service account directly to fetch an access token? See https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/using_gke_with_terraform#using-the-kubernetes-and-helm-providers

Scale hush-house properly

There are some orphaned teams that are not using hush-house anymore. With changes in v7.7 / v7.8 plus gke cluster upgrade, hush-house should re-scale its resource.

Improve observability for worker uptime

We've noticed that as many PRs come in quickly, build will often error because the worker restarts. We don't really have any metrics to help us identify situations like these besides builds erroring. We should look into improving observability for worker uptime

Update all cluster to E2 machine type

Existing N1 tiers are costly and old. With reduced work load and the goal of improving budget spending, it should be using newer tier E2 that delivers equal performance while cost less.

  • hush-house
  • ci (production)
  • dispatcher
  • k8s-topgun
  • bosh (topgun)

Scale production properly

With less PRs to run and reduced topgun jobs (removed from releasing pipelines), and also less releasing pipelines, it should be able to cut pr workers from 3 to 1 and general workers from 8 to 4.

Below are workers list atm:

administrators-Mac-mini.local         6           darwin    none        main  running  2.4      8d
c90d0f30-76f8-41a5-b10b-22b006356d13  10          linux     bosh        main  running  2.4      29m23s
ci-pr-worker-0                        13          linux     pr          none  running  2.4      1d
ci-pr-worker-1                        15          linux     pr          none  running  2.4      1d
ci-pr-worker-2                        18          linux     pr          none  running  2.4      1d
ci-workers-worker-0                   58          linux     none        none  running  2.4      1d
ci-workers-worker-1                   56          linux     none        none  running  2.4      1d
ci-workers-worker-2                   55          linux     none        none  running  2.4      1d
ci-workers-worker-3                   86          linux     none        none  running  2.4      1d
ci-workers-worker-4                   71          linux     none        none  running  2.4      1d
ci-workers-worker-5                   67          linux     none        none  running  2.4      1d
ci-workers-worker-6                   55          linux     none        none  running  2.4      1d
ci-workers-worker-7                   71          linux     none        none  running  2.4      1d
topgun-worker-worker-0                0           linux     k8s-topgun  none  running  2.4      1d
windows-worker-ci                     0           windows   none        main  running  2.4      50m42s

automate down migrations on downgrade

we have the ability to downgrade Concourse by pinning to an older Concourse version and triggering bump-concourse-to-* - however, if there were any migrations that were run since that version, we won't automatically apply the down migrations upon downgrade.

we'll need some sort of mechanism to do this automatically - it could be a part of the terraform job that runs before the actual terraform step. since the migrations are disruptive, we'll need to stop all running ATCs prior to running the migrations

Set up vault policy on concours/* after initialized

Currently after job initialize-vault is done, Vault is still not usable until following policy is added.

concourse.hcl

path "concourse/*"
{
  capabilities = ["read"]
}

vault create policy concourse concourse.hcl

Otherwise on production CI there will be 401 permission denied error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.